CouchDB as MapReduce framework?

Hendrijk Templehov Sat, 13 Sep 2008 11:55:00 -0700

Hi there,

I am currently trying to dive into Map/Reduce, Clustering (kmeans,
conopy etc.), large data sets and so on...


To be honest, I am a bit confused about all the stuff out there: I had
a small look at Apache Hadoop, the pythonic Disco-Framework and
others. Today, I found CouchDB. And, waht shall I say, I really like
the lightweight feeling of CouchDB.

I tried a bit with the CouchDB Map/Reduce-views, but I am wondering if
I can use map/reduce further in this context. More precisely, is it
possible to run more than one map/reduce-job over a complete dataset?

The most simple example I can imagine is borrowed from the usual
map/reduce example: word_count.

Imagine, I have a database, where some (not all!) documents have a
field called "fulltext" and I want to count all words in that field.
The common map/reduce approach would consist of to jobs: first: get
all documents with that field and second: count the words in those
documents. I know, with CouchDB I could run that in one job, but if
you think of more complex examples, it would be nice to further
map/reduce the query-result-set.

Another example: I have a document set where all documents have the
fields "link_to", "permalink", "date_published". And now I want to
know which articles got a backlink on last sunday. So, first I would
create a view giving me all documents with "date_published"=last
sunday. And in a second step I would emit all documents which match to
link_to on this query result.
That sounds a bit like a relational database issue and I know, CouchDB
isn't designed to replace an RDMS, but a query like that should be
possible. I know there are work arrounds for those examples so that
you can handle it with one single map/reduce view, but if you have a
look at more complex map/reduce-algorithms (see also: Apache Mahout),
it would be very great, if one could combine the great accessibility
of CouchDB with a full featured map/reduce framework.

Is it possible  with CouchDB?


Thank you in advance for your comments?


Kind regards,
Hendrijk

CouchDB as MapReduce framework?

Reply via email to