Is this architecture possible (Hadoop, HBase)?

Raghava Mutharaju Mon, 15 Feb 2010 00:06:43 -0800

Hello all,

      I am relatively new to MapReduce and haven't used HBase at all. Is the
following architecture possible?


A distributed key-value store is used (HBase). So along with values, there
would be a timestamp associated with the values. Map & Reduce tasks are
executed iteratively. Map, in each iteration should take in values which
were added in the previous iteration to the store (perhaps the ones with
latest timestamp?). Reduce should take in Map's output as well as the
<key,value> pairs from the store whose key(s) match the key(s) that reduce
has to process in the current iteration. The output of reduce goes to the
store.

If this is possible, which classes (eg: InputFormat, run() of Reduce) should
be extended so that instead of the regular operation the above operation
takes place. If this is not possible, are there any alternatives to achieve
the same?

Thank you.

PS: I have put the same question on mapreduce-user apache mailing list (but
haven't got any replies yet). I found many topics on mapreduce in this
mailing list as well, so thought of posting it here also.

Regards,
Raghava.

Is this architecture possible (Hadoop, HBase)?

Reply via email to