Hello All,

This is my first message to the list, so please feel free to refer me to
other posts, blogs, etc. to get me up to speed.  I understand that HBase and
MapReduce work side-by-side to each other, that is, that they can feed each
other data.  I have two sets of use cases for my application: one which
requires batch style calculations in parallel, which MapReduce is perfect
for, and one which requires interactive calculations, which I'm not sure how
to accomplish in HBase.  By interactive calculation, I mean that a user
makes a request to HBase which requires some data transformation of the data
in HDFS (say an aggregation or an allocation) and wants the results returned
immediately.  Here are my questions:

1.  What is the mechanism by which you can build your own calculations that
return results quickly in HBase?  Is it just Java classes or some other
technique.
2.  For these types of calculations, does HBase handle acquiring the data if
its distributed across multiple boxes like MapReduce does, or do I have to
write my own algorithms that seek out the data on the write nodes?
3.  Is it possible to break-up the work across multiple nodes and then bring
it together like a MapReduce, but without the performance penalty of using
the MapReduce framework?  In other words, if HBase knows that files A-D are
on node 1, E-G are on node 2, can I write a function that says "sum up X on
node 1 locally and y on node 2 locally" and bring it back to me combined?
4.  Are there ways to guarantee that the computation will happen in-memory
on the local column store, or is this the only place that such calculations
happen?

Apologies for what must be very basic questions.  Any pointers really
appreciated.  Thank you.

Best Regards,

Nenshad

-- 
Nenshad D. Bardoliwalla
Twitter: http://twitter.com/nenshad
Book: http://www.driventoperform.net
Blog: http://bardoli.blogspot.com

Reply via email to