Hello All, This is my first message to the list, so please feel free to refer me to other posts, blogs, etc. to get me up to speed. I understand that HBase and MapReduce work side-by-side to each other, that is, that they can feed each other data. I have two sets of use cases for my application: one which requires batch style calculations in parallel, which MapReduce is perfect for, and one which requires interactive calculations, which I'm not sure how to accomplish in HBase. By interactive calculation, I mean that a user makes a request to HBase which requires some data transformation of the data in HDFS (say an aggregation or an allocation) and wants the results returned immediately. Here are my questions:
1. What is the mechanism by which you can build your own calculations that return results quickly in HBase? Is it just Java classes or some other technique. 2. For these types of calculations, does HBase handle acquiring the data if its distributed across multiple boxes like MapReduce does, or do I have to write my own algorithms that seek out the data on the write nodes? 3. Is it possible to break-up the work across multiple nodes and then bring it together like a MapReduce, but without the performance penalty of using the MapReduce framework? In other words, if HBase knows that files A-D are on node 1, E-G are on node 2, can I write a function that says "sum up X on node 1 locally and y on node 2 locally" and bring it back to me combined? 4. Are there ways to guarantee that the computation will happen in-memory on the local column store, or is this the only place that such calculations happen? Apologies for what must be very basic questions. Any pointers really appreciated. Thank you. Best Regards, Nenshad -- Nenshad D. Bardoliwalla Twitter: http://twitter.com/nenshad Book: http://www.driventoperform.net Blog: http://bardoli.blogspot.com
