HDFS is a double-edged sword . Being a raw file system - you can feed it to a Map Reduce program although it might be necessary to define InputSplit-s as appropriate to chop down the input size.

OTOH, HBase is structured data ( well - sort of ! ) using a file format on top of HDFS to store the schema and hence comes with predefined InputSplit-s that make it easy to get started on a MapReduce program. From an API simplicity point of view - HBase can get you started relatively faster because of it ( assuming you have your data in hbase).

Refer to -
http://wiki.apache.org/hadoop/Hbase/MapReduce .

Although the wiki says deprecated - in reality - it is suggested to stick with *.mapred.* packages for some time since the underlying .mapreduce.* packages are not mature enough at this point.

The decision is to entirely do with - the kind of the data you have and identifying the data by a primary key amenable to your application, which is all hbase in its rudimentary form needs.

On the other hand - if having a schema and defining a primary key for your data seems non-orthogonal for your app - you can stick with HDFS and a custom InputSplit depending on your data. Especially since HBase provides a lot more than HDFS in terms of scanning / row id ordering and if these features are not necessary for what you do - then storing data in HDFS should be just about ok.




On 1/28/10 6:20 PM, Otis Gospodnetic wrote:
I asked a similar question recently:
http://search-hadoop.com/[email protected]||hbase%20mapreduce%20otis%20TableInputFormat


Otis



----- Original Message ----
From: "[email protected]"<[email protected]>
To: [email protected]
Sent: Thu, January 28, 2010 8:02:39 PM
Subject: Hbase as Map/Reduce source

Hi,

I want to understand clearly about Hbase as Map/Reduce source.
Basicly, if a table with 100 regions, it means 100 map will be started,
right?
What's the difference between hdfs and hbase as a Map/Reduce source?
Thanks




Fleming Chiu(邱宏明)
707-6128
[email protected]
週一無肉日吃素救地球(Meat Free Monday Taiwan)


---------------------------------------------------------------------------
                                                          TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
intended recipient. Any unauthorized review, use or distribution by anyone
other than the intended
recipient is strictly prohibited.  If you are not the intended recipient,
please notify the sender by
replying to this email, and then delete this email and any copies of it
immediately. Thank you.
---------------------------------------------------------------------------

Reply via email to