What about if I want to analyse the data which have update and delete record. In this scenario, hbase is a good M/R source better than hdfs raw file , is it correct?
Fleming Chiu(邱宏明) 707-6128 [email protected] 週一無肉日吃素救地球(Meat Free Monday Taiwan) Kay Kay <kaykay.uni...@gm To: [email protected] ail.com> cc: (bcc: Y_823910/TSMC) Subject: Re: Hbase as Map/Reduce source 2010/01/29 11:05 AM Please respond to hbase-user HDFS is a double-edged sword . Being a raw file system - you can feed it to a Map Reduce program although it might be necessary to define InputSplit-s as appropriate to chop down the input size. OTOH, HBase is structured data ( well - sort of ! ) using a file format on top of HDFS to store the schema and hence comes with predefined InputSplit-s that make it easy to get started on a MapReduce program. From an API simplicity point of view - HBase can get you started relatively faster because of it ( assuming you have your data in hbase). Refer to - http://wiki.apache.org/hadoop/Hbase/MapReduce . Although the wiki says deprecated - in reality - it is suggested to stick with *.mapred.* packages for some time since the underlying .mapreduce.* packages are not mature enough at this point. The decision is to entirely do with - the kind of the data you have and identifying the data by a primary key amenable to your application, which is all hbase in its rudimentary form needs. On the other hand - if having a schema and defining a primary key for your data seems non-orthogonal for your app - you can stick with HDFS and a custom InputSplit depending on your data. Especially since HBase provides a lot more than HDFS in terms of scanning / row id ordering and if these features are not necessary for what you do - then storing data in HDFS should be just about ok. On 1/28/10 6:20 PM, Otis Gospodnetic wrote: > I asked a similar question recently: > http://search-hadoop.com/[email protected]||hbase%20mapreduce%20otis%20TableInputFormat > > > Otis > > > > ----- Original Message ---- > >> From: "[email protected]"<[email protected]> >> To: [email protected] >> Sent: Thu, January 28, 2010 8:02:39 PM >> Subject: Hbase as Map/Reduce source >> >> Hi, >> >> I want to understand clearly about Hbase as Map/Reduce source. >> Basicly, if a table with 100 regions, it means 100 map will be started, >> right? >> What's the difference between hdfs and hbase as a Map/Reduce source? >> Thanks >> >> >> >> >> Fleming Chiu(邱宏明) >> 707-6128 >> [email protected] >> 週一無肉日吃素救地球(Meat Free Monday Taiwan) >> >> >> --------------------------------------------------------------------------- >> TSMC PROPERTY >> This email communication (and any attachments) is proprietary information >> for the sole use of its >> intended recipient. Any unauthorized review, use or distribution by anyone >> other than the intended >> recipient is strictly prohibited. If you are not the intended recipient, >> please notify the sender by >> replying to this email, and then delete this email and any copies of it >> immediately. Thank you. >> --------------------------------------------------------------------------- >> > --------------------------------------------------------------------------- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---------------------------------------------------------------------------
