Re: MapReduce newbie questions

Jonathan Gray Thu, 09 Jul 2009 11:15:49 -0700

First, I recommend upgrading to the latest HBase 0.19 release, 0.19.3.


You have a few choices, but in short you want to use filters.

http://hadoop.apache.org/hbase/docs/r0.19.3/api/org/apache/hadoop/hbase/filter/package-summary.html

Specifically, you should look at the RegExpRowFilter:

http://hadoop.apache.org/hbase/docs/r0.19.3/api/org/apache/hadoop/hbase/filter/RegExpRowFilter.html

You could set up the regular expression to only return stuff from themonth you want. Inside the MR job you would know every row returnedwould come from the month in question and would be able to look at thekey to determine the agency_id and day.


There's an example in TIFB docs:

http://hadoop.apache.org/hbase/docs/r0.19.3/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html


Best of luck!

JG


Michael Hauck wrote:

Hi,

i'm new to hbase MapReduce and want to do following:

- create daily statistics with sql queries against a sql database- store statistic results in hbase- run daily MapReduce on that results to compute monthly statistics

I stored this data in hbase table 'route_conversion_statistics'.
My keys have the format '<agency_id>_yyyy-MM-dd' like '208_2009-06-08'
My ColumnFamilies are 'looks', 'bookings', 'turnover', 'paxcount'

For example:
The row '208_2009-06-08' has about 30000 column like this:

looks:FRAMUC, value=123
looks:FRALAX, value=456
...
bookings:FRAMUC, value=15
bookings:FRALAX, value=34
...
turnover:FRAMUC, value=1534.34
turnover:FRALAX, value=4574.35
...
paxcount:FRAMUC, value=356
paxcount:FRALAX, value=5676
...

Now i want to create a new row with the corresponding key '208_2009-06'

and putthe sum of all columns from '208_2009-06-01' to '208_2009-06-30'


What is the best practice to do this?
How can i scan over this monthly range?


I use hadoop 0.19.1 and HBase 0.19.1

Thanks,
Michael

Re: MapReduce newbie questions

Reply via email to