First, I recommend upgrading to the latest HBase 0.19 release, 0.19.3.

You have a few choices, but in short you want to use filters.

http://hadoop.apache.org/hbase/docs/r0.19.3/api/org/apache/hadoop/hbase/filter/package-summary.html

Specifically, you should look at the RegExpRowFilter:

http://hadoop.apache.org/hbase/docs/r0.19.3/api/org/apache/hadoop/hbase/filter/RegExpRowFilter.html


You could set up the regular expression to only return stuff from the month you want. Inside the MR job you would know every row returned would come from the month in question and would be able to look at the key to determine the agency_id and day.

There's an example in TIFB docs:

http://hadoop.apache.org/hbase/docs/r0.19.3/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html


Best of luck!

JG


Michael Hauck wrote:
Hi,

i'm new to hbase MapReduce and want to do following:

- create daily statistics with sql queries against a sql database - store statistic results in hbase - run daily MapReduce on that results to compute monthly statistics
I stored this data in hbase table 'route_conversion_statistics'.
My keys have the format '<agency_id>_yyyy-MM-dd' like '208_2009-06-08'
My ColumnFamilies are 'looks', 'bookings', 'turnover', 'paxcount'

For example:
The row '208_2009-06-08' has about 30000 column like this:

looks:FRAMUC, value=123
looks:FRALAX, value=456
...
bookings:FRAMUC, value=15
bookings:FRALAX, value=34
...
turnover:FRAMUC, value=1534.34
turnover:FRALAX, value=4574.35
...
paxcount:FRAMUC, value=356
paxcount:FRALAX, value=5676
...

Now i want to create a new row with the corresponding key '208_2009-06'
and put the sum of all columns from '208_2009-06-01' to '208_2009-06-30'

What is the best practice to do this?
How can i scan over this monthly range?


I use hadoop 0.19.1 and HBase 0.19.1

Thanks,
Michael

Reply via email to