I have the follow kind of data (a typical store sell record): {product, date,
store_name} --> number
I understand that if I choose the following row key design, I will be able to
quickly GROUP BY store_name.
row key -- product:date:store_name
column name -- number
In other words, I can efficiently achieve the following logic (just a HBase
scan) -- adjacent scan.
1) SELECT SUM(num) FROM sale_history_table where product="hammer" GROUP BY
product
2) SELECT SUM(num) FROM sale_history_table where product="hammer",
date="12/04/2009" GROUP BY product date
However, it's very inefficient to do the following thing because to achieve
this, I basically need to scan the whole session of data that containing
"hammer"
3) SELECT SUM(num) FROM sale_history_table where product="hammer",
store_name="SFO_AIRPORT" GROUP BY product store_name
Can someone give me an advice on what I should design my HBase schema if I
choose to use native Hbase (I am thinking a second table may help case 3, but
have not come up with an idea)?
( I understand Zohmg is good at these kind of problem, but I'd rather choose it
as the last resort)
Thanks,
Sean
The New Busy is not the old busy. Search, chat and e-mail from your inbox. Get
started.
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2