How to do "Group By" in HBase

Sean Thu, 01 Apr 2010 01:46:29 -0700








I have the follow kind of data (a typical store sell record)： {product, date, 
store_name} --> number

I understand that if I choose the following row key design, I will be able to 
quickly GROUP BY store_name. 

row key -- product:date:store_name
column name -- number

In other words, I can efficiently achieve the following logic (just a HBase 
scan) -- adjacent scan. 
1) SELECT SUM(num) FROM sale_history_table where product="hammer" GROUP BY 
product 
2) SELECT SUM(num) FROM sale_history_table where product="hammer", 
date="12/04/2009" GROUP BY product date 

However, it's very inefficient to do the following thing because to achieve 
this, I basically need to scan the whole session of data that containing 
"hammer" 

3) SELECT SUM(num) FROM sale_history_table where product="hammer", 
store_name="SFO_AIRPORT" GROUP BY product store_name 

Can someone give me an advice on what I should design my HBase schema if I 
choose to use native Hbase (I am thinking a second table may help case 3, but 
have not come up with an idea)? 



( I understand Zohmg is good at these kind of problem, but I'd rather choose it 
as the last resort) 

Thanks,
Sean
                                          
The New Busy is not the old busy. Search, chat and e-mail from your inbox. Get 
started.                                           
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2
How to do "Group By" in HBase

Reply via email to