Re: Any successful story of an HBasecell for 'analytics job' plus 'realtime serving'?

Andrew Purtell Sun, 04 Jul 2010 03:50:34 -0700

> From: Sean Bigdatafun
> What I am thinking of is the following scenario:
> -- 1) I want to store my hourly web traffic into a fact
> table hourly into Table A
> -- 2) I want to invoke map-reduce to generate aggregated
> table like trends/web-usage-summary into Table B
> -- 3) I want to serve end user's query from Table B.


I have successfully done something like this in the past, on an experimental 
cluster. You must adjust the size the cluster from time to time (we started 
with 15, went to 25) and spend time via trial and error with MapReduce 
tasktracker and job spec tuning to insure the scanning query load on the 
cluster does not cause user query latency to fall out of tolerance. 

There has been some recent talk about introducing QoS into HBase RPC: 
https://issues.apache.org/jira/browse/HBASE-2782. Proposed is a narrowly scoped 
issue regarding META. But I could see a RPC QoS scheme with the adjustable 
priorities: 

     META (highest)
     Get
     Put/Delete/Scanner.next

which would minimize as a rule single row query latency at the expense of all 
but system operation, if that is your choice, and possibly do it well enough so 
you don't need to tune beyond setting the RPC QoS priorities. 

Regarding HBase RPC in general, we are going to need to think about supporting 
security features in Hadoop and dynamic runtime method extension for 
coprocessors. Causes one to give HBASE-2182 a hard look 
(https://issues.apache.org/jira/browse/HBASE-2182)

  - Andy

Re: Any successful story of an HBasecell for 'analytics job' plus 'realtime serving'?

Reply via email to