> From: Sean Bigdatafun > What I am thinking of is the following scenario: > -- 1) I want to store my hourly web traffic into a fact > table hourly into Table A > -- 2) I want to invoke map-reduce to generate aggregated > table like trends/web-usage-summary into Table B > -- 3) I want to serve end user's query from Table B.
I have successfully done something like this in the past, on an experimental cluster. You must adjust the size the cluster from time to time (we started with 15, went to 25) and spend time via trial and error with MapReduce tasktracker and job spec tuning to insure the scanning query load on the cluster does not cause user query latency to fall out of tolerance. There has been some recent talk about introducing QoS into HBase RPC: https://issues.apache.org/jira/browse/HBASE-2782. Proposed is a narrowly scoped issue regarding META. But I could see a RPC QoS scheme with the adjustable priorities: META (highest) Get Put/Delete/Scanner.next which would minimize as a rule single row query latency at the expense of all but system operation, if that is your choice, and possibly do it well enough so you don't need to tune beyond setting the RPC QoS priorities. Regarding HBase RPC in general, we are going to need to think about supporting security features in Hadoop and dynamic runtime method extension for coprocessors. Causes one to give HBASE-2182 a hard look (https://issues.apache.org/jira/browse/HBASE-2182) - Andy
