[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014908#comment-13014908
 ] 

Himanshu Vashishtha commented on HBASE-1512:
--------------------------------------------

Thanks for the suggestions Ted.

a) Added generics functionality to the AggregationClient. As suggested by Ted, 
there should be a ColumnInterpreter thing to give the client a chance to 
describe the cell value type. I made this thing generic, in the sense that now 
client is supposed to give the column interpreter object along with the agg 
function calls. AggregationClient has such a implementation where client says 
that its cell value is a long. Other cell values can be used with a similar 
approach.

b) While client can define the cell value type by implementing 
ColumnInterpreter,I still think the average and Standard deviation will be a 
double value. So, I added a wrapper on these methods to support the generic 
functionality. Please refer to AggreagationClient.getStdParams & getAvgParams. 
Let me know if it is "un-intuitive". I think it is right though :)

c) Added a filter to each of the agg functions. They are just passed along with 
the call, and are stuffed in the Scan object at the region level during 
scanning. In case of row count, if client provides a filter, that one will be 
used. If neither a filter nor a qualifier is provided, FirstKeyValueFilter is 
used.

d) Added more test cases for testing filter use cases (44 in total :)). 

e) refactored the "done" variable as suggested by Ted.

> Coprocessors: Support aggregate functions
> -----------------------------------------
>
>                 Key: HBASE-1512
>                 URL: https://issues.apache.org/jira/browse/HBASE-1512
>             Project: HBase
>          Issue Type: Sub-task
>          Components: coprocessors
>            Reporter: stack
>         Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to