[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927634#action_12927634
 ] 

Mingjie Lai commented on HBASE-1512:
------------------------------------

Himanshu.

The patch looks good. But it doesn't provide the whole picture of the solution. 
There are still some important questions unanswered for this feature:

1) what's the interface provided to end users? HTableInterface.sum(...), 
HTableInterface.min/max()? Do we need shell support?

2) how to implement the interface? (by utilizing coprocessor)

3) how to make sure the coprocessor loaded properly if the feature is 
available. 

You patch addresses part of (2). And it only provides max() and countRow() 
implementation. 

IMO I don't think ProcessResultsFromCP is necessary. It doesn't really provide 
any convenience for developers to reduce development effort. 

Thanks. 

> Coprocessors: Support aggregate functions
> -----------------------------------------
>
>                 Key: HBASE-1512
>                 URL: https://issues.apache.org/jira/browse/HBASE-1512
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: stack
>         Attachments: 1512.zip
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to