[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662054#comment-14662054
 ] 

nicu marasoiu commented on HBASE-1512:
--------------------------------------

Hi,

Do you know if, related to this issue, or generally, is there a solution with 
HBase coprocessors for:
1. multiple metric columns e.g. group by (d1,..,dn) sum(c1) sum(c2)
2. custom metric columns e.g. group by (d1,..,dn) sum(c1) hyperlogUniq(c2)
3. sharing the components with map-reduce to run the same query for larger 
inputs

Please advise,
Nicu

> Coprocessors: Support aggregate functions
> -----------------------------------------
>
>                 Key: HBASE-1512
>                 URL: https://issues.apache.org/jira/browse/HBASE-1512
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Coprocessors
>            Reporter: stack
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.92.0
>
>         Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> addendum_1512.txt, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, 
> patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, 
> patch-1512-9.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to