[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662054#comment-14662054 ]
nicu marasoiu commented on HBASE-1512: -------------------------------------- Hi, Do you know if, related to this issue, or generally, is there a solution with HBase coprocessors for: 1. multiple metric columns e.g. group by (d1,..,dn) sum(c1) sum(c2) 2. custom metric columns e.g. group by (d1,..,dn) sum(c1) hyperlogUniq(c2) 3. sharing the components with map-reduce to run the same query for larger inputs Please advise, Nicu > Coprocessors: Support aggregate functions > ----------------------------------------- > > Key: HBASE-1512 > URL: https://issues.apache.org/jira/browse/HBASE-1512 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors > Reporter: stack > Assignee: Himanshu Vashishtha > Fix For: 0.92.0 > > Attachments: 1512.zip, AggregateCpProtocol.java, > AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, > addendum_1512.txt, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, > patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, > patch-1512-9.txt, patch-1512.txt > > > Chatting with jgray and holstad at the kitchen table about counts, sums, and > other aggregating facility, facility generally where you want to calculate > some meta info on your table, it seems like it wouldn't be too hard making a > filter type that could run a function server-side and return the result ONLY > of the aggregation or whatever. > For example, say you just want to count rows, currently you scan, server > returns all data to client and count is done by client counting up row keys. > A bunch of time and resources have been wasted returning data that we're not > interested in. With this new filter type, the counting would be done > server-side and then it would make up a new result that was the count only > (kinda like mysql when you ask it to count, it returns a 'table' with a count > column whose value is count of rows). We could have it so the count was > just done per region and return that. Or we could maybe make a small change > in scanner too so that it aggregated the per-region counts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)