[
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926831#action_12926831
]
Himanshu Vashishtha commented on HBASE-1512:
--------------------------------------------
With the 2001 patch, the basic infrastructure required by these functions is
available. I wrote a test class to cover some of these, but am confused about
their degree of 'generic'-ness.
Here, I assumed the user is aware of the table in context and the return types
he is getting from the Coprocessor impls, and so the input/output types of
these agg operations will also be the same. Therefore he builds agg function
classes with those 'types'. I think it is kind of skewed assumption and seeks
further clarification. What are the expectations from the 'end interface'?
I have attached the new/modified classes (2/1).
a) ProcessResultsFromCP: to be implemented by the agg functions (can be part of
the Batch class).
b) TestAggFunctions: has the test case using the agg functions
c) HTable: one method to execute the aggregation functions.
There is high probability that I have twisted the desired feature entirely, so
please feel free to 'lambaste' the code and its underlying assumptions.
PS: I was thinking to make this jira a sub item for jira 2469, but couldn't
come up with some thing worth mentioning.
> Coprocessors: Support aggregate functions
> -----------------------------------------
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
> Issue Type: Sub-task
> Reporter: stack
> Attachments: 1512.zip
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and
> other aggregating facility, facility generally where you want to calculate
> some meta info on your table, it seems like it wouldn't be too hard making a
> filter type that could run a function server-side and return the result ONLY
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server
> returns all data to client and count is done by client counting up row keys.
> A bunch of time and resources have been wasted returning data that we're not
> interested in. With this new filter type, the counting would be done
> server-side and then it would make up a new result that was the count only
> (kinda like mysql when you ask it to count, it returns a 'table' with a count
> column whose value is count of rows). We could have it so the count was
> just done per region and return that. Or we could maybe make a small change
> in scanner too so that it aggregated the per-region counts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.