[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662054#comment-14662054 ] nicu marasoiu commented on HBASE-1512: -- Hi, Do you know if, related to this issue, or generally, is there a solution with HBase coprocessors for: 1. multiple metric columns e.g. group by (d1,..,dn) sum(c1) sum(c2) 2. custom metric columns e.g. group by (d1,..,dn) sum(c1) hyperlogUniq(c2) 3. sharing the components with map-reduce to run the same query for larger inputs Please advise, Nicu Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: Coprocessors Reporter: stack Assignee: Himanshu Vashishtha Fix For: 0.92.0 Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, addendum_1512.txt, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512-9.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027260#comment-13027260 ] Hudson commented on HBASE-1512: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Assignee: Himanshu Vashishtha Fix For: 0.92.0 Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, addendum_1512.txt, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512-9.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024894#comment-13024894 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review536 --- I think its almost there. This patch won't compile (see below for why). I'd be game for applying the next version. This patch has come on a long way. Lets make new issues after applying it for issues found in it (This patch does include a nice set of tests). - Michael On 2011-04-23 16:39:37, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-23 16:39:37) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024893#comment-13024893 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review535 --- I think its almost there. This patch won't compile (see below for why). I'd be game for applying the next version. This patch has come on a long way. Lets make new issues after applying it for issues found in it (This patch does include a nice set of tests). /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1116 I agree w/ the review that suggested we spell out 'agg' rather than use the abbreviation, especially in javadoc. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1117 if should be 'where'. Should we throw an exception if multiple families supplied so users are not surprised when they don't get answers for multiple families? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1118 I'd say leave implementation details out of the public javadoc (the bit about calling private methods) /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1119 Does Scan do this test? Internally? (I'm not sure) /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment1120 'should' or 'does'? I think you want to say the latter? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment1121 Why this javadoc? Don't we inherit javadoc from the Interface? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment1122 Whats this? We do nothing on serialization? Is that right? It could be. It just strikes me as a little odd. Maybe put a comment in here to say 'nothing to serialize'? /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment1123 Do we agree that AggregateCpProtocol was not a good name, that rather it should be AggregateProtocol since cp is in the package name? I see you have a AP later in this patch. Let me go look at it. I think I see whats going on... you didn't mean to include this in the patch? /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment1124 Otherwise, this Interface looks good. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment1125 Yeah, this class shouldn't be included either. - Michael On 2011-04-23 16:39:37, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-23 16:39:37) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes.
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024900#comment-13024900 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review537 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment1126 In my version, I have: public interface ColumnInterpreterT extends Serializable { There is no need to extend Writable. - Ted On 2011-04-23 16:39:37, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-23 16:39:37) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024906#comment-13024906 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-25 18:31:17, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java, line 99 bq. https://reviews.apache.org/r/585/diff/3-6/?file=15641#file15641line99 bq. bq. In my version, I have: bq. public interface ColumnInterpreterT extends Serializable { bq. bq. There is no need to extend Writable. bq. bq. Michael Stack wrote: bq. Ok. Then we should remove that from the patch (Good one Ted) Whoops. Sorry, did you say Serializeable Ted as in java.io.serializable? We don't want j.i.s. Thats what Writable replaces. - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review537 --- On 2011-04-23 16:39:37, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-23 16:39:37) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024907#comment-13024907 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review540 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment1129 Yes, but it seems Writable is the standard way to go in Hadoop for these RPCs. No big issue either way. Since it doesn't have any state, there is nothing to serialize here. It seems we can have make this as static util class (?). - Himanshu On 2011-04-23 16:39:37, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-23 16:39:37) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024911#comment-13024911 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review538 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1127 I exchanged emails with Himanshu about supporting multiple column families. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment1131 I prefer using Serializable for the interpreter which is stateless. It is supported by HbaseObjectWritable. - Ted On 2011-04-23 16:39:37, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-23 16:39:37) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024917#comment-13024917 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review542 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1132 ok. now it throws an exception when 1 families are defined. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1133 removed the javadoc related to private method calls /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1134 I didn't want it to propagate to the server just to return an exception. I thought that aggregate functions should work on distinct set of rows, ie, startRow endRow should always be true (except when they are empty). There is a check in HTable- getStartKeysInRange() that throws an exception when startRow endRow, but I needed the boundary condition too. Please let me know if this condition we should remove this condition. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment1135 Yes, its not there in later versions. - Himanshu On 2011-04-23 16:39:37, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-23 16:39:37) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024919#comment-13024919 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-25 18:58:37, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java, line 99 bq. https://reviews.apache.org/r/585/diff/3-6/?file=15641#file15641line99 bq. bq. I prefer using Serializable for the interpreter which is stateless. bq. bq. It is supported by HbaseObjectWritable. bq. My personal opinion is it will not go well with others if one uses Serializable. It is supported in HBaseObjectWritable only for legacy reasons I believe. - Himanshu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review538 --- On 2011-04-23 16:39:37, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-23 16:39:37) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024926#comment-13024926 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review546 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment1140 There are 10 classes which implement Serializable under src/main/java/org/apache/hadoop/hbase/rest/model/ And: public class PairT1, T2 implements Serializable ./src/main/java/org/apache/hadoop/hbase/util/Pair.java public static class CustomSerializable implements Serializable { ./src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java - Ted On 2011-04-23 16:39:37, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-23 16:39:37) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024932#comment-13024932 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-25 19:44:07, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java, line 99 bq. https://reviews.apache.org/r/585/diff/3-6/?file=15641#file15641line99 bq. bq. There are 10 classes which implement Serializable under src/main/java/org/apache/hadoop/hbase/rest/model/ bq. bq. And: bq. public class PairT1, T2 implements Serializable bq. ./src/main/java/org/apache/hadoop/hbase/util/Pair.java bq. public static class CustomSerializable implements Serializable { bq. ./src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java My guess is that for REST, its because of the REST engine we use. For Pair, we should probably change it to be Writable. Same for CustomSerializable. Otherwise, Ted, if you +1 Himanshu's patch, I'll commit it. - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review546 --- On 2011-04-25 19:53:33, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-25 19:53:33) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java PRE-CREATION bq./src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION bq./src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 1088894 bq./src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 bq./src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1088894 bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024933#comment-13024933 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/ --- (Updated 2011-04-25 19:53:33.129920) Review request for hbase and Gary Helmling. Changes --- Changes as per Stack's review. Major changes include: a) LongColumnInterpreter still implements Writable (though with empty read/write methods). b) Exception is thrown in case of more than one family is defined. Summary --- This patch provides reference implementation for aggregate function support through Coprocessor framework. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html Himanshu Vashishtha started the work. I provided some review comments and some of the code. This addresses bug HBASE-1512. https://issues.apache.org/jira/browse/HBASE-1512 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 1088894 /src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 /src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1088894 /src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION Diff: https://reviews.apache.org/r/585/diff Testing --- TestAggFunctions passes. Thanks, Ted Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024934#comment-13024934 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review548 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment1145 I am fine with using Writable. - Ted On 2011-04-25 19:53:33, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-25 19:53:33) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java PRE-CREATION bq./src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION bq./src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 1088894 bq./src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 bq./src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1088894 bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024940#comment-13024940 ] stack commented on HBASE-1512: -- @Himanshu I think you uploaded the wrong patch for v8. Its all about CursorCallable... Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023573#comment-13023573 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/ --- (Updated 2011-04-23 16:39:37.773505) Review request for hbase and Gary Helmling. Changes --- import statement of TestAggregateProtocol is removed in LongColumnInterpreter Summary --- This patch provides reference implementation for aggregate function support through Coprocessor framework. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html Himanshu Vashishtha started the work. I provided some review comments and some of the code. This addresses bug HBASE-1512. https://issues.apache.org/jira/browse/HBASE-1512 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION Diff: https://reviews.apache.org/r/585/diff Testing --- TestAggFunctions passes. Thanks, Ted Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022607#comment-13022607 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-15 21:16:05, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 84 bq. https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84 bq. bq. One family is fine for the moment. FYI, more families can easily be added as client is passing the Scan object as such. It boils down to fetching the value of a cell from the columndescripter's getValue which takes family, qualifier and kv value; And then grouping it with other family values, but again I believe it is not the purpose of this jira == aggregating over multiple families. Please correct me if I am wrong. - himanshu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review490 --- On 2011-04-18 17:14:30, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-18 17:14:30) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021101#comment-13021101 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review493 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1002 done /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1003 done /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1004 done /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment1005 done /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment1006 ok, using only Bytes.toLong now. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment1007 done /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment1008 done /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment1009 removed the repeatition in the doc. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment1010 Column interpreter is more genericised now. It supports HBase cell data type and its promoted data type. For doing these computations, we will use this promoted data type. So, in case a cell value is int, we will be using a long type while computing sum to handle overflow. In case of finding max and min, we will still use the cell data type. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment1011 coprocessor implementation returns a over all sum and row count, so no need to use a double/float in the return type. It is used at the Client side (AggregationClient). /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment1012 done /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment1013 added class java doc. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment1014 refactored it /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment1015 setting start/end rows does it. So, no need of checking now. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment1016 done /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment1017 now, current version returns a PairListS, Long where the list contains the sum and sum of squares and Long is the row count. I can have a more specific object, but it seems it has to be added in the rpc stack (implementing Writable). Please comment if that is _the_ right way. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java https://reviews.apache.org/r/585/#comment1001 You mean to rename the class or just the javadoc (sorry I missed this review statement initially) /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java https://reviews.apache.org/r/585/#comment1018 This class is more genericised now. It defines two parameters type T,Sbq. , where T is the cell value type and S is the promoted data type. S is used for doing arithmetic computations, T is used for finding min, max operation. - himanshu On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021121#comment-13021121 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/ --- (Updated 2011-04-18 17:14:30.424344) Review request for hbase and Gary Helmling. Changes --- Latest update from Himanshu. Summary --- This patch provides reference implementation for aggregate function support through Coprocessor framework. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html Himanshu Vashishtha started the work. I provided some review comments and some of the code. This addresses bug HBASE-1512. https://issues.apache.org/jira/browse/HBASE-1512 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java PRE-CREATION Diff: https://reviews.apache.org/r/585/diff Testing --- TestAggFunctions passes. Thanks, Ted Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020202#comment-13020202 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review469 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment914 sure /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment913 As per the current usage, it is one instance per thread. This method is called from the concrete coprocessor implementation deployed at region level. Though this instance is a singleton, but its a stateless, hence threadsafe. I can change it to AtomicLong if you say so. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment919 As this class implements Writable, it is handled by HBaseObjectWritable such that it writes its full class name onto the stream (and goes while reading it at server side). Since this is a stateless, I don't have anything to read write as such. No need to call super. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment922 Ok. Will do all the formatting changes. - himanshu On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020203#comment-13020203 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-15 06:10:48, himanshu vashishtha wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java, line 66 bq. https://reviews.apache.org/r/585/diff/4/?file=15695#file15695line66 bq. bq. As per the current usage, it is one instance per thread. This method is called from the concrete coprocessor implementation deployed at region level. Though this instance is a singleton, but its a stateless, hence threadsafe. bq. I can change it to AtomicLong if you say so. just to clarify, I meant the CP instance is a singleton (pardon my not so good English). - himanshu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review469 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020209#comment-13020209 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review468 --- A few comments in the below. See what you think. This is close to commit I'd say. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment916 I'd say change the name of this class to AggregateProtocol. Leave off the Cp' since its in the package name already. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment911 'Gives' rather than 'It gives'. Are you repeating yourself i the javadoc here? /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment912 Good /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment917 Call this class AggregateImplementation? It'll implement AggregateProtocol. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment918 Class comment explaining what this class does? /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment923 Why this? When we just made an empty one? And whats the '//' on end of the line. Oh, you did this each time through loop so you only work on one return at a time /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment924 FYI there is an 'equals' in Bytes so you don't have to do compareTo...0 /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment929 hash code is what? Can you print out encodedName? Thats better for identifying regions. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment930 Its nice that this all genericized. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment939 This three part test is used in all methods? Might be big enough to move out to a method (Not important) /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java https://reviews.apache.org/r/585/#comment940 Missing period. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java https://reviews.apache.org/r/585/#comment941 Missing period. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java https://reviews.apache.org/r/585/#comment942 Should you say ColumnInterpreter for AggregateProtocol? /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java https://reviews.apache.org/r/585/#comment943 You should call it TestAggregateProtocol or TestAggregateCoprocessor... it should be name of class under test with a Test prefix. - Michael On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq.
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020219#comment-13020219 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review470 --- Looks like this is coming along nicely. Some overall comments: * A fair amount of naming should be cleaned up. For the client facing methods in AggregationClient, I would go for the simplest names: max() instead of getMaximum(), min() instead of getMinimum(), etc. But that is a matter of personal preference. * Think about providing simpler overloaded versions of the methods. Seven parameters is a lot if you don't always care about all of them. * Look more closely at the parameterization of some of the methods. I'm not sure it's sufficient for getSum(), getAvg(), getStd(), where the return types may differ from the actual column value types. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment920 Don't abbreviate in the javadoc comments. This is part of the end user documentation so you need to spell it all out: agg - aggregation RS - region server cp impls - name the actual coprocessor /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment921 agg - aggregation /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment925 Remove trailing whitespace /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment931 Overload this with some briefer versions? This is a real mouthful if you don't actually need all 7 parameters. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment932 Again add some overloaded simpler versions of this. Do you always need a filter? What about column qualifier? Maybe in most cases you do, just seeing if simplify usage in some cases. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment927 What is a row num? Is this the number of rows? How about using row count instead? It's more consistent with other HBase tools. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment928 name getRowCount() instead? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment934 Remove trailing whitespace /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment935 Maybe a bit more description of the actual usage here. The client needs to pass an instance of this in AggregationClient methods right? Javadoc should make clear it's purpose and how to use it. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment936 Drop the Cp from the name, it's extraneous. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment937 Is it correct that sum should always return the same type as the individual values? If the values are Integers, you would want to return Long, right? Otherwise you risk overflowing the max value. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment938 Is the type correlation correct here as well? Individual values may be Integers, but you may want a double for the average. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment944 Same as with getAvg(), wouldn't you want this to possibly return a different type than the individual column values? Like return a double even if the column values are ints? /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment945 I don't think you need the word Protocol in here. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment946 Can you just set start and end rows on the scanner instead of checking each row? /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment948 If there are no results from the scanner would this return the value from
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020276#comment-13020276 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review474 --- /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment958 I think returning ci.getMinValue() is fine because AggregationClient would further consolidate partial results. If change is really needed, it should be made in AggregationClient. - Ted On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020315#comment-13020315 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-15 11:50:08, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java, line 73 bq. https://reviews.apache.org/r/585/diff/4/?file=15697#file15697line73 bq. bq. I think returning ci.getMinValue() is fine because AggregationClient would further consolidate partial results. bq. If change is really needed, it should be made in AggregationClient. I don't agree. This leaves no way to distinguish between a valid result of Long.MIN_VALUE and _no_ result. What happens for an empty table? I think returning Long.MIN_VALUE (or whatever might be the case) for an empty table is broken. - Gary --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review474 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020377#comment-13020377 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review476 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment960 w.r.t. Gary's comment, we need another boolean flag in MaxCallBack so that we can distinguish whether MaxCallBack.update() has been called. Currently ci.getMinValue() would be returned if there is no qualifying row (possibly due to the effect of Filter). - Ted On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020381#comment-13020381 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-15 18:02:26, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 98 bq. https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line98 bq. bq. w.r.t. Gary's comment, we need another boolean flag in MaxCallBack so that we can distinguish whether MaxCallBack.update() has been called. bq. Currently ci.getMinValue() would be returned if there is no qualifying row (possibly due to the effect of Filter). MaxCallBack.update() will still be called once per region, even if no rows matched. It will just return the initial value that was set. This is why I think the initial value should be null. So when update() is called with null, it can be handled appropriately. In the same vein, I think max in MaxCallBack should initialize to null as well. - Gary --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review476 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020384#comment-13020384 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review478 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment962 The following code would produce NPE: Long l = null; if (l Long.MIN_VALUE) { - Ted On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020385#comment-13020385 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-15 18:18:57, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 98 bq. https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line98 bq. bq. The following code would produce NPE: bq. Long l = null; bq. if (l Long.MIN_VALUE) { bq. Yes, all this code needs to handle nulls. I think that goes without saying. - Gary --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review478 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020395#comment-13020395 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review481 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment965 Of these, I think only two are optional. colQualifier filer. OK? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment966 Agreed it should be initialized null to handle null resultset. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment967 same as the max one above. Yes, in case of a null qualifier, it computes the value for the overall family /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment968 ok /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment969 ok /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment970 Yes, more description should be added. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment971 hmm. the return type can be different. I will make it more generic to have a different return type. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment972 I thought about it. I return a list but I see its not a right one to pass as one element contains the sum and other contains the rowCount. So, it should be like a Pair as you said. I will look into it. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment973 its in the Interface? Shall it be repeated here too? Ok, will do the name refactoring. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment974 ok, will use the equals method. I thought since it is an internal scanner (local to a region), it should not cross out the boundaries while setting start-end rows. Will check it (should also improve performance). /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment975 right. a null is more pertinent here. will do it. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment976 yes, the current one does return min value. But as you said, it will return null now. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment977 I thought about it and then just left it as its only three line of code and a separate call will be kind of over-refactoring. But once I set the start-end row as suggested by Gary, it should become more light. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment978 yes indeed. It occurred to me while I saw Stack's review last night and here you are :). I will do it. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment979 ok. And what if I need to send more than 2 parameters as in case of Standard deviation? /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java https://reviews.apache.org/r/585/#comment980 you mean a pojo with these many attributes. Is there exists such an object that i can reuse (should be rpc compatible-- like implementing writable). /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java https://reviews.apache.org/r/585/#comment981 So yes, will do all the name/space/grammar refactorings as suggested. Thanks a lot to all of you folks for this wonderful review. - himanshu On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020399#comment-13020399 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review483 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment984 I think the startKey and endKey can be optional as well. Basically that means scanning the whole region. - Ted On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020403#comment-13020403 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-15 19:06:58, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 84 bq. https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84 bq. bq. I think the startKey and endKey can be optional as well. bq. Basically that means scanning the whole region. bq. bq. himanshu vashishtha wrote: bq. These start-end keys are used to locate the interested regions. Do you mean whole _table_? If so, it will be like setting HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays. This would be a bigger change, but maybe it would make sense to have the client pass a Scan object? Then you could specify start/end row, time range, multiple column qualifiers, filter? It's starting to look like we're duplicating most of these arguments when there's already a good way of passing them. What do you think? - Gary --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review483 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020405#comment-13020405 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-15 19:06:58, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 84 bq. https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84 bq. bq. I think the startKey and endKey can be optional as well. bq. Basically that means scanning the whole region. bq. bq. himanshu vashishtha wrote: bq. These start-end keys are used to locate the interested regions. Do you mean whole _table_? If so, it will be like setting HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays. bq. bq. Gary Helmling wrote: bq. This would be a bigger change, but maybe it would make sense to have the client pass a Scan object? Then you could specify start/end row, time range, multiple column qualifiers, filter? bq. bq. It's starting to look like we're duplicating most of these arguments when there's already a good way of passing them. What do you think? Yes, am wondering why it didn't occur to me before! As a matter of fact, we are creating a Scan object at region level. So, with passing the Scan object to the Aggregation client, it will call the appropriate HTable method (the existing one), but the CP's method will take the Scan object as a parameter, and let the client have its liberty. But it needs some code changes, like in validation stuff for one. (I was thinking that it was good to go and now there is so much room for improvement. Good stuff). - himanshu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review483 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020414#comment-13020414 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-15 19:06:58, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 84 bq. https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84 bq. bq. I think the startKey and endKey can be optional as well. bq. Basically that means scanning the whole region. bq. bq. himanshu vashishtha wrote: bq. These start-end keys are used to locate the interested regions. Do you mean whole _table_? If so, it will be like setting HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays. bq. bq. Gary Helmling wrote: bq. This would be a bigger change, but maybe it would make sense to have the client pass a Scan object? Then you could specify start/end row, time range, multiple column qualifiers, filter? bq. bq. It's starting to look like we're duplicating most of these arguments when there's already a good way of passing them. What do you think? bq. bq. himanshu vashishtha wrote: bq. Yes, am wondering why it didn't occur to me before! As a matter of fact, we are creating a Scan object at region level. So, with passing the Scan object to the Aggregation client, it will call the appropriate HTable method (the existing one), but the CP's method will take the Scan object as a parameter, and let the client have its liberty. But it needs some code changes, like in validation stuff for one. bq. (I was thinking that it was good to go and now there is so much room for improvement. Good stuff). In continuation of what I earlier said, in the current design we assume that client is interested in one family only. Shall this needs to be change too. I am refactoring these methods to let the client pass a Scan object to the AggregationClient class, but a scan object as such can have multi families in it. Shall we need to change this assumption. I don't see any issue with it as such, but this is something I didn't plan originally and it needs change in test cases. Please comment. - himanshu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review483 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020418#comment-13020418 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review488 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment989 This is the first code review that evolves into a design session in my career - exciting. I think we should relax the initial assumption. - Ted On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020426#comment-13020426 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-15 20:21:01, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 84 bq. https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84 bq. bq. This is the first code review that evolves into a design session in my career - exciting. bq. I think we should relax the initial assumption. I still think that I would go with one family, as the families are quite separate entities as such(HTable design wise), and I don't see any usage of doing aggregates on accumulated column families. If that is what is needed probably suggests some schema design rethinking. The point I raised was that the object we are now riding upon supports multiple families (which is very relevant for scanning a table), but we don't need it as per real usage. So, shall we support or not, this is the point of consideration. Moreover, as the requirements are evolving (and I guess they will continue to do so), it might change again. I am happy as long as it is moving in the right direction. - himanshu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review488 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020431#comment-13020431 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review490 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment993 One family is fine for the moment. - Ted On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020506#comment-13020506 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-15 19:06:58, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 84 bq. https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84 bq. bq. I think the startKey and endKey can be optional as well. bq. Basically that means scanning the whole region. bq. bq. himanshu vashishtha wrote: bq. These start-end keys are used to locate the interested regions. Do you mean whole _table_? If so, it will be like setting HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays. bq. bq. Gary Helmling wrote: bq. This would be a bigger change, but maybe it would make sense to have the client pass a Scan object? Then you could specify start/end row, time range, multiple column qualifiers, filter? bq. bq. It's starting to look like we're duplicating most of these arguments when there's already a good way of passing them. What do you think? bq. bq. himanshu vashishtha wrote: bq. Yes, am wondering why it didn't occur to me before! As a matter of fact, we are creating a Scan object at region level. So, with passing the Scan object to the Aggregation client, it will call the appropriate HTable method (the existing one), but the CP's method will take the Scan object as a parameter, and let the client have its liberty. But it needs some code changes, like in validation stuff for one. bq. (I was thinking that it was good to go and now there is so much room for improvement. Good stuff). bq. bq. himanshu vashishtha wrote: bq. In continuation of what I earlier said, in the current design we assume that client is interested in one family only. Shall this needs to be change too. bq. I am refactoring these methods to let the client pass a Scan object to the AggregationClient class, but a scan object as such can have multi families in it. Shall we need to change this assumption. I don't see any issue with it as such, but this is something I didn't plan originally and it needs change in test cases. Please comment. I refactored a agg method 1512 as per today's review (using scan object plus others) and its working fine (test passes for the method that i change). May be I need to add more boundary conditions to test the scan object. I have some stuff for tonight/tomorrow, so will complete this by tomorrow night or by Sunday. I hope that should be ok(?) - himanshu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review483 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020187#comment-13020187 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-13 08:35:42, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 143 bq. https://reviews.apache.org/r/585/diff/3/?file=15640#file15640line143 bq. bq. This is the type parameter for return value. ok - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review440 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020188#comment-13020188 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- bq. On 2011-04-13 06:23:56, himanshu vashishtha wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java, line 81 bq. https://reviews.apache.org/r/585/diff/3/?file=15640#file15640line81 bq. bq. I use Eclipse formatter (which says it is using Apache's standard, and it is inserting these spaces. I tried to edit the setting to make it work, but couldn't find the way for these extra spaces between doc and arg list. I removed them manually, but want to know the standard approach. None of this kinda of white space is tolerated (well, I'm not too bad about it but others are watching the codebase and will complain loudly if they see these) - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review438 --- On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020190#comment-13020190 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review466 --- This is review of diff between v3 and v4. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment894 Good /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment895 Yeah, method doesn't take a T, it returns it /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java https://reviews.apache.org/r/585/#comment896 This is good. - Michael On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020198#comment-13020198 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review467 --- Here is more. Submitting now in case I lose it. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment897 You have double the class comment here. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment898 Use Bytes.SIZEOF_LONG instead of '8' /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment899 You could autobox and just return Bytes.toLong... I tihnk that'll work (not important). If you are going to use Long, you might use Long.value of because JVM can cache often used Long instances: http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Long.html#valueOf(long) /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment900 Is there one instance of this class per thread? Will it be accessed concurrently? Should the base Long be an AtomicLong? Or we need to synchronize updates on the KV? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment901 I'd name this multiply rather than 'mult' /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment902 You don't need to call through to the super for it to serialize the Writable? And again for the write? /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment903 Say 'Defines' rather than 'It defines' /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment904 Good /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment905 Again here say 'Gives' rather than 'It gives..' /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment906 Good doc. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment907 Whats a region level? Try and have your max doc similar to your min doc with min/max the only diff. Makes it easier on the reader. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment909 Why capital 'Q' on qualifier? /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment910 Return is T? /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java https://reviews.apache.org/r/585/#comment908 Why this empty line? - Michael On 2011-04-13 08:37:14, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-13 08:37:14) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq.
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019227#comment-13019227 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review438 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment855 ill effects of copy-paste. will change. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment854 Removed it. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment852 I use Eclipse formatter (which says it is using Apache's standard, and it is inserting these spaces. I tried to edit the setting to make it work, but couldn't find the way for these extra spaces between doc and arg list. I removed them manually, but want to know the standard approach. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment853 Yes, will do it. Thanks. - himanshu On 2011-04-12 04:41:49, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-12 04:41:49) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019248#comment-13019248 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review440 --- /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment863 This is the type parameter for return value. - Ted On 2011-04-12 04:41:49, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-12 04:41:49) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019250#comment-13019250 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/ --- (Updated 2011-04-13 08:37:14.182698) Review request for hbase and Gary Helmling. Changes --- Himanshu updated the patch according to Stack's suggestions. Summary --- This patch provides reference implementation for aggregate function support through Coprocessor framework. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html Himanshu Vashishtha started the work. I provided some review comments and some of the code. This addresses bug HBASE-1512. https://issues.apache.org/jira/browse/HBASE-1512 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION Diff: https://reviews.apache.org/r/585/diff Testing --- TestAggFunctions passes. Thanks, Ted Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019209#comment-13019209 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review436 --- I read half the patch. Will finish in morning. Comments below. This utility looks really great. Hurry up and finish it! /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment828 Its 2011! /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment829 There is xtra white space here and elsewhere in this block. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment830 should be 'handler' /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment831 Do you want to make this actual javadoc link; e.g. {@link Aggr} Is AggrationClient misspelled? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment832 Is this comment still right? Says 8 byte long (Ted's blog seems to indicate this is not longer the case) /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment833 Nice javadoc. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment834 Why this constructor? We'll have a null conf? Will that be dangerous later? NPEs? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment835 White space /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment836 Looks like this comment is no longer true? The method has been genericized? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment837 Should you reuse the passed configuration else you are making a new COnnection per invocation. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment838 Whats this? The return? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment839 Reuse passed conf? /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment840 Whats this? Xtra white space. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment841 Reuse conf creating HTable. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment842 Whats this? This prob. is in all subsequent methods... the xtra white space too. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java https://reviews.apache.org/r/585/#comment843 This needs to be passed the conf. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java https://reviews.apache.org/r/585/#comment844 2011 - Michael On 2011-04-12 04:41:49, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-12 04:41:49) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq.
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018677#comment-13018677 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/ --- Review request for hbase and Gary Helmling. Summary --- This patch provides reference implementation for aggregate function support through Coprocessor framework. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html Himanshu Vashishtha started the work. I provided some review comments and some of the code. This addresses bug HBASE-1512. https://issues.apache.org/jira/browse/HBASE-1512 Diffs - /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION Diff: https://reviews.apache.org/r/585/diff Testing --- TestAggFunctions passes. Thanks, Ted Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018686#comment-13018686 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/#review429 --- /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java https://reviews.apache.org/r/585/#comment817 While working on a different jira, I saw that I am using wrong (old) key for registering the CP. It was working because in the code that follows this, Agg CP is loaded explicitly (line #102-106). One can update this either using the Region CP specific key: CoprocessorHost.REGION_COPROCESSOR_CONF_KEY, and remove the explicit loading below (and remove the explicit loading code below), OR entirely delete this statement. - himanshu On 2011-04-12 02:28:10, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/585/ bq. --- bq. bq. (Updated 2011-04-12 02:28:10) bq. bq. bq. Review request for hbase and Gary Helmling. bq. bq. bq. Summary bq. --- bq. bq. This patch provides reference implementation for aggregate function support through Coprocessor framework. bq. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. bq. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html bq. bq. Himanshu Vashishtha started the work. I provided some review comments and some of the code. bq. bq. bq. This addresses bug HBASE-1512. bq. https://issues.apache.org/jira/browse/HBASE-1512 bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/585/diff bq. bq. bq. Testing bq. --- bq. bq. TestAggFunctions passes. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018703#comment-13018703 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/ --- (Updated 2011-04-12 04:28:32.453024) Review request for hbase and Gary Helmling. Changes --- Switched to CoprocessorHost.REGION_COPROCESSOR_CONF_KEY and removed the manual loading of CPs. Summary --- This patch provides reference implementation for aggregate function support through Coprocessor framework. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html Himanshu Vashishtha started the work. I provided some review comments and some of the code. This addresses bug HBASE-1512. https://issues.apache.org/jira/browse/HBASE-1512 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION Diff: https://reviews.apache.org/r/585/diff Testing --- TestAggFunctions passes. Thanks, Ted Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018709#comment-13018709 ] jirapos...@reviews.apache.org commented on HBASE-1512: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/585/ --- (Updated 2011-04-12 04:41:49.068986) Review request for hbase and Gary Helmling. Changes --- Switched to CoprocessorHost.REGION_COPROCESSOR_CONF_KEY and removed the manual loading of CPs. Summary --- This patch provides reference implementation for aggregate function support through Coprocessor framework. ColumnInterpreter interface allows client to specify how the value's byte array is interpreted. Some of the thoughts are summarized at http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html Himanshu Vashishtha started the work. I provided some review comments and some of the code. This addresses bug HBASE-1512. https://issues.apache.org/jira/browse/HBASE-1512 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java PRE-CREATION Diff: https://reviews.apache.org/r/585/diff Testing --- TestAggFunctions passes. Thanks, Ted Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016496#comment-13016496 ] Himanshu Vashishtha commented on HBASE-1512: Thanks for review Ted. What I think in the divide method is returning Double.NaN if either of operand is null. Any operation on null should give null. Ok for the name refactoring. I don't have any strong feeling for making a separate class out if it at this point of time, as it doesn't add much on its own. But will do it if you say so. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016519#comment-13016519 ] Ted Yu commented on HBASE-1512: --- I think returning Double.NaN is fine. Normally either of operand being null would lead to NPE. For making a separate class, it would be easier for users to produce other ColumnInterpreter classes based on LongColumnInterpreter. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016585#comment-13016585 ] Ted Yu commented on HBASE-1512: --- Version 4 is awesome. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014733#comment-13014733 ] Ted Yu commented on HBASE-1512: --- In AggregateProtocolImpl, I think the boolean done should be renamed. It actually indicates whether more rows exist after the current one. The following loop condition may confuse someone: {code} } while (done); {code} Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014908#comment-13014908 ] Himanshu Vashishtha commented on HBASE-1512: Thanks for the suggestions Ted. a) Added generics functionality to the AggregationClient. As suggested by Ted, there should be a ColumnInterpreter thing to give the client a chance to describe the cell value type. I made this thing generic, in the sense that now client is supposed to give the column interpreter object along with the agg function calls. AggregationClient has such a implementation where client says that its cell value is a long. Other cell values can be used with a similar approach. b) While client can define the cell value type by implementing ColumnInterpreter,I still think the average and Standard deviation will be a double value. So, I added a wrapper on these methods to support the generic functionality. Please refer to AggreagationClient.getStdParams getAvgParams. Let me know if it is un-intuitive. I think it is right though :) c) Added a filter to each of the agg functions. They are just passed along with the call, and are stuffed in the Scan object at the region level during scanning. In case of row count, if client provides a filter, that one will be used. If neither a filter nor a qualifier is provided, FirstKeyValueFilter is used. d) Added more test cases for testing filter use cases (44 in total :)). e) refactored the done variable as suggested by Ted. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014942#comment-13014942 ] Ted Yu commented on HBASE-1512: --- For LongColumnInterpreter.divide(), if l2 is null, I think we should return Double.NaN I would write: {code} if (l2 == null) return Double.NaN; if (l1 == null) return 0; {code} I think the following method can be named getAvgArgs (argument in place of parameter): {code} private R ListR getAvgParams(final byte[] tableName, {code} But I don't have strong opinion here. getAvgParamsAsArray() of AvgCallBack can be named getAvgParams() because its return type is List. Overall, version 3 is great. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014947#comment-13014947 ] Ted Yu commented on HBASE-1512: --- Also, I think it is time to move LongColumnInterpreter out into its own file under src/main/java/org/apache/hadoop/hbase/client/coprocessor/. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014091#comment-13014091 ] Ted Yu commented on HBASE-1512: --- Pardon me for attaching source files directly. svn diff doesn't recognize the changes I made on top of patch-1512-2.txt Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, patch-1512-2.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013745#comment-13013745 ] Ted Yu commented on HBASE-1512: --- This feature is very useful. Is it possible to pass some class to AggregateProtocolImpl which can interpret the type of value based on colFamily:colQualifier ? I tried adding type parameter (for type of value) to AggregateCpProtocol but encountered various compilation errors. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013820#comment-13013820 ] Ted Yu commented on HBASE-1512: --- I think AggregationClient should have a ctor which accepts Configuration and saves it. Then Configuration can be used to point to a table in remote cluster: {code} HTable table = new HTable(conf, tableName); {code} Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013830#comment-13013830 ] Ted Yu commented on HBASE-1512: --- A 4 byte value can represent float. 8 byte value can represent double. As for the return type, Long, I tried to make AggregateCpProtocol generic but wasn't successful. e.g. AggregateCpProtocolLong.class wouldn't compile. Since AggregateCpProtocol is interface, I cannot instantiate and obtain class afterward. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011750#comment-13011750 ] Himanshu Vashishtha commented on HBASE-1512: Stack: Thanks for the review. I have revamped the patch and also incorporated your suggestions. There were bunch of discrepancies regarding the boundary conditions you mentioned in the previous version, where at the region level there was no knowledge of the exact start/stop rows as given by the user. To achieve this, I modified the agg functions signatures to include start/stop rows at the region level. Following are some key aspects for this version: a) startEow endRow is an essential condition now (other than when one is doing a full table scan, where startRow and endRow both are empty byte array). This helps in handling boundary conditions where the start row provided by the user is start row of a region (the default scanner impl returns null because it is a non-get query). Moreover, it is also aligned with the logic of these functions, where one is finding max, min, row count etc. b) For all computations like avg, sum, max etc, it is assumed the cell value is a long value (8 bytes); if this is not the case, that cell value is skipped from the computation c) For all functions, column family is essential (if it is null, an ioe is returned). For max, min, avg, sum,std, when no column qualifier is provided, I aggregate all the values in that family. So, a sum for such a case is group sum of all CQ's values for one row key. I think it is a right approach. Please advice here. d) Now in case of rowcount, one can use FirstKeyValueFilter for optimisation. But it may give wrong result in case user has also provided a column qualifier. In such a case, the first value returned by the scanner might belong to other qualifier, but the FirstKeyValueFilter will set its flag to skip to next row, but that value is filtered out from the result set. Its overall effect is that row is not counted and scanner moves to the next row. I used this only when there is no column qualifier. ( I confirmed this during my testing, but will be good to have some comments here). d) As suggested, I have added bunch of boundary test cases for each of the six agg functions. Please let me know in case some more are to be added. e) Yes, its the client (here AggregationtClient), that will perform the reduce phase, where individual results from all the target regions are received and accumulated. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927634#action_12927634 ] Mingjie Lai commented on HBASE-1512: Himanshu. The patch looks good. But it doesn't provide the whole picture of the solution. There are still some important questions unanswered for this feature: 1) what's the interface provided to end users? HTableInterface.sum(...), HTableInterface.min/max()? Do we need shell support? 2) how to implement the interface? (by utilizing coprocessor) 3) how to make sure the coprocessor loaded properly if the feature is available. You patch addresses part of (2). And it only provides max() and countRow() implementation. IMO I don't think ProcessResultsFromCP is necessary. It doesn't really provide any convenience for developers to reduce development effort. Thanks. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Reporter: stack Attachments: 1512.zip Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927706#action_12927706 ] Gary Helmling commented on HBASE-1512: -- Thanks for the patch Himanshu! For the scope of the functionality and what sort of aggregation functions you might cover, you might want to start with a comparison of common SQL functions (ex. http://dev.mysql.com/doc/refman/5.5/en/group-by-functions.html). I don't know if you really need to implement all of them, but a good start would probably be: * COUNT * AVG * MIN * MAX * STD * SUM (just my opinion of course). All of these would need some form of server side function, and in some cases the client/server coordination might be a little tricky when you have to span regions. The client side interface for these also has it's own needs. Does it make sense to be able to combine different client side aggregation functions with unmatched server side functions? Would you want to take a client side minimum of the per-region maximum values returned from the row range? As far as I can see, you would mostly want a single client function paired with a given server-side method. I do see that the raw HTable.coprocessorExec() interface is a bit clumsy for these types of operations. You really want to be able to return a single value, not a value per region. But I think you can create some client helper methods to hide that complexity. For the current client classes ProcessResultsFromCP seems to have a lot of overlap with Batch.Callback. The main difference being that HTable.processResultsFromCP() allows you to pass a list of instances (as opposed to a single Batch.Callback). If using a single Callback instance is limiting, we could allow use of a list of Callbacks, or provide a Batch.callbackList() factory method that allows chaining multiple instances together. But for the common cases here, it seems like you'll want a single client side function (min, max, etc) paired with a single server-side invocation (min, max, etc.), so the current Batch.Callback would probably suffice. So as an example on the client side, you could provide a client wrapper in the form: {{{ public class Aggregations { private static class ClientSum implements Batch.CallbackLong { private long sum; public void update(byte[] region, byte[] row, Long value) { sum += value; } public long getValue() { return sum; } } public static long sum(HTable table, byte[] start, byte[] end, byte[] family, byte[] col) { ClientSum sum = new ClientSum(); table.coprocessorExec(AggFunctionProtocol.class, start, end, new Batch.CallAggFunctionProtocol,Long() { public Long call(AggFunctionProtocol instance) { return instance.sum(family, col); } }, sum); return sum.getValue(); } }}} And so on for the other types of operations... Then clients can just call Aggregations.sum() with the right args. There may be better ways to do it, this is just an illustration. :) And, please, if you see ways that HTable.coprocessorExec() can be improved to make this easier, comment on HBASE-2002! Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Reporter: stack Attachments: 1512.zip Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926831#action_12926831 ] Himanshu Vashishtha commented on HBASE-1512: With the 2001 patch, the basic infrastructure required by these functions is available. I wrote a test class to cover some of these, but am confused about their degree of 'generic'-ness. Here, I assumed the user is aware of the table in context and the return types he is getting from the Coprocessor impls, and so the input/output types of these agg operations will also be the same. Therefore he builds agg function classes with those 'types'. I think it is kind of skewed assumption and seeks further clarification. What are the expectations from the 'end interface'? I have attached the new/modified classes (2/1). a) ProcessResultsFromCP: to be implemented by the agg functions (can be part of the Batch class). b) TestAggFunctions: has the test case using the agg functions c) HTable: one method to execute the aggregation functions. There is high probability that I have twisted the desired feature entirely, so please feel free to 'lambaste' the code and its underlying assumptions. PS: I was thinking to make this jira a sub item for jira 2469, but couldn't come up with some thing worth mentioning. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Reporter: stack Attachments: 1512.zip Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.