[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2015-08-07 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662054#comment-14662054
 ] 

nicu marasoiu commented on HBASE-1512:
--

Hi,

Do you know if, related to this issue, or generally, is there a solution with 
HBase coprocessors for:
1. multiple metric columns e.g. group by (d1,..,dn) sum(c1) sum(c2)
2. custom metric columns e.g. group by (d1,..,dn) sum(c1) hyperlogUniq(c2)
3. sharing the components with map-reduce to run the same query for larger 
inputs

Please advise,
Nicu

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: Coprocessors
Reporter: stack
Assignee: Himanshu Vashishtha
 Fix For: 0.92.0

 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 addendum_1512.txt, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, 
 patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, 
 patch-1512-9.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027260#comment-13027260
 ] 

Hudson commented on HBASE-1512:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
Assignee: Himanshu Vashishtha
 Fix For: 0.92.0

 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 addendum_1512.txt, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, 
 patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, 
 patch-1512-9.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024894#comment-13024894
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review536
---


I think its almost there.   This patch won't compile (see below for why).  I'd 
be game for applying the next version.  This patch has come on a long way.  
Lets make new issues after applying it for issues found in it (This patch does 
include a nice set of tests).

- Michael


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024893#comment-13024893
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review535
---


I think its almost there.   This patch won't compile (see below for why).  I'd 
be game for applying the next version.  This patch has come on a long way.  
Lets make new issues after applying it for issues found in it (This patch does 
include a nice set of tests).


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1116

I agree w/ the review that suggested we spell out 'agg' rather than use the 
abbreviation, especially in javadoc.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1117

if should be 'where'.  Should we throw an exception if multiple families 
supplied so users are not surprised when they don't get answers for multiple 
families?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1118

I'd say leave implementation details out of the public javadoc (the bit 
about calling private methods)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1119

Does Scan do this test?  Internally? (I'm not sure)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1120

'should' or 'does'?  I think you want to say the latter?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1121

Why this javadoc?  Don't we inherit javadoc from the Interface?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1122

Whats this?  We do nothing on serialization?  Is that right?  It could be.  
It just strikes me as a little odd.  Maybe put a comment in here to say 
'nothing to serialize'?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment1123

Do we agree that AggregateCpProtocol was not a good name, that rather it 
should be AggregateProtocol since cp is in the package name?

I see you have a AP later in this patch.  Let me go look at it.

I think I see whats going on... you didn't mean to include this in the 
patch?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment1124

Otherwise, this Interface looks good.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment1125

Yeah, this class shouldn't be included either.


- Michael


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024900#comment-13024900
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review537
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1126

In my version, I have:
public interface ColumnInterpreterT extends Serializable {

There is no need to extend Writable.


- Ted


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024906#comment-13024906
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-25 18:31:17, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 99
bq.   https://reviews.apache.org/r/585/diff/3-6/?file=15641#file15641line99
bq.  
bq.   In my version, I have:
bq.   public interface ColumnInterpreterT extends Serializable {
bq.   
bq.   There is no need to extend Writable.
bq.  
bq.  Michael Stack wrote:
bq.  Ok.  Then we should remove that from the patch (Good one Ted)

Whoops.  Sorry, did you say Serializeable Ted as in java.io.serializable?  We 
don't want j.i.s.  Thats what Writable replaces.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review537
---


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024907#comment-13024907
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review540
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1129

Yes, but it seems Writable is the standard way to go in Hadoop for these 
RPCs. No big issue either way.
Since it doesn't have any state, there is nothing to serialize here. It 
seems we can have make this as static util class (?).


- Himanshu


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024911#comment-13024911
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review538
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1127

I exchanged emails with Himanshu about supporting multiple column families.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1131

I prefer using Serializable for the interpreter which is stateless.

It is supported by HbaseObjectWritable.



- Ted


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024917#comment-13024917
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review542
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1132

ok. now it throws an exception when  1 families are defined.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1133

removed the javadoc related to private method calls



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1134

I didn't want it to propagate to the server just to return an exception. I 
thought that aggregate functions should work on distinct set of rows, ie, 
startRow  endRow should always be true (except when they are empty). 
There is a check in HTable- getStartKeysInRange() that throws an exception 
when startRow  endRow, but I needed the boundary condition too.
Please let me know if this condition we should remove this condition.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment1135

Yes, its not there in later versions.


- Himanshu


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024919#comment-13024919
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-25 18:58:37, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 99
bq.   https://reviews.apache.org/r/585/diff/3-6/?file=15641#file15641line99
bq.  
bq.   I prefer using Serializable for the interpreter which is stateless.
bq.   
bq.   It is supported by HbaseObjectWritable.
bq.  

My personal opinion is it will not go well with others if one uses 
Serializable. It is supported in HBaseObjectWritable only for legacy reasons I 
believe.


- Himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review538
---


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024926#comment-13024926
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review546
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1140

There are 10 classes which implement Serializable under 
src/main/java/org/apache/hadoop/hbase/rest/model/

And:
public class PairT1, T2 implements Serializable
./src/main/java/org/apache/hadoop/hbase/util/Pair.java
  public static class CustomSerializable implements Serializable {
./src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java


- Ted


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024932#comment-13024932
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-25 19:44:07, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 99
bq.   https://reviews.apache.org/r/585/diff/3-6/?file=15641#file15641line99
bq.  
bq.   There are 10 classes which implement Serializable under 
src/main/java/org/apache/hadoop/hbase/rest/model/
bq.   
bq.   And:
bq.   public class PairT1, T2 implements Serializable
bq.   ./src/main/java/org/apache/hadoop/hbase/util/Pair.java
bq. public static class CustomSerializable implements Serializable {
bq.   
./src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java

My guess is that for REST, its because of the REST engine we use.  For Pair, we 
should probably change it to be Writable.  Same for CustomSerializable.

Otherwise, Ted, if you +1 Himanshu's patch, I'll commit it.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review546
---


On 2011-04-25 19:53:33, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-25 19:53:33)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java 
PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024933#comment-13024933
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-25 19:53:33.129920)


Review request for hbase and Gary Helmling.


Changes
---

Changes as per Stack's review.
Major changes include:
a) LongColumnInterpreter still implements Writable (though with empty 
read/write methods).
b) Exception is thrown in case of more than one family is defined.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  /src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 1088894 
  /src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 
  /src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1088894 
  /src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024934#comment-13024934
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review548
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1145

I am fine with using Writable.


- Ted


On 2011-04-25 19:53:33, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-25 19:53:33)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java 
PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024940#comment-13024940
 ] 

stack commented on HBASE-1512:
--

@Himanshu I think you uploaded the wrong patch for v8.  Its all about 
CursorCallable...

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-23 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023573#comment-13023573
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-23 16:39:37.773505)


Review request for hbase and Gary Helmling.


Changes
---

import statement of TestAggregateProtocol is removed in LongColumnInterpreter


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-20 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022607#comment-13022607
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 21:16:05, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.   https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84
bq.  
bq.   One family is fine for the moment.

FYI, more families can easily be added as client is passing the Scan object as 
such. It boils down to fetching the value of a cell from the columndescripter's 
getValue which takes family, qualifier and kv value; And then grouping it with 
other family values, but again I believe it is not the purpose of this jira == 
aggregating over multiple families. Please correct me if I am wrong. 


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review490
---


On 2011-04-18 17:14:30, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-18 17:14:30)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021101#comment-13021101
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review493
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1002

done



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1003

done




/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1004

done



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment1005

done



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1006

ok, using only Bytes.toLong now.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1007

done



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment1008

done



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment1009

removed the repeatition in the doc.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment1010

Column interpreter is more genericised now. It supports HBase cell data 
type and its promoted data type. For doing these computations, we will use this 
promoted data type. So, in case a cell value is int, we will be using a long 
type while computing sum to handle overflow. In case of finding max and min, we 
will still use the cell data type.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment1011

coprocessor implementation returns a over all sum and row count, so no need 
to use a double/float in the return type. It is used at the Client side 
(AggregationClient).



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment1012

done



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment1013

added class java doc.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment1014

refactored it



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment1015

setting start/end rows does it. So, no need of checking now.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment1016

done



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment1017

now, current version returns a PairListS, Long where the list contains 
the sum and sum of squares and Long is the row count. I can have a more 
specific object, but it seems it has to be added in the rpc stack (implementing 
Writable). Please comment if that is _the_ right way.



/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1001

You mean to rename the class or just the javadoc (sorry I missed this 
review statement initially)



/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java
https://reviews.apache.org/r/585/#comment1018

This class is more genericised now. It defines two parameters type T,Sbq. 
, where T is the cell value type and S is the promoted data type. S is used for 
doing arithmetic computations, T is used for finding min, max operation.


- himanshu


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021121#comment-13021121
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-18 17:14:30.424344)


Review request for hbase and Gary Helmling.


Changes
---

Latest update from Himanshu.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512-6.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020202#comment-13020202
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review469
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment914

sure



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment913

As per the current usage, it is one instance per thread. This method is 
called from the concrete coprocessor implementation deployed at region level. 
Though this instance is a singleton, but its a stateless, hence threadsafe. 
I can change it to AtomicLong if you say so.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment919

As this class implements Writable, it is handled by HBaseObjectWritable 
such that it writes its full class name onto the stream (and goes while reading 
it at server side). Since this is a stateless, I don't have anything to read 
write as such. 
No need to call super.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment922

Ok. Will do all the formatting changes.


- himanshu


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020203#comment-13020203
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 06:10:48, himanshu vashishtha wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 66
bq.   https://reviews.apache.org/r/585/diff/4/?file=15695#file15695line66
bq.  
bq.   As per the current usage, it is one instance per thread. This method 
is called from the concrete coprocessor implementation deployed at region 
level. Though this instance is a singleton, but its a stateless, hence 
threadsafe. 
bq.   I can change it to AtomicLong if you say so.

just to clarify, I meant the CP instance is a singleton (pardon my not so good 
English).


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review469
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020209#comment-13020209
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review468
---


A few comments in the below. See what you think.  This is close to commit I'd 
say.


/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment916

I'd say change the name of this class to AggregateProtocol.  Leave off the 
Cp' since its in the package name already.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment911

'Gives' rather than 'It gives'.  Are you repeating yourself i the javadoc 
here?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment912

Good



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment917

Call this class AggregateImplementation?  It'll implement AggregateProtocol.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment918

Class comment explaining what this class does?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment923

Why this?  When we just made an empty one?  And whats the '//' on end of 
the line.

Oh, you did this each time through loop so you only work on one return 
at a time 



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment924

FYI there is an 'equals' in Bytes so you don't have to do compareTo...0



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment929

hash code is what?  Can you print out encodedName? Thats better for 
identifying regions.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment930

Its nice that this all genericized.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment939

This three part test is used in all methods?  Might be big enough to move 
out  to a method (Not important)



/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java
https://reviews.apache.org/r/585/#comment940

Missing period.



/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java
https://reviews.apache.org/r/585/#comment941

Missing period.



/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java
https://reviews.apache.org/r/585/#comment942

Should you say ColumnInterpreter for AggregateProtocol?



/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java
https://reviews.apache.org/r/585/#comment943

You should call it TestAggregateProtocol or TestAggregateCoprocessor... it 
should be name of class under test with a Test prefix.


- Michael


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020219#comment-13020219
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review470
---


Looks like this is coming along nicely.  Some overall comments:

* A fair amount of naming should be cleaned up.  For the client facing methods 
in AggregationClient, I would go for the simplest names:  max() instead of 
getMaximum(), min() instead of getMinimum(), etc.  But that is a matter of 
personal preference.

* Think about providing simpler overloaded versions of the methods.  Seven 
parameters is a lot if you don't always care about all of them.

* Look more closely at the parameterization of some of the methods.  I'm not 
sure it's sufficient for getSum(), getAvg(), getStd(), where the return types 
may differ from the actual column value types.


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment920

Don't abbreviate in the javadoc comments.  This is part of the end user 
documentation so you need to spell it all out:

agg - aggregation
RS - region server
cp impls - name the actual coprocessor



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment921

agg - aggregation



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment925

Remove trailing whitespace




/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment931

Overload this with some briefer versions?  This is a real mouthful if you 
don't actually need all 7 parameters.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment932

Again add some overloaded simpler versions of this.  Do you always need a 
filter?  What about column qualifier?  Maybe in most cases you do, just seeing 
if simplify usage in some cases.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment927

What is a row num?  Is this the number of rows?  How about using row 
count instead?  It's more consistent with other HBase tools.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment928

name getRowCount() instead?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment934

Remove trailing whitespace



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment935

Maybe a bit more description of the actual usage here.  The client needs to 
pass an instance of this in AggregationClient methods right?  Javadoc should 
make clear it's purpose and how to use it.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment936

Drop the Cp from the name, it's extraneous.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment937

Is it correct that sum should always return the same type as the individual 
values?  If the values are Integers, you would want to return Long, right?  
Otherwise you risk overflowing the max value.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment938

Is the type correlation correct here as well?  Individual values may be 
Integers, but you may want a double for the average.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment944

Same as with getAvg(), wouldn't you want this to possibly return a 
different type than the individual column values?  Like return a double even if 
the column values are ints?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment945

I don't think you need the word Protocol in here.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment946

Can you just set start and end rows on the scanner instead of checking each 
row?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment948

If there are no results from the scanner would this return the value from 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020276#comment-13020276
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review474
---



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment958

I think returning ci.getMinValue() is fine because AggregationClient would 
further consolidate partial results.
If change is really needed, it should be made in AggregationClient.


- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020315#comment-13020315
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 11:50:08, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java, 
line 73
bq.   https://reviews.apache.org/r/585/diff/4/?file=15697#file15697line73
bq.  
bq.   I think returning ci.getMinValue() is fine because AggregationClient 
would further consolidate partial results.
bq.   If change is really needed, it should be made in AggregationClient.

I don't agree.  This leaves no way to distinguish between a valid result of 
Long.MIN_VALUE and _no_ result.  What happens for an empty table?  I think 
returning Long.MIN_VALUE (or whatever might be the case) for an empty table is 
broken.


- Gary


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review474
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020377#comment-13020377
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review476
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment960

w.r.t. Gary's comment, we need another boolean flag in MaxCallBack so that 
we can distinguish whether MaxCallBack.update() has been called.
Currently ci.getMinValue() would be returned if there is no qualifying row 
(possibly due to the effect of Filter).


- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020381#comment-13020381
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 18:02:26, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 98
bq.   https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line98
bq.  
bq.   w.r.t. Gary's comment, we need another boolean flag in MaxCallBack 
so that we can distinguish whether MaxCallBack.update() has been called.
bq.   Currently ci.getMinValue() would be returned if there is no 
qualifying row (possibly due to the effect of Filter).

MaxCallBack.update() will still be called once per region, even if no rows 
matched.  It will just return the initial value that was set.  This is why I 
think the initial value should be null.  So when update() is called with null, 
it can be handled appropriately.

In the same vein, I think max in MaxCallBack should initialize to null as 
well.


- Gary


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review476
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020384#comment-13020384
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review478
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment962

The following code would produce NPE:
  Long l = null;
  if (l  Long.MIN_VALUE) {



- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020385#comment-13020385
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 18:18:57, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 98
bq.   https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line98
bq.  
bq.   The following code would produce NPE:
bq.  Long l = null;
bq.  if (l  Long.MIN_VALUE) {
bq.  

Yes, all this code needs to handle nulls.  I think that goes without saying.


- Gary


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review478
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020395#comment-13020395
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review481
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment965

Of these, I think only two are optional. colQualifier  filer. OK?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment966

Agreed it should be initialized null to handle null resultset.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment967

same as the max one above. Yes, in case of a null qualifier, it computes 
the value for the overall family 



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment968

ok



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment969

ok



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment970

Yes, more description should be added.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment971

hmm. the return type can be different. I will make it more generic to have 
a different return type.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment972

I thought about it. I return a list but I see its not a right one to pass 
as one element contains the sum and other contains the rowCount. So, it should 
be like a Pair as you said. I will look into it.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment973

its in the Interface? Shall it be repeated here too?
Ok, will do the name refactoring.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment974

ok, will use the equals method.
I thought since it is an internal scanner (local to a region), it should 
not cross out the boundaries while setting start-end rows. Will check it 
(should also improve performance).



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment975

right. a null is more pertinent here. will do it.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment976

yes, the current one does return min value. But as you said, it will return 
null now.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment977

I thought about it and then just left it as its only three line of code and 
a separate call will be kind of over-refactoring. But once I set the start-end 
row as suggested by Gary, it should become more light.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment978

yes indeed. It occurred to me while I saw Stack's review last night and 
here you are :). I will do it.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment979

ok. And what if I need to send more than 2 parameters as in case of 
Standard deviation?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
https://reviews.apache.org/r/585/#comment980

you mean a pojo with these many attributes. Is there exists such an object 
that i can reuse (should be rpc compatible-- like implementing writable).



/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java
https://reviews.apache.org/r/585/#comment981

So yes, will do all the name/space/grammar refactorings as suggested. 

Thanks a lot to all of you folks for this wonderful review.


- himanshu


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020399#comment-13020399
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment984

I think the startKey and endKey can be optional as well.
Basically that means scanning the whole region.


- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020403#comment-13020403
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 19:06:58, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.   https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84
bq.  
bq.   I think the startKey and endKey can be optional as well.
bq.   Basically that means scanning the whole region.
bq.  
bq.  himanshu vashishtha wrote:
bq.  These start-end keys are used to locate the interested regions. Do you 
mean whole _table_? If so, it will be like setting 
HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays.

This would be a bigger change, but maybe it would make sense to have the client 
pass a Scan object?  Then you could specify start/end row, time range, multiple 
column qualifiers, filter?

It's starting to look like we're duplicating most of these arguments when 
there's already a good way of passing them.  What do you think?


- Gary


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020405#comment-13020405
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 19:06:58, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.   https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84
bq.  
bq.   I think the startKey and endKey can be optional as well.
bq.   Basically that means scanning the whole region.
bq.  
bq.  himanshu vashishtha wrote:
bq.  These start-end keys are used to locate the interested regions. Do you 
mean whole _table_? If so, it will be like setting 
HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays.
bq.  
bq.  Gary Helmling wrote:
bq.  This would be a bigger change, but maybe it would make sense to have 
the client pass a Scan object?  Then you could specify start/end row, time 
range, multiple column qualifiers, filter?
bq.  
bq.  It's starting to look like we're duplicating most of these arguments 
when there's already a good way of passing them.  What do you think?

Yes, am wondering why it didn't occur to me before! As a matter of fact, we are 
creating a Scan object at region level. So, with passing the Scan object to the 
Aggregation client, it will call the appropriate HTable method (the existing 
one), but the CP's method will take the Scan object as a parameter, and let the 
client have its liberty. But it needs some code changes, like in validation 
stuff for one. 
(I was thinking that it was good to go and now there is so much room for 
improvement. Good stuff).


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020414#comment-13020414
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 19:06:58, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.   https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84
bq.  
bq.   I think the startKey and endKey can be optional as well.
bq.   Basically that means scanning the whole region.
bq.  
bq.  himanshu vashishtha wrote:
bq.  These start-end keys are used to locate the interested regions. Do you 
mean whole _table_? If so, it will be like setting 
HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays.
bq.  
bq.  Gary Helmling wrote:
bq.  This would be a bigger change, but maybe it would make sense to have 
the client pass a Scan object?  Then you could specify start/end row, time 
range, multiple column qualifiers, filter?
bq.  
bq.  It's starting to look like we're duplicating most of these arguments 
when there's already a good way of passing them.  What do you think?
bq.  
bq.  himanshu vashishtha wrote:
bq.  Yes, am wondering why it didn't occur to me before! As a matter of 
fact, we are creating a Scan object at region level. So, with passing the Scan 
object to the Aggregation client, it will call the appropriate HTable method 
(the existing one), but the CP's method will take the Scan object as a 
parameter, and let the client have its liberty. But it needs some code changes, 
like in validation stuff for one. 
bq.  (I was thinking that it was good to go and now there is so much room 
for improvement. Good stuff).

In continuation of what I earlier said, in the current design we assume that 
client is interested in one family only. Shall this needs to be change too. 
I am refactoring these methods to let the client pass a Scan object to the 
AggregationClient class, but a scan object as such can have multi families in 
it. Shall we need to change this assumption. I don't see any issue with it as 
such, but this is something I didn't plan originally and it needs change in 
test cases. Please comment.


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020418#comment-13020418
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review488
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment989

This is the first code review that evolves into a design session in my 
career - exciting.
I think we should relax the initial assumption.


- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020426#comment-13020426
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 20:21:01, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.   https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84
bq.  
bq.   This is the first code review that evolves into a design session in 
my career - exciting.
bq.   I think we should relax the initial assumption.

I still think that I would go with one family, as the families are quite 
separate entities as such(HTable design wise), and I don't see any usage of 
doing aggregates on accumulated column families. If that is what is needed 
probably suggests some schema design rethinking. The point I raised was that 
the object we are now riding upon supports multiple families (which is very 
relevant for scanning a table), but we don't need it as per real usage. So, 
shall we support or not, this is the point of consideration. 
Moreover, as the requirements are evolving (and I guess they will continue to 
do so), it might change again. I am happy as long as it is moving in the right 
direction. 


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review488
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020431#comment-13020431
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review490
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment993

One family is fine for the moment.


- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020506#comment-13020506
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 19:06:58, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.   https://reviews.apache.org/r/585/diff/4/?file=15694#file15694line84
bq.  
bq.   I think the startKey and endKey can be optional as well.
bq.   Basically that means scanning the whole region.
bq.  
bq.  himanshu vashishtha wrote:
bq.  These start-end keys are used to locate the interested regions. Do you 
mean whole _table_? If so, it will be like setting 
HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays.
bq.  
bq.  Gary Helmling wrote:
bq.  This would be a bigger change, but maybe it would make sense to have 
the client pass a Scan object?  Then you could specify start/end row, time 
range, multiple column qualifiers, filter?
bq.  
bq.  It's starting to look like we're duplicating most of these arguments 
when there's already a good way of passing them.  What do you think?
bq.  
bq.  himanshu vashishtha wrote:
bq.  Yes, am wondering why it didn't occur to me before! As a matter of 
fact, we are creating a Scan object at region level. So, with passing the Scan 
object to the Aggregation client, it will call the appropriate HTable method 
(the existing one), but the CP's method will take the Scan object as a 
parameter, and let the client have its liberty. But it needs some code changes, 
like in validation stuff for one. 
bq.  (I was thinking that it was good to go and now there is so much room 
for improvement. Good stuff).
bq.  
bq.  himanshu vashishtha wrote:
bq.  In continuation of what I earlier said, in the current design we 
assume that client is interested in one family only. Shall this needs to be 
change too. 
bq.  I am refactoring these methods to let the client pass a Scan object to 
the AggregationClient class, but a scan object as such can have multi families 
in it. Shall we need to change this assumption. I don't see any issue with it 
as such, but this is something I didn't plan originally and it needs change in 
test cases. Please comment.

I refactored a agg method 1512 as per today's review (using scan object plus 
others) and its working fine (test passes for the method that i change). May be 
I need to add more boundary conditions to test the scan object. 
I have some stuff for tonight/tomorrow, so will complete this by tomorrow night 
or by Sunday. I hope that should be ok(?)


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020187#comment-13020187
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-13 08:35:42, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 143
bq.   https://reviews.apache.org/r/585/diff/3/?file=15640#file15640line143
bq.  
bq.   This is the type parameter for return value.

ok


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review440
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020188#comment-13020188
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-13 06:23:56, himanshu vashishtha wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 81
bq.   https://reviews.apache.org/r/585/diff/3/?file=15640#file15640line81
bq.  
bq.   I use Eclipse formatter (which says it is using Apache's standard, 
and it is inserting these spaces. I tried to edit the setting to make it work, 
but couldn't find the way for these extra spaces between doc and arg list. I 
removed them manually, but want to know the standard approach.

None of this kinda of white space is tolerated (well, I'm not too bad about it 
but others are watching the codebase and will complain loudly if they see these)


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review438
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020190#comment-13020190
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review466
---


This is review of diff between v3 and v4.


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment894

Good



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment895

Yeah, method doesn't take a T, it returns it



/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java
https://reviews.apache.org/r/585/#comment896

This is good.


- Michael


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020198#comment-13020198
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review467
---


Here is more.  Submitting now in case I lose it.


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment897

You have double the class comment here.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment898

Use Bytes.SIZEOF_LONG instead of '8'



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment899

You could autobox and just return Bytes.toLong... I tihnk that'll work (not 
important).  If you are going to use Long, you might use Long.value of because 
JVM can cache often used Long instances: 
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Long.html#valueOf(long)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment900

Is there one instance of this class per thread?  Will it be accessed 
concurrently?  Should the base Long be an AtomicLong?  Or we need to 
synchronize updates on the KV?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment901

I'd name this multiply rather than 'mult'



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment902

You don't need to call through to the super for it to serialize the 
Writable?  And again for the write?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment903

Say 'Defines' rather than 'It defines'



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment904

Good



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment905

Again here say 'Gives' rather than 'It gives..'



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment906

Good doc.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment907

Whats a region level?  Try and have your max doc similar to your min doc 
with min/max the only diff.  Makes it easier on the reader.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment909

Why capital 'Q' on qualifier?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment910

Return is T?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
https://reviews.apache.org/r/585/#comment908

Why this empty line?



- Michael


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019227#comment-13019227
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review438
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment855

ill effects of copy-paste. will change.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment854

Removed it.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment852

I use Eclipse formatter (which says it is using Apache's standard, and it 
is inserting these spaces. I tried to edit the setting to make it work, but 
couldn't find the way for these extra spaces between doc and arg list. I 
removed them manually, but want to know the standard approach.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment853

Yes, will do it. Thanks.


- himanshu


On 2011-04-12 04:41:49, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 04:41:49)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019248#comment-13019248
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review440
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment863

This is the type parameter for return value.


- Ted


On 2011-04-12 04:41:49, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 04:41:49)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019250#comment-13019250
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-13 08:37:14.182698)


Review request for hbase and Gary Helmling.


Changes
---

Himanshu updated the patch according to Stack's suggestions.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-12 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019209#comment-13019209
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review436
---


I read half the patch.  Will finish in morning.  Comments below.  This utility 
looks really great.  Hurry up and finish it!


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment828

Its 2011!



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment829

There is xtra white space here and elsewhere in this block.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment830

should be 'handler'



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment831

Do you want to make this actual javadoc link; e.g. {@link Aggr}  Is 
AggrationClient misspelled?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment832

Is this comment still right?  Says 8 byte long (Ted's blog seems to 
indicate this is not longer the case)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment833

Nice javadoc.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment834

Why this constructor?  We'll have a null conf?  Will that be dangerous 
later?  NPEs?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment835

White space



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment836

Looks like this comment is no longer true?  The method has been genericized?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment837

Should you reuse the passed configuration else you are making a new 
COnnection per invocation.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment838

Whats this?  The return?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment839

Reuse passed conf?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment840

Whats this?  Xtra white space.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment841

Reuse conf creating HTable.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment842

Whats this?  This prob. is in all subsequent methods... the xtra white 
space too.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment843

This needs to be passed the conf.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
https://reviews.apache.org/r/585/#comment844

2011


- Michael


On 2011-04-12 04:41:49, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 04:41:49)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018677#comment-13018677
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

Review request for hbase and Gary Helmling.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018686#comment-13018686
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review429
---



/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java
https://reviews.apache.org/r/585/#comment817

While working on a different jira, I saw that I am using wrong (old) key 
for registering the CP. It was working because in the code that follows this, 
Agg CP is loaded explicitly (line #102-106).
One can update this either using the Region CP specific key: 
CoprocessorHost.REGION_COPROCESSOR_CONF_KEY, and remove the explicit loading 
below (and remove the explicit loading code below), OR entirely delete this 
statement. 


- himanshu


On 2011-04-12 02:28:10, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 02:28:10)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018703#comment-13018703
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-12 04:28:32.453024)


Review request for hbase and Gary Helmling.


Changes
---

Switched to CoprocessorHost.REGION_COPROCESSOR_CONF_KEY and removed the manual 
loading of CPs.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018709#comment-13018709
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-12 04:41:49.068986)


Review request for hbase and Gary Helmling.


Changes
---

Switched to CoprocessorHost.REGION_COPROCESSOR_CONF_KEY and removed the manual 
loading of CPs.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-06 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016496#comment-13016496
 ] 

Himanshu Vashishtha commented on HBASE-1512:


Thanks for review Ted. 

What I think in the divide method is returning Double.NaN if either of operand 
is null. Any operation on null should give null.

Ok for the name refactoring.

I don't have any strong feeling for making a separate class out if it at this 
point of time, as it doesn't add much on its own. But will do it if you say so.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016519#comment-13016519
 ] 

Ted Yu commented on HBASE-1512:
---

I think returning Double.NaN is fine. Normally either of operand being null 
would lead to NPE.
For making a separate class, it would be easier for users to produce other 
ColumnInterpreter classes based on LongColumnInterpreter.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016585#comment-13016585
 ] 

Ted Yu commented on HBASE-1512:
---

Version 4 is awesome.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014733#comment-13014733
 ] 

Ted Yu commented on HBASE-1512:
---

In AggregateProtocolImpl, I think the boolean done should be renamed. It 
actually indicates whether more rows exist after the current one.
The following loop condition may confuse someone:
{code}
  } while (done);
{code}


 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014908#comment-13014908
 ] 

Himanshu Vashishtha commented on HBASE-1512:


Thanks for the suggestions Ted.

a) Added generics functionality to the AggregationClient. As suggested by Ted, 
there should be a ColumnInterpreter thing to give the client a chance to 
describe the cell value type. I made this thing generic, in the sense that now 
client is supposed to give the column interpreter object along with the agg 
function calls. AggregationClient has such a implementation where client says 
that its cell value is a long. Other cell values can be used with a similar 
approach.

b) While client can define the cell value type by implementing 
ColumnInterpreter,I still think the average and Standard deviation will be a 
double value. So, I added a wrapper on these methods to support the generic 
functionality. Please refer to AggreagationClient.getStdParams  getAvgParams. 
Let me know if it is un-intuitive. I think it is right though :)

c) Added a filter to each of the agg functions. They are just passed along with 
the call, and are stuffed in the Scan object at the region level during 
scanning. In case of row count, if client provides a filter, that one will be 
used. If neither a filter nor a qualifier is provided, FirstKeyValueFilter is 
used.

d) Added more test cases for testing filter use cases (44 in total :)). 

e) refactored the done variable as suggested by Ted.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014942#comment-13014942
 ] 

Ted Yu commented on HBASE-1512:
---

For LongColumnInterpreter.divide(), if l2 is null, I think we should return 
Double.NaN
I would write:
{code}
  if (l2 == null)
return Double.NaN;
  if (l1 == null)
return 0;
{code}
I think the following method can be named getAvgArgs (argument in place of 
parameter):
{code}
  private R ListR getAvgParams(final byte[] tableName,
{code}
But I don't have strong opinion here.

getAvgParamsAsArray() of AvgCallBack can be named getAvgParams() because its 
return type is List.

Overall, version 3 is great.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014947#comment-13014947
 ] 

Ted Yu commented on HBASE-1512:
---

Also, I think it is time to move LongColumnInterpreter out into its own file 
under src/main/java/org/apache/hadoop/hbase/client/coprocessor/.


 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014091#comment-13014091
 ] 

Ted Yu commented on HBASE-1512:
---

Pardon me for attaching source files directly. svn diff doesn't recognize the 
changes I made on top of patch-1512-2.txt

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013745#comment-13013745
 ] 

Ted Yu commented on HBASE-1512:
---

This feature is very useful.
Is it possible to pass some class to AggregateProtocolImpl which can interpret 
the type of value based on colFamily:colQualifier ?

I tried adding type parameter (for type of value) to AggregateCpProtocol but 
encountered various compilation errors.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013820#comment-13013820
 ] 

Ted Yu commented on HBASE-1512:
---

I think AggregationClient should have a ctor which accepts Configuration and 
saves it.
Then Configuration can be used to point to a table in remote cluster:
{code}
HTable table = new HTable(conf, tableName);
{code}


 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013830#comment-13013830
 ] 

Ted Yu commented on HBASE-1512:
---

A 4 byte value can represent float. 8 byte value can represent double.

As for the return type, Long, I tried to make AggregateCpProtocol generic but 
wasn't successful.
e.g. AggregateCpProtocolLong.class wouldn't compile. Since 
AggregateCpProtocol is interface, I cannot instantiate and obtain class 
afterward.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-27 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011750#comment-13011750
 ] 

Himanshu Vashishtha commented on HBASE-1512:


Stack: Thanks for the review. 

I have revamped the patch and also incorporated your suggestions. There were 
bunch of discrepancies regarding the boundary conditions you mentioned in the 
previous version, where at the region level there was no knowledge of the exact 
start/stop rows as given by the user. To achieve this, I modified the agg 
functions signatures to include start/stop rows at the region level.

Following are some key aspects for this version:
a) startEow  endRow is an essential condition now (other than when one is 
doing a full table scan, where startRow and endRow both are empty byte array). 
This helps in handling boundary conditions where the start row provided by the 
user is start row of a region (the default scanner impl returns null because it 
is a non-get query). Moreover, it is also aligned with the logic of these 
functions, where one is finding max, min, row count etc.

b) For all computations like avg, sum, max etc, it is assumed the cell value is 
a long value (8 bytes); if this is not the case, that cell value is skipped 
from the computation

c) For all functions, column family is essential (if it is null, an ioe is 
returned). 
For max, min, avg, sum,std, when no column qualifier is provided, I aggregate 
all the values in that family. So, a sum for such a case is group sum of all 
CQ's values for one row key. I think it is a right approach. Please advice here.

d) Now in case of rowcount, one can use FirstKeyValueFilter for optimisation. 
But it may give wrong result in case user has also provided a column qualifier. 
In such a case, the first value returned by the scanner might belong to other 
qualifier, but the FirstKeyValueFilter will set its flag to skip to next row, 
but that value is filtered out from the result set. Its overall effect is that 
row is not counted and scanner moves to the next row. I used this only when 
there is no column qualifier. ( I confirmed this during my testing, but will be 
good to have some comments here).

d) As suggested, I have added bunch of boundary test cases for each of the six 
agg functions. Please let me know in case some more are to be added.

e) Yes, its the client (here AggregationtClient), that will perform the reduce 
phase, where individual results from all the target regions are received and 
accumulated.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2010-11-02 Thread Mingjie Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927634#action_12927634
 ] 

Mingjie Lai commented on HBASE-1512:


Himanshu.

The patch looks good. But it doesn't provide the whole picture of the solution. 
There are still some important questions unanswered for this feature:

1) what's the interface provided to end users? HTableInterface.sum(...), 
HTableInterface.min/max()? Do we need shell support?

2) how to implement the interface? (by utilizing coprocessor)

3) how to make sure the coprocessor loaded properly if the feature is 
available. 

You patch addresses part of (2). And it only provides max() and countRow() 
implementation. 

IMO I don't think ProcessResultsFromCP is necessary. It doesn't really provide 
any convenience for developers to reduce development effort. 

Thanks. 

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
 Attachments: 1512.zip


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2010-11-02 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927706#action_12927706
 ] 

Gary Helmling commented on HBASE-1512:
--

Thanks for the patch Himanshu!  

For the scope of the functionality and what sort of aggregation functions you 
might cover, you might want to start with a comparison of common SQL functions 
(ex. http://dev.mysql.com/doc/refman/5.5/en/group-by-functions.html).  I don't 
know if you really need to implement all of them, but a good start would 
probably be:

 * COUNT
 * AVG
 * MIN
 * MAX
 * STD
 * SUM

(just my opinion of course).  All of these would need some form of server side 
function, and in some cases the client/server coordination might be a little 
tricky when you have to span regions.

The client side interface for these also has it's own needs.  Does it make 
sense to be able to combine different client side aggregation functions with 
unmatched server side functions?  Would you want to take a client side minimum 
of the per-region maximum values returned from the row range?  As far as I can 
see, you would mostly want a single client function paired with a given 
server-side method.

I do see that the raw HTable.coprocessorExec() interface is a bit clumsy for 
these types of operations.  You really want to be able to return a single 
value, not a value per region.  But I think you can create some client helper 
methods to hide that complexity.

For the current client classes ProcessResultsFromCP seems to have a lot of 
overlap with Batch.Callback.  The main difference being that 
HTable.processResultsFromCP() allows you to pass a list of instances (as 
opposed to a single Batch.Callback).  If using a single Callback instance is 
limiting, we could allow use of a list of Callbacks, or provide a 
Batch.callbackList() factory method that allows chaining multiple instances 
together.  But for the common cases here, it seems like you'll want a single 
client side function (min, max, etc) paired with a single server-side 
invocation (min, max, etc.), so the current Batch.Callback would probably 
suffice.

So as an example on the client side, you could provide a client wrapper in the 
form:

{{{
public class Aggregations {
private static class ClientSum implements Batch.CallbackLong {
private long sum;
public void update(byte[] region, byte[] row, Long value) {
sum += value;
}
public long getValue() { return sum; }
}

public static long sum(HTable table, byte[] start, byte[] end, byte[] 
family, byte[] col) {
ClientSum sum = new ClientSum();
table.coprocessorExec(AggFunctionProtocol.class, start, end, 
new Batch.CallAggFunctionProtocol,Long() {
public Long call(AggFunctionProtocol instance) {
return instance.sum(family, col);
}
}, sum);
return sum.getValue();
}
}}}

And so on for the other types of operations...  Then clients can just call 
Aggregations.sum() with the right args.

There may be better ways to do it, this is just an illustration. :)

And, please, if you see ways that HTable.coprocessorExec() can be improved to 
make this easier, comment on HBASE-2002!




 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
 Attachments: 1512.zip


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2010-10-31 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926831#action_12926831
 ] 

Himanshu Vashishtha commented on HBASE-1512:


With the 2001 patch, the basic infrastructure required by these functions is 
available. I wrote a test class to cover some of these, but am confused about 
their degree of 'generic'-ness. 

Here, I assumed the user is aware of the table in context and the return types 
he is getting from the Coprocessor impls, and so the input/output types of 
these  agg operations will also be the same. Therefore he builds agg function 
classes with those 'types'. I think it is kind of skewed assumption and seeks 
further clarification. What are the expectations from the 'end interface'? 

I have attached the new/modified classes (2/1). 
a) ProcessResultsFromCP: to be implemented by the agg functions (can be part of 
the Batch class). 
b) TestAggFunctions: has the test case using the agg functions
c) HTable: one method to execute the aggregation functions.

There is high probability that I have twisted the desired feature entirely, so 
please feel free to 'lambaste' the code and its underlying assumptions.

PS: I was thinking to make this jira a sub item for jira 2469, but couldn't 
come up with some thing worth mentioning.


 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
 Attachments: 1512.zip


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.