[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2015-08-07 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662054#comment-14662054
 ] 

nicu marasoiu commented on HBASE-1512:
--

Hi,

Do you know if, related to this issue, or generally, is there a solution with 
HBase coprocessors for:
1. multiple metric columns e.g. group by (d1,..,dn) sum(c1) sum(c2)
2. custom metric columns e.g. group by (d1,..,dn) sum(c1) hyperlogUniq(c2)
3. sharing the components with map-reduce to run the same query for larger 
inputs

Please advise,
Nicu

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Reporter: stack
>Assignee: Himanshu Vashishtha
> Fix For: 0.92.0
>
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> addendum_1512.txt, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, 
> patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, 
> patch-1512-9.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027260#comment-13027260
 ] 

Hudson commented on HBASE-1512:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
>Assignee: Himanshu Vashishtha
> Fix For: 0.92.0
>
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> addendum_1512.txt, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, 
> patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, 
> patch-1512-9.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024940#comment-13024940
 ] 

stack commented on HBASE-1512:
--

@Himanshu I think you uploaded the wrong patch for v8.  Its all about 
CursorCallable...

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024934#comment-13024934
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review548
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


I am fine with using Writable.


- Ted


On 2011-04-25 19:53:33, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-25 19:53:33)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java 
PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024932#comment-13024932
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-25 19:44:07, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 99
bq.  > 
bq.  >
bq.  > There are 10 classes which implement Serializable under 
src/main/java/org/apache/hadoop/hbase/rest/model/
bq.  > 
bq.  > And:
bq.  > public class Pair implements Serializable
bq.  > ./src/main/java/org/apache/hadoop/hbase/util/Pair.java
bq.  >   public static class CustomSerializable implements Serializable {
bq.  > 
./src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java

My guess is that for REST, its because of the REST engine we use.  For Pair, we 
should probably change it to be Writable.  Same for CustomSerializable.

Otherwise, Ted, if you +1 Himanshu's patch, I'll commit it.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review546
---


On 2011-04-25 19:53:33, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-25 19:53:33)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java 
PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 
1088894 
bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024933#comment-13024933
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-25 19:53:33.129920)


Review request for hbase and Gary Helmling.


Changes
---

Changes as per Stack's review.
Major changes include:
a) LongColumnInterpreter still implements Writable (though with empty 
read/write methods).
b) Exception is thrown in case of more than one family is defined.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  /src/main/java/org/apache/hadoop/hbase/client/CursorCallable.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/client/CursorCp.java PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 1088894 
  /src/main/java/org/apache/hadoop/hbase/client/HTable.java 1088894 
  /src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1088894 
  /src/main/java/org/apache/hadoop/hbase/client/Scan.java 1088894 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024926#comment-13024926
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review546
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


There are 10 classes which implement Serializable under 
src/main/java/org/apache/hadoop/hbase/rest/model/

And:
public class Pair implements Serializable
./src/main/java/org/apache/hadoop/hbase/util/Pair.java
  public static class CustomSerializable implements Serializable {
./src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java


- Ted


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024919#comment-13024919
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-25 18:58:37, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 99
bq.  > 
bq.  >
bq.  > I prefer using Serializable for the interpreter which is stateless.
bq.  > 
bq.  > It is supported by HbaseObjectWritable.
bq.  >

My personal opinion is it will not go well with others if one uses 
Serializable. It is supported in HBaseObjectWritable only for legacy reasons I 
believe.


- Himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review538
---


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024917#comment-13024917
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review542
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


ok. now it throws an exception when > 1 families are defined.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


removed the javadoc related to private method calls



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


I didn't want it to propagate to the server just to return an exception. I 
thought that aggregate functions should work on distinct set of rows, ie, 
startRow < endRow should always be true (except when they are empty). 
There is a check in HTable-> getStartKeysInRange() that throws an exception 
when startRow > endRow, but I needed the boundary condition too.
Please let me know if this condition we should remove this condition.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Yes, its not there in later versions.


- Himanshu


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024911#comment-13024911
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review538
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


I exchanged emails with Himanshu about supporting multiple column families.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


I prefer using Serializable for the interpreter which is stateless.

It is supported by HbaseObjectWritable.



- Ted


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024906#comment-13024906
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-25 18:31:17, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 99
bq.  > 
bq.  >
bq.  > In my version, I have:
bq.  > public interface ColumnInterpreter extends Serializable {
bq.  > 
bq.  > There is no need to extend Writable.
bq.  
bq.  Michael Stack wrote:
bq.  Ok.  Then we should remove that from the patch (Good one Ted)

Whoops.  Sorry, did you say Serializeable Ted as in java.io.serializable?  We 
don't want j.i.s.  Thats what Writable replaces.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review537
---


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024907#comment-13024907
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review540
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


Yes, but it seems Writable is the standard way to go in Hadoop for these 
RPCs. No big issue either way.
Since it doesn't have any state, there is nothing to serialize here. It 
seems we can have make this as static util class (?).


- Himanshu


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024904#comment-13024904
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-25 18:31:17, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 99
bq.  > 
bq.  >
bq.  > In my version, I have:
bq.  > public interface ColumnInterpreter extends Serializable {
bq.  > 
bq.  > There is no need to extend Writable.

Ok.  Then we should remove that from the patch (Good one Ted)


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review537
---


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024900#comment-13024900
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review537
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


In my version, I have:
public interface ColumnInterpreter extends Serializable {

There is no need to extend Writable.


- Ted


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024893#comment-13024893
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review535
---


I think its almost there.   This patch won't compile (see below for why).  I'd 
be game for applying the next version.  This patch has come on a long way.  
Lets make new issues after applying it for issues found in it (This patch does 
include a nice set of tests).


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


I agree w/ the review that suggested we spell out 'agg' rather than use the 
abbreviation, especially in javadoc.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


if should be 'where'.  Should we throw an exception if multiple families 
supplied so users are not surprised when they don't get answers for multiple 
families?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


I'd say leave implementation details out of the public javadoc (the bit 
about calling private methods)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Does Scan do this test?  Internally? (I'm not sure)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


'should' or 'does'?  I think you want to say the latter?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


Why this javadoc?  Don't we inherit javadoc from the Interface?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


Whats this?  We do nothing on serialization?  Is that right?  It could be.  
It just strikes me as a little odd.  Maybe put a comment in here to say 
'nothing to serialize'?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Do we agree that AggregateCpProtocol was not a good name, that rather it 
should be AggregateProtocol since cp is in the package name?

I see you have a AP later in this patch.  Let me go look at it.

I think I see whats going on... you didn't mean to include this in the 
patch?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Otherwise, this Interface looks good.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


Yeah, this class shouldn't be included either.


- Michael


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestA

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024894#comment-13024894
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review536
---


I think its almost there.   This patch won't compile (see below for why).  I'd 
be game for applying the next version.  This patch has come on a long way.  
Lets make new issues after applying it for issues found in it (This patch does 
include a nice set of tests).

- Michael


On 2011-04-23 16:39:37, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-23 16:39:37)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-23 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023573#comment-13023573
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-23 16:39:37.773505)


Review request for hbase and Gary Helmling.


Changes
---

import statement of TestAggregateProtocol is removed in LongColumnInterpreter


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512-7.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-20 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022607#comment-13022607
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 21:16:05, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.  > 
bq.  >
bq.  > One family is fine for the moment.

FYI, more families can easily be added as client is passing the Scan object as 
such. It boils down to fetching the value of a cell from the columndescripter's 
getValue which takes family, qualifier and kv value; And then grouping it with 
other family values, but again I believe it is not the purpose of this jira ==> 
aggregating over multiple families. Please correct me if I am wrong. 


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review490
---


On 2011-04-18 17:14:30, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-18 17:14:30)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021121#comment-13021121
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-18 17:14:30.424344)


Review request for hbase and Gary Helmling.


Changes
---

Latest update from Himanshu.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512-6.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-18 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021101#comment-13021101
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review493
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


done



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


done




/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


done



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


done



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


ok, using only Bytes.toLong now.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


done



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


done



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


removed the repeatition in the doc.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Column interpreter is more genericised now. It supports HBase cell data 
type and its promoted data type. For doing these computations, we will use this 
promoted data type. So, in case a cell value is int, we will be using a long 
type while computing sum to handle overflow. In case of finding max and min, we 
will still use the cell data type.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


coprocessor implementation returns a over all sum and row count, so no need 
to use a double/float in the return type. It is used at the Client side 
(AggregationClient).



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


done



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


added class java doc.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


refactored it



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


setting start/end rows does it. So, no need of checking now.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


done



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


now, current version returns a Pair, Long> where the list contains 
the sum and sum of squares and Long is the row count. I can have a more 
specific object, but it seems it has to be added in the rpc stack (implementing 
Writable). Please comment if that is _the_ right way.



/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java


You mean to rename the class or just the javadoc (sorry I missed this 
review statement initially)



/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java


This class is more genericised now. It defines two parameters type https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocess

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020506#comment-13020506
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 19:06:58, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.  > 
bq.  >
bq.  > I think the startKey and endKey can be optional as well.
bq.  > Basically that means scanning the whole region.
bq.  
bq.  himanshu vashishtha wrote:
bq.  These start-end keys are used to locate the interested regions. Do you 
mean whole _table_? If so, it will be like setting 
HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays.
bq.  
bq.  Gary Helmling wrote:
bq.  This would be a bigger change, but maybe it would make sense to have 
the client pass a Scan object?  Then you could specify start/end row, time 
range, multiple column qualifiers, filter?
bq.  
bq.  It's starting to look like we're duplicating most of these arguments 
when there's already a good way of passing them.  What do you think?
bq.  
bq.  himanshu vashishtha wrote:
bq.  Yes, am wondering why it didn't occur to me before! As a matter of 
fact, we are creating a Scan object at region level. So, with passing the Scan 
object to the Aggregation client, it will call the appropriate HTable method 
(the existing one), but the CP's method will take the Scan object as a 
parameter, and let the client have its liberty. But it needs some code changes, 
like in validation stuff for one. 
bq.  (I was thinking that it was good to go and now there is so much room 
for improvement. Good stuff).
bq.  
bq.  himanshu vashishtha wrote:
bq.  In continuation of what I earlier said, in the current design we 
assume that client is interested in one family only. Shall this needs to be 
change too. 
bq.  I am refactoring these methods to let the client pass a Scan object to 
the AggregationClient class, but a scan object as such can have multi families 
in it. Shall we need to change this assumption. I don't see any issue with it 
as such, but this is something I didn't plan originally and it needs change in 
test cases. Please comment.

I refactored a agg method 1512 as per today's review (using scan object plus 
others) and its working fine (test passes for the method that i change). May be 
I need to add more boundary conditions to test the scan object. 
I have some stuff for tonight/tomorrow, so will complete this by tomorrow night 
or by Sunday. I hope that should be ok(?)


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBa

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020431#comment-13020431
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review490
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


One family is fine for the moment.


- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020426#comment-13020426
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 20:21:01, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.  > 
bq.  >
bq.  > This is the first code review that evolves into a design session in 
my career - exciting.
bq.  > I think we should relax the initial assumption.

I still think that I would go with one family, as the families are quite 
separate entities as such(HTable design wise), and I don't see any usage of 
doing aggregates on accumulated column families. If that is what is needed 
probably suggests some schema design rethinking. The point I raised was that 
the object we are now riding upon supports multiple families (which is very 
relevant for scanning a table), but we don't need it as per real usage. So, 
shall we support or not, this is the point of consideration. 
Moreover, as the requirements are evolving (and I guess they will continue to 
do so), it might change again. I am happy as long as it is moving in the right 
direction. 


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review488
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region count

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020418#comment-13020418
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review488
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


This is the first code review that evolves into a design session in my 
career - exciting.
I think we should relax the initial assumption.


- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020414#comment-13020414
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 19:06:58, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.  > 
bq.  >
bq.  > I think the startKey and endKey can be optional as well.
bq.  > Basically that means scanning the whole region.
bq.  
bq.  himanshu vashishtha wrote:
bq.  These start-end keys are used to locate the interested regions. Do you 
mean whole _table_? If so, it will be like setting 
HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays.
bq.  
bq.  Gary Helmling wrote:
bq.  This would be a bigger change, but maybe it would make sense to have 
the client pass a Scan object?  Then you could specify start/end row, time 
range, multiple column qualifiers, filter?
bq.  
bq.  It's starting to look like we're duplicating most of these arguments 
when there's already a good way of passing them.  What do you think?
bq.  
bq.  himanshu vashishtha wrote:
bq.  Yes, am wondering why it didn't occur to me before! As a matter of 
fact, we are creating a Scan object at region level. So, with passing the Scan 
object to the Aggregation client, it will call the appropriate HTable method 
(the existing one), but the CP's method will take the Scan object as a 
parameter, and let the client have its liberty. But it needs some code changes, 
like in validation stuff for one. 
bq.  (I was thinking that it was good to go and now there is so much room 
for improvement. Good stuff).

In continuation of what I earlier said, in the current design we assume that 
client is interested in one family only. Shall this needs to be change too. 
I am refactoring these methods to let the client pass a Scan object to the 
AggregationClient class, but a scan object as such can have multi families in 
it. Shall we need to change this assumption. I don't see any issue with it as 
such, but this is something I didn't plan originally and it needs change in 
test cases. Please comment.


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, 

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020405#comment-13020405
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 19:06:58, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.  > 
bq.  >
bq.  > I think the startKey and endKey can be optional as well.
bq.  > Basically that means scanning the whole region.
bq.  
bq.  himanshu vashishtha wrote:
bq.  These start-end keys are used to locate the interested regions. Do you 
mean whole _table_? If so, it will be like setting 
HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays.
bq.  
bq.  Gary Helmling wrote:
bq.  This would be a bigger change, but maybe it would make sense to have 
the client pass a Scan object?  Then you could specify start/end row, time 
range, multiple column qualifiers, filter?
bq.  
bq.  It's starting to look like we're duplicating most of these arguments 
when there's already a good way of passing them.  What do you think?

Yes, am wondering why it didn't occur to me before! As a matter of fact, we are 
creating a Scan object at region level. So, with passing the Scan object to the 
Aggregation client, it will call the appropriate HTable method (the existing 
one), but the CP's method will take the Scan object as a parameter, and let the 
client have its liberty. But it needs some code changes, like in validation 
stuff for one. 
(I was thinking that it was good to go and now there is so much room for 
improvement. Good stuff).


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this ne

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020403#comment-13020403
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 19:06:58, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.  > 
bq.  >
bq.  > I think the startKey and endKey can be optional as well.
bq.  > Basically that means scanning the whole region.
bq.  
bq.  himanshu vashishtha wrote:
bq.  These start-end keys are used to locate the interested regions. Do you 
mean whole _table_? If so, it will be like setting 
HConstants.START_ROW/STOP_ROW which are essentially empty byte arrays.

This would be a bigger change, but maybe it would make sense to have the client 
pass a Scan object?  Then you could specify start/end row, time range, multiple 
column qualifiers, filter?

It's starting to look like we're duplicating most of these arguments when 
there's already a good way of passing them.  What do you think?


- Gary


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020402#comment-13020402
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 19:06:58, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 84
bq.  > 
bq.  >
bq.  > I think the startKey and endKey can be optional as well.
bq.  > Basically that means scanning the whole region.

These start-end keys are used to locate the interested regions. Do you mean 
whole _table_? If so, it will be like setting HConstants.START_ROW/STOP_ROW 
which are essentially empty byte arrays.


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020399#comment-13020399
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review483
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


I think the startKey and endKey can be optional as well.
Basically that means scanning the whole region.


- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020395#comment-13020395
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review481
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Of these, I think only two are optional. colQualifier & filer. OK?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Agreed it should be initialized null to handle null resultset.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


same as the max one above. Yes, in case of a null qualifier, it computes 
the value for the overall family 



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


ok



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


ok



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


Yes, more description should be added.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


hmm. the return type can be different. I will make it more generic to have 
a different return type.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


I thought about it. I return a list but I see its not a right one to pass 
as one element contains the sum and other contains the rowCount. So, it should 
be like a Pair as you said. I will look into it.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


its in the Interface? Shall it be repeated here too?
Ok, will do the name refactoring.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


ok, will use the equals method.
I thought since it is an internal scanner (local to a region), it should 
not cross out the boundaries while setting start-end rows. Will check it 
(should also improve performance).



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


right. a null is more pertinent here. will do it.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


yes, the current one does return min value. But as you said, it will return 
null now.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


I thought about it and then just left it as its only three line of code and 
a separate call will be kind of over-refactoring. But once I set the start-end 
row as suggested by Gary, it should become more light.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


yes indeed. It occurred to me while I saw Stack's review last night and 
here you are :). I will do it.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


ok. And what if I need to send more than 2 parameters as in case of 
Standard deviation?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


you mean a pojo with these many attributes. Is there exists such an object 
that i can reuse (should be rpc compatible--> like implementing writable).



/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java


So yes, will do all the name/space/grammar refactorings as suggested. 

Thanks a lot to all of you folks for this wonderful review.


- himanshu


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implement

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020385#comment-13020385
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 18:18:57, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 98
bq.  > 
bq.  >
bq.  > The following code would produce NPE:
bq.  >Long l = null;
bq.  >if (l < Long.MIN_VALUE) {
bq.  >

Yes, all this code needs to handle nulls.  I think that goes without saying.


- Gary


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review478
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020384#comment-13020384
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review478
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


The following code would produce NPE:
  Long l = null;
  if (l < Long.MIN_VALUE) {



- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020381#comment-13020381
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 18:02:26, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 98
bq.  > 
bq.  >
bq.  > w.r.t. Gary's comment, we need another boolean flag in MaxCallBack 
so that we can distinguish whether MaxCallBack.update() has been called.
bq.  > Currently ci.getMinValue() would be returned if there is no 
qualifying row (possibly due to the effect of Filter).

MaxCallBack.update() will still be called once per region, even if no rows 
matched.  It will just return the initial value that was set.  This is why I 
think the initial value should be null.  So when update() is called with null, 
it can be handled appropriately.

In the same vein, I think "max" in MaxCallBack should initialize to null as 
well.


- Gary


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review476
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020377#comment-13020377
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review476
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


w.r.t. Gary's comment, we need another boolean flag in MaxCallBack so that 
we can distinguish whether MaxCallBack.update() has been called.
Currently ci.getMinValue() would be returned if there is no qualifying row 
(possibly due to the effect of Filter).


- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020315#comment-13020315
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 11:50:08, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java, 
line 73
bq.  > 
bq.  >
bq.  > I think returning ci.getMinValue() is fine because AggregationClient 
would further consolidate partial results.
bq.  > If change is really needed, it should be made in AggregationClient.

I don't agree.  This leaves no way to distinguish between a valid result of 
Long.MIN_VALUE and _no_ result.  What happens for an empty table?  I think 
returning Long.MIN_VALUE (or whatever might be the case) for an empty table is 
broken.


- Gary


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review474
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020276#comment-13020276
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review474
---



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


I think returning ci.getMinValue() is fine because AggregationClient would 
further consolidate partial results.
If change is really needed, it should be made in AggregationClient.


- Ted


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-15 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020219#comment-13020219
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review470
---


Looks like this is coming along nicely.  Some overall comments:

* A fair amount of naming should be cleaned up.  For the client facing methods 
in AggregationClient, I would go for the simplest names:  max() instead of 
getMaximum(), min() instead of getMinimum(), etc.  But that is a matter of 
personal preference.

* Think about providing simpler overloaded versions of the methods.  Seven 
parameters is a lot if you don't always care about all of them.

* Look more closely at the parameterization of some of the methods.  I'm not 
sure it's sufficient for getSum(), getAvg(), getStd(), where the return types 
may differ from the actual column value types.


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Don't abbreviate in the javadoc comments.  This is part of the end user 
documentation so you need to spell it all out:

agg -> aggregation
RS -> region server
cp impls -> name the actual coprocessor



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


agg -> aggregation



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Remove trailing whitespace




/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Overload this with some briefer versions?  This is a real mouthful if you 
don't actually need all 7 parameters.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Again add some overloaded simpler versions of this.  Do you always need a 
filter?  What about column qualifier?  Maybe in most cases you do, just seeing 
if simplify usage in some cases.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


What is a row num?  Is this the number of rows?  How about using "row 
count" instead?  It's more consistent with other HBase tools.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


name getRowCount() instead?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Remove trailing whitespace



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


Maybe a bit more description of the actual usage here.  The client needs to 
pass an instance of this in AggregationClient methods right?  Javadoc should 
make clear it's purpose and how to use it.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Drop the "Cp" from the name, it's extraneous.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Is it correct that sum should always return the same type as the individual 
values?  If the values are Integers, you would want to return Long, right?  
Otherwise you risk overflowing the max value.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Is the type correlation correct here as well?  Individual values may be 
Integers, but you may want a double for the average.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Same as with getAvg(), wouldn't you want this to possibly return a 
different type than the individual column values?  Like return a double even if 
the column values are ints?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


I don't think you need the word "Protocol" in here.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


Can you just set start and end rows on the scanner instead of checking each 
row?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


If there are no results from the sc

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020209#comment-13020209
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review468
---


A few comments in the below. See what you think.  This is close to commit I'd 
say.


/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


I'd say change the name of this class to AggregateProtocol.  Leave off the 
"Cp' since its in the package name already.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


'Gives' rather than 'It gives'.  Are you repeating yourself i the javadoc 
here?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Good



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


Call this class AggregateImplementation?  It'll implement AggregateProtocol.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


Class comment explaining what this class does?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


Why this?  When we just made an empty one?  And whats the '//' on end of 
the line.

Oh, you did this each time through loop so you only work on one return 
at a time 



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


FYI there is an 'equals' in Bytes so you don't have to do compareTo...0



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


hash code is what?  Can you print out encodedName? Thats better for 
identifying regions.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


Its nice that this all genericized.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java


This three part test is used in all methods?  Might be big enough to move 
out  to a method (Not important)



/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java


Missing period.



/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java


Missing period.



/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java


Should you say ColumnInterpreter for AggregateProtocol?



/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java


You should call it TestAggregateProtocol or TestAggregateCoprocessor... it 
should be name of class under test with a Test prefix.


- Michael


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CR

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020203#comment-13020203
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-15 06:10:48, himanshu vashishtha wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java,
 line 66
bq.  > 
bq.  >
bq.  > As per the current usage, it is one instance per thread. This method 
is called from the concrete coprocessor implementation deployed at region 
level. Though this instance is a singleton, but its a stateless, hence 
threadsafe. 
bq.  > I can change it to AtomicLong if you say so.

just to clarify, I meant the CP instance is a singleton (pardon my not so good 
English).


- himanshu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review469
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020202#comment-13020202
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review469
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


sure



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


As per the current usage, it is one instance per thread. This method is 
called from the concrete coprocessor implementation deployed at region level. 
Though this instance is a singleton, but its a stateless, hence threadsafe. 
I can change it to AtomicLong if you say so.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


As this class implements Writable, it is handled by HBaseObjectWritable 
such that it writes its full class name onto the stream (and goes while reading 
it at server side). Since this is a stateless, I don't have anything to read 
write as such. 
No need to call super.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Ok. Will do all the formatting changes.


- himanshu


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small c

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020198#comment-13020198
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review467
---


Here is more.  Submitting now in case I lose it.


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


You have double the class comment here.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


Use Bytes.SIZEOF_LONG instead of '8'



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


You could autobox and just return Bytes.toLong... I tihnk that'll work (not 
important).  If you are going to use Long, you might use Long.value of because 
JVM can cache often used Long instances: 
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Long.html#valueOf(long)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


Is there one instance of this class per thread?  Will it be accessed 
concurrently?  Should the base Long be an AtomicLong?  Or we need to 
synchronize updates on the KV?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


I'd name this multiply rather than 'mult'



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


You don't need to call through to the super for it to serialize the 
Writable?  And again for the write?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Say 'Defines' rather than 'It defines'



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Good



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Again here say 'Gives' rather than 'It gives..'



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Good doc.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Whats a region level?  Try and have your max doc similar to your min doc 
with min/max the only diff.  Makes it easier on the reader.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Why capital 'Q' on qualifier?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Return is ?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Why this empty line?



- Michael


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/had

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020190#comment-13020190
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review466
---


This is review of diff between v3 and v4.


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Good



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java


Yeah, method doesn't take a , it returns it



/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java


This is good.


- Michael


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020187#comment-13020187
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-13 08:35:42, Ted Yu wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 143
bq.  > 
bq.  >
bq.  > This is the type parameter for return value.

ok


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review440
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-14 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020188#comment-13020188
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--



bq.  On 2011-04-13 06:23:56, himanshu vashishtha wrote:
bq.  > 
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java,
 line 81
bq.  > 
bq.  >
bq.  > I use Eclipse formatter (which says it is using Apache's standard, 
and it is inserting these spaces. I tried to edit the setting to make it work, 
but couldn't find the way for these extra spaces between doc and arg list. I 
removed them manually, but want to know the standard approach.

None of this kinda of white space is tolerated (well, I'm not too bad about it 
but others are watching the codebase and will complain loudly if they see these)


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review438
---


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019250#comment-13019250
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-13 08:37:14.182698)


Review request for hbase and Gary Helmling.


Changes
---

Himanshu updated the patch according to Stack's suggestions.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019248#comment-13019248
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review440
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


This is the type parameter for return value.


- Ted


On 2011-04-12 04:41:49, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 04:41:49)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-12 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019227#comment-13019227
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review438
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


ill effects of copy-paste. will change.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Removed it.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


I use Eclipse formatter (which says it is using Apache's standard, and it 
is inserting these spaces. I tried to edit the setting to make it work, but 
couldn't find the way for these extra spaces between doc and arg list. I 
removed them manually, but want to know the standard approach.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Yes, will do it. Thanks.


- himanshu


On 2011-04-12 04:41:49, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 04:41:49)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-12 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019209#comment-13019209
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review436
---


I read half the patch.  Will finish in morning.  Comments below.  This utility 
looks really great.  Hurry up and finish it!


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Its 2011!



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


There is xtra white space here and elsewhere in this block.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


should be 'handler'



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Do you want to make this actual javadoc link; e.g. {@link Aggr}  Is 
AggrationClient misspelled?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Is this comment still right?  Says 8 byte long (Ted's blog seems to 
indicate this is not longer the case)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Nice javadoc.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Why this constructor?  We'll have a null conf?  Will that be dangerous 
later?  NPEs?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


White space



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Looks like this comment is no longer true?  The method has been genericized?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Should you reuse the passed configuration else you are making a new 
COnnection per invocation.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Whats this?  The return?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Reuse passed conf?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Whats this?  Xtra white space.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Reuse conf creating HTable.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Whats this?  This prob. is in all subsequent methods... the xtra white 
space too.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


This needs to be passed the conf.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


2011


- Michael


On 2011-04-12 04:41:49, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 04:41:49)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018709#comment-13018709
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-12 04:41:49.068986)


Review request for hbase and Gary Helmling.


Changes
---

Switched to CoprocessorHost.REGION_COPROCESSOR_CONF_KEY and removed the manual 
loading of CPs.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018703#comment-13018703
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-12 04:28:32.453024)


Review request for hbase and Gary Helmling.


Changes
---

Switched to CoprocessorHost.REGION_COPROCESSOR_CONF_KEY and removed the manual 
loading of CPs.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018686#comment-13018686
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review429
---



/src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java


While working on a different jira, I saw that I am using wrong (old) key 
for registering the CP. It was working because in the code that follows this, 
Agg CP is loaded explicitly (line #102-106).
One can update this either using the Region CP specific key: 
CoprocessorHost.REGION_COPROCESSOR_CONF_KEY, and remove the explicit loading 
below (and remove the explicit loading code below), OR entirely delete this 
statement. 


- himanshu


On 2011-04-12 02:28:10, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 02:28:10)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-11 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018677#comment-13018677
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

Review request for hbase and Gary Helmling.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016585#comment-13016585
 ] 

Ted Yu commented on HBASE-1512:
---

Version 4 is awesome.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016519#comment-13016519
 ] 

Ted Yu commented on HBASE-1512:
---

I think returning Double.NaN is fine. Normally either of operand being null 
would lead to NPE.
For making a separate class, it would be easier for users to produce other 
ColumnInterpreter classes based on LongColumnInterpreter.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-06 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016496#comment-13016496
 ] 

Himanshu Vashishtha commented on HBASE-1512:


Thanks for review Ted. 

What I think in the divide method is returning Double.NaN if either of operand 
is null. Any operation on null should give null.

Ok for the name refactoring.

I don't have any strong feeling for making a separate class out if it at this 
point of time, as it doesn't add much on its own. But will do it if you say so.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014947#comment-13014947
 ] 

Ted Yu commented on HBASE-1512:
---

Also, I think it is time to move LongColumnInterpreter out into its own file 
under src/main/java/org/apache/hadoop/hbase/client/coprocessor/.


> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014942#comment-13014942
 ] 

Ted Yu commented on HBASE-1512:
---

For LongColumnInterpreter.divide(), if l2 is null, I think we should return 
Double.NaN
I would write:
{code}
  if (l2 == null)
return Double.NaN;
  if (l1 == null)
return 0;
{code}
I think the following method can be named getAvgArgs (argument in place of 
parameter):
{code}
  private  List getAvgParams(final byte[] tableName,
{code}
But I don't have strong opinion here.

getAvgParamsAsArray() of AvgCallBack can be named getAvgParams() because its 
return type is List<>.

Overall, version 3 is great.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014908#comment-13014908
 ] 

Himanshu Vashishtha commented on HBASE-1512:


Thanks for the suggestions Ted.

a) Added generics functionality to the AggregationClient. As suggested by Ted, 
there should be a ColumnInterpreter thing to give the client a chance to 
describe the cell value type. I made this thing generic, in the sense that now 
client is supposed to give the column interpreter object along with the agg 
function calls. AggregationClient has such a implementation where client says 
that its cell value is a long. Other cell values can be used with a similar 
approach.

b) While client can define the cell value type by implementing 
ColumnInterpreter,I still think the average and Standard deviation will be a 
double value. So, I added a wrapper on these methods to support the generic 
functionality. Please refer to AggreagationClient.getStdParams & getAvgParams. 
Let me know if it is "un-intuitive". I think it is right though :)

c) Added a filter to each of the agg functions. They are just passed along with 
the call, and are stuffed in the Scan object at the region level during 
scanning. In case of row count, if client provides a filter, that one will be 
used. If neither a filter nor a qualifier is provided, FirstKeyValueFilter is 
used.

d) Added more test cases for testing filter use cases (44 in total :)). 

e) refactored the "done" variable as suggested by Ted.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014733#comment-13014733
 ] 

Ted Yu commented on HBASE-1512:
---

In AggregateProtocolImpl, I think the boolean done should be renamed. It 
actually indicates whether more rows exist after the current one.
The following loop condition may confuse someone:
{code}
  } while (done);
{code}


> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014564#comment-13014564
 ] 

Ted Yu commented on HBASE-1512:
---

See my thoughts on 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014091#comment-13014091
 ] 

Ted Yu commented on HBASE-1512:
---

Pardon me for attaching source files directly. svn diff doesn't recognize the 
changes I made on top of patch-1512-2.txt

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013830#comment-13013830
 ] 

Ted Yu commented on HBASE-1512:
---

A 4 byte value can represent float. 8 byte value can represent double.

As for the return type, Long, I tried to make AggregateCpProtocol generic but 
wasn't successful.
e.g. AggregateCpProtocol.class wouldn't compile. Since 
AggregateCpProtocol is interface, I cannot instantiate and obtain class 
afterward.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-30 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013826#comment-13013826
 ] 

Himanshu Vashishtha commented on HBASE-1512:


Thanks for reviewing it Ted.

I will add the constructor. 

yes, I was thinking about this dependency of having a long variable for all 
these methods. But flexibility of using any data type (by converting it to byte 
array) for even a specific column family: column qualifier makes it a bit 
tricky to go for a data type argument. I can have varying number of data types 
even for one CF:CQ combination. Rather I was considering the option to have one 
additional check for int type (4 bytes). But that is just me, will be great 
what others say on it.

For adding the type parameter to the AggregateCpProtocol methods, there will be 
dependency with AggregationClient. Did you try adding it there too (apart from 
its impl).


> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013820#comment-13013820
 ] 

Ted Yu commented on HBASE-1512:
---

I think AggregationClient should have a ctor which accepts Configuration and 
saves it.
Then Configuration can be used to point to a table in remote cluster:
{code}
HTable table = new HTable(conf, tableName);
{code}


> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013745#comment-13013745
 ] 

Ted Yu commented on HBASE-1512:
---

This feature is very useful.
Is it possible to pass some class to AggregateProtocolImpl which can interpret 
the type of value based on colFamily:colQualifier ?

I tried adding type parameter (for type of value) to AggregateCpProtocol but 
encountered various compilation errors.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-27 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011750#comment-13011750
 ] 

Himanshu Vashishtha commented on HBASE-1512:


Stack: Thanks for the review. 

I have revamped the patch and also incorporated your suggestions. There were 
bunch of discrepancies regarding the boundary conditions you mentioned in the 
previous version, where at the region level there was no knowledge of the exact 
start/stop rows as given by the user. To achieve this, I modified the agg 
functions signatures to include start/stop rows at the region level.

Following are some key aspects for this version:
a) startEow < endRow is an essential condition now (other than when one is 
doing a full table scan, where startRow and endRow both are empty byte array). 
This helps in handling boundary conditions where the start row provided by the 
user is start row of a region (the default scanner impl returns null because it 
is a non-get query). Moreover, it is also aligned with the logic of these 
functions, where one is finding max, min, row count etc.

b) For all computations like avg, sum, max etc, it is assumed the cell value is 
a long value (8 bytes); if this is not the case, that cell value is skipped 
from the computation

c) For all functions, column family is essential (if it is null, an ioe is 
returned). 
For max, min, avg, sum,std, when no column qualifier is provided, I aggregate 
all the values in that family. So, a sum for such a case is group sum of all 
CQ's values for one row key. I think it is a right approach. Please advice here.

d) Now in case of rowcount, one can use FirstKeyValueFilter for optimisation. 
But it may give wrong result in case user has also provided a column qualifier. 
In such a case, the first value returned by the scanner might belong to other 
qualifier, but the FirstKeyValueFilter will set its flag to skip to next row, 
but that value is filtered out from the result set. Its overall effect is that 
row is not counted and scanner moves to the next row. I used this only when 
there is no column qualifier. ( I confirmed this during my testing, but will be 
good to have some comments here).

d) As suggested, I have added bunch of boundary test cases for each of the six 
agg functions. Please let me know in case some more are to be added.

e) Yes, its the client (here AggregationtClient), that will perform the "reduce 
phase", where individual results from all the target regions are received and 
accumulated.



> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006793#comment-13006793
 ] 

stack commented on HBASE-1512:
--

FYI, lines should be 80 characters or less.

What you want here Himanshu?

{code}
+// sleep here is an ugly hack to allow region transitions to finish
+Thread.sleep(5000);
+for (JVMClusterUtil.RegionServerThread t : 
cluster.getRegionServerThreads()) {
+  for (HRegionInfo r : t.getRegionServer().getOnlineRegions()) {
+t.getRegionServer().getOnlineRegion(r.getRegionName()).
+  getCoprocessorHost().
+  load(AggregateProtocolImpl.class, 
//TestAggFunctions.AggFunctionsCT.class,
+  Coprocessor.Priority.USER);
+
+  }
+}
{code}

I think there is probably better means of waiting on HRS startup than a timer 
(FYI, a delay will always fail up on the Apache build server -- it has a 
special knack for doing the unexpected).

Our convention is spaces around operators.  Not...

{code}
+for(int i=0;i or < long length, then 
something is off and WARN?

This log message is going to drive folks crazy:

{code}
+log.debug("val read in the region is: "+temp);
{code}

In any region of any decent size, there'll almost be a log line per cell?

The below should be inside a finally:

{code}
+  scanner.close();
{code}

Just throw I'd say, don't bother logging:

{code}
+  log.error("Some error occurred. Aborting the computation"+ 
ie.getMessage());
+  throw new IOException("Aborting the Max aggregate computation");
{code}

Be careful w/ your formatting:

{code}
+}while(done);
+scanner.close();
+}catch (IOException ie){ 
{code}

Try to be consistent.

Here is another example:

{code}
+   /**
+* For a given column family and column Qualifier for a table, it gives its 
sum of all its values at the region level.
+*/
+  
+  @Override
{code}

Why a Long object instead of just a long primitive type?

{code}
+Long sum = 0l;
{code}

This is messy here formatting-wise:

{code}
+  KeyValue val = results.get(0);
+  if(val != null) counter++; // TODO: Or shall it only caters to the row, 
and ignore whether a specific column is null or not. 
+// Or is it like a val can't be null. Need 
to look in to all possible values of keyval!
+}while(done);
+}finally{
+  scanner.close();  
+}

{code}

Do you think we'll always be acting on a column only?  What if someone wants to 
act on a whole column family; i.e. no qualifier?


Be careful w/ your white space.  For instance in the interface you have this:

{code}
+  List getAvg(byte[] colFamily, byte[] colQualifier) throws 
IOException;
+  List getStd(byte[] colFamily, byte[] colQualifier) throws 
IOException;
+  
+  
+ 
+}
{code}

Clear those empty lines.

You should put the javadoc in the Interface on the Interface methods, rather 
than out in the implementations.  Thats how its usually done.  The 
implementations inherit the interface javadoc.

Fill out the javadoc in your client, in particular description of the return in 
each case.

So is stuff being averaged, and summed in the client?  Hows that done?

Patch looks cool.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-14 Thread Lianhui Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006777#comment-13006777
 ] 

Lianhui Wang commented on HBASE-1512:
-

Great,this feature can resolve the real-time aggregate functions? now we use 
the redis to do the real-time aggregate in the memory,but when data 
increased,the memory cannot load the data.so we meet the bottleneck. i think 
this feature can give us some hope, or others have any suggestions? thank you.

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2011-03-14 Thread coofucoo zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006385#comment-13006385
 ] 

coofucoo zhang commented on HBASE-1512:
---

Hi guys, Could you tell me when do you plan to add this feature in Hbase? I 
think it is a very good feature. We need it very much. Thank you. 

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>  Components: coprocessors
>Reporter: stack
> Attachments: 1512.zip, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2010-11-02 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927706#action_12927706
 ] 

Gary Helmling commented on HBASE-1512:
--

Thanks for the patch Himanshu!  

For the scope of the functionality and what sort of aggregation functions you 
might cover, you might want to start with a comparison of common SQL functions 
(ex. http://dev.mysql.com/doc/refman/5.5/en/group-by-functions.html).  I don't 
know if you really need to implement all of them, but a good start would 
probably be:

 * COUNT
 * AVG
 * MIN
 * MAX
 * STD
 * SUM

(just my opinion of course).  All of these would need some form of server side 
function, and in some cases the client/server coordination might be a little 
tricky when you have to span regions.

The client side interface for these also has it's own needs.  Does it make 
sense to be able to combine different client side aggregation functions with 
unmatched server side functions?  Would you want to take a client side minimum 
of the per-region maximum values returned from the row range?  As far as I can 
see, you would mostly want a single client function paired with a given 
server-side method.

I do see that the "raw" HTable.coprocessorExec() interface is a bit clumsy for 
these types of operations.  You really want to be able to return a single 
value, not a value per region.  But I think you can create some client helper 
methods to hide that complexity.

For the current client classes ProcessResultsFromCP seems to have a lot of 
overlap with Batch.Callback.  The main difference being that 
HTable.processResultsFromCP() allows you to pass a list of instances (as 
opposed to a single Batch.Callback).  If using a single Callback instance is 
limiting, we could allow use of a list of Callbacks, or provide a 
Batch.callbackList() factory method that allows chaining multiple instances 
together.  But for the common cases here, it seems like you'll want a single 
client side function (min, max, etc) paired with a single server-side 
invocation (min, max, etc.), so the current Batch.Callback would probably 
suffice.

So as an example on the client side, you could provide a client wrapper in the 
form:

{{{
public class Aggregations {
private static class ClientSum implements Batch.Callback {
private long sum;
public void update(byte[] region, byte[] row, Long value) {
sum += value;
}
public long getValue() { return sum; }
}

public static long sum(HTable table, byte[] start, byte[] end, byte[] 
family, byte[] col) {
ClientSum sum = new ClientSum();
table.coprocessorExec(AggFunctionProtocol.class, start, end, 
new Batch.Call() {
public Long call(AggFunctionProtocol instance) {
return instance.sum(family, col);
}
}, sum);
return sum.getValue();
}
}}}

And so on for the other types of operations...  Then clients can just call 
Aggregations.sum() with the right args.

There may be better ways to do it, this is just an illustration. :)

And, please, if you see ways that HTable.coprocessorExec() can be improved to 
make this easier, comment on HBASE-2002!




> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: 1512.zip
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2010-11-02 Thread Mingjie Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927634#action_12927634
 ] 

Mingjie Lai commented on HBASE-1512:


Himanshu.

The patch looks good. But it doesn't provide the whole picture of the solution. 
There are still some important questions unanswered for this feature:

1) what's the interface provided to end users? HTableInterface.sum(...), 
HTableInterface.min/max()? Do we need shell support?

2) how to implement the interface? (by utilizing coprocessor)

3) how to make sure the coprocessor loaded properly if the feature is 
available. 

You patch addresses part of (2). And it only provides max() and countRow() 
implementation. 

IMO I don't think ProcessResultsFromCP is necessary. It doesn't really provide 
any convenience for developers to reduce development effort. 

Thanks. 

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: 1512.zip
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2010-10-31 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926831#action_12926831
 ] 

Himanshu Vashishtha commented on HBASE-1512:


With the 2001 patch, the basic infrastructure required by these functions is 
available. I wrote a test class to cover some of these, but am confused about 
their degree of 'generic'-ness. 

Here, I assumed the user is aware of the table in context and the return types 
he is getting from the Coprocessor impls, and so the input/output types of 
these  agg operations will also be the same. Therefore he builds agg function 
classes with those 'types'. I think it is kind of skewed assumption and seeks 
further clarification. What are the expectations from the 'end interface'? 

I have attached the new/modified classes (2/1). 
a) ProcessResultsFromCP: to be implemented by the agg functions (can be part of 
the Batch class). 
b) TestAggFunctions: has the test case using the agg functions
c) HTable: one method to execute the aggregation functions.

There is high probability that I have twisted the desired feature entirely, so 
please feel free to 'lambaste' the code and its underlying assumptions.

PS: I was thinking to make this jira a sub item for jira 2469, but couldn't 
come up with some thing worth mentioning.


> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
> Attachments: 1512.zip
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2010-08-27 Thread Mingjie Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903275#action_12903275
 ] 

Mingjie Lai commented on HBASE-1512:


Hi Himanshu. 

Right now I'm doing some cleanups for coprocessor. Here is the code: 
http://github.com/mlai/hbase. Please use the branch -- coprocessors_mlai. 

However our current objective is to utilize CP to implement role based access 
control(RBAC) toward 0.90. We only need Coprocessor, RegionObservor, 
CommandType interfaces for this purpose. So I didn't include the MapReduce and 
FilterInterface in the branch (neither for 0.90 I think). 

You can take a look at that branch. It can pass all HBase test cases, but we 
still need to improve it a little for exception handling. 

If you have interests for Mapreduce implementation, you can also refer the 
first patch of HBASE-2001. 

Thanks,
Mingjie 


> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

2010-08-25 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902534#action_12902534
 ] 

Himanshu Vashishtha commented on HBASE-1512:


Hello Andrew,
Can you please tell where should I start digging and changing stuff to make it 
work. I have seen your code under the patch 2001, and would like to work on 
coprocessors, starting from this jira. 
Will fetching codebase from github (as mentioned there) will give me the right 
code base...seems things have changed since then. What is the right way to do 
this.
Thanks,
Himanshu

> Coprocessors: Support aggregate functions
> -
>
> Key: HBASE-1512
> URL: https://issues.apache.org/jira/browse/HBASE-1512
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.