[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

[email protected] (JIRA) Fri, 15 Apr 2011 00:22:53 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020219#comment-13020219
 ]


[email protected] commented on HBASE-1512:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review470
-----------------------------------------------------------


Looks like this is coming along nicely.  Some overall comments:

* A fair amount of naming should be cleaned up.  For the client facing methods 
in AggregationClient, I would go for the simplest names:  max() instead of 
getMaximum(), min() instead of getMinimum(), etc.  But that is a matter of 
personal preference.

* Think about providing simpler overloaded versions of the methods.  Seven 
parameters is a lot if you don't always care about all of them.

* Look more closely at the parameterization of some of the methods.  I'm not 
sure it's sufficient for getSum(), getAvg(), getStd(), where the return types 
may differ from the actual column value types.


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
<https://reviews.apache.org/r/585/#comment920>

    Don't abbreviate in the javadoc comments.  This is part of the end user 
documentation so you need to spell it all out:
    
    agg -> aggregation
    RS -> region server
    cp impls -> name the actual coprocessor



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
<https://reviews.apache.org/r/585/#comment921>

    agg -> aggregation



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
<https://reviews.apache.org/r/585/#comment925>

    Remove trailing whitespace
    



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
<https://reviews.apache.org/r/585/#comment931>

    Overload this with some briefer versions?  This is a real mouthful if you 
don't actually need all 7 parameters.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
<https://reviews.apache.org/r/585/#comment932>

    Again add some overloaded simpler versions of this.  Do you always need a 
filter?  What about column qualifier?  Maybe in most cases you do, just seeing 
if simplify usage in some cases.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
<https://reviews.apache.org/r/585/#comment927>

    What is a row num?  Is this the number of rows?  How about using "row 
count" instead?  It's more consistent with other HBase tools.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
<https://reviews.apache.org/r/585/#comment928>

    name getRowCount() instead?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
<https://reviews.apache.org/r/585/#comment934>

    Remove trailing whitespace



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
<https://reviews.apache.org/r/585/#comment935>

    Maybe a bit more description of the actual usage here.  The client needs to 
pass an instance of this in AggregationClient methods right?  Javadoc should 
make clear it's purpose and how to use it.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
<https://reviews.apache.org/r/585/#comment936>

    Drop the "Cp" from the name, it's extraneous.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
<https://reviews.apache.org/r/585/#comment937>

    Is it correct that sum should always return the same type as the individual 
values?  If the values are Integers, you would want to return Long, right?  
Otherwise you risk overflowing the max value.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
<https://reviews.apache.org/r/585/#comment938>

    Is the type correlation correct here as well?  Individual values may be 
Integers, but you may want a double for the average.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java
<https://reviews.apache.org/r/585/#comment944>

    Same as with getAvg(), wouldn't you want this to possibly return a 
different type than the individual column values?  Like return a double even if 
the column values are ints?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
<https://reviews.apache.org/r/585/#comment945>

    I don't think you need the word "Protocol" in here.



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
<https://reviews.apache.org/r/585/#comment946>

    Can you just set start and end rows on the scanner instead of checking each 
row?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
<https://reviews.apache.org/r/585/#comment948>

    If there are no results from the scanner would this return the value from 
ci.getMinValue()?  Shouldn't it return null instead?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
<https://reviews.apache.org/r/585/#comment947>

    If there are no results from the scanner would this return the value from 
ci.getMaxValue()?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
<https://reviews.apache.org/r/585/#comment949>

    Does this need to be parameterized?  Would we ever want the returned row 
count to not be a Long?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
<https://reviews.apache.org/r/585/#comment950>

    Returning a two entry list here is a bit unclear.  Why not make it a 
Pair<Long,Long> or a specifically typed object?



/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java
<https://reviews.apache.org/r/585/#comment951>

    Again, I would prefer a more clearly typed simple object here instead of a 
list with encoded meanings based on index.


- Gary


On 2011-04-13 08:37:14, Ted Yu wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-04-13 08:37:14)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.      https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.    
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.    
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.    
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.    
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq.    /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Coprocessors: Support aggregate functions
> -----------------------------------------
>
>                 Key: HBASE-1512
>                 URL: https://issues.apache.org/jira/browse/HBASE-1512
>             Project: HBase
>          Issue Type: Sub-task
>          Components: coprocessors
>            Reporter: stack
>         Attachments: 1512.zip, AggregateCpProtocol.java, 
> AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
> patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
> patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

Reply via email to