[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

stack (JIRA) Mon, 14 Mar 2011 21:52:54 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006793#comment-13006793
 ]


stack commented on HBASE-1512:
------------------------------

FYI, lines should be 80 characters or less.

What you want here Himanshu?

{code}
+    // sleep here is an ugly hack to allow region transitions to finish
+    Thread.sleep(5000);
+    for (JVMClusterUtil.RegionServerThread t : 
cluster.getRegionServerThreads()) {
+      for (HRegionInfo r : t.getRegionServer().getOnlineRegions()) {
+        t.getRegionServer().getOnlineRegion(r.getRegionName()).
+          getCoprocessorHost().
+          load(AggregateProtocolImpl.class, 
//TestAggFunctions.AggFunctionsCT.class,
+              Coprocessor.Priority.USER);
+        
+      }
+    }
{code}

I think there is probably better means of waiting on HRS startup than a timer 
(FYI, a delay will always fail up on the Apache build server -- it has a 
special knack for doing the unexpected).

Our convention is spaces around operators.  Not...

{code}
+    for(int i=0;i<n;i++) {
{code}

... but

{code}
+    for(int i = 0; i < n; i++) {
{code}

See the rest of the code base for examples.  These are not biggies.  I'm just 
pointing them out.

This CP is very cool.

Would suggest you add more examples to your test. You just test one instance of 
each aggregate.  Shove in some edge cases if you can think of them (Debugging a 
unit tests is a million times better than trying to debug it out on live 
cluster).

Do you think you should be a big more defensive here when scanning?  You assume 
a long:

{code}
+        temp = Bytes.toLong(val.getValue());
{code}

Do you think you should check the cell length?  If > or < long length, then 
something is off and WARN?

This log message is going to drive folks crazy:

{code}
+        log.debug("val read in the region is: "+temp);
{code}

In any region of any decent size, there'll almost be a log line per cell?

The below should be inside a finally:

{code}
+      scanner.close();
{code}

Just throw I'd say, don't bother logging:

{code}
+      log.error("Some error occurred. Aborting the computation"+ 
ie.getMessage());
+      throw new IOException("Aborting the Max aggregate computation");
{code}

Be careful w/ your formatting:

{code}
+    }while(done);
+    scanner.close();
+    }catch (IOException ie){ 
{code}

Try to be consistent.

Here is another example:

{code}
+   /**
+    * For a given column family and column Qualifier for a table, it gives its 
sum of all its values at the region level.
+    */
+  
+  @Override
{code}

Why a Long object instead of just a long primitive type?

{code}
+    Long sum = 0l;
{code}

This is messy here formatting-wise:

{code}
+      KeyValue val = results.get(0);
+      if(val != null) counter++; // TODO: Or shall it only caters to the row, 
and ignore whether a specific column is null or not. 
+                                    // Or is it like a val can't be null. Need 
to look in to all possible values of keyval!
+    }while(done);
+    }finally{
+      scanner.close();  
+    }

{code}

Do you think we'll always be acting on a column only?  What if someone wants to 
act on a whole column family; i.e. no qualifier?


Be careful w/ your white space.  For instance in the interface you have this:

{code}
+  List<LongWritable> getAvg(byte[] colFamily, byte[] colQualifier) throws 
IOException;
+  List<LongWritable> getStd(byte[] colFamily, byte[] colQualifier) throws 
IOException;
+  
+  
+ 
+}
{code}

Clear those empty lines.

You should put the javadoc in the Interface on the Interface methods, rather 
than out in the implementations.  Thats how its usually done.  The 
implementations inherit the interface javadoc.

Fill out the javadoc in your client, in particular description of the return in 
each case.

So is stuff being averaged, and summed in the client?  Hows that done?

Patch looks cool.

> Coprocessors: Support aggregate functions
> -----------------------------------------
>
>                 Key: HBASE-1512
>                 URL: https://issues.apache.org/jira/browse/HBASE-1512
>             Project: HBase
>          Issue Type: Sub-task
>          Components: coprocessors
>            Reporter: stack
>         Attachments: 1512.zip, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and 
> other aggregating facility, facility generally where you want to calculate 
> some meta info on your table, it seems like it wouldn't be too hard making a 
> filter type that could run a function server-side and return the result ONLY 
> of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server 
> returns all data to client and count is done by client counting up row keys.  
> A bunch of time and resources have been wasted returning data that we're not 
> interested in.  With this new filter type, the counting would be done 
> server-side and then it would make up a new result that was the count only 
> (kinda like mysql when you ask it to count, it returns a 'table' with a count 
> column whose value is count of rows).   We could have it so the count was 
> just done per region and return that.  Or we could maybe make a small change 
> in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions

Reply via email to