[ 
https://issues.apache.org/jira/browse/HBASE-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211053#comment-15211053
 ] 

Paul Wilkinson commented on HBASE-3434:
---------------------------------------

Hey folks, happy to take this on. The current prototype code (based on 
co-processors) is at 
https://github.com/paulmw/hbase-aggregation/tree/master/src/main/java/aggregation/coprocessor

It's a work in progress for sure, but most of the ideas are in there. It 
aggregates data both during flushes and compactions, as well as during gets and 
scans. So counters are implemented simply by adding the co-processor and 
performing puts. It's very much not limited to summation though, as you can 
plug in a custom value aggregation function (by implementing 
https://github.com/paulmw/hbase-aggregation/blob/master/src/main/java/aggregation/coprocessor/ValueAccumulator.java).

The decision on what cells to aggregate is also pluggable - the default is 
versions of the same cell 
(https://github.com/paulmw/hbase-aggregation/blob/master/src/main/java/aggregation/coprocessor/DefaultCellAccumulator.java,
 which implements CellAccumulator) but it's easy to imagine the kind of 
multi-level rollup you often get in time series - keeping 1 minute granularity 
for today, 10 minute granularity for the previous 6 days, hourly beyond that 
etc. So long as those values are all consecutive in KV terms, that's still 
possible in a stateless fashion.

What's missing as yet is a design for how aggregation functions are registered 
- happy to take direction there. It's also possible it could become more 
supported in HBase itself, rather than in client land. Again, happy to take 
direction from folks here. It's certain though that there's a need to retain 
the custom aggregation part of this, rather than just doing a better version of 
counters.

> ability to increment a counter without reading original value from storage
> --------------------------------------------------------------------------
>
>                 Key: HBASE-3434
>                 URL: https://issues.apache.org/jira/browse/HBASE-3434
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, regionserver
>            Reporter: dhruba borthakur
>            Assignee: stack
>              Labels: gsoc2016, mentor
>
> There are a bunch of applications that do read-modify-write operations on 
> HBase constructs, e.g  a counter; The counter value has to be read in from 
> hdfs before it can be incremented.  We have an application where the number 
> of increments on a counter far outnumbers the number of times the counter is 
> used or read. For these type of applications, it will be very beneficial to 
> not have to read in the counter from disk before it can be incremented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to