[ 
https://issues.apache.org/jira/browse/PHOENIX-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010103#comment-14010103
 ] 

James Taylor commented on PHOENIX-340:
--------------------------------------

Atomic increment is support now in Phoenix in the form of sequences: 
http://phoenix.incubator.apache.org/sequences.html

When we have support for transactions, we'll have better atomicity guarantees 
as well.

> Support atomic increment
> ------------------------
>
>                 Key: PHOENIX-340
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-340
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: Raymond Liu
>
> At present, If you want to update a specific column and add an increment on 
> itself, you can do that by
>   " UPSERT INTO T1 (id, count) SELECT id, count+1 FROM T1 WHERE id = id1 "
> There are several problems here:
> 1. If id = id1 is not there, it won't be added with base value say the 
> increment 1. 
> 2. It is do not support concurrent update well, multiple thread running it at 
> the same time will lead to error result.
> There are Htable.increment in HBASE which do support atomic increment, the 
> problem is how to surface it to Phoenix.
> There are several way to do this job.
> Per -18 : implement "create sequence", while this only work for global 
> counter usage, and not suitable for embeded in each row for e.g. page visit, 
> link count etc.
> Make UPSERT SELECT support atomic operation. This is the ideal solution, 
> while might involve too much overhead for normal operation which without 
> atomic requirements.  And Hbase only support LONG type for increment. So this 
> won't work for common data type therefore should be limited on the scope.
> Though we can start to invent a new DML , still for easy idea showcase, 
> UPSERT is the most close thing that I can reuse, Thus I have the following 
> tweak on exisiting UPSERT (by adding a INCREASE before VALUES to include the 
> increment feature) e.g.
>     UPSERT INTO TEST(ID, COUNT) INCREASE VALUES('foo',1);
> That could reuse most of the UPSERT VALUES code path, and do not introduce 
> too much extra overhead. When you have INCREASE in the Statement, the value 
> for PRIMARY KEY still works as normal value for seeking the row, while the 
> value for Non primary key will acting as increments.
> I have just made a initial version at here : SHA: 2466ee6 with unit test code 
> for your reference on the usage and issues I mention below.
> Due to the limitation of current phoenix code structure and framework, there 
> are a few problems in this initial version: SHA: 
> 2466ee6a27d12b6c6bb29ba87ece95466e9df98a
> 1 Phoenix treat long/int etc. data type differently with HBASE, say, flip the 
> sign bit. this will lead to incompatible operation on the same value when use 
> hbase ICV to set initial value upon non exist column. 
> 2 UNSIGNED LONG could be used without this initial value problem, however 
> negative value will not be supported.  Not only you can not store negative 
> value in the column, but also you can not pass negative value to the UPSERT 
> INCREASE VALUES statement, it won't pass the grammar checking.
> For this two issue, even you don't solve it for this issue, as long as you 
> want to use increment ( say to implement create sequence ) you had to find a 
> way to overcome it. To make the data type compatible with Hbase. Thus I am 
> wondering, Maybe we could create two type for each of the number data type, 
> say a RAW Version which do not flip the flag, and a flip flag version for use 
> in the PK Column. They could still share one TYPE in DML, say LONG, but when 
> DDL is called, it will change it to the corresponding TYPE and use it in the 
> META TABLE. In this way, user will not need to know the difference. And the 
> code could still deal with them easily without extra logic, maybe even 
> faster, since the normal column's value do not need to go through 
> encoding/decoding to flip flag anymore.
> 3 The current Mutation plan only accept PUT/DELETE and implement it by 
> htable.batch. While Hbase increment go through htable.increment.  The 
> mutation join strategy also just works for simple replacement.
> To overcome this, it do need to hack a lot of fundamental code. so, in my 
> branch, I do enhance the MutationState by change the mutation value from 
> Byte[] to a MutationValue class to store both byte[] for PUT/DELETE and long 
> for increment operation. And with join operation for multiple DML, the later 
> Put/DELETE will override previous mutation, while a later Increment will not 
> override PUT/DELETE, it will be kept. And will also accumulate on Increment 
> on the same column. And upon commit, all the PUT/DELETE will still be batched 
> first, then Increment will be done one by one. 
> I am not sure is there better solution on this, but this approaching is the 
> most easy one I can figure out which do not impact the whole framework too 
> much.
> You can test out both of the scenario I mentioned above with the unit test 
> case.
> At present, since issue 1/2 is not addressed well, some case will fail ( and 
> so I comment them out). But with the solution for data type I mentioned above 
> been implemented, I believe this could work quite well.
> Any idea?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to