[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692563#comment-13692563
 ] 

Enis Soztutar commented on HBASE-3787:
--------------------------------------

Sergey asked me to elaborate a bit more on my earlier candidate proposal. This 
is still light on details, and just for some food for thought to be considered 
for later. 

The idea for this proposal will only work with append and increment type 
operations, since it will be operation specific rather than a generic solution. 
This also relies on assumptions that distributed counters are the main use case 
for increment operation, and these counters are mostly written to and 
less-frequently read. 

We will introduce two KeyValue.Type's: Put_Inc and Put_App, and rely on cell 
tags to keep nonces around. These sort before Puts. We can make the cell tag 
nonce a part of sort order as well, if it is set (otherwise we can append nonce 
to the row_key). With this we don't need any specific handling of nonces on the 
write side, since writes with the same nonce will eclipse each other since they 
will sort the same. Also we do not have to keep anything in memory, and regions 
can be moved freely in between servers. Put_Inc and Put_App will not count 
against version, so that we keep those around until they expire. 

We can build a grouping KV scanner which collapses Put_Inc's with the 
underlying Puts. Since every get is already a scan, when client wants to read 
the value back, it is computed on the fly (until we see a base Put, the 
versions will not increase, so we will keep on scanning and buffering up). On 
compactions, we can also use this grouping to collapse nonces that have been 
expired. 

The data might be sorted as:
Put,r1,cf1:q1,ts3,val4
Put_Inc,r1,cf1:q1,ts2,val3 (tag:nonce)
Put_Inc,r1,cf1:q1,ts1,val2 (tag:nonce)
Put_Inc,r1,cf1:q1,ts1,val2 (tag:nonce)  => idempotent rpc, second try
Put,r1,cf1:q1,ts1,val1

Get -> will return val4. 
Get (ts <= ts2) will return val3 + val2 + val1
                
> Increment is non-idempotent but client retries RPC
> --------------------------------------------------
>
>                 Key: HBASE-3787
>                 URL: https://issues.apache.org/jira/browse/HBASE-3787
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.94.4, 0.95.2
>            Reporter: dhruba borthakur
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>             Fix For: 0.95.2
>
>         Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, 
> HBASE-3787-v1.patch, HBASE-3787-v2.patch, HBASE-3787-v3.patch, 
> HBASE-3787-v4.patch, HBASE-3787-v5.patch, HBASE-3787-v5.patch
>
>
> The HTable.increment() operation is non-idempotent. The client retries the 
> increment RPC a few times (as specified by configuration) before throwing an 
> error to the application. This makes it possible that the same increment call 
> be applied twice at the server.
> For increment operations, is it better to use 
> HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
> to enhance the IPC module to make the RPC server correctly identify if the 
> RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to