[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692563#comment-13692563 ]
Enis Soztutar commented on HBASE-3787: -------------------------------------- Sergey asked me to elaborate a bit more on my earlier candidate proposal. This is still light on details, and just for some food for thought to be considered for later. The idea for this proposal will only work with append and increment type operations, since it will be operation specific rather than a generic solution. This also relies on assumptions that distributed counters are the main use case for increment operation, and these counters are mostly written to and less-frequently read. We will introduce two KeyValue.Type's: Put_Inc and Put_App, and rely on cell tags to keep nonces around. These sort before Puts. We can make the cell tag nonce a part of sort order as well, if it is set (otherwise we can append nonce to the row_key). With this we don't need any specific handling of nonces on the write side, since writes with the same nonce will eclipse each other since they will sort the same. Also we do not have to keep anything in memory, and regions can be moved freely in between servers. Put_Inc and Put_App will not count against version, so that we keep those around until they expire. We can build a grouping KV scanner which collapses Put_Inc's with the underlying Puts. Since every get is already a scan, when client wants to read the value back, it is computed on the fly (until we see a base Put, the versions will not increase, so we will keep on scanning and buffering up). On compactions, we can also use this grouping to collapse nonces that have been expired. The data might be sorted as: Put,r1,cf1:q1,ts3,val4 Put_Inc,r1,cf1:q1,ts2,val3 (tag:nonce) Put_Inc,r1,cf1:q1,ts1,val2 (tag:nonce) Put_Inc,r1,cf1:q1,ts1,val2 (tag:nonce) => idempotent rpc, second try Put,r1,cf1:q1,ts1,val1 Get -> will return val4. Get (ts <= ts2) will return val3 + val2 + val1 > Increment is non-idempotent but client retries RPC > -------------------------------------------------- > > Key: HBASE-3787 > URL: https://issues.apache.org/jira/browse/HBASE-3787 > Project: HBase > Issue Type: Bug > Components: Client > Affects Versions: 0.94.4, 0.95.2 > Reporter: dhruba borthakur > Assignee: Sergey Shelukhin > Priority: Critical > Fix For: 0.95.2 > > Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, > HBASE-3787-v1.patch, HBASE-3787-v2.patch, HBASE-3787-v3.patch, > HBASE-3787-v4.patch, HBASE-3787-v5.patch, HBASE-3787-v5.patch > > > The HTable.increment() operation is non-idempotent. The client retries the > increment RPC a few times (as specified by configuration) before throwing an > error to the application. This makes it possible that the same increment call > be applied twice at the server. > For increment operations, is it better to use > HConnectionManager.getRegionServerWithoutRetries()? Another option would be > to enhance the IPC module to make the RPC server correctly identify if the > RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira