Jeevan Prakash created HBASE-28829:
--------------------------------------
Summary: Increment Inconsistency in Replication
Key: HBASE-28829
URL: https://issues.apache.org/jira/browse/HBASE-28829
Project: HBase
Issue Type: Bug
Components: Replication
Environment: OS: macOS Sonoma 14.6.1
Reporter: Jeevan Prakash
*Issue:*
Consistency is not achieved for Increment operation in replication.
*Setup:*
Lets have two HBase clusters 'cluster1' and 'cluster2' and both are added as
peers to each other in them and both have replication enabled. There is a
counter cell with initial value '2' in a table. There is a replication delay
from 'cluster1' to 'cluster2'.
*Actions:*
1. Perform increment in 'cluster1' with 1.
2. Perform increment in 'cluster2' with 2.
*Expected Behaviour:*
The value in the counter cell should be 5.
*Actual Behaviour:*
The value in the counter cell is 4.
*Analysis:*
1. After increment in 'cluster1', the value became 2 in 'cluster1'.
2. The replication from 'cluster1' to 'cluster2' gets initiated.
3. But there is a replication delay from 'cluster1' to 'cluster2' and within
that timeframe, increment in 'cluster2' performed.
4. Now the value is 4 in 'cluster2' and it got replicated to 'cluster1'.
5. Since the replication is cell-based, not operation based and the 'cluster2'
increment is the latest, value 4 from 'cluster2' overrides value 3 in
'cluster1'.
*Steps to reproduce:*
Add a coprocessor in 'cluster2' with 'Thread.sleep' in 'preWALAppend' to
simulate replication delay.
*Inference:*
During debugging, it was discovered that the replication is cell-based, meaning
the entire cell is being replicated rather than the specific operation being
performed. This method works for other operations such as put and delete
operations because it resolves the inconsistency problem by utilising the
timestamp and version of the cell. However, for increment operations, which
rely on the cell's previous value, this method is not successful.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)