[
https://issues.apache.org/jira/browse/HBASE-22839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905603#comment-16905603
]
Bin Shi commented on HBASE-22839:
---------------------------------
[[email protected]], [~lhofhansl], [~divesh.jain], [~priyankporwal]
[~sanjeevln]
Please let me know your thoughts.
> Provide Serial Replication in HBase 1.3 to fix "row keys and timestamps are
> the same but the values are different in the presence of cross-cluster
> replication"
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-22839
> URL: https://issues.apache.org/jira/browse/HBASE-22839
> Project: HBase
> Issue Type: Improvement
> Components: Replication
> Affects Versions: 1.3.4, 1.3.5
> Reporter: Bin Shi
> Priority: Major
> Fix For: 1.3.4, 1.3.5
>
>
> Problem Statement:
> In the cross-cluster replication validation, we found some cells in
> master(source) cluster and slave(destination) cluster can have the same row
> key, the same timestamp but different values. The happens when mutations with
> the same row key are submitted in batch without specifying the timestamp, and
> the same timestamp in the unit of millisecond is assigned at the time when
> they are committed to the WAL.
> When this happens, if the major compaction hasn’t happened yet and you scan
> the table, you can find some cells have the same row key, the same timestamps
> but different values, like the first three rows in the following table.
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 1|
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 2|
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 3|
> |Row Key 2|CF0::Column 1|Timestatmp 2|Value 4|
> |Row Key 3|CF0::Column 1|Timestatmp 4|Value 5|
> The ordering of the first three rows is indeterminate in the presence of the
> cross-replication, so after compaction, in the master cluster you will see
> “Row Key 1, CF0::Column1, Timestamp1” having the value 3, but in the slave
> cluster, you might see the cell having one of the three possible values 1, 2,
> 3, which results data inconsistency issue between the master and slave
> clusters.
> Root Cause Analysis:
> In HBaseInterClusterReplicationEndpoint.createBatches() of branch-1.3, the
> WAL entries from the same region could be split into different batches
> according to replication RPC limit and these batches are shipped by
> ReplicationSource concurrently, so the batches for the same region could
> arrive at the sink on the region servers in the slave clusters then apply to
> the region in indeterminate order due to synchronous nature of cross-cluster
> replication.
> Solution:
> In HBase 3.0.0 and 2.1.0, we provided Serial Replication HBASE-20046 which
> guarantees the order of pushing logs to slave clusters is same as the order
> of requests from client in the master cluster. It contains mainly two changes:
> # Recording the replication "barriers" in ZooKeeper to synchronize the
> replication across old/failed RS and new RS to provide strict ordering
> semantics even in the presence of region-move or RS failure.
> # Make sure the batches within one region are shipped to the slave clusters
> in order.
> The second part of change is exactly what we need and the minimal change to
> fix the issue in this JIRA.
> To fix the issue in this JIRA, we have two options:
> # Cherry-Pick HBASE-20046 to branch 1.3. Pros: It also fixes the data
> inconsistency issue when there is region-move or RS failure and help to
> reduce the noises in our cross-cluster replication/backup validation which is
> our ultimate goal. Cons: the change is big and I'm not sure for now whether
> the change is self-contained or it has other dependencies which need to port
> to branch 1.3 too; and we need longer time to validate and stabilize.
> # Port the minimal change or make the equivalent change as the second part
> of HBASE-20046 to make sure the batches within one region are shipped to the
> slave clusters in order."
> I prefer option 2 because of cons of option 1. Thoughts?
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)