[ 
https://issues.apache.org/jira/browse/HBASE-22839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905603#comment-16905603
 ] 

Bin Shi commented on HBASE-22839:
---------------------------------

[[email protected]], [~lhofhansl], [~divesh.jain], [~priyankporwal] 
[~sanjeevln]

Please let me know your thoughts. 

> Provide Serial Replication in HBase 1.3 to fix "row keys and timestamps are 
> the same but the values are different in the presence of cross-cluster 
> replication"
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-22839
>                 URL: https://issues.apache.org/jira/browse/HBASE-22839
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 1.3.4, 1.3.5
>            Reporter: Bin Shi
>            Priority: Major
>             Fix For: 1.3.4, 1.3.5
>
>
> Problem Statement:
> In the cross-cluster replication validation, we found some cells in 
> master(source) cluster and slave(destination) cluster can have the same row 
> key, the same timestamp but different values. The happens when mutations with 
> the same row key are submitted in batch without specifying the timestamp, and 
> the same timestamp in the unit of millisecond is assigned at the time when 
> they are committed to the WAL. 
> When this happens, if the major compaction hasn’t happened yet and you scan 
> the table, you can find some cells have the same row key, the same timestamps 
> but different values, like the first three rows in the following table.
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 1|
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 2|
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 3|
> |Row Key 2|CF0::Column 1|Timestatmp 2|Value 4|
> |Row Key 3|CF0::Column 1|Timestatmp 4|Value 5|
> The ordering of the first three rows is indeterminate in the presence of the 
> cross-replication, so after compaction, in the master cluster you will see 
> “Row Key 1, CF0::Column1, Timestamp1” having the value 3, but in the slave 
> cluster, you might see the cell having one of the three possible values 1, 2, 
> 3, which results data inconsistency issue between the master and slave 
> clusters.
> Root Cause Analysis:
> In HBaseInterClusterReplicationEndpoint.createBatches() of branch-1.3, the 
> WAL entries from the same region could be split into different batches 
> according to replication RPC limit and these batches are shipped by 
> ReplicationSource concurrently, so the batches for the same region could 
> arrive at the sink on the region servers in the slave clusters then apply to 
> the region in indeterminate order due to synchronous nature of  cross-cluster 
> replication.
> Solution:
> In HBase 3.0.0 and 2.1.0, we provided Serial Replication HBASE-20046 which 
> guarantees the order of pushing logs to slave clusters is same as the order 
> of requests from client in the master cluster. It contains mainly two changes:
>  # Recording the replication "barriers" in ZooKeeper to synchronize the 
> replication across old/failed RS and new RS to provide strict ordering 
> semantics even in the presence of region-move or RS failure.
>  # Make sure the batches within one region are shipped to the slave clusters 
> in order.
> The second part of change is exactly what we need and the minimal change to 
> fix the issue in this JIRA.
> To fix the issue in this JIRA, we have two options:
>  # Cherry-Pick HBASE-20046 to branch 1.3. Pros: It also fixes the data 
> inconsistency issue when there is region-move or RS failure and help to 
> reduce the noises in our cross-cluster replication/backup validation which is 
> our ultimate goal. Cons: the change is big and I'm not sure for now whether 
> the change is self-contained or it has other dependencies which need to port 
> to branch 1.3 too; and we need longer time to validate and stabilize.  
>  # Port the minimal change or make the equivalent change as the second part 
> of HBASE-20046 to make sure the batches within one region are shipped to the 
> slave clusters in order."
> I prefer option 2 because of cons of option 1. Thoughts? 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to