[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046286#comment-15046286
 ] 

Phil Yang commented on HBASE-14004:
-----------------------------------

{quote}
I guess we need to persist the acked length to somewhere like zk, or else we 
will still replicate the non-acked data to slave cluster when recover?
{quote}
If we persist the acked length to zk and RS crash before saving to zk, what 
will happen? There is always a situation that we haven't done anything after we 
get a timeout on hflush/hsync or even we crash before getting ack. For hflush, 
if we hold the request, client will not get any response which means after 
restarting we can either make it success or fail for this request. For hsync, 
RS has crashed and it will replay the log after restarting, but we can not make 
sure the data that is not acked by hsync is on DNs' disks or memories, so 
ReplicationSource may can only wait until RS restart because we can not make 
sure if the following visible data will be hsynced to disk. If 
ReplicationSource read anyway and then three DNs crash, slave will have more 
data than master...

And if we add a CreateFlag.SYNC_BLOCK flag when creating WAL file, we can make 
sure that closed file must be on disks, so ReplicationSource can wait namenode 
close the file automatically if RS doesn't recover and read the whole file, 
right?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14004
>                 URL: https://issues.apache.org/jira/browse/HBASE-14004
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: He Liangliang
>            Priority: Critical
>              Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to