[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044835#comment-15044835
 ] 

Phil Yang commented on HBASE-14004:
-----------------------------------


{quote}
In another word, the client needn't discriminate between timeout and sync 
error, right?
{quote}
For a HBase client which has retry logic, timeout and other errors result in 
retrying with no difference, so it seems not important that we must guarantee 
the non-timeout errors should make sure the data will never exist in database, 
although I think it is necessary for a reliable database. Different users may 
have different requirement, what do other fellas think on this question?

And I think there is not a big difference between your approach and part of my 
design. There is only difference on implementation.

If we need the guarantee above that we should differentiate timeout error and 
non-timeout error, we can save on zk then write to new file (your idea) or just 
write to new file but skip reading when replaying if repeating (my idea), 
before acking client success. My idea may be more fast because it needn't send 
request to zk.

If we needn't the guarantee, we can rollback memstore, ack fail to client and 
do other work async because we are not afraid of RS crashing right now.

And I think the major difference between our ideas is you don't change the 
logic about WAL's sync. However, I think the currently logic may be not perfect 
because hflush only write data to three DNs memory, which is not the real 
persistence just like users thought. If RS and three DNs down, or the whole 
cluster down because of some serious issue, the data that is not synced to DNs' 
disks will be lost. This issue not only affects the inconsistency between two 
clusters, but also confuses users that actually we don't save your data on 
disk. I think it may not be good :(


> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14004
>                 URL: https://issues.apache.org/jira/browse/HBASE-14004
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: He Liangliang
>            Priority: Critical
>              Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to