[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149497#comment-15149497 ] stack commented on HBASE-14004: --- bq. We could try rejiggering the order in which memstore gets updated, putting it off till after the sync. Since "HBASE-15158 Change order in which we do write pipeline operations; do all under row locks", we don't do memstore rollbacks. FYI. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149552#comment-15149552 ] stack commented on HBASE-14004: --- But, as noted elsewhere, this fact does not solve this issue (what made it into the stream..) > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625326#comment-15625326 ] Duo Zhang commented on HBASE-14004: --- I think we need to pick this up. With AsyncFSWAL, it is not safe to use DFSInputStream to read the WAL file directly until EOF when it is still open. The data we read maybe disappear later. FSHLog also has this problem but it is much safer... See this document for more details https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# The problem only happens when the WAL file is still open. AFAIK, if a RS is alive, then its WAL will always be replicated by itself. So I think it is possible that we expose an API to tell the ReplicationSource the safe length to read of an opened WAL file. And for a ReplicationSource that replicates WAL of other RS, then we can make sure the RS is dead and all its WALs should also be closed(we can also make sure it by calling recoverLease). So it is safe to read it until EOF with DFSInputStream. Any concerns? If not, Let's start working! Thanks. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977691#comment-14977691 ] Heng Chen commented on HBASE-14004: --- I have a proposal. When WAL sync failed, and master has to rollback Memstore. We can record this action in ZK or system table, meanwhile all slaves should sync this action and modify its memstore. Any concerns? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977696#comment-14977696 ] Heng Chen commented on HBASE-14004: --- {quote} When WAL sync failed, and master has to rollback Memstore. We can record this action in ZK or system table, meanwhile all slaves should sync this action and modify its memstore. {quote} Of course, before slave rollback memstore, it should check the timestamp of rollback action and WALs to replay. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977769#comment-14977769 ] Duo Zhang commented on HBASE-14004: --- The problem here not only effects replication. {quote} As a result, the handler will rollback the Memstore and the later flushed HFile will also skip this record. {quote} What if the regionserver crashed before flushing HFile? I think the record will come back since it has already been persisted in WAL. Add a marker maybe a solution, but you need to check the marker everywhere when replaying WAL, and you still need to deal with the failure when placing marker... I do not think it is easy to do... The basic problem here is we may have inconsistency between memstore and WAL when we fail to sync WAL. A simple solution is killing the regionserver when we fail to sync WAL which means we will never rollback memstore but reconstruct it using WAL. We can make sure there is no difference between memstore and WAL under this situation. If we want to keep regionserver alive when syncing failed, then I think we need to find the real result of the sync operation. Maybe we could close the WAL file and check its length? Of course, if we have lost the connection to namenode, I think there is no simple solution other than killing the regionserver... Thanks. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977841#comment-14977841 ] Heng Chen commented on HBASE-14004: --- {quote} What if the regionserver crashed before flushing HFile? I think the record will come back since it has already been persisted in WAL. {quote} Indeed, as current logic, when sync failed, the memstore will rollback, and client will be told 'write failed'. And if RS crash before memstore flush, the record will came back after WAL replayed. So client will found write action is not failed, it is inconsistent! {quote} Add a marker maybe a solution, but you need to check the marker everywhere when replaying WAL, and you still need to deal with the failure when placing marker... {quote} Agreed! Thanks [~Apache9] for your reply! > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977908#comment-14977908 ] Anoop Sam John commented on HBASE-14004: In case when RS is killed and so later WAL replay will come in and possibly add back the Mutation, then also the inconsistency with client reply will come in. Client would have told that the mutation failed! > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978004#comment-14978004 ] Yu Li commented on HBASE-14004: --- {quote} It's possible that the HDFS sync RPC call fails, but the data is already (may partially) transported to the DNs which finally get persisted {quote} Does this really happen on your cluster or it's just an assumption [~heliangliang] [~chenheng]? >From what I could see we will get EOFException when trying to read partially >written entries (those larger than 64k and split into multiple packets and >some packets got lost due to network or dfs error) and these entries will be >abandoned during replication. If in real case the sync RPC call fails but data somehow persisted, it's more like a bug in HDFS or error usage in HBase (like not handling all possible exceptions, etc.). I'd suggest to dig deeper into this first before any further discussion. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978038#comment-14978038 ] Heng Chen commented on HBASE-14004: --- On the assumption that HDFS sync successfully and do response, but RS NOT get this response because of network disconnect. As for RS, sync timeout but data persisted on HDFS. [~carp84] As [~Apache9] mentioned, {quote} A simple solution is killing the regionserver when we fail to sync WAL which means we will never rollback memstore but reconstruct it using WAL. {quote} It may cause two problems. * we may do mutation twice, (it seems no problems if mutation is incr/append in current logic?) * All RS will be killed if network between RS and NN disconnected shortly. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978108#comment-14978108 ] Duo Zhang commented on HBASE-14004: --- [~carp84] This is an inherent problem of RPC based systems due to temporary network failure. HBase use {{hflush}} to sync WAL, I do not know the details that if hflush will call namenode to update length, but in any case, the last RPC call could fail at client side but succeed at server side(network failure when writing return value back). And sure, this should be a bug in HBase. I checked the code, if an exception is thrown from hflush, {{FSHLog.SyncRunner}} simply passes it to upper layer. So it could happen that hflush is succeeded at HDFS, but HBase think it is failed and cause inconsistency. I think we need to find a way to make sure whether the WAL is actually persisted at HDFS. And if DFSClient already has retry, then I think killing regionserver is enough? Any suggestions [~carp84]? Thanks. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999866#comment-14999866 ] Yu Li commented on HBASE-14004: --- [~chenheng] [~Apache9] Sorry for the late response, somehow didn't notice the jira update mail. See your point now, thanks for the explanation. {quote} I do not know the details that if hflush will call namenode to update length {quote} Checking the code, SyncRunner will call ProtobufLogWriter#sync and finally call DataOutputStream#hflush, there we could know it will only call nn to update length when new block created, but won't while filling one already created one. {quote} I think we need to find a way to make sure whether the WAL is actually persisted at HDFS {quote} Agree. Just notice you've created HBASE-14790 and let's discuss more details there. :-) > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042694#comment-15042694 ] Phil Yang commented on HBASE-14004: --- After discussion in HBASE-14790 , we can move forward now. Let me repost my comment in HBASE-14790 first :) {quote} Currently there are two scenarios which may result in inconsistency between two clusters. The first is master cluster crashes(for example, power failure) or three DNs and RS crash at the same time and we lost all data that is not flushed to DNs' disks but the data have been already synced to slave cluster. The second is we will rollback memstore and response client an error if we get a error on hflush but the log may indeed exists in WAL. This will not only results in inconsistency between two clusters but also gives client a wrong response because the data will "revive" after replaying WAL. This scenario has been discussed in HBASE-14004 Comparing to the second, it is easier to solve the first scenario that we can tell ReplicationSource it can only read the logs that is already saved on three disks. We need to know the largest WAL entry id that has been synced. So HDFS's sync logic for itself may be not helpful for us and we must use hsync to let HBase know the entry id. So we need a configurable periodically hsync here, and if we have only one cluster it is also helpful to reduce data losses because of data center power failure or unluckily crashing three DNs and RS at the same time. For the second scenario, it is more complex because we can not rollback memstore and tell client this operation failed unless we are very sure the data will never exist in WAL, and mostly we are not sure... So we have to use a new WAL logic that rewriting the entry to the new file rather than rollback. To implement this we need to handle duplicate entries while replaying WAL. {quote} Therefore, we may have 4 subtasks: 1: A configurable periodically hsync logic to make sure our data has been saved on disks. It is also helpful for single cluster mode. 2: ReplicationSource should only read WAL that is hsynced to prevent slave cluster having data that master losses. 3: WAL reader can handle duplicate entries, in other words, make WAL logging idempotent. 4: Fixing HBase writing path that we should retry logging WAL in a new file rather than rollback MemStore. Thoughts? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042744#comment-15042744 ] Duo Zhang commented on HBASE-14004: --- {quote} Fixing HBase writing path that we should retry logging WAL in a new file rather than rollback MemStore. {quote} To be clear, this means we will hold the {{WAL.sync}} request if there are some entries have already been written out but not acked and never return until we successfully write them out and get ack back. And if {{WAL.sync}} or {{WAL.write}} fails(maybe due to queue full), we will still rollback MemStore since we can confirm that the WAL entries have not been written out. Right? And I think there is another task for us. Now the DFSOutputStream does not provide a public method to get acked length. We can open a issue of HDFS project and use reflection first in HBase. But there is still a problem that {{hflush}} or {{hsync}} does not return the acked length which means get acked length and {{hsync}} are two separated operations so it is hard to get the exact acked length after calling {{hsync}}. Maybe we could get current total write out bytes first(not acked length) and then call {{hsync}}, the acked length after calling {{hsync}} must be larger than this value so it is safe to use this value as "acked length". Any thoughts? Thanks. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042765#comment-15042765 ] Phil Yang commented on HBASE-14004: --- Is it required to use the size of serialized binary data? I don't know if there is a sequence increment unique id for each wal log. If so or if we can add this, we can know what is the largest id that has been hsynced, right? And this id can also help us on replaying duplicate entries. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043246#comment-15043246 ] Duo Zhang commented on HBASE-14004: --- Sounds great, an incremental unique id of WAL entry is better since it is managed by ourselves. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043657#comment-15043657 ] stack commented on HBASE-14004: --- bq. ReplicationSource should only read WAL that is hsynced to prevent slave cluster having data that master losses. This will require big change in how replication works but for the better and replication will be less resource intense because less NN ops (if crash, we ask NN for file length, not ZK? If so, this would be a task we have been needing to do for a long time; i.e. undo keeping replication position in zk). bq. WAL reader can handle duplicate entries, in other words, make WAL logging idempotent. Might have to add some code to reader to skip an entry it has seen before (this may be there already -- need to check). bq. Fixing HBase writing path that we should retry logging WAL in a new file rather than rollback MemStore. This is new but has been done before. I'd be up for helping w/ WAL changes, stuff like keeping around appends until the sync for them comes in (I've messed w/ this before), and would be interested in helping out on replication log length accounting changing it from relying on reopen after it gets EOF and keeping length in zk. You fellas are fixing a few fundamental issues here. Sweet. bq. we will still rollback MemStore since we can confirm that the WAL entries have not been written out. Right? We could try rejiggering the order in which memstore gets updated, putting it off till after the sync. The order we have now came about long time ago when WAL was very different. We might be able to change the order, simplify the write pipeline, and not lose too much perf (or, perhaps, get more perf because we are doing healthier group commits). bq. Maybe we could get current total write out bytes first(not acked length) and then call hsync, the acked length after calling hsync must be larger than this value so it is safe to use this value as "acked length". It would be good if hbase could calculate the written length itself. We could try it. What happens if we want to compress WAL or what about crc tax (I suppose this latter would be a constant -- and for the former, maybe we could figure then length... even on compress if per edit or per batch) bq. I don't know if there is a sequence increment unique id for each wal log. There is such a sequenceid but it is by-region, not global. Could keep sequence id by region accounts? (We already do this elsewhere). > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043664#comment-15043664 ] Duo Zhang commented on HBASE-14004: --- {quote} This will require big change in how replication works but for the better and replication will be less resource intense because less NN ops (if crash, we ask NN for file length, not ZK? If so, this would be a task we have been needing to do for a long time; i.e. undo keeping replication position in zk). {quote} I think we should have two branches to determine how many entries can we read. One is for closed WAL file, one is for the WAL still being written. We can get this information using {{DistributedFileSystem.isFileClosed}}. If the file is already closed, then we could use the length that gotten from HDFS. If the file is still opened for writting, then we should ask the rs who is writing it for the safe length. If we can not find the rs(maybe it has already crashed), then we could wait a minute since namenode will finally recover its lease and close the file. {quote} There is such a sequenceid but it is by-region, not global. Could keep sequence id by region accounts? (We already do this elsewhere). {quote} So maybe we still need to use "acked length", not "acked id". But this is enough to filter out duplicate WAL entries I think. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043721#comment-15043721 ] Yu Li commented on HBASE-14004: --- Nice discussion [~yangzhe1991] and [~Apache9]. FWIW, two questions about the Phil's proposal: 1. What the logic would be like if the durability is set to ASYNC in table descriptor? Is the following case possible to happen?: 1) entry write into memstore 2) region reassign to other RS thus content in memstore got flushed into hfile 3) wal sync/write failed In this case we might run into another kind of inconsistency, say master cluster has the data but slave doesn't? 2. About {{WAL logging idempotent}}, maybe we also need to consider the cross-RS case when region assign happens before wal sync acked? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044308#comment-15044308 ] Heng Chen commented on HBASE-14004: --- {quote} To be clear, this means we will hold the WAL.sync request if there are some entries have already been written out but not acked and never return until we successfully write them out and get ack back. And if WAL.sync or WAL.write fails(maybe due to queue full), we will still rollback MemStore since we can confirm that the WAL entries have not been written out. Right? {quote} I have a big concern about it. If we not configure hsync every time( hsync periodically), it means there are always some entries we make hflush but not hsync. And as our logical designed, when one hflush failed, we close old wal and open a new one, the entries which not hsync will be written into new WAL. If RS crashed at this time, what will happen? Is it means some entries may be already in place (you have told to client your mutation successed and data was really in place on DN already) will lost. I think it is a regression. Because one failed mutation may cause more mutations inconsistency. I think it is also [~carp84] concern as his problem. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044315#comment-15044315 ] Phil Yang commented on HBASE-14004: --- {quote} If RS crashed at this time, what will happen? {quote} If only RS crashed, DNs not, there is no data loss because they are hflushed to DNs' memory. Only RS+DNs all crashed causes data loss. But hsync every time will increase latency, so users can configure on this issue. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044318#comment-15044318 ] Heng Chen commented on HBASE-14004: --- {quote} If only RS crashed, DNs not, there is no data loss because they are hflushed to DNs' memory. {quote} If i understand correctly, this is my thought. If one hlush failed, we close old WAL and write entries not hsync to new WAL, right? You close old WAL as old acked length, right? 'Acked length' is updated by hsync, right? If RS crashed at this time, when WAL replayed by other RS, the entries not hsynced in Queue (whatever it is) will lost, right? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044319#comment-15044319 ] Duo Zhang commented on HBASE-14004: --- {quote} Is it means some entries may be already in place (you have told to client your mutation successed and data was really in place on DN already) will lost. {quote} If all 3 DNs are all crashed and the RS is also crashed then yes, you have no choice unless hsync every time to prevent this. And this does not related to the scenario you said I think, we do not return SUCESS to client until we got a successful hflush response. And what [~liyu] said is another problem. Flush WAL async(or even do not write WAL) is known to have the risk of losing data. We need to determine what we can guarantee when user use these configurations. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044321#comment-15044321 ] Duo Zhang commented on HBASE-14004: --- {quote} If one hlush failed, we close old WAL and write entries not hsync to new WAL, right? You close old WAL as old acked length, right? 'Acked length' is updated by hsync, right? If RS crashed at this time, when WAL replayed by other RS, the entries not hsynced in Queue (whatever it is) will lost, right? {quote} No. You can close old WAL using any length that larger than the previous succeeded hflushed length, not the hsynced length. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044327#comment-15044327 ] Heng Chen commented on HBASE-14004: --- {quote} No. You can close old WAL using any length that larger than the previous succeeded hflushed length, not the hsynced length. {quote} OK, i got it now. The 'acked length' only be used in Replication. We need another 'hflushed length' to close WAL. So data only lost when all 3 DNs and RS crashed. Thanks [~Apache9] for your explain. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044333#comment-15044333 ] Phil Yang commented on HBASE-14004: --- I have a new concern about the configuration. Now HBase only use hflush, if we add a hsync logic, will there be degradation of performance? Should we still allow users disable hsync just like before? If so, what is the default configure? If user disable hsync, what should we do for ReplicationSource? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044341#comment-15044341 ] Heng Chen commented on HBASE-14004: --- {quote} Now HBase only use hflush, if we add a hsync logic, will there be degradation of performance? {quote} I think default configuration is hsync periodically. In normal path, because of async hsync, IMO it is OK for performance. But if hflush failed, the handler will be hang to process recovery, if it costs too much time, because retry of client, maybe all handlers will be exhausted soon. It may be another regression as current logical. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044355#comment-15044355 ] Yu Li commented on HBASE-14004: --- bq. if we add a hsync logic, will there be degradation of performance IMHO for 100% no-data-loss guarantee, we have to sacrifice performance, more or less. This is a trade-off and user should be able to make their choice. However, there's no real fsync support in HBase yet, although quite some efforts paid like HBASE-5954 ([~lhofhansl] and [~stack], please correct me if I've stated anything wrong here, thanks). Not sure whether I'm off the topic but somehow I feel all these things are related and indeed we are trying to fix a few fundamental issues here just like stack mentioned. bq. Should we still allow users disable hsync just like before? I think yes, we should leave an option here, user might care more about performance and would like to take the relative low all-3-DN-crashes risk, I guess. bq. If so, what is the default configure? I guess this depends on the final perf number of the new design and implementation bq. If user disable hsync, what should we do for ReplicationSource? I think we should go back to old logic when user choose to. btw, could see quite some discussion here and maybe a doc summarizing the basic design, existing discussion conclusion and left-over questions would be good for understanding and further discussion? :-) > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044368#comment-15044368 ] Heng Chen commented on HBASE-14004: --- {quote} btw, could see quite some discussion here and maybe a doc summarizing the basic design, existing discussion conclusion and left-over questions would be good for understanding and further discussion? {quote} Good suggestion. We need one design doc, [~yangzhe1991] [~Apache9] do you have time to prepare a doc for this issue. If you don't have time, i can write one. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044380#comment-15044380 ] Phil Yang commented on HBASE-14004: --- Sure, I'm going to draft a document for summarizing our ideas. [~Apache9] and I can have more offline discussion so I can post an initial version that satisfies both of us and then ask for your suggestion :) > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044563#comment-15044563 ] Heng Chen commented on HBASE-14004: --- Looks good. Except one thing. {code} If we get timeout error finally, we should open a new file and write/hflush all data in buffer to it and use a background thread to close old file using any length that larger than the previous succeeded hflushed length. {code} IIRC, we should ensure seqId of WAL edits increased. Otherwise, we will failed when replay WALs. Some entries in new WAL may have smaller seqid than entries in old WAL as our logical, right? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044576#comment-15044576 ] Phil Yang commented on HBASE-14004: --- {quote} Some entries in new WAL may have smaller seqid than entries in old WAL as our logical, right? {quote} Yes, that is why we should make sure WAL reader can filter entries that it has seen before. Two entries with same id may not adjacent, maybe two WAL files are just like this: [1, 2, 3, 4, 5, 6] [4, 5, 6, 7] If when writing first WAL we only make sure we have hflushed entries no larger than 3 > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044603#comment-15044603 ] Heng Chen commented on HBASE-14004: --- It seems that WAL reader is designed for one WAL, we can't filter entries from other WAL in WAL Reader. Maybe we should do something to filter entries when replay. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044616#comment-15044616 ] He Liangliang commented on HBASE-14004: --- Sorry for the late reply. What about this approach: 1. Keep the current synced sequence id in memory and make sure the replicator does not read over this id when replicating. 2. When rolling WAL, record the final id in zk replication queue and also mark the file as rolled in-memory so the local replicator can know this file is finished. 3. When the replication queue is failovered, the new replicator will check the recorded sequence id, if it's available, it's successfully rolled, otherwise, read until EOF. For the later case, we need make sure the edits after the last successful sync of that log must be replayed to ensure consistency. Recording the last successful synced sequence id when flushing can guarantee this. There overhead is insignificant (just a memory barrier for the volatile sequence id passing to the replicator). Guess this may be the original design since there is TODO comments in FSHLog.java: TODO: * replication may pick up these last edits though they have been marked as failed append (Need to * keep our own file lengths, not rely on HDFS). > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044643#comment-15044643 ] Phil Yang commented on HBASE-14004: --- {quote} It seems that WAL reader is designed for one WAL, we can't filter entries from other WAL in WAL Reader. Maybe we should do something to filter entries when replay. {quote} You are right, the logic that ignores logs which are already in HFiles is in HRegion.replayRecoveredEdits, so the logic of filtering seen logs should also be there? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044666#comment-15044666 ] Heng Chen commented on HBASE-14004: --- {quote} You are right, the logic that ignores logs which are already in HFiles is in HRegion.replayRecoveredEdits, so the logic of filtering seen logs should also be there? {quote} yeah, maybe we should store the max seqid of 'acked hflushed' wal entries in some places (ZK or system table), we could use this information to skip entries in new WAL. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044685#comment-15044685 ] Phil Yang commented on HBASE-14004: --- {quote} Guess this may be the original design since there is TODO comments in FSHLog.java: {quote} If my understanding is right, these comments think we should truncate WAL file according to the position of the last synced log so we can avoid replaying or replicating edits that have been regarded as failed by clients. However, even if RS know where we should truncate. RS may crash after telling clients failing and before truncating. So I think there is a better idea that we do not truncate and only rewrite the logs to the new file and do not telling clients failing after we make WAL logging idempotent. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044702#comment-15044702 ] He Liangliang commented on HBASE-14004: --- We don't need to guarantee "avoid replaying or replicating edits that have been regarded as failed by clients". In another word, the client needed't discriminate between timeout and sync error, right? In fact, the most probable case would like this: 1. DN have problems 2. HBase client timeout 3. HDFS client timeout and sync fails > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044835#comment-15044835 ] Phil Yang commented on HBASE-14004: --- {quote} In another word, the client needn't discriminate between timeout and sync error, right? {quote} For a HBase client which has retry logic, timeout and other errors result in retrying with no difference, so it seems not important that we must guarantee the non-timeout errors should make sure the data will never exist in database, although I think it is necessary for a reliable database. Different users may have different requirement, what do other fellas think on this question? And I think there is not a big difference between your approach and part of my design. There is only difference on implementation. If we need the guarantee above that we should differentiate timeout error and non-timeout error, we can save on zk then write to new file (your idea) or just write to new file but skip reading when replaying if repeating (my idea), before acking client success. My idea may be more fast because it needn't send request to zk. If we needn't the guarantee, we can rollback memstore, ack fail to client and do other work async because we are not afraid of RS crashing right now. And I think the major difference between our ideas is you don't change the logic about WAL's sync. However, I think the currently logic may be not perfect because hflush only write data to three DNs memory, which is not the real persistence just like users thought. If RS and three DNs down, or the whole cluster down because of some serious issue, the data that is not synced to DNs' disks will be lost. This issue not only affects the inconsistency between two clusters, but also confuses users that actually we don't save your data on disk. I think it may not be good :( > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044970#comment-15044970 ] Phil Yang commented on HBASE-14004: --- Furthermore, I find another issue that since HDFS-744 supported hsync(), it also added CreateFlag.SYNC_BLOCK in FileSystem.create() that "Force closed blocks to disk". Which means it will send a syncBlock flag in the last DFSPacket in every endBlock(). If we don't add this just like currently HBase, the files we save on HDFS are not synced to disk immediately. We are having the risk of losing data when we just flush a MemStore into HFile or we just compact some HFiles because we think these data have been saved and we may delete WAL or old HFiles, right? If I am not wrong, we can create a new issue to have discussion there since it is independent > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046226#comment-15046226 ] Yu Li commented on HBASE-14004: --- bq. My idea may be more fast because it needn't send request to zk Considering the fail over case (like RS crash), I guess we need to persist the acked length to somewhere like zk, or else we will still replicate the non-acked data to slave cluster when recover? bq. We are having the risk of losing data when we just flush a MemStore into HFile I believe HBASE-5954 tried to resolve the same problem, and would suggest to pay a visit there. :-) > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046286#comment-15046286 ] Phil Yang commented on HBASE-14004: --- {quote} I guess we need to persist the acked length to somewhere like zk, or else we will still replicate the non-acked data to slave cluster when recover? {quote} If we persist the acked length to zk and RS crash before saving to zk, what will happen? There is always a situation that we haven't done anything after we get a timeout on hflush/hsync or even we crash before getting ack. For hflush, if we hold the request, client will not get any response which means after restarting we can either make it success or fail for this request. For hsync, RS has crashed and it will replay the log after restarting, but we can not make sure the data that is not acked by hsync is on DNs' disks or memories, so ReplicationSource may can only wait until RS restart because we can not make sure if the following visible data will be hsynced to disk. If ReplicationSource read anyway and then three DNs crash, slave will have more data than master... And if we add a CreateFlag.SYNC_BLOCK flag when creating WAL file, we can make sure that closed file must be on disks, so ReplicationSource can wait namenode close the file automatically if RS doesn't recover and read the whole file, right? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046292#comment-15046292 ] Heng Chen commented on HBASE-14004: --- {quote} I guess we need to persist the acked length to somewhere like zk, or else we will still replicate the non-acked data to slave cluster when recover? {quote} There is no need to do it. Replication Queue store current position of WAL in ZK which has been replicated. If replicator only reads 'hsynced' entries, it will not replicate non hsynced data to slave when recover. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046371#comment-15046371 ] Yu Li commented on HBASE-14004: --- It seems to me that replication queue only records where to *start* reading but not where to *end*, and the acked length exactly records where to end? If e don't persist this acked length, during failover we still will read until EOF? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046385#comment-15046385 ] Yu Li commented on HBASE-14004: --- Assume this situation: 1. RS received an ack from DFSClient, and record it in memory, not on zk 2. ReplicationSource read from the last synced length, but RS crashed before it reach the new ack length 3. Replication queue recovery starts on another RS, where no place to get the acked length for the old-crashed RS In this case, where will step 3 ends? If read until it reaches EOF, then the inconsistency issue we try to resolve here happen again, right? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046395#comment-15046395 ] Duo Zhang commented on HBASE-14004: --- {quote} If e don't persist this acked length, during failover we still will read until EOF? {quote} It depends on the file state. If it is already closed, then we could use the file length get from namenode. If it is still opened for writing, then we should ack for the acked length. You could already get the acked length unless the RS is crashed, but if the RS is crashed, then the file will finally be closed by NN. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046398#comment-15046398 ] Duo Zhang commented on HBASE-14004: --- {quote} 3. Replication queue recovery starts on another RS, where no place to get the acked length for the old-crashed RS In this case, where will step 3 ends? If read until it reaches EOF, then the inconsistency issue we try to resolve here happen again, right? {quote} If the region is assigned to another RS, then the new RS need to replay WAL to reconstruct the memstore, at this time, even unacked WAL entries that appear in the WAL file will also be replayed, right? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046401#comment-15046401 ] Yu Li commented on HBASE-14004: --- bq. then the file will finally be closed by NN I don't think *finally be closed* is enough, since the replication queue recover will start as soon as Master detects RS crash and issue the recovery task. I think we need to handle this case that file not closed but recovery already started, right? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046405#comment-15046405 ] Duo Zhang commented on HBASE-14004: --- {quote} I don't think finally be closed is enough, since the replication queue recover will start as soon as Master detects RS crash and issue the recovery task. {quote} HBase uses NN to do fencing which means HMaster will only consider an RS is dead when all of its WAL file is closed. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046414#comment-15046414 ] Heng Chen commented on HBASE-14004: --- Why we need 'where to end' when recover replication? We just need to start replicate entries where replication were broken when RS failed, right? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046416#comment-15046416 ] Heng Chen commented on HBASE-14004: --- Why we need 'where to end' when recover replication? We just need to start replicate entries where replication were broken when RS failed, right? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046415#comment-15046415 ] Heng Chen commented on HBASE-14004: --- Why we need 'where to end' when recover replication? We just need to start replicate entries where replication were broken when RS failed, right? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046422#comment-15046422 ] Duo Zhang commented on HBASE-14004: --- I do not think we can benefit a lot from storing acked length to zookeeper. If we do it periodically, then we could still meet the problem that we flush things out and then crashed so the new acked length is not stored on zk yet. If we do it at every hflush synchronously, then I think zookeeper will be fucked up in a large cluster... > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046439#comment-15046439 ] Yu Li commented on HBASE-14004: --- bq. HBase uses NN to do fencing which means HMaster will only consider an RS is dead when all of its WAL file is closed. Oh yes, the recover lease logic could make sure of this... Ok, now I agree that we don't need to persist the acked length on zk, during failover the existing logic could make sure of consistency. And the to-be-implemented WAL idempotent feature will make sure that "WAL replay of unacked entry" and "client retry after RS crash" won't conflict. Thanks for the explanation [~Apache9]! > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046475#comment-15046475 ] Yu Li commented on HBASE-14004: --- You are right Heng, and Duo also explained about this. Thanks. :-) > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046584#comment-15046584 ] Heng Chen commented on HBASE-14004: --- I make a patch for skip duplicate entries when replay, i found something new which we didn't considered. let's open a new issue for it, and we can discuss something there. HBASE-14949 > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046656#comment-15046656 ] stack commented on HBASE-14004: --- Doc is great. I added a few little notes... > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047929#comment-15047929 ] Heng Chen commented on HBASE-14004: --- {quote} Have to be careful how we do this. We have elaborate accounting now that allows only one 'sync' type, either a hflush or a hsync, but not a mix of both. {quote} After read notes in doc, i begin to agree with stack. Why we need hsync? The concern about using 'hflush' is we may lost data when 3 DNs and RS crash at the same time, right? It is really small probability. But if we introduce hsync (for example hsync periodically), it will cause latency between master and slave. Is it worth to do it? And inconsistency problem about this issue in replication could be fixed if replicator only read entries 'acked hflushed' just like we do in recovery process when hflush failed, right? And as our design, we only use hsync to ensure data inconsistency in replication but data lost still happen because we NOT use 'hsync' in write path. If so, why NOT we just use hflush? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047976#comment-15047976 ] Phil Yang commented on HBASE-14004: --- {quote} And as our design, we only use hsync to ensure data inconsistency in replication but data lost still happen because we NOT use 'hsync' in write path. If so, why NOT we just use hflush? {quote} This is why I think hsync should be configurable. For a database, mostly we should have the guarantee of data persistence. But sometimes we will sacrifice it for higher performance. For example, Redis's aof can be configured to fsync every write, each second or never. Users can configure it according to their requirement. Although "each second" will still lose data after crashing but users will be guaranteed that they will lose data in at most one second, which is still a valuable guarantee. However, currently HBase has no guarantee about this and users may thought their data has already saved on disks, and we have no idea when our data will be saved on disks. Both WAL and SSTable have this issue. And obviously this will result in data inconsistency between two clusters, too. I will not oppose if we only fix (or fix it first) the issue that we transferring data which has been rollbacked in MemStore. And we can use the ack size of hflush which means there won't be additional latency between two clusters. But I think there should be a follow-up work on data persistence and we need a configurable hsync in HBase :) > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047986#comment-15047986 ] Heng Chen commented on HBASE-14004: --- {quote} But I think there should be a follow-up work on data persistence and we need a configurable hsync in HBase {quote} It is really a problem. But i think it will be done in another issue. Maybe you can send an email to dev list. And in this issue, i suppose we just ensure correctness with hflush, wdyt? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048010#comment-15048010 ] stack commented on HBASE-14004: --- This is the issue where client can say what durability to apply per edit. Lars did a load of work on this and issue is filled with good stuff in it. In end, what came out was how it is hard to allow client saying hflush/hsync or no sync in current setup. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048137#comment-15048137 ] Phil Yang commented on HBASE-14004: --- {quote} And in this issue, i suppose we just ensure correctness with hflush, wdyt? {quote} Agree :) , we can focus on the inconsistency between WAL and Memstore which also results in inconsistency between master and slave. And if we don't use hsync, I think we need not change the logic of replicator which means we needn't only transfer data that is hflushed because all entries in WAL should finally be in memstore, right? So we may have only two subtasks now: 1: WAL reader can handle duplicate entries, in other words, make WAL logging idempotent. 2: WAL logger does not throw exception if it can not make sure whether the entry is saved on HDFS or not(for example, hflush timeout), it retry to write entry to a new file and close old file asynchronously I change the description of the second subtask because we needn't care the logic of HRegion, which is now "write memstore->write wal->rollback if wal fail" and may be changed to "write wal->write memstore if wal succeed" > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048171#comment-15048171 ] Heng Chen commented on HBASE-14004: --- {quote} And if we don't use hsync, I think we need not change the logic of replicator which means we needn't only transfer data that is hflushed because all entries in WAL should finally be in memstore, right? {quote} Think about this situation. In write path, some entries hflushed but not acked, so we close old WAL with acked length, and try to write this entries into new WAL, then RS crashed. Slave maybe has replicate this entries but RS after recovery in master will lost them, right? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048552#comment-15048552 ] Phil Yang commented on HBASE-14004: --- You are right, we should change the logic of replicator. And I am not an expert so have a question about the idempotent of HBase operation: What will happen if we replay an entry more than once? Considering these scenarios, the number is the seq ids: 1, 2, 3, 4, 5---this is normal order 1, 3, 2, 4, 5---the order is wrong but each log we only read once. 1, 1, 2, 3, 4, 5---we replay one entry twice but they are continuous 1, 2, 3, 1, 4, 5we replay one entry twice and they are not continuous 1, 2, 3, 1, 2, 3, 4, 5---the order is wrong but the subsequence is repeat so we make sure the order Are they all wrong except the first? It seems that the last one is not wrong? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049009#comment-15049009 ] stack commented on HBASE-14004: --- Not sure I understand the question. Replication reads the WAL in order and sends to the remote cluster all it has read in order. When it gets to the remote side, all should be 'applied' in order. How we get your scenarios #2, #3, etc., above? bq. In write path, some entries hflushed but not acked, so we close old WAL with acked length, and try to write this entries into new WAL, then RS crashed. Slave maybe has replicate this entries but RS after recovery in master will lost them, right? Isn't this essentially the description that leads off this JIRA? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049048#comment-15049048 ] Phil Yang commented on HBASE-14004: --- The reason I asked this question is if we don't have to make sure we must replay wal in order and only once for each log, it may be easier to resolve HBASE-14949 which is being fixed by Heng > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049947#comment-15049947 ] Heng Chen commented on HBASE-14004: --- {quote} 1, 2, 3, 4, 5---this is normal order 1, 3, 2, 4, 5---the order is wrong but each log we only read once. 1, 1, 2, 3, 4, 5---we replay one entry twice but they are continuous 1, 2, 3, 1, 4, 5we replay one entry twice and they are not continuous 1, 2, 3, 1, 2, 3, 4, 5---the order is wrong but the subsequence is repeat so we make sure the order {quote} [~yangzhe1991] only the first one is right now in current logic. WAL is RS Level, but replay is in Region Level, so in wal , seqId in increased one by one, but we can't ensure it in region Level when replay, that's why i mentioned i made a mistake in HBASE-14949. I will dig it deeper. Would you mind update the doc as we mentioned above, Zhe? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050165#comment-15050165 ] Phil Yang commented on HBASE-14004: --- I have revised the doc https://docs.google.com/document/d/1tTwXrip18qxxsSiPu_y4fGB-26SxqwMvZnFREswyQow/edit?usp=sharing which removes the logic about hsync. And previously the doc quotes Duo's comment {quote} close old file using any length that larger than the previous succeeded hflushed length {quote} I changed this to close the file using the previous succeeded hflushed length which is more clear for implementation. And in section "Some issues still need discussion" I quote [~carp84]'s comment: {quote} We also need to consider the cross-RS case when region assign happens before wal sync acked. {quote} Yu, is it still a problem after we not using hsync? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050308#comment-15050308 ] Phil Yang commented on HBASE-14004: --- [~chenheng] it seems that HBASE-14949 only handle replaying logs before region open and it doesn't work on Distributed Log Replay? If I am right, would you please put our logic to WALSplitter so we can handle both case? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050322#comment-15050322 ] Yu Li commented on HBASE-14004: --- bq. Yu, is it still a problem after we not using hsync? My concern was mainly about whether it's possible to have duplicated entries in WAL of *different* RS. Think about the fail over case, replication queue will be transfered to some other RS then entries of the failed RS WAL will be replicated, meanwhile the same WAL will be split, replayed and its entries written into WAL of the new RS serving the same region. In this situation we add a {{isReplay}} flag in WALEdit to avoid duplicated replication (see below code segment in {{FSHlog#append}}) {code} if (entry.getEdit().isReplay()) { // Set replication scope null so that this won't be replicated entry.getKey().setScopes(null); } {code} I could see a similar situation here: # hflush timeout due to network failure but actually persisted # WAL logger tries to re-write the buffered entries to a new WAL but new WAL creation failed due to the same network failure, and returns fail to client # region got reassigned due to balance or hbase shell command # client retry writing to the same region served by a new RS and succeed In this case we have duplicated entries in different RS Feel free to correct me if the assumption is wrong, but if it's possible, then we need to handle it in HBASE-14949 [~chenheng] > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050333#comment-15050333 ] Heng Chen commented on HBASE-14004: --- That is my plan. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050332#comment-15050332 ] Heng Chen commented on HBASE-14004: --- That is my plan. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050376#comment-15050376 ] Heng Chen commented on HBASE-14004: --- {quote} 2. WAL logger tries to re-write the buffered entries to a new WAL but new WAL creation failed due to the same network failure, and returns fail to client 3. region got reassigned due to balance or hbase shell command {quote} when create new WAL and write buffered entries into it. If failed in this procedure, we will tell client 'unacked hlushed' mutation failed. when region reassigned (NOT RS crash) at this time, memstore will be flushed into HFile , right? As current logic, when we split WAL, we will check lastFlushedSeqId, so duplicate entries will be skipped. see HBASE-14949 comments.[~carp84] > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050409#comment-15050409 ] Yu Li commented on HBASE-14004: --- Let me confirm, do you mean we never rollback the memstore even if buffered-entries rewrite failed, so it will be included in the flushed HFile? I think current logic only makes sure already flushed entries won't be replayed, but not duplicated entries. If the memstore rolled back and the duplicated entry is the last write, I don't think lastFlushedSeqId could skip it > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050418#comment-15050418 ] Heng Chen commented on HBASE-14004: --- {quote} Let me confirm, do you mean we never rollback the memstore even if buffered-entries rewrite failed, so it will be included in the flushed HFile? {quote} Maybe i am wrong, yeah we NOT rollback the memstore, but the mvcc will not be forward too Let's check it too.. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050438#comment-15050438 ] Heng Chen commented on HBASE-14004: --- IMO we should handle this situation, if we failed during recovery procedure, maybe the network or hdfs has some problems. Should we close RS directly? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050467#comment-15050467 ] Phil Yang commented on HBASE-14004: --- We can not rollback even if buffered-entries rewrite failed, or we may still have inconsistency between memstore and wal. So we can only retry and retry... until too many retries and must crash ourselves? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161489#comment-16161489 ] Hadoop QA commented on HBASE-14004: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 4s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 56s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 38m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 5s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s{color} | {color:red} hbase-server generated 16 new + 0 unchanged - 0 fixed = 16 total (was 0) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 96m 19s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}159m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:5d60123 | | JIRA Issue | HBASE-14004 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886413/HBASE-14004.patch | | Optional Tests | asflicense shadedjars javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 2683ddeb981c 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master /
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162335#comment-16162335 ] Duo Zhang commented on HBASE-14004: --- Good, no test failures. Let me fix the javadoc issues and upload the patch to RB. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162439#comment-16162439 ] Duo Zhang commented on HBASE-14004: --- [~stack] [~anoopsamjohn] PTAL. Thanks. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162524#comment-16162524 ] Hadoop QA commented on HBASE-14004: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 59s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 32s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 38s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 45m 17s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 34s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 93m 41s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}169m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:5d60123 | | JIRA Issue | HBASE-14004 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886568/HBASE-14004-v1.patch | | Optional Tests | asflicense shadedjars javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux bb785c0a15ab 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / d6db4a2 | | Default Java | 1.8.0_144 | | fi
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164280#comment-16164280 ] Duo Zhang commented on HBASE-14004: --- Ping [~stack] [~anoopsamjohn] [~ram_krish], this feature is necessary if we want to set AsyncFSWAL as our default WAL. Thanks. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164323#comment-16164323 ] ramkrishna.s.vasudevan commented on HBASE-14004: Will take a look at this @duo zhang. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165819#comment-16165819 ] Duo Zhang commented on HBASE-14004: --- Thanks [~ram_krish] for reviewing. Will commit later. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165857#comment-16165857 ] stack commented on HBASE-14004: --- [~Apache9] Sorry. I reviewed it a few days ago but forgot to publish the review. I just did. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167437#comment-16167437 ] Hadoop QA commented on HBASE-14004: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 2m 47s{color} | {color:red} HBASE-14004 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-14004 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12887272/HBASE-14004-v2.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/8641/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167720#comment-16167720 ] Hadoop QA commented on HBASE-14004: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 35s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 58s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 13s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 38m 28s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}113m 52s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}180m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:5d60123 | | JIRA Issue | HBASE-14004 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12887282/HBASE-14004-v3.patch | | Optional Tests | asflicense shadedjars javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux dfd14dafca5b 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 5c07dba | | Default Java | 1.8.0_144
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167727#comment-16167727 ] Duo Zhang commented on HBASE-14004: --- OK, all green. Let me commit. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167998#comment-16167998 ] Hudson commented on HBASE-14004: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3719 (See [https://builds.apache.org/job/HBase-Trunk_matrix/3719/]) HBASE-14004 [Replication] Inconsistency between Memstore and WAL may (zhangduo: rev 4341c3f554cf85e73d3bb536bdda33a83f463f16) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DisabledWALProvider.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestWALEntryStream.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReplicationService.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/AbstractFSWALProvider.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java * (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALFactory.java * (add) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALFileLengthProvider.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALProvider.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/RegionGroupingProvider.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WAL.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/RecoveredReplicationSource.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/wal/IOTestProvider.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168034#comment-16168034 ] Hudson commented on HBASE-14004: FAILURE: Integrated in Jenkins build HBase-2.0 #518 (See [https://builds.apache.org/job/HBase-2.0/518/]) HBASE-14004 [Replication] Inconsistency between Memstore and WAL may (zhangduo: rev d90f77ab7dc46a19893786b6a740bc73470fd779) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALProvider.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestWALEntryStream.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/wal/IOTestProvider.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WAL.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DisabledWALProvider.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/AbstractFSWALProvider.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALFactory.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java * (add) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALFileLengthProvider.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java * (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReplicationService.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/RegionGroupingProvider.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/RecoveredReplicationSource.java > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168299#comment-16168299 ] stack commented on HBASE-14004: --- This is a great fix. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169589#comment-16169589 ] Sean Busbey commented on HBASE-14004: - is this needed in branch-1? the doc mentions that the non-async wal still has the potential for the same failure. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169643#comment-16169643 ] Duo Zhang commented on HBASE-14004: --- The problem for FSHLog is complicated. One thing is that, syncing WAL is also an IPC, so a timeout does not mean fail, it's just unknown. But when syncing failed, we will just fail the log request and the upper layer will treat it as a failure which means the memstore will not be modified. I think this is much easier to happen than the scenario described in the above doc, so, this patch is a nice to have for branch-1, but there are still a lot of works to do for FSHLog... Thanks. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170374#comment-16170374 ] Ted Yu commented on HBASE-14004: TestReplicationSmallTests fails consistently on master. If I switch to commit f7a986cb67b55e36b58bf4b4934a2f32f29f538a, the test passes. Duo: Can you check ? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170982#comment-16170982 ] Duo Zhang commented on HBASE-14004: --- OK, will check later. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 3.0.0, 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184854#comment-16184854 ] stack commented on HBASE-14004: --- [~Apache9] Any chance of a release note here please boss? This is a great fix. Would be a shame for it to go under the radar. The release note will help folks realize what has happened in here. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185257#comment-16185257 ] Duo Zhang commented on HBASE-14004: --- Will do. Thanks for the reminding sir. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185726#comment-16185726 ] Hudson commented on HBASE-14004: FAILURE: Integrated in Jenkins build HBase-2.0 #597 (See [https://builds.apache.org/job/HBase-2.0/597/]) HBASE-18845 TestReplicationSmallTests fails after HBASE-14004 (zhangduo: rev 2e4c1b62884026ba8fc2d743d33a7f9d9125393e) * (edit) hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSmallTests.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185802#comment-16185802 ] Hudson commented on HBASE-14004: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3798 (See [https://builds.apache.org/job/HBase-Trunk_matrix/3798/]) HBASE-18845 TestReplicationSmallTests fails after HBASE-14004 (zhangduo: rev 239e6872674ff122ecec2d8d6a557b269e6ae54b) * (edit) hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSmallTests.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195103#comment-16195103 ] Hudson commented on HBASE-14004: Results for branch HBASE-18467, done in 4 hr 24 min and counting [build #136 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18467/136/]: FAILURE details (if available): (x) *{color:red}-1 overall{color}* Committer, please check your recent inclusion of a patch for this issue. (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18467/136//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 checks{color} -- For more information [see jdk8 report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18467/136//JDK8_Nightly_Build_Report/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver, Replication >Reporter: He Liangliang >Assignee: Duo Zhang >Priority: Critical > Labels: replication, wal > Fix For: 2.0.0-alpha-4 > > Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, > HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch > > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. > == > This is a long lived issue. The above problem is solved by write path > reorder, as now we will sync wal first before modifying memstore. But the > problem may still exists as replication thread may read the new data before > we return from hflush. See this document for more details: > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# > So we need to keep a sync length in WAL and tell replication wal reader this > is limit when you read this wal file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)