[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2016-02-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149497#comment-15149497
 ] 

stack commented on HBASE-14004:
---

bq. We could try rejiggering the order in which memstore gets updated, putting 
it off till after the sync. 

Since "HBASE-15158 Change order in which we do write pipeline operations; do 
all under row locks", we don't do memstore rollbacks. FYI.



> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2016-02-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149552#comment-15149552
 ] 

stack commented on HBASE-14004:
---

But, as noted elsewhere, this fact does not solve this issue (what made it into 
the stream..)

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2016-11-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625326#comment-15625326
 ] 

Duo Zhang commented on HBASE-14004:
---

I think we need to pick this up.

With AsyncFSWAL, it is not safe to use DFSInputStream to read the WAL file 
directly until EOF when it is still open. The data we read maybe disappear 
later. FSHLog also has this problem but it is much safer... See this document 
for more details

https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#

The problem only happens when the WAL file is still open. AFAIK, if a RS is 
alive, then its WAL will always be replicated by itself. So I think it is 
possible that we expose an API to tell the ReplicationSource the safe length to 
read of an opened WAL file. And for a ReplicationSource that replicates WAL of 
other RS, then we can make sure the RS is dead and all its WALs should also be 
closed(we can also make sure it by calling recoverLease). So it is safe to read 
it until EOF with DFSInputStream.

Any concerns?  If not, Let's start working!

Thanks.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-27 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977691#comment-14977691
 ] 

Heng Chen commented on HBASE-14004:
---

I have a proposal.
When WAL sync failed, and master has to rollback Memstore.  
We can record this action in ZK or system table,  meanwhile all slaves should 
sync this action and modify its memstore.

Any concerns? 



> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-27 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977696#comment-14977696
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
When WAL sync failed, and master has to rollback Memstore. 
We can record this action in ZK or system table, meanwhile all slaves should 
sync this action and modify its memstore.
{quote}

Of course, before slave rollback memstore, it should check the timestamp of 
rollback action and WALs to replay.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-27 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977769#comment-14977769
 ] 

Duo Zhang commented on HBASE-14004:
---

The problem here not only effects replication.

{quote}
As a result, the handler will rollback the Memstore and the later flushed HFile 
will also skip this record.
{quote}

What if the regionserver crashed before flushing HFile? I think the record will 
come back since it has already been persisted in WAL.

Add a marker maybe a solution, but you need to check the marker everywhere when 
replaying WAL, and you still need to deal with the failure when placing 
marker... I do not think it is easy to do...

The basic problem here is we may have inconsistency between memstore and WAL 
when we fail to sync WAL.
A simple solution is killing the regionserver when we fail to sync WAL which 
means we will never rollback memstore but reconstruct it using WAL. We can make 
sure there is no difference between memstore and WAL under this situation.
If we want to keep regionserver alive when syncing failed, then I think we need 
to find the real result of the sync operation. Maybe we could close the WAL 
file and check its length? Of course, if we have lost the connection to 
namenode, I think there is no simple solution other than killing the 
regionserver...

Thanks.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-27 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977841#comment-14977841
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
What if the regionserver crashed before flushing HFile? I think the record will 
come back since it has already been persisted in WAL.
{quote}
Indeed, as current logic, when sync failed, the memstore will rollback,  and 
client will be told 'write failed'.  
And if RS crash before memstore flush,  the record will came back after WAL 
replayed.  So client will found write action is not failed, it is inconsistent!

{quote}
Add a marker maybe a solution, but you need to check the marker everywhere when 
replaying WAL, and you still need to deal with the failure when placing 
marker... 
{quote}
Agreed!

Thanks [~Apache9] for your reply!
 

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-28 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977908#comment-14977908
 ] 

Anoop Sam John commented on HBASE-14004:


In case when RS is killed and so later WAL replay will come in and possibly add 
back the Mutation, then also the inconsistency with client reply will come in. 
Client would have told that the mutation failed!

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-28 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978004#comment-14978004
 ] 

Yu Li commented on HBASE-14004:
---

{quote}
It's possible that the HDFS sync RPC call fails, but the data is already (may 
partially) transported to the DNs which finally get persisted
{quote}
Does this really happen on your cluster or it's just an assumption 
[~heliangliang] [~chenheng]? 

>From what I could see we will get EOFException when trying to read partially 
>written entries (those larger than 64k and split into multiple packets and 
>some packets got lost due to network or dfs error) and these entries will be 
>abandoned during replication.

If in real case the sync RPC call fails but data somehow persisted, it's more 
like a bug in HDFS or error usage in HBase (like not handling all possible 
exceptions, etc.). I'd suggest to dig deeper into this first before any further 
discussion.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-28 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978038#comment-14978038
 ] 

Heng Chen commented on HBASE-14004:
---

On the assumption that HDFS sync successfully and do response,  
but RS NOT get this response because of network disconnect.  As for RS, sync 
timeout but data persisted on HDFS.  [~carp84]  

As [~Apache9] mentioned, 
{quote}
A simple solution is killing the regionserver when we fail to sync WAL which 
means we will never rollback memstore but reconstruct it using WAL.
{quote} 
It may cause two problems.
 * we may do mutation twice,  (it seems no problems if mutation is incr/append 
in current logic?)
 * All RS will be killed if network between RS and NN disconnected shortly.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-28 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978108#comment-14978108
 ] 

Duo Zhang commented on HBASE-14004:
---

[~carp84] This is an inherent problem of RPC based systems due to temporary 
network failure. HBase use {{hflush}} to sync WAL,  I do not know the details 
that if hflush will call namenode to update length, but in any case, the last 
RPC call could fail at client side but succeed at server side(network failure 
when writing return value back).

And sure, this should be a bug in HBase. I checked the code, if an exception is 
thrown from hflush, {{FSHLog.SyncRunner}} simply passes it to upper layer. So 
it could happen that hflush is succeeded at HDFS, but HBase think it is failed 
and cause inconsistency.

I think we need to find a way to make sure whether the WAL is actually 
persisted at HDFS. And if DFSClient already has retry, then I think killing 
regionserver is enough? Any suggestions [~carp84]?

Thanks.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-11-10 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999866#comment-14999866
 ] 

Yu Li commented on HBASE-14004:
---

[~chenheng] [~Apache9]
Sorry for the late response, somehow didn't notice the jira update mail. See 
your point now, thanks for the explanation.

{quote}
I do not know the details that if hflush will call namenode to update length
{quote}
Checking the code, SyncRunner will call ProtobufLogWriter#sync and finally call 
DataOutputStream#hflush, there we could know it will only call nn to update 
length when new block created, but won't while filling one already created one.

{quote}
I think we need to find a way to make sure whether the WAL is actually 
persisted at HDFS
{quote}
Agree. Just notice you've created HBASE-14790 and let's discuss more details 
there. :-)

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-04 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042694#comment-15042694
 ] 

Phil Yang commented on HBASE-14004:
---

After discussion in HBASE-14790 , we can move forward now. Let me repost my 
comment in HBASE-14790 first :)
{quote}
Currently there are two scenarios which may result in inconsistency between two 
clusters.

The first is master cluster crashes(for example, power failure) or three DNs 
and RS crash at the same time and we lost all data that is not flushed to DNs' 
disks but the data have been already synced to slave cluster.

The second is we will rollback memstore and response client an error if we get 
a error on hflush but the log may indeed exists in WAL. This will not only 
results in inconsistency between two clusters but also gives client a wrong 
response because the data will "revive" after replaying WAL. This scenario has 
been discussed in HBASE-14004

Comparing to the second, it is easier to solve the first scenario that we can 
tell ReplicationSource it can only read the logs that is already saved on three 
disks. We need to know the largest WAL entry id that has been synced. So HDFS's 
sync logic for itself may be not helpful for us and we must use hsync to let 
HBase know the entry id. So we need a configurable periodically hsync here, and 
if we have only one cluster it is also helpful to reduce data losses because of 
data center power failure or unluckily crashing three DNs and RS at the same 
time.

For the second scenario, it is more complex because we can not rollback 
memstore and tell client this operation failed unless we are very sure the data 
will never exist in WAL, and mostly we are not sure... So we have to use a new 
WAL logic that rewriting the entry to the new file rather than rollback. To 
implement this we need to handle duplicate entries while replaying WAL.
{quote}

Therefore, we may have 4 subtasks:
1: A configurable periodically hsync logic to make sure our data has been saved 
on disks. It is also helpful for single cluster mode.
2: ReplicationSource should only read WAL that is hsynced to prevent slave 
cluster having data that master losses.
3: WAL reader can handle duplicate entries, in other words, make WAL logging 
idempotent. 
4: Fixing HBase writing path that we should retry logging WAL in a new file 
rather than rollback MemStore.

Thoughts?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-05 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042744#comment-15042744
 ] 

Duo Zhang commented on HBASE-14004:
---

{quote}
Fixing HBase writing path that we should retry logging WAL in a new file rather 
than rollback MemStore.
{quote}
To be clear, this means we will hold the {{WAL.sync}} request if there are some 
entries have already been written out but not acked and never return until we 
successfully write them out and get ack back. And if {{WAL.sync}} or 
{{WAL.write}} fails(maybe due to queue full), we will still rollback MemStore 
since we can confirm that the WAL entries have not been written out. Right?

And I think there is another task for us. Now the DFSOutputStream does not 
provide a public method to get acked length. We can open a issue of HDFS 
project and use reflection first in HBase. But there is still a problem that 
{{hflush}} or {{hsync}} does not return the acked length which means get acked 
length and {{hsync}} are two separated operations so it is hard to get the 
exact acked length after calling {{hsync}}. Maybe we could get current total 
write out bytes first(not acked length) and then call {{hsync}}, the acked 
length after calling {{hsync}} must be larger than this value so it is safe to 
use this value as "acked length". Any thoughts?

Thanks.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-05 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042765#comment-15042765
 ] 

Phil Yang commented on HBASE-14004:
---

Is it required to use the size of serialized binary data? I don't know if there 
is a sequence increment unique id for each wal log. If so or if we can add 
this, we can know what is the largest id that has been hsynced, right? And this 
id can also help us on replaying duplicate entries.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-05 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043246#comment-15043246
 ] 

Duo Zhang commented on HBASE-14004:
---

Sounds great, an incremental unique id of WAL entry is better since it is 
managed by ourselves.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043657#comment-15043657
 ] 

stack commented on HBASE-14004:
---

bq. ReplicationSource should only read WAL that is hsynced to prevent slave 
cluster having data that master losses.

This will require big change in how replication works but for the better and 
replication will be less resource intense because less NN ops (if crash, we ask 
NN for file length, not ZK? If so, this would be a task we have been needing to 
do for a long time; i.e. undo keeping replication position in zk).

bq.  WAL reader can handle duplicate entries, in other words, make WAL logging 
idempotent. 

Might have to add some code to reader to skip an entry it has seen before (this 
may be there already -- need to check).

bq. Fixing HBase writing path that we should retry logging WAL in a new file 
rather than rollback MemStore.

This is new but has been done before.

I'd be up for helping w/ WAL changes, stuff like keeping around appends until 
the sync for them comes in (I've messed w/ this before), and would be 
interested in helping out on replication log length accounting changing it from 
relying on reopen after it gets EOF and keeping length in zk.

You fellas are fixing a few fundamental issues here. Sweet.

bq. we will still rollback MemStore since we can confirm that the WAL entries 
have not been written out. Right?

We could try rejiggering the order in which memstore gets updated, putting it 
off till after the sync. The order we have now came about long time ago when 
WAL was very different. We might be able to change the order, simplify the 
write pipeline, and not lose too much perf (or, perhaps, get more perf because 
we are doing healthier group commits).

bq.  Maybe we could get current total write out bytes first(not acked length) 
and then call hsync, the acked length after calling hsync must be larger than 
this value so it is safe to use this value as "acked length". 

It would be good if hbase could calculate the written length itself. We could 
try it. What happens if we want to compress WAL or what about crc tax (I 
suppose this latter would be a constant -- and for the former, maybe we could 
figure then length... even on compress if per edit or per batch)

bq.  I don't know if there is a sequence increment unique id for each wal log. 

There is such a sequenceid but it is by-region, not global.  Could keep 
sequence id by region accounts? (We already do this elsewhere).



> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-05 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043664#comment-15043664
 ] 

Duo Zhang commented on HBASE-14004:
---

{quote}
This will require big change in how replication works but for the better and 
replication will be less resource intense because less NN ops (if crash, we ask 
NN for file length, not ZK? If so, this would be a task we have been needing to 
do for a long time; i.e. undo keeping replication position in zk).
{quote}
I think we should have two branches to determine how many entries can we read. 
One is for closed WAL file, one is for the WAL still being written. We can get 
this information using {{DistributedFileSystem.isFileClosed}}. If the file is 
already closed, then we could use the length that gotten from HDFS. If the file 
is still opened for writting, then we should ask the rs who is writing it for 
the safe length. If we can not find the rs(maybe it has already crashed), then 
we could wait a minute since namenode will finally recover its lease and close 
the file.

{quote}
There is such a sequenceid but it is by-region, not global. Could keep sequence 
id by region accounts? (We already do this elsewhere).
{quote}
So maybe we still need to use "acked length", not "acked id". But this is 
enough to filter out duplicate WAL entries I think.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-05 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043721#comment-15043721
 ] 

Yu Li commented on HBASE-14004:
---

Nice discussion [~yangzhe1991] and [~Apache9].

FWIW, two questions about the Phil's proposal:
1. What the logic would be like if the durability is set to ASYNC in table 
descriptor? Is the following case possible to happen?:
1) entry write into memstore
2) region reassign to other RS thus content in memstore got flushed into 
hfile
3) wal sync/write failed
In this case we might run into another kind of inconsistency, say master 
cluster has the data but slave doesn't?

2. About {{WAL logging idempotent}}, maybe we also need to consider the 
cross-RS case when region assign happens before wal sync acked?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044308#comment-15044308
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
To be clear, this means we will hold the WAL.sync request if there are some 
entries have already been written out but not acked and never return until we 
successfully write them out and get ack back. And if WAL.sync or WAL.write 
fails(maybe due to queue full), we will still rollback MemStore since we can 
confirm that the WAL entries have not been written out. Right?
{quote}
I have a big concern about it.  If we not configure hsync every time( hsync  
periodically),  it means there are always some entries we make hflush but not 
hsync.  And as our logical designed, when one hflush failed, we close old wal 
and open a new one,  the entries which not hsync will be written into new WAL.  
 
If RS crashed at this time,  what will happen?  Is it means some entries may be 
already in place (you have told to client your mutation successed and data was 
really in place on DN already) will lost.  I think it is a regression.  Because 
one failed mutation may cause more mutations inconsistency.
I think it is also [~carp84] concern as his problem.





> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044315#comment-15044315
 ] 

Phil Yang commented on HBASE-14004:
---

{quote}
If RS crashed at this time, what will happen?
{quote}
If only RS crashed, DNs not, there is no data loss because they are hflushed to 
DNs' memory. Only RS+DNs all crashed causes data loss. But hsync every time 
will increase latency, so users can configure on this issue.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044318#comment-15044318
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
If only RS crashed, DNs not, there is no data loss because they are hflushed to 
DNs' memory.
{quote}
If i understand correctly,  this is my thought.
If one hlush failed, we close old WAL and write entries not hsync to new WAL, 
right?  You close old WAL as old acked length, right?  'Acked length' is 
updated by hsync, right?  If RS crashed at this time, when WAL replayed by 
other RS,  the entries not hsynced in Queue (whatever it is) will lost, right?


> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044319#comment-15044319
 ] 

Duo Zhang commented on HBASE-14004:
---

{quote}
 Is it means some entries may be already in place (you have told to client your 
mutation successed and data was really in place on DN already) will lost.
{quote}
If all 3 DNs are all crashed and the RS is also crashed then yes, you have no 
choice unless hsync every time to prevent this. And this does not related to 
the scenario you said I think, we do not return SUCESS to client until we got a 
successful hflush response.

And what [~liyu] said is another problem. Flush WAL async(or even do not write 
WAL) is known to have the risk of losing data. We need to determine what we can 
guarantee when user use these configurations.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044321#comment-15044321
 ] 

Duo Zhang commented on HBASE-14004:
---

{quote}
If one hlush failed, we close old WAL and write entries not hsync to new WAL, 
right? You close old WAL as old acked length, right? 'Acked length' is updated 
by hsync, right? If RS crashed at this time, when WAL replayed by other RS, the 
entries not hsynced in Queue (whatever it is) will lost, right?
{quote}
No. You can close old WAL using any length that larger than the previous 
succeeded hflushed length, not the hsynced length.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044327#comment-15044327
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
No. You can close old WAL using any length that larger than the previous 
succeeded hflushed length, not the hsynced length.
{quote}
OK,  i got it now.  The 'acked length' only be used in Replication.  We need 
another 'hflushed length' to close WAL.  So data only lost when all 3 DNs and 
RS crashed.
Thanks [~Apache9] for your explain. 

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044333#comment-15044333
 ] 

Phil Yang commented on HBASE-14004:
---

I have a new concern about the configuration. Now HBase only use hflush, if we 
add a hsync logic, will there be degradation of performance? Should we still 
allow users disable hsync just like before? If so, what is the default 
configure? If user disable hsync, what should we do for ReplicationSource?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044341#comment-15044341
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
Now HBase only use hflush, if we add a hsync logic, will there be degradation 
of performance? 
{quote}
I think default configuration is hsync periodically. 
In normal path, because of  async hsync, IMO it is OK for performance.  But if 
hflush failed,  the handler will be hang to process recovery,  if it costs too 
much time, because retry of client,  maybe all handlers will be exhausted soon. 
It may be another regression as current logical.



> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044355#comment-15044355
 ] 

Yu Li commented on HBASE-14004:
---

bq. if we add a hsync logic, will there be degradation of performance
IMHO for 100% no-data-loss guarantee, we have to sacrifice performance, more or 
less. This is a trade-off and user should be able to make their choice. 
However, there's no real fsync support in HBase yet, although quite some 
efforts paid like HBASE-5954 ([~lhofhansl] and [~stack], please correct me if 
I've stated anything wrong here, thanks). Not sure whether I'm off the topic 
but somehow I feel all these things are related and indeed we are trying to fix 
a few fundamental issues here just like stack mentioned.

bq. Should we still allow users disable hsync just like before?
I think yes, we should leave an option here, user might care more about 
performance and would like to take the relative low all-3-DN-crashes risk, I 
guess.

bq. If so, what is the default configure?
I guess this depends on the final perf number of the new design and 
implementation

bq. If user disable hsync, what should we do for ReplicationSource?
I think we should go back to old logic when user choose to.

btw, could see quite some discussion here and maybe a doc summarizing the basic 
design, existing discussion conclusion and left-over questions would be good 
for understanding and further discussion? :-)

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044368#comment-15044368
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
btw, could see quite some discussion here and maybe a doc summarizing the basic 
design, existing discussion conclusion and left-over questions would be good 
for understanding and further discussion?
{quote}

Good suggestion.  We need one design doc,  [~yangzhe1991] [~Apache9] do you 
have time to prepare a doc for this issue. If you don't have time, i can write 
one.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-06 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044380#comment-15044380
 ] 

Phil Yang commented on HBASE-14004:
---

Sure, I'm going to draft a document for summarizing our ideas. [~Apache9] and I 
can have more offline discussion so I can post an initial version that 
satisfies both of us and then ask for your suggestion :)

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044563#comment-15044563
 ] 

Heng Chen commented on HBASE-14004:
---

Looks good. Except one thing.
{code}
If we get timeout error finally, we should open a new file and write/hflush all 
data in buffer to it and use a background thread to close old file using any 
length that larger than the previous succeeded hflushed length.
{code}
IIRC,  we should ensure seqId of WAL edits increased. Otherwise,  we will 
failed when replay WALs. 
Some entries in new WAL may have smaller seqid than entries in old WAL as our 
logical, right?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044576#comment-15044576
 ] 

Phil Yang commented on HBASE-14004:
---

{quote}
Some entries in new WAL may have smaller seqid than entries in old WAL as our 
logical, right?
{quote}
Yes, that is why we should make sure WAL reader can filter entries that it has 
seen before. Two entries with same id may not adjacent, maybe two WAL files are 
just like this:
[1, 2, 3, 4, 5, 6]
[4, 5, 6, 7]
If when writing first WAL we only make sure we have hflushed entries no larger 
than 3


> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044603#comment-15044603
 ] 

Heng Chen commented on HBASE-14004:
---

It seems that WAL reader is designed for one WAL,   we can't filter entries 
from other WAL in WAL Reader.  
Maybe we should do something to filter entries when replay.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread He Liangliang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044616#comment-15044616
 ] 

He Liangliang commented on HBASE-14004:
---

Sorry for the late reply. What about this approach:
1. Keep the current synced sequence id in memory and make sure the replicator 
does not read over this id when replicating.
2. When rolling WAL, record the final id in zk replication queue and also mark 
the file as rolled in-memory so the local replicator can know this file is 
finished.
3. When the replication queue is failovered, the new replicator will check the 
recorded sequence id, if it's available, it's successfully rolled, otherwise, 
read until EOF. For the later case, we need make sure the edits after the last 
successful sync of that log must be replayed to ensure consistency. Recording 
the last successful synced sequence id when flushing can guarantee this.

There overhead is insignificant (just a memory barrier for the volatile 
sequence id passing to the replicator).

Guess this may be the original design since there is TODO comments in 
FSHLog.java:
TODO:
 * replication may pick up these last edits though they have been marked as 
failed append (Need to
 * keep our own file lengths, not rely on HDFS).

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044643#comment-15044643
 ] 

Phil Yang commented on HBASE-14004:
---

{quote}
It seems that WAL reader is designed for one WAL, we can't filter entries from 
other WAL in WAL Reader. 
Maybe we should do something to filter entries when replay.
{quote}
You are right, the logic that ignores logs which are already in HFiles is in 
HRegion.replayRecoveredEdits, so the logic of filtering seen logs should also 
be there?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044666#comment-15044666
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
You are right, the logic that ignores logs which are already in HFiles is in 
HRegion.replayRecoveredEdits, so the logic of filtering seen logs should also 
be there?
{quote}
yeah, maybe we should store the max seqid of 'acked hflushed' wal entries in 
some places (ZK or system table),   we could use this information to skip 
entries in new WAL.



> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044685#comment-15044685
 ] 

Phil Yang commented on HBASE-14004:
---

{quote}
Guess this may be the original design since there is TODO comments in 
FSHLog.java:
{quote}
If my understanding is right, these comments think we should truncate WAL file 
according to the position of the last synced log so we can avoid replaying or 
replicating edits that have been regarded as failed by clients. However, even 
if RS know where we should truncate. RS may crash after telling clients failing 
and before truncating. So I think there is a better idea that we do not 
truncate and only rewrite the logs to the new file and do not telling clients 
failing after we make WAL logging idempotent.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread He Liangliang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044702#comment-15044702
 ] 

He Liangliang commented on HBASE-14004:
---

We don't need to guarantee "avoid replaying or replicating edits that have been 
regarded as failed by clients". In another word, the client needed't 
discriminate between timeout and sync error, right?

In fact, the most probable case would like this:
1. DN have problems
2. HBase client timeout
3. HDFS client timeout and sync fails

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044835#comment-15044835
 ] 

Phil Yang commented on HBASE-14004:
---


{quote}
In another word, the client needn't discriminate between timeout and sync 
error, right?
{quote}
For a HBase client which has retry logic, timeout and other errors result in 
retrying with no difference, so it seems not important that we must guarantee 
the non-timeout errors should make sure the data will never exist in database, 
although I think it is necessary for a reliable database. Different users may 
have different requirement, what do other fellas think on this question?

And I think there is not a big difference between your approach and part of my 
design. There is only difference on implementation.

If we need the guarantee above that we should differentiate timeout error and 
non-timeout error, we can save on zk then write to new file (your idea) or just 
write to new file but skip reading when replaying if repeating (my idea), 
before acking client success. My idea may be more fast because it needn't send 
request to zk.

If we needn't the guarantee, we can rollback memstore, ack fail to client and 
do other work async because we are not afraid of RS crashing right now.

And I think the major difference between our ideas is you don't change the 
logic about WAL's sync. However, I think the currently logic may be not perfect 
because hflush only write data to three DNs memory, which is not the real 
persistence just like users thought. If RS and three DNs down, or the whole 
cluster down because of some serious issue, the data that is not synced to DNs' 
disks will be lost. This issue not only affects the inconsistency between two 
clusters, but also confuses users that actually we don't save your data on 
disk. I think it may not be good :(


> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044970#comment-15044970
 ] 

Phil Yang commented on HBASE-14004:
---

Furthermore, I find another issue that since HDFS-744 supported hsync(), it 
also added CreateFlag.SYNC_BLOCK in FileSystem.create() that "Force closed 
blocks to disk". Which means it will send a syncBlock flag in the last 
DFSPacket in every endBlock(). If we don't add this just like currently HBase, 
the files we save on HDFS are not synced to disk immediately. We are having the 
risk of losing data when we just flush a MemStore into HFile or we just compact 
some HFiles because we think these data have been saved and we may delete WAL 
or old HFiles, right?

If I am not wrong, we can create a new issue to have discussion there since it 
is independent

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046226#comment-15046226
 ] 

Yu Li commented on HBASE-14004:
---

bq. My idea may be more fast because it needn't send request to zk
Considering the fail over case (like RS crash), I guess we need to persist the 
acked length to somewhere like zk, or else we will still replicate the 
non-acked data to slave cluster when recover?

bq. We are having the risk of losing data when we just flush a MemStore into 
HFile
I believe HBASE-5954 tried to resolve the same problem, and would suggest to 
pay a visit there. :-)

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046286#comment-15046286
 ] 

Phil Yang commented on HBASE-14004:
---

{quote}
I guess we need to persist the acked length to somewhere like zk, or else we 
will still replicate the non-acked data to slave cluster when recover?
{quote}
If we persist the acked length to zk and RS crash before saving to zk, what 
will happen? There is always a situation that we haven't done anything after we 
get a timeout on hflush/hsync or even we crash before getting ack. For hflush, 
if we hold the request, client will not get any response which means after 
restarting we can either make it success or fail for this request. For hsync, 
RS has crashed and it will replay the log after restarting, but we can not make 
sure the data that is not acked by hsync is on DNs' disks or memories, so 
ReplicationSource may can only wait until RS restart because we can not make 
sure if the following visible data will be hsynced to disk. If 
ReplicationSource read anyway and then three DNs crash, slave will have more 
data than master...

And if we add a CreateFlag.SYNC_BLOCK flag when creating WAL file, we can make 
sure that closed file must be on disks, so ReplicationSource can wait namenode 
close the file automatically if RS doesn't recover and read the whole file, 
right?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046292#comment-15046292
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
I guess we need to persist the acked length to somewhere like zk, or else we 
will still replicate the non-acked data to slave cluster when recover?
{quote}

There is no need to do it.  Replication Queue store current position of WAL in 
ZK which has been replicated.  If replicator only reads 'hsynced' entries,  it 
will not replicate non hsynced data to slave when recover.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046371#comment-15046371
 ] 

Yu Li commented on HBASE-14004:
---

It seems to me that replication queue only records where to *start* reading but 
not where to *end*, and the acked length exactly records where to end? If e 
don't persist this acked length, during failover we still will read until EOF?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046385#comment-15046385
 ] 

Yu Li commented on HBASE-14004:
---

Assume this situation:
1. RS received an ack from DFSClient, and record it in memory, not on zk
2. ReplicationSource read from the last synced length, but RS crashed before it 
reach the new ack length
3. Replication queue recovery starts on another RS, where no place to get the 
acked length for the old-crashed RS
In this case, where will step 3 ends? If read until it reaches EOF, then the 
inconsistency issue we try to resolve here happen again, right?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046395#comment-15046395
 ] 

Duo Zhang commented on HBASE-14004:
---

{quote}
 If e don't persist this acked length, during failover we still will read until 
EOF?
{quote}
It depends on the file state. If it is already closed, then we could use the 
file length get from namenode. If it is still opened for writing, then we 
should ack for the acked length. You could already get the acked length unless 
the RS is crashed, but if the RS is crashed, then the file will finally be 
closed by NN.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046398#comment-15046398
 ] 

Duo Zhang commented on HBASE-14004:
---

{quote}
3. Replication queue recovery starts on another RS, where no place to get the 
acked length for the old-crashed RS
In this case, where will step 3 ends? If read until it reaches EOF, then the 
inconsistency issue we try to resolve here happen again, right?
{quote}
If the region is assigned to another RS, then the new RS need to replay WAL to 
reconstruct the memstore, at this time, even unacked WAL entries that appear in 
the WAL file will also be replayed, right? 

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046401#comment-15046401
 ] 

Yu Li commented on HBASE-14004:
---

bq. then the file will finally be closed by NN
I don't think *finally be closed* is enough, since the replication queue 
recover will start as soon as Master detects RS crash and issue the recovery 
task. I think we need to handle this case that file not closed but recovery 
already started, right?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046405#comment-15046405
 ] 

Duo Zhang commented on HBASE-14004:
---

{quote}
I don't think finally be closed is enough, since the replication queue recover 
will start as soon as Master detects RS crash and issue the recovery task.
{quote}
HBase uses NN to do fencing which means HMaster will only consider an RS is 
dead when all of its WAL file is closed.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046414#comment-15046414
 ] 

Heng Chen commented on HBASE-14004:
---

Why we need 'where to end' when recover replication?   We just need to start 
replicate entries where replication were broken when RS failed, right?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046416#comment-15046416
 ] 

Heng Chen commented on HBASE-14004:
---

Why we need 'where to end' when recover replication?   We just need to start 
replicate entries where replication were broken when RS failed, right?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046415#comment-15046415
 ] 

Heng Chen commented on HBASE-14004:
---

Why we need 'where to end' when recover replication?   We just need to start 
replicate entries where replication were broken when RS failed, right?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046422#comment-15046422
 ] 

Duo Zhang commented on HBASE-14004:
---

I do not think we can benefit a lot from storing acked length to zookeeper. If 
we do it periodically, then we could still meet the problem that we flush 
things out and then crashed so the new acked length is not stored on zk yet. If 
we do it at every hflush synchronously, then I think zookeeper will be fucked 
up in a large cluster...

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046439#comment-15046439
 ] 

Yu Li commented on HBASE-14004:
---

bq. HBase uses NN to do fencing which means HMaster will only consider an RS is 
dead when all of its WAL file is closed.
Oh yes, the recover lease logic could make sure of this... Ok, now I agree that 
we don't need to persist the acked length on zk, during failover the existing 
logic could make sure of consistency. And the to-be-implemented WAL idempotent 
feature will make sure that "WAL replay of unacked entry" and "client retry 
after RS crash" won't conflict. Thanks for the explanation [~Apache9]!

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-07 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046475#comment-15046475
 ] 

Yu Li commented on HBASE-14004:
---

You are right Heng, and Duo also explained about this. Thanks. :-)

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-08 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046584#comment-15046584
 ] 

Heng Chen commented on HBASE-14004:
---

I make a patch for skip duplicate entries when replay,  i found something new 
which we didn't considered.

let's open a new issue for it, and we can discuss something there.  HBASE-14949

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046656#comment-15046656
 ] 

stack commented on HBASE-14004:
---

Doc is great. I added a few little notes...

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-08 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047929#comment-15047929
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
Have to be careful how we do this. We have elaborate accounting now that allows 
only one 'sync' type, either a hflush or a hsync, but not a mix of both.
{quote}
After read notes in doc, i begin to agree with stack.   
Why we need hsync?  The concern about using 'hflush' is we may lost data when 3 
DNs and RS crash at the same time, right?  It is really small probability. But 
if we introduce hsync (for example hsync periodically), it will cause latency 
between master and slave. Is it worth to do it? 

And inconsistency problem about this issue in replication could be fixed if 
replicator only read entries 'acked hflushed' just like we do in recovery 
process when hflush failed, right? And as our design,  we only use hsync to 
ensure  data inconsistency in replication but data lost still happen because we 
NOT use 'hsync' in write path. If so, why NOT we just use hflush?  

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-08 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047976#comment-15047976
 ] 

Phil Yang commented on HBASE-14004:
---

{quote}
And as our design, we only use hsync to ensure data inconsistency in 
replication but data lost still happen because we NOT use 'hsync' in write 
path. If so, why NOT we just use hflush?
{quote}

This is why I think hsync should be configurable. For a database, mostly we 
should have the guarantee of data persistence. But sometimes we will sacrifice 
it for higher performance. For example, Redis's aof can be configured to fsync 
every write, each second or never. Users can configure it according to their 
requirement. Although "each second" will still lose data after crashing but 
users will be guaranteed that they will lose data in at most one second, which 
is still a valuable guarantee. However, currently HBase has no guarantee about 
this and users may thought their data has already saved on disks, and we have 
no idea when our data will be saved on disks. Both WAL and SSTable have this 
issue. And obviously this will result in data inconsistency between two 
clusters, too.

I will not oppose if we only fix (or fix it first) the issue that we 
transferring data which has been rollbacked in MemStore. And we can use the ack 
size of hflush which means there won't be additional latency between two 
clusters. But I think there should be a follow-up work on data persistence and 
we need a configurable hsync in HBase :)

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-08 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047986#comment-15047986
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
 But I think there should be a follow-up work on data persistence and we need a 
configurable hsync in HBase 
{quote}
It is really a problem. But i think it will be done in another issue.  Maybe 
you can send an email to dev list.  And in this issue, i suppose we just ensure 
correctness with hflush, wdyt?


> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048010#comment-15048010
 ] 

stack commented on HBASE-14004:
---

This is the issue where client can say what durability to apply per edit. Lars 
did a load of work on this and issue is filled with good stuff in it. In end, 
what came out was how it is hard to allow client saying hflush/hsync or no sync 
in current setup.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-08 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048137#comment-15048137
 ] 

Phil Yang commented on HBASE-14004:
---

{quote}
And in this issue, i suppose we just ensure correctness with hflush, wdyt?
{quote}
Agree :) , we can focus on the inconsistency between WAL and Memstore which 
also results in inconsistency between master and slave. And if we don't use 
hsync, I think we need not change the logic of replicator which means we 
needn't only transfer data that is hflushed because all entries in WAL should 
finally be in memstore, right?

So we may have only two subtasks now:
1: WAL reader can handle duplicate entries, in other words, make WAL logging 
idempotent. 
2: WAL logger does not throw exception if it can not make sure whether the 
entry is saved on HDFS or not(for example, hflush timeout), it retry to write 
entry to a new file and close old file asynchronously

I change the description of the second subtask because we needn't care the 
logic of HRegion, which is now "write memstore->write wal->rollback if wal 
fail" and may be changed to "write wal->write memstore if wal succeed"



> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-08 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048171#comment-15048171
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
And if we don't use hsync, I think we need not change the logic of replicator 
which means we needn't only transfer data that is hflushed because all entries 
in WAL should finally be in memstore, right?
{quote}

Think about this situation.  
In write path, some entries hflushed but not acked,  so we close old WAL with 
acked length,  and try to write this entries into new WAL, then RS crashed.  
Slave maybe has replicate this entries but RS after recovery in master will 
lost them, right?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-09 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048552#comment-15048552
 ] 

Phil Yang commented on HBASE-14004:
---

You are right, we should change the logic of replicator.

And I am not an expert so have a question about the idempotent of HBase 
operation: What will happen if we replay an entry more than once? Considering 
these scenarios, the number is the seq ids:

1, 2, 3, 4, 5---this is normal order
1, 3, 2, 4, 5---the order is wrong but each log we only read once.
1, 1, 2, 3, 4, 5---we replay one entry twice but they are continuous
1, 2, 3, 1, 4, 5we replay one entry twice and they are not continuous
1, 2, 3, 1, 2, 3, 4, 5---the order is wrong but the subsequence is repeat so we 
make sure the order

Are they all wrong except the first? It seems that the last one is not wrong?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049009#comment-15049009
 ] 

stack commented on HBASE-14004:
---

Not sure I understand the question. Replication reads the WAL in order and 
sends to the remote cluster all it has read in order. When it gets to the 
remote side, all should be 'applied' in order. How we get your scenarios #2, 
#3, etc., above?

bq. In write path, some entries hflushed but not acked, so we close old WAL 
with acked length, and try to write this entries into new WAL, then RS crashed. 
Slave maybe has replicate this entries but RS after recovery in master will 
lost them, right?

Isn't this essentially the description that leads off this JIRA?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-09 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049048#comment-15049048
 ] 

Phil Yang commented on HBASE-14004:
---

The reason I asked this question is if we don't have to make sure we must 
replay wal in order and only once for each log, it may be easier to resolve 
HBASE-14949 which is being fixed by Heng

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-09 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049947#comment-15049947
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
1, 2, 3, 4, 5---this is normal order
1, 3, 2, 4, 5---the order is wrong but each log we only read once.
1, 1, 2, 3, 4, 5---we replay one entry twice but they are continuous
1, 2, 3, 1, 4, 5we replay one entry twice and they are not continuous
1, 2, 3, 1, 2, 3, 4, 5---the order is wrong but the subsequence is repeat so we 
make sure the order
{quote}
[~yangzhe1991]  only the first one is right now in current logic.

WAL is RS Level,  but replay is in Region Level,  so in wal , seqId in 
increased one by one, but we can't ensure it in region Level when replay, 
that's why i mentioned i made a mistake in HBASE-14949. I will dig it deeper.

Would you mind update the doc as we mentioned above, Zhe?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-09 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050165#comment-15050165
 ] 

Phil Yang commented on HBASE-14004:
---

I have revised the doc 
https://docs.google.com/document/d/1tTwXrip18qxxsSiPu_y4fGB-26SxqwMvZnFREswyQow/edit?usp=sharing
 which removes the logic about hsync.

And previously the doc quotes Duo's comment
{quote}
close old file using any length that larger than the previous succeeded 
hflushed length
{quote}
I changed this to close the file using the previous succeeded hflushed length 
which is more clear for implementation.

And in section "Some issues still need discussion" I quote [~carp84]'s comment:
{quote}
We also need to consider the cross-RS case when region assign happens before 
wal sync acked.
{quote}
Yu, is it still a problem after we not using hsync?


> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-10 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050308#comment-15050308
 ] 

Phil Yang commented on HBASE-14004:
---

[~chenheng] it seems that HBASE-14949 only handle replaying logs before region 
open and it doesn't work on Distributed Log Replay? If I am right, would you 
please put our logic to WALSplitter so we can handle both case?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-10 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050322#comment-15050322
 ] 

Yu Li commented on HBASE-14004:
---

bq. Yu, is it still a problem after we not using hsync?
My concern was mainly about whether it's possible to have duplicated entries in 
WAL of *different* RS.

Think about the fail over case, replication queue will be transfered to some 
other RS then entries of the failed RS WAL will be replicated, meanwhile the 
same WAL will be split, replayed and its entries written into WAL of the new RS 
serving the same region. In this situation we add a {{isReplay}} flag in 
WALEdit to avoid duplicated replication (see below code segment in 
{{FSHlog#append}})
{code}
if (entry.getEdit().isReplay()) {
  // Set replication scope null so that this won't be replicated
  entry.getKey().setScopes(null);
}
{code}

I could see a similar situation here:
# hflush timeout due to network failure but actually persisted
# WAL logger tries to re-write the buffered entries to a new WAL but new WAL 
creation failed due to the same network failure, and returns fail to client
# region got reassigned due to balance or hbase shell command
# client retry writing to the same region served by a new RS and succeed

In this case we have duplicated entries in different RS

Feel free to correct me if the assumption is wrong, but if it's possible, then 
we need to handle it in HBASE-14949 [~chenheng]

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-10 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050333#comment-15050333
 ] 

Heng Chen commented on HBASE-14004:
---

That is my plan.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-10 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050332#comment-15050332
 ] 

Heng Chen commented on HBASE-14004:
---

That is my plan.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-10 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050376#comment-15050376
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
2. WAL logger tries to re-write the buffered entries to a new WAL but new WAL 
creation failed due to the same network failure, and returns fail to client
3. region got reassigned due to balance or hbase shell command
{quote}

when create new WAL and write buffered entries into it. 
If failed in this procedure,  we will tell client 'unacked hlushed' mutation 
failed.  when region reassigned (NOT RS crash) at this time,  memstore will be 
flushed into HFile , right? As current logic, when we split WAL, we will check 
lastFlushedSeqId, so duplicate entries will be skipped.  see HBASE-14949 
comments.[~carp84]

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-10 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050409#comment-15050409
 ] 

Yu Li commented on HBASE-14004:
---

Let me confirm, do you mean we never rollback the memstore even if 
buffered-entries rewrite failed, so it will be included in the flushed HFile?

I think current logic only makes sure already flushed entries won't be 
replayed, but not duplicated entries. If the memstore rolled back and the 
duplicated entry is the last write, I don't think lastFlushedSeqId could skip it

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-10 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050418#comment-15050418
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
Let me confirm, do you mean we never rollback the memstore even if 
buffered-entries rewrite failed, so it will be included in the flushed HFile?
{quote}
Maybe i am wrong,  yeah we NOT rollback the memstore, but the mvcc will not be 
forward too  Let's check it too..

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-10 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050438#comment-15050438
 ] 

Heng Chen commented on HBASE-14004:
---

IMO we should handle this situation,  if we failed during recovery procedure,  
maybe the network or hdfs has some problems.  Should we close RS directly?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-12-10 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050467#comment-15050467
 ] 

Phil Yang commented on HBASE-14004:
---

We can not rollback even if buffered-entries rewrite failed, or we may still 
have inconsistency between memstore and wal. So we can only retry and retry... 
until too many retries and must crash ourselves?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161489#comment-16161489
 ] 

Hadoop QA commented on HBASE-14004:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 4s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
56s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
38m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
27s{color} | {color:red} hbase-server generated 16 new + 0 unchanged - 0 fixed 
= 16 total (was 0) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
16s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 96m 
19s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:5d60123 |
| JIRA Issue | HBASE-14004 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12886413/HBASE-14004.patch |
| Optional Tests |  asflicense  shadedjars  javac  javadoc  unit  findbugs  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 2683ddeb981c 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 

[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-11 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162335#comment-16162335
 ] 

Duo Zhang commented on HBASE-14004:
---

Good, no test failures. Let me fix the javadoc issues and upload the patch to 
RB.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-11 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162439#comment-16162439
 ] 

Duo Zhang commented on HBASE-14004:
---

[~stack] [~anoopsamjohn] PTAL. Thanks.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162524#comment-16162524
 ] 

Hadoop QA commented on HBASE-14004:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
59s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
38s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
45m 17s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
34s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 93m 
41s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}169m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:5d60123 |
| JIRA Issue | HBASE-14004 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12886568/HBASE-14004-v1.patch |
| Optional Tests |  asflicense  shadedjars  javac  javadoc  unit  findbugs  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux bb785c0a15ab 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / d6db4a2 |
| Default Java | 1.8.0_144 |
| fi

[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-13 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164280#comment-16164280
 ] 

Duo Zhang commented on HBASE-14004:
---

Ping [~stack] [~anoopsamjohn] [~ram_krish], this feature is necessary if we 
want to set AsyncFSWAL as our default WAL.

Thanks.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164323#comment-16164323
 ] 

ramkrishna.s.vasudevan commented on HBASE-14004:


Will take a look at this @duo zhang. 

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-13 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165819#comment-16165819
 ] 

Duo Zhang commented on HBASE-14004:
---

Thanks [~ram_krish] for reviewing. Will commit later.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165857#comment-16165857
 ] 

stack commented on HBASE-14004:
---

[~Apache9] Sorry. I reviewed it a few days ago but forgot to publish the 
review. I just did.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167437#comment-16167437
 ] 

Hadoop QA commented on HBASE-14004:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  2m 47s{color} 
| {color:red} HBASE-14004 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-14004 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12887272/HBASE-14004-v2.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/8641/console |
| Powered by | Apache Yetus 0.4.0   http://yetus.apache.org |


This message was automatically generated.



> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167720#comment-16167720
 ] 

Hadoop QA commented on HBASE-14004:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
35s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
58s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
13s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
38m 28s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
19s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}113m 
52s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}180m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:5d60123 |
| JIRA Issue | HBASE-14004 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12887282/HBASE-14004-v3.patch |
| Optional Tests |  asflicense  shadedjars  javac  javadoc  unit  findbugs  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux dfd14dafca5b 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5c07dba |
| Default Java | 1.8.0_144 

[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-15 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167727#comment-16167727
 ] 

Duo Zhang commented on HBASE-14004:
---

OK, all green. Let me commit.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167998#comment-16167998
 ] 

Hudson commented on HBASE-14004:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3719 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3719/])
HBASE-14004 [Replication] Inconsistency between Memstore and WAL may (zhangduo: 
rev 4341c3f554cf85e73d3bb536bdda33a83f463f16)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DisabledWALProvider.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestWALEntryStream.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReplicationService.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/AbstractFSWALProvider.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALFactory.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALFileLengthProvider.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALProvider.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/RegionGroupingProvider.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WAL.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/RecoveredReplicationSource.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/wal/IOTestProvider.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168034#comment-16168034
 ] 

Hudson commented on HBASE-14004:


FAILURE: Integrated in Jenkins build HBase-2.0 #518 (See 
[https://builds.apache.org/job/HBase-2.0/518/])
HBASE-14004 [Replication] Inconsistency between Memstore and WAL may (zhangduo: 
rev d90f77ab7dc46a19893786b6a740bc73470fd779)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALProvider.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestWALEntryStream.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/wal/IOTestProvider.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WAL.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DisabledWALProvider.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/AbstractFSWALProvider.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALFactory.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALFileLengthProvider.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReplicationService.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/RegionGroupingProvider.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/RecoveredReplicationSource.java


> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168299#comment-16168299
 ] 

stack commented on HBASE-14004:
---

This is a great fix.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-17 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169589#comment-16169589
 ] 

Sean Busbey commented on HBASE-14004:
-

is this needed in branch-1? the doc mentions that the non-async wal still has 
the potential for the same failure.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-17 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169643#comment-16169643
 ] 

Duo Zhang commented on HBASE-14004:
---

The problem for FSHLog is complicated. One thing is that, syncing WAL is also 
an IPC, so a timeout does not mean fail, it's just unknown. But when syncing 
failed, we will just fail the log request and the upper layer will treat it as 
a failure which means the memstore will not be modified. I think this is much 
easier to happen than the scenario described in the above doc, so, this patch 
is a nice to have for branch-1, but there are still a lot of works to do for 
FSHLog...

Thanks.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170374#comment-16170374
 ] 

Ted Yu commented on HBASE-14004:


TestReplicationSmallTests fails consistently on master.

If I switch to commit f7a986cb67b55e36b58bf4b4934a2f32f29f538a, the test passes.

Duo:
Can you check ?

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-18 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170982#comment-16170982
 ] 

Duo Zhang commented on HBASE-14004:
---

OK, will check later.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 3.0.0, 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-28 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184854#comment-16184854
 ] 

stack commented on HBASE-14004:
---

[~Apache9] Any chance of a release note here please boss? This is a great fix. 
Would be a shame for it to go under the radar. The release note will help folks 
realize what has happened in here.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-28 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185257#comment-16185257
 ] 

Duo Zhang commented on HBASE-14004:
---

Will do. Thanks for the reminding sir.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185726#comment-16185726
 ] 

Hudson commented on HBASE-14004:


FAILURE: Integrated in Jenkins build HBase-2.0 #597 (See 
[https://builds.apache.org/job/HBase-2.0/597/])
HBASE-18845 TestReplicationSmallTests fails after HBASE-14004 (zhangduo: rev 
2e4c1b62884026ba8fc2d743d33a7f9d9125393e)
* (edit) 
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSmallTests.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java


> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185802#comment-16185802
 ] 

Hudson commented on HBASE-14004:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3798 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3798/])
HBASE-18845 TestReplicationSmallTests fails after HBASE-14004 (zhangduo: rev 
239e6872674ff122ecec2d8d6a557b269e6ae54b)
* (edit) 
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSmallTests.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java


> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2017-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195103#comment-16195103
 ] 

Hudson commented on HBASE-14004:


Results for branch HBASE-18467, done in 4 hr 24 min and counting
[build #136 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18467/136/]: 
FAILURE

details (if available):

(x) *{color:red}-1 overall{color}*
Committer, please check your recent inclusion of a patch for this issue.

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18467/136//General_Nightly_Build_Report/]










(/) {color:green}+1 jdk8 checks{color}
-- For more information [see jdk8 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18467/136//JDK8_Nightly_Build_Report/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.



> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: He Liangliang
>Assignee: Duo Zhang
>Priority: Critical
>  Labels: replication, wal
> Fix For: 2.0.0-alpha-4
>
> Attachments: HBASE-14004.patch, HBASE-14004-v1.patch, 
> HBASE-14004-v2.patch, HBASE-14004-v2.patch, HBASE-14004-v3.patch
>
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.
> ==
> This is a long lived issue. The above problem is solved by write path 
> reorder, as now we will sync wal first before modifying memstore. But the 
> problem may still exists as replication thread may read the new data before 
> we return from hflush. See this document for more details:
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#
> So we need to keep a sync length in WAL and tell replication wal reader this 
> is limit when you read this wal file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >