[ 
https://issues.apache.org/jira/browse/HBASE-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662354#comment-13662354
 ] 

Jeffrey Zhong commented on HBASE-8573:
--------------------------------------

First of all the exact description of HBase-6059 won't happen in 
distributedLogReplay while the "make deleted data exist again" could still 
happen in another race condition.(happen very rare though). 

Below are steps how this could happen:
The situation is that we have two store files hstore1(sequence id = id1) and 
hstore2(sequence id = id1) assuming hstore1 flushed successfully while hstore2 
failed for some reasons. So we have hstore1(id1 bump up to id3) and s2(id1). 
Assuming the flush of s1 contains a new put on row1. Currently distributed log 
replay uses the min(flushed sequence ids of all stores) so we could replay the 
put since we only skip edits since id1.

While if we had a delete on row1 before sequence id(id1) (a delete before a put 
on row1), then the put replay could bring back the deleted data back because a 
major compaction erased the delete.

hbase-6059 triggered a long discussion on this, it convinced me that 
distributedLogReplay also needs flushed sequence if per store for replay.

[~zjushch]
{quote}
Maybe the only potential problem is the duplication of some data 
{quote}
I think it's all right in the situation without compromising data correctness.


 

                
> Store last flushed sequence id for each store of region for Distributed Log 
> Replay
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8573
>                 URL: https://issues.apache.org/jira/browse/HBASE-8573
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ted Yu
>
> HBASE-7006 stores last flushed sequence id of the region in zookeeper.
> To prevent deleted data from appearing again, we should store last flushed 
> sequence id for each store of region in zookeeper.
> See discussion here:
> https://issues.apache.org/jira/browse/HBASE-7006?focusedCommentId=13660428&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13660428

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to