[ 
https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617925#comment-13617925
 ] 

Jeffrey Zhong commented on HBASE-8208:
--------------------------------------

[~lhofhansl] The newly added sync will make "deferredSync" cycle(time interval) 
logically shorter. Let's  say we defer sync every 1 sec before the fix. With 
the fix, we sync either 1 sec or for each flush. Basically we have more syncs. 
So each sync will have less data to sync than before. If the "sync storm" 
you're referring to the amount of data, it will make things better. 

If you're referring to the frequency of sync, it won't add many extra syncs 
because the newly added sync is per flush and only sync when there is something 
to flush. In non-deferred sync case, we have much more syncs. IMHO, it should 
not have much impact. For this change, I with we have a good stressing test env 
so we can quantify the impact. To be safe, we can put it in trunk firstly and 
backport it to 0.94 once we're confident.

Thanks,
-Jeffrey
                
> Data could not be replicated to slaves when deferredLogSync is enabled
> ----------------------------------------------------------------------
>
>                 Key: HBASE-8208
>                 URL: https://issues.apache.org/jira/browse/HBASE-8208
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.95.0, 0.98.0, 0.94.6
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>             Fix For: 0.95.0, 0.98.0, 0.94.7
>
>         Attachments: hbase-8208.patch, hbase-8208-v1.patch, 
> hbase-8208_v2.patch
>
>
> This is a subtle issue. When deferredLogSync is enabled, there are chances we 
> could flush data before syncing all HLog entries. Assuming we just flush the 
> internal cache and the server dies with some unsynced hlog entries. 
> Data is not lost at the source cluster while replication is based on WAL 
> files and some changes we flushed at the source won't be replicated the slave 
> clusters. 
> Although enabling deferredLogSync with tolerances of data loss, it breaks the 
> replication assumption that whatever persisted in the source should be 
> replicated to its slave clusters. 
> In short, the slave cluster could end up with double losses: the data loss in 
> the source and some data stored in source cluster may not be replicated to 
> slaves either.
> The fix of the issue isn't hard. Basically we can invoke sync during each 
> flush when replication is enabled for a region server. Since sync returns 
> immediately when nothing to sync so there should be no performance impact.
> Please let me know what you think!
> Thanks,
> -Jeffrey

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to