[ 
https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652074#comment-13652074
 ] 

Jeffrey Zhong commented on HBASE-7006:
--------------------------------------

{quote}
How can you be sure all edits in WALs from crashed server were replicated 
already?
{quote}
This is guaranteed by the replication fail over logic. Replication waits for 
log splitting finish and then resume replication on those wal files from failed 
RS. The above change just make sure we don't replicate WAL edits created by 
replay command again because those edits will be replicated from the original 
wal file.
                
> [MTTR] Study distributed log splitting to see how we can make it faster
> -----------------------------------------------------------------------
>
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.95.1
>
>         Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, 
> hbase-7006-combined-v3.patch, hbase-7006-combined-v4.patch, 
> hbase-7006-combined-v5.patch, LogSplitting Comparison.pdf, 
> ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
>
>
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 
> 1700 WALs to replay.  Replay took almost an hour.  It looks like it could run 
> faster that much of the time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to