[ https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647328#comment-13647328 ]
stack commented on HBASE-7006: ------------------------------ Thinking on it, flushing after all logs recovered is a bad idea because it a special case. Replay mutations, as is, are treated like any other inbound edit. I think this good. Turning off WALs and flushing on the end and trying to figure what we failed to write or writing hfiles directly -- if you could, and I don't think you can since edits need to be sorted in an hfile -- and by-passing memstore and then telling the Region to pick up the new hfile when done all introduce new states that we will have to manage complicating critical recovery. > [MTTR] Study distributed log splitting to see how we can make it faster > ----------------------------------------------------------------------- > > Key: HBASE-7006 > URL: https://issues.apache.org/jira/browse/HBASE-7006 > Project: HBase > Issue Type: Bug > Components: MTTR > Reporter: stack > Assignee: Jeffrey Zhong > Priority: Critical > Fix For: 0.95.1 > > Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, > hbase-7006-combined-v3.patch, hbase-7006-combined-v4.patch, LogSplitting > Comparison.pdf, > ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf > > > Just saw interesting issue where a cluster went down hard and 30 nodes had > 1700 WALs to replay. Replay took almost an hour. It looks like it could run > faster that much of the time is spent zk'ing and nn'ing. > Putting in 0.96 so it gets a look at least. Can always punt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira