[ https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646864#comment-13646864 ]
stack commented on HBASE-7006: ------------------------------ Some comments on the design doc: + Nit: Add author, date, and add issue number so can go back to the hosting issue should I trip over the doc w/o any other context. + Is your assumption about out-of-order replay of edits new to this feature? I suppose in the old/current way of log splitting, we do stuff in sequenceid order because we wrote the recovered.edits files named by sequenceid... so they were ordered when the regionserver read them in? We should highlight your assumption more. I think if we move to multiple-WALs we'll want to also take on this assumption doing recovery. + Given the assumption, we should list the problematic scenarios (or point to where we list them already -- I think the 'Current Limitations' section here http://hbase.apache.org/book.html#version.delete should have the list we currently know). + "...check if all WALs of a failed region server have been successfully replayed." How is this done? + How will a crashed regionserver "...... and appending itself into the list of...": i.e. append itself to list of crashed servers (am I reading this wrong)? bq. "For each region per failed region server, we stores the last flushed sequence Id from the region server before it failed." This is the mechanism that has the regionserver telling the master its current sequenceid everytime it flushes to an hfile? So when server crashes, master writes a znode under the recovering-regions with the last reported seq id? if a new regionserver hosting a recovery of regions then crashes, it gets a new znode w/ its current sequenceid? Now we have two crashed servers with (probably) two different sequenceids whose logs we are recovering. The two sequenceids are never related right? They are only applied to the logs of the server who passed the particular sequenceid to the master? Question: So it looks like we replay the WALs of a crashed regionserver by playing them into the new region host servers. There does not seem to be a flush when the replay of the old crashed servers WALs is done. Is your thinking that it is not needed since the old edits are now in the new servers WAL? Would there be any advantage NOT writing the WAL on replay and only when done, then flush (I suppose not, thinking about it, and in fact, it would probably make replay more complicated since we'd have to have this new operation to do; a flush-when-all-WALS-recovered). Good stuff. > [MTTR] Study distributed log splitting to see how we can make it faster > ----------------------------------------------------------------------- > > Key: HBASE-7006 > URL: https://issues.apache.org/jira/browse/HBASE-7006 > Project: HBase > Issue Type: Bug > Components: MTTR > Reporter: stack > Assignee: Jeffrey Zhong > Priority: Critical > Fix For: 0.95.1 > > Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, > hbase-7006-combined-v2.patch, hbase-7006-combined-v3.patch, LogSplitting > Comparison.pdf, > ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf > > > Just saw interesting issue where a cluster went down hard and 30 nodes had > 1700 WALs to replay. Replay took almost an hour. It looks like it could run > faster that much of the time is spent zk'ing and nn'ing. > Putting in 0.96 so it gets a look at least. Can always punt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira