[jira] [Commented] (HBASE-7006) [MTTR] Study distributed log splitting to see how we can make it faster

stack (JIRA) Wed, 01 May 2013 12:46:17 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646864#comment-13646864
 ]


stack commented on HBASE-7006:
------------------------------

Some comments on the design doc:

+ Nit: Add author, date, and add issue number so can go back to the hosting 
issue should I trip over the doc w/o any other context.
+ Is your assumption about out-of-order replay of edits new to this feature?  I 
suppose in the old/current way of log splitting, we do stuff in sequenceid 
order because we wrote the recovered.edits files named by sequenceid... so they 
were ordered when the regionserver read them in? We should highlight your 
assumption more.  I think if we move to multiple-WALs we'll want to also take 
on this assumption doing recovery.
+ Given the assumption, we should list the problematic scenarios (or point to 
where we list them already -- I think the 'Current Limitations' section here 
http://hbase.apache.org/book.html#version.delete should have the list we 
currently know).
+ "...check if all WALs of a failed region server have been successfully 
replayed."  How is this done?
+ How will a crashed regionserver "...... and appending itself into the list 
of...": i.e. append itself to list of crashed servers (am I reading this wrong)?

bq. "For each region per failed region server, we stores the last flushed 
sequence Id from the region server before it failed."

This is the mechanism that has the regionserver telling the master its current 
sequenceid everytime it flushes to an hfile?  So when server crashes, master 
writes a znode under the recovering-regions with the last reported seq id?    
if a new regionserver hosting a recovery of regions then crashes, it gets a new 
znode w/ its current sequenceid?  Now we have two crashed servers with 
(probably) two different sequenceids whose logs we are recovering.  The two 
sequenceids are never related right?  They are only applied to the logs of the 
server who passed the particular sequenceid to the master?


Question:  So it looks like we replay the WALs of a crashed regionserver by 
playing them into the new region host servers.  There does not seem to be a 
flush when the replay of the old crashed servers WALs is done.  Is your 
thinking that it is not needed since the old edits are now in the new servers 
WAL?  Would there be any advantage NOT writing the WAL on replay and only when 
done, then flush (I suppose not, thinking about it, and in fact, it would 
probably make replay more complicated since we'd have to have this new 
operation to do; a flush-when-all-WALS-recovered).

Good stuff.
                
> [MTTR] Study distributed log splitting to see how we can make it faster
> -----------------------------------------------------------------------
>
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.95.1
>
>         Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, 
> hbase-7006-combined-v2.patch, hbase-7006-combined-v3.patch, LogSplitting 
> Comparison.pdf, 
> ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
>
>
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 
> 1700 WALs to replay.  Replay took almost an hour.  It looks like it could run 
> faster that much of the time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7006) [MTTR] Study distributed log splitting to see how we can make it faster

Reply via email to