[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203921#comment-13203921 ]
stack commented on HBASE-5270: ------------------------------ I was taking a look through HBASE-5179 and HBASE-4748 again, the two issues that spawned this one (Both are in synopsis about master failover with concurrent servershutdown handler running). I have also been looking at "HBASE-5344 [89-fb] Scan unassigned region directory on master failover". HBASE-5179 starts out as we can miss edits if a server is discovered to be dead AFTER master failover has started up splitting logs because we'll notice it dead so will assign out its regions but before we've had a chance to split its logs. The way fb deal with this in hbase-5344 is not to process zookeeper events that come in during master failover. They queue them instead and only start in on the processing after master is up. Chunhui does something like this in his original patch by adding any server currently being processed by server shutdown to the list of regionservers whose logs we should not split. The fb way of halting temporarily the callback processing seems more airtight. HBASE-5179 is then extended to include as in scope, the processing of servers carrying root and meta (hbase-4748) that crash during master failover. We need to consider the cases where a server crashes AFTER master failover distributed log splitting has started but before we run the verifications of meta and root locations. Currently we'll expire the server that is unresponsive when we go to verify root and meta locations. The notion is that the meta regions will be assigned by the server shutdown handler. The fb technique of turning off processing zk events would mess with our existing handling code here -- but I'm not too confident the code is going to do the right thing since it has no tests of this predicament and the scenarios look like they could be pretty varied (root is offline only, meta server has crashed only, a server with both root and meta has crashed, etc). In hbase-5344, fb will go query each regionserver for the regions its currently hosting (and look in zk to see what rs are up). Maybe we need some of this from 89-fb in trunk but I'm not clear on it just yet; would need more study of the current state of trunk and then of what is happening over in 89-fb. One thing I think we should do to lessen the number of code paths we can take on failover is to do the long-talked of purge of the root region. This should cut down on the number of states we need to deal with and make reasoning about failure states on failover easier to reason about. > Handle potential data loss due to concurrent processing of processFaileOver > and ServerShutdownHandler > ----------------------------------------------------------------------------------------------------- > > Key: HBASE-5270 > URL: https://issues.apache.org/jira/browse/HBASE-5270 > Project: HBase > Issue Type: Sub-task > Components: master > Reporter: Zhihong Yu > Fix For: 0.94.0, 0.92.1 > > > This JIRA continues the effort from HBASE-5179. Starting with Stack's > comments about patches for 0.92 and TRUNK: > Reviewing 0.92v17 > isDeadServerInProgress is a new public method in ServerManager but it does > not seem to be used anywhere. > Does isDeadRootServerInProgress need to be public? Ditto for meta version. > This method param names are not right 'definitiveRootServer'; what is meant > by definitive? Do they need this qualifier? > Is there anything in place to stop us expiring a server twice if its carrying > root and meta? > What is difference between asking assignment manager isCarryingRoot and this > variable that is passed in? Should be doc'd at least. Ditto for meta. > I think I've asked for this a few times - onlineServers needs to be > explained... either in javadoc or in comment. This is the param passed into > joinCluster. How does it arise? I think I know but am unsure. God love the > poor noob that comes awandering this code trying to make sense of it all. > It looks like we get the list by trawling zk for regionserver znodes that > have not checked in. Don't we do this operation earlier in master setup? Are > we doing it again here? > Though distributed split log is configured, we will do in master single > process splitting under some conditions with this patch. Its not explained in > code why we would do this. Why do we think master log splitting 'high > priority' when it could very well be slower. Should we only go this route if > distributed splitting is not going on. Do we know if concurrent distributed > log splitting and master splitting works? > Why would we have dead servers in progress here in master startup? Because a > servershutdownhandler fired? > This patch is different to the patch for 0.90. Should go into trunk first > with tests, then 0.92. Should it be in this issue? This issue is really hard > to follow now. Maybe this issue is for 0.90.x and new issue for more work on > this trunk patch? > This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira