[hbase] HLog generates incorrect file name when splitting a log, race  
condition also contributes
-------------------------------------------------------------------------------------------------

                 Key: HADOOP-2079
                 URL: https://issues.apache.org/jira/browse/HADOOP-2079
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/hbase
    Affects Versions: 0.16.0
            Reporter: Jim Kellerman
            Assignee: Jim Kellerman
             Fix For: 0.16.0


In Hadoop-Nightly #277 TestRegionServerExit failed with a timeout.

The reason for this was a race in the Master in which checkAssigned (run from 
either the root or meta scanner)  will immediately try to split the log and 
then assign a region which has invalid server info.

The scenario went something like this:

1. region server aborted
2. root region was written on optional cache flush
lease timed out on aborted server which removes it from serversToServerInfo and 
queues a PendingServerShutdown operation
3. root scanner runs and finds server info incorrect (it is in the root region 
but the server is not in serversToServerInfo
4. checkAssigned starts splitting the log but because the log name is incorrect 
it can't finish
5. PendingServerShutdown fires and really gums up the works.

So there are two problems:

1. HLog.splitLog needs to generate the correct log file name.
2. PendingServerShutdown and/or leaseExpired need to cooperate with 
checkAssigned so that there are not two concurrent attempts to recover the log.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to