[ https://issues.apache.org/jira/browse/HBASE-18261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Umesh Agashe updated HBASE-18261: --------------------------------- Attachment: hbase-18261.master.004.patch Fixed findbugs error. > [AMv2] Create new RecoverMetaProcedure and use it from ServerCrashProcedure > and HMaster.finishActiveMasterInitialization() > -------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-18261 > URL: https://issues.apache.org/jira/browse/HBASE-18261 > Project: HBase > Issue Type: Improvement > Components: amv2 > Affects Versions: 2.0.0-alpha-1 > Reporter: Umesh Agashe > Assignee: Umesh Agashe > Fix For: 2.0.0-alpha-2 > > Attachments: hbase-18261.master.001.patch, > HBASE-18261.master.001.patch, hbase-18261.master.002.patch, > hbase-18261.master.003.patch, hbase-18261.master.004.patch > > > When unit test > hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta() > is enabled and run several times, it fails intermittently. Cause is meta > recovery is done at two different places: > * ServerCrashProcedure.processMeta() > * HMaster.finishActiveMasterInitialization() > and its not coordinated. > When HMaster.finishActiveMasterInitialization() gets to submit splitMetaLog() > first and while its running call from ServerCrashProcedure.processMeta() > fails causing step to be retried again in a loop. > When ServerCrashProcedure.processMeta() submits splitMetaLog after > splitMetaLog from HMaster.finishActiveMasterInitialization() is finished, > success is returned without doing any work. > But if ServerCrashProcedure.processMeta() submits splitMetaLog request and > while its going HMaster.finishActiveMasterInitialization() submits it test > fails with exception. > [~stack] and I discussed the possible solution: > Create RecoverMetaProcedure and call it where required. Procedure framework > provides mutual exclusion and requires idempotence, which should fix the > problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029)