[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861721#comment-13861721 ]
Lars Hofhansl commented on HBASE-8912: -------------------------------------- I had been considering this, but we'd block the event thread for a long time. The best solution IMHO is to only allow one operation per region at a time, but that is hard to fit in (I think) without rewriting most of the logic. If you bring back running the ClosedRegionHandler asynchronously you'd have the same issues, right? (the region will be removed from failedOpenRegions in the finally block, while the ClosedRegionHandler is still running - the ClosedRegionHandler might not even have started) Actually what *is* the issue? What race condition are you referring to [~ram_krish]? > [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to > OFFLINE > ---------------------------------------------------------------------------------- > > Key: HBASE-8912 > URL: https://issues.apache.org/jira/browse/HBASE-8912 > Project: HBase > Issue Type: Bug > Reporter: Enis Soztutar > Assignee: Lars Hofhansl > Priority: Critical > Fix For: 0.94.16 > > Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, > HBASE-8912.patch, HBase-0.94 #1036 test - testRetrying [Jenkins].html, > log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt > > > AM throws this exception which subsequently causes the master to abort: > {code} > java.lang.IllegalStateException: Unexpected state : > testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. > state=PENDING_OPEN, ts=1372891751912, > server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. > at > org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) > at > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:662) > {code} > This exception trace is from the failing test TestMetaReaderEditor which is > failing pretty frequently, but looking at the test code, I think this is not > a test-only issue, but affects the main code path. > https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)