[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177339#comment-13177339 ]
Zhihong Yu commented on HBASE-5100: ----------------------------------- TestMasterObserver passed locally on my MacBook. > Rollback of split could cause closed region to be opened again > -------------------------------------------------------------- > > Key: HBASE-5100 > URL: https://issues.apache.org/jira/browse/HBASE-5100 > Project: HBase > Issue Type: Bug > Reporter: chunhui shen > Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0 > > Attachments: 5100.txt, hbase-5100.patch > > > If master sending close region to rs and region's split transaction > concurrently happen, > it may cause closed region to opened. > See the detailed code in SplitTransaction#createDaughters > {code} > List<StoreFile> hstoreFilesToSplit = null; > try{ > hstoreFilesToSplit = this.parent.close(false); > if (hstoreFilesToSplit == null) { > // The region was closed by a concurrent thread. We can't continue > // with the split, instead we must just abandon the split. If we > // reopen or split this could cause problems because the region has > // probably already been moved to a different server, or is in the > // process of moving to a different server. > throw new IOException("Failed to close region: already closed by " + > "another thread"); > } > } finally { > this.journal.add(JournalEntry.CLOSED_PARENT_REGION); > } > {code} > when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes > this.parent.initialize(); > Although this region is not onlined in the regionserver, it may bring some > potential problem. > For example, in our environment, the closed parent region is rolled back > sucessfully , and then starting compaction and split again. > The parent region is f892dd6107b6b4130199582abc78e9c1 > master log > {code} > 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance > hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., > src=dw87.kgb.sqa.cm4,60020,1324827866085, > dest=dw80.kgb.sqa.cm4,60020,1324827865780 > 2011-12-26 00:24:42,693 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > (offlining) > 2011-12-26 00:24:42,694 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, > usedHeap=0, maxHeap=0) for region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:24:42,699 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned > node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 > (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., > server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) > 2011-12-26 00:24:42,699 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, > region=f892dd6107b6b4130199582abc78e9c1 > 2011-12-26 00:24:45,348 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, > region=f892dd6107b6b4130199582abc78e9c1 > 2011-12-26 00:24:45,349 DEBUG > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED > event for f892dd6107b6b4130199582abc78e9c1 > 2011-12-26 00:24:45,349 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > state=CLOSED, ts=1324830285347 > 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13447f283f40e73 Creating (or updating) unassigned node for > f892dd6107b6b4130199582abc78e9c1 with OFFLINE state > 2011-12-26 00:24:45,354 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:60000, > region=f892dd6107b6b4130199582abc78e9c1 > 2011-12-26 00:24:45,354 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > destination server is + serverName=dw80.kgb.sqa.cm4,60020,1324827865780, > load=(requests=0, regions=1, usedHeap=0, maxHeap=0) > 2011-12-26 00:24:45,354 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for > region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.; > > plan=hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., > src=dw87.kgb.sqa.cm4,60020,1324827866085, > dest=dw80.kgb.sqa.cm4,60020,1324827865780 > 2011-12-26 00:24:45,354 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > to dw80.kgb.sqa.cm4,60020,1324827865780 > 2011-12-26 00:24:46,899 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > on dw80.kgb.sqa.cm4,60020,1324827865780 > {code} > RE_dw87 log > {code} > 2011-12-26 00:24:42,694 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:24:42,694 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing > close of > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:60020-0x43447f30cb31367 Creating unassigned node for > f892dd6107b6b4130199582abc78e9c1 in a CLOSING state > 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Closing > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.: > disabling compactions & flushes > 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > waiting for compaction to complete for region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:24:43,340 INFO org.apache.hadoop.hbase.regionserver.HRegion: > aborted compaction on region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > after 49sec > 2011-12-26 00:24:43,340 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Running close preflush of > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:24:43,340 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Started memstore flush for > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., > current region memstore size 59.5m > 2011-12-26 00:24:43,340 INFO > org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of > region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:24:45,347 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Closed > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:24:45,347 WARN org.apache.hadoop.hbase.regionserver.HRegion: > Region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > already closed > 2011-12-26 00:24:45,347 WARN > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running rollback of > failed split of > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.; > Failed to close region: already closed by another thread > 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:60020-0x43447f30cb31367 Successfully transitioned node > f892dd6107b6b4130199582abc78e9c1 from RS_ZK_REGION_CLOSING to > RS_ZK_REGION_CLOSED > 2011-12-26 00:24:45,348 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed > region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:24:46,837 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Onlined > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.; > next sequenceid=717341809 > 2011-12-26 00:24:46,841 INFO > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Successful rollback > of failed split of > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:24:46,841 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Starting compaction on region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:25:23,288 INFO org.apache.hadoop.hbase.regionserver.HRegion: > completed compaction on region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > after 36sec > 2011-12-26 00:25:23,288 INFO > org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of > region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > 2011-12-26 00:25:24,847 INFO org.apache.hadoop.hbase.catalog.MetaEditor: > Offlined parent region > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. > in META > 2011-12-26 00:25:26,165 INFO > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region split, META > updated, and report to master. > Parent=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., > new regions: > writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324830323288.595f458507cecf208640cb4a1be8e293., > > writetest,DHZX0CD7A4OE1KRILMWEJL2HTBN6OJVSFOKGU0P938DJ1M44B79C068NCZPXK1Z5OD2RQJ6LMA41TC0D44H05525TO3AHLZ4BZXX,1324830323288.ba9376c83327c34c7926fccb68c3b9e3.. > Split took 2sec > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira