[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835149#comment-16835149 ]
Abhishek Singh Chouhan commented on HBASE-22330: ------------------------------------------------ Patch for branch-1 applies to branch-1.4. However 1.3 required slight modification in the test file due to wal api changes. Have added a patch for branch-1.3. > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > ------------------------------------------------------------------------------------------------------------- > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver > Affects Versions: 1.5.0, 1.4.9, 1.3.4 > Reporter: Andrew Purtell > Assignee: Abhishek Singh Chouhan > Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-22330.branch-1.001.patch, > HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)