[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151363#comment-13151363 ]
Ted Yu commented on HBASE-4799: ------------------------------- @Max: HadoopQA picks the latest patch. Looks like it is running through test suite: https://builds.apache.org/job/PreCommit-HBASE-Build/265/console Since your fix is marked against 0.90.4 and HBASE-4238 was integrated into 0.90.5 (absent from http://archive.cloudera.com/cdh/3/hbase-0.90.4+49.1.CHANGES.txt), I wonder if you could try Stack's fix out. > Catalog Janitor logic bug causes region leackage > ------------------------------------------------ > > Key: HBASE-4799 > URL: https://issues.apache.org/jira/browse/HBASE-4799 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.4 > Reporter: Max Lapan > Assignee: Max Lapan > Priority: Critical > Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, > 0002-Temporary-fix-to-remove-leaked-regions.patch > > > When region split takes a significant amount of time, CatalogJanitor can > cleanup one of SPLIT records, but left another in META. When another split > finish, janitor cleans left SPLIT record, but parent regions haven't removed > from FS and META not cleared. > The race condition is follows: > 1. region split started > 2. one of regions splitted, i.e. A (have no reference storefiles) but other > (B) doesn't > 3. janitor started and in routine checkDaughter removes SPLITA from meta, but > see that SPLITB has references and does nothing. > 4. region B completes split > 5. janitor wakes up, removes SPLITB, but see that there is no records for A > and does nothing again. > Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira