[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Max Lapan updated HBASE-4799: ----------------------------- Attachment: 0002-Temporary-fix-to-remove-leaked-regions.patch 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch First patch resolves this race condition. The second could be used to automatically remove leaked regions. Not sure that it is safe to always return (True,False) pair from checkDaughterInFs when no SPLIT{A,B} records. > Catalog Janitor logic bug causes region leackage > ------------------------------------------------ > > Key: HBASE-4799 > URL: https://issues.apache.org/jira/browse/HBASE-4799 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.4 > Reporter: Max Lapan > Priority: Critical > Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, > 0002-Temporary-fix-to-remove-leaked-regions.patch > > > When region split takes a significant amount of time, CatalogJanitor can > cleanup one of SPLIT records, but left another in META. When another split > finish, janitor cleans left SPLIT record, but parent regions haven't removed > from FS and META not cleared. > The race condition is follows: > 1. region split started > 2. one of regions splitted, i.e. A (have no reference storefiles) but other > (B) doesn't > 3. janitor started and in routine checkDaughter removes SPLITA from meta, but > see that SPLITB has references and does nothing. > 4. region B completes split > 5. janitor wakes up, removes SPLITB, but see that there is no records for A > and does nothing again. > Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira