[ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151545#comment-13151545
 ] 

stack commented on HBASE-4799:
------------------------------

Thanks for digging in here Max.

This comment is now wrong?

{code}
+      // Remove daughters from the parent IFF the daughter region exists in FS.
+      // If there is no daughter region in the filesystem, must be because of
+      // a failed split.  The ServerShutdownHandler will do the fixup.  Don't
+      // do any deletes in here that could intefere with ServerShutdownHandler
+      // fixup
{code}

hasNoReferences will return if no daughter dir or if no references (so if no 
daughter dir we'll delete parent).

Otherwise, I'm fine w/ tying together the removals of splitA and splitB... all 
in the one go; it dumbs down the number of possible states which is usually a 
good thing.

One thing though, rather than removeDaughterFromParent, shouldn't we do the 
clear of both splitA and splitB in the one go since its same row (we could have 
strange case where splitA was removed but then we crash before splitB was 
removed).  Don't we need a removeDaughter*s*FromParent; i.e. plural?

Good stuff Max.


                
> Catalog Janitor logic bug causes region leackage
> ------------------------------------------------
>
>                 Key: HBASE-4799
>                 URL: https://issues.apache.org/jira/browse/HBASE-4799
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.4
>            Reporter: Max Lapan
>            Assignee: Max Lapan
>            Priority: Critical
>         Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
> 0002-Temporary-fix-to-remove-leaked-regions.patch
>
>
> When region split takes a significant amount of time, CatalogJanitor can 
> cleanup one of SPLIT records, but left another in META. When another split 
> finish, janitor cleans left SPLIT record, but parent regions haven't removed 
> from FS and META not cleared.
> The race condition is follows:
> 1. region split started
> 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
> (B) doesn't
> 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
> see that SPLITB has references and does nothing.
> 4. region B completes split
> 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
> and does nothing again.
> Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to