[jira] [Commented] (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog

Phabricator (Commented) (JIRA) Fri, 28 Oct 2011 16:09:58 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138908#comment-13138908
 ]


Phabricator commented on HBASE-2312:
------------------------------------

khemani has commented on the revision "HBASE-2312 [jira] Possible data loss 
when RS goes into GC pause while rolling HLog".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java:223 
Basically what Nicolas said ... setting the abort flag here will not have any 
immediate effect.

  From HMaster code
    status.setStatus("Splitting logs after master startup");
      this.fileSystemManager.
    splitLogAfterStartup(this.serverManager.getOnlineServers().keySet());
    // Make sure root and meta assigned before proceeding.
    assignRootAndMeta(status);

  the master thread will try to assignRootAndMeta() even if 
splitLogAfterStartup() returns with abort flag set. May be if HMaster.abort() 
had set the interrupt flag on at least the main master thread then it might 
have worked ...

REVISION DETAIL
  https://reviews.facebook.net/D99

                
> Possible data loss when RS goes into GC pause while rolling HLog
> ----------------------------------------------------------------
>
>                 Key: HBASE-2312
>                 URL: https://issues.apache.org/jira/browse/HBASE-2312
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.90.0
>            Reporter: Karthik Ranganathan
>            Assignee: Nicolas Spiegelberg
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: D99.1.patch
>
>
> There is a very corner case when bad things could happen(ie data loss):
> 1)    RS #1 is going to roll its HLog - not yet created the new one, old one 
> will get no more writes
> 2)    RS #1 enters GC Pause of Death
> 3)    Master lists HLog files of RS#1 that is has to split as RS#1 is dead, 
> starts splitting
> 4)    RS #1 wakes up, created the new HLog (previous one was rolled) and 
> appends an edit - which is lost
> The following seems like a possible solution:
> 1)    Master detects RS#1 is dead
> 2)    The master renames the /hbase/.logs/<regionserver name>  directory to 
> something else (say /hbase/.logs/<regionserver name>-dead)
> 3)    Add mkdir support (as opposed to mkdirs) to HDFS - so that a file 
> create fails if the directory doesn't exist. Dhruba tells me this is very 
> doable.
> 4)    RS#1 comes back up and is not able create the new hlog. It restarts 
> itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2312) Possible data loss when RS goes into GC pause while rolling HLog

Reply via email to