[jira] [Commented] (HBASE-4695) WAL logs get deleted before region server can fully flush

Ted Yu (Commented) (JIRA) Sat, 29 Oct 2011 21:23:57 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139532#comment-13139532
 ]


Ted Yu commented on HBASE-4695:
-------------------------------

Originally closeWAL(true) would be called even if this.fsOk is false. But now:
{code}
+    if (this.fsOk) {
       waitOnAllRegionsToClose(abortRequested);
+      if (!this.killed){
+        closeWAL(abortRequested ? false : true);
{code}
I think the call to closeWAL() should be placed outside if (this.fsOk) block.

The next step is to verify that the issue is really fixed.

Thanks for taking care of this, Jinchao.
                
> WAL logs get deleted before region server can fully flush
> ---------------------------------------------------------
>
>                 Key: HBASE-4695
>                 URL: https://issues.apache.org/jira/browse/HBASE-4695
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.90.4
>            Reporter: jack levin
>            Assignee: gaojinchao
>            Priority: Blocker
>             Fix For: 0.90.5
>
>         Attachments: HBASE-4695_branch90_trial.patch
>
>
> To replicate the problem do the following:
> 1. check /hbase/.logs/XXXX directory to see if you have WAL logs for the 
> region server you are shutting down.
> 2. executing kill <pid> (where pid is a regionserver pid)
> 3. Watch the regionserver log to start flushing, you will see how many 
> regions are left to flush:
> 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
> on 489 regions to close
> 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
> on 116 regions to close
> 4. Check /hbase/.logs/XXXX -- you will notice that it has dissapeared.
> 5. Check namenode logs:
> 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
> ugi=root ip=/10.101.1.5 cmd=delete 
> src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
> Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
> any WAL logs to replay.  We need to make sure that logs are deleted or moved 
> out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4695) WAL logs get deleted before region server can fully flush

Reply via email to