[ 
https://issues.apache.org/jira/browse/HBASE-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520487#comment-14520487
 ] 

Lars Hofhansl commented on HBASE-13592:
---------------------------------------

TestFlushRegionEntry looks suspicious, but only failed in the one build and 
passes consistently locally.

> RegionServer sometimes gets stuck during shutdown in case of cache flush 
> failures
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-13592
>                 URL: https://issues.apache.org/jira/browse/HBASE-13592
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.10
>            Reporter: Vikas Vishwakarma
>            Assignee: Vikas Vishwakarma
>             Fix For: 0.98.13
>
>         Attachments: HBASE-13592-0.98.patch
>
>
> Observed that RegionServer sometimes gets stuck during shutdown in case of 
> cache flush failures. On adding few debug logs and looking through the stack 
> trace RegionServer process looks stuck in closeWAL -> hlog.close -> 
> closeBarrier.stopAndDrainOps(); during the shutdown sequence in the run method
> From the RegionServer logs we see there are multiple attempts to flush cache 
> for a particular region which increments the beginOp count in DrainBarrier 
> but all the flush attempts fails somewhere in wal sync and the DrainBarrier 
> endOp count decrement never happens. Later on when shutdown is initiated 
> RegionServer process is permanently stuck here
> In this case hbase stop also does not work and RegionServer process has to be 
> explicitly killed using kill -9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to