[ https://issues.apache.org/jira/browse/HBASE-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520487#comment-14520487 ]
Lars Hofhansl commented on HBASE-13592: --------------------------------------- TestFlushRegionEntry looks suspicious, but only failed in the one build and passes consistently locally. > RegionServer sometimes gets stuck during shutdown in case of cache flush > failures > --------------------------------------------------------------------------------- > > Key: HBASE-13592 > URL: https://issues.apache.org/jira/browse/HBASE-13592 > Project: HBase > Issue Type: Bug > Affects Versions: 0.98.10 > Reporter: Vikas Vishwakarma > Assignee: Vikas Vishwakarma > Fix For: 0.98.13 > > Attachments: HBASE-13592-0.98.patch > > > Observed that RegionServer sometimes gets stuck during shutdown in case of > cache flush failures. On adding few debug logs and looking through the stack > trace RegionServer process looks stuck in closeWAL -> hlog.close -> > closeBarrier.stopAndDrainOps(); during the shutdown sequence in the run method > From the RegionServer logs we see there are multiple attempts to flush cache > for a particular region which increments the beginOp count in DrainBarrier > but all the flush attempts fails somewhere in wal sync and the DrainBarrier > endOp count decrement never happens. Later on when shutdown is initiated > RegionServer process is permanently stuck here > In this case hbase stop also does not work and RegionServer process has to be > explicitly killed using kill -9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)