[ 
https://issues.apache.org/jira/browse/HBASE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-2447:
-------------------------------

    Attachment: hbase-2447.txt

Here's a patch that I believe fixes the issue (at least I haven't seen it 
reoccur since). The way I triggered the problem was to pause a RS with kill 
-STOP for 62 seconds - when it came back from sleeping state all if its writes 
to hlogs would fail, but pending writes would still try to sync. It correctly 
tried to shut down but was left in a state where multiple threads were waiting 
on the syncer, but the syncer had already exited, so it never shut down.

> LogSyncer.addToSyncQueue doesn't check if syncer is still running before 
> waiting
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-2447
>                 URL: https://issues.apache.org/jira/browse/HBASE-2447
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hbase-2447.txt
>
>
> In testing GC pause scenarios with kill -STOP, I got the regionserver into a 
> situation where it was blocked forever while shutting down (also blocking 
> clients, since the RPCs were still pinging). The root issue is that, if the 
> log syncer has an error just as more edits are being done, addToSyncQueue() 
> can go to sleep waiting on a syncer which has just died.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to