[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated ZOOKEEPER-2288:
-------------------------------------
    Attachment: ZOOKEEPER-2288.001.patch

This began with a discussion on the dev list.  That mail thread contains more 
details.

http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201510.mbox/%3cd23841d1.2e7c8%[email protected]%3E

Thank you to [~randgalt] for reporting this and showing a test case that 
demonstrates the problem.  To summarize, the shutdown sequence in 
{{ZooKeeperServerMain#shutdown}} can close out client sockets before the 
request processing pipeline shutdown gets a chance to ack transactions back to 
those clients.  This can cause a client application to retry erroneously, 
resulting in an unexpected  {{NoNode}} or {{NodeExists}} error.

I'm attaching a patch that fixes the shutdown sequence.  This appears to be a 
major contributing factor in the problem.  I adapted Jordan's example to a 
JUnit test.  Without my patch, this test fails within a few seconds of making 
repeated client calls.  After applying my patch, I can run the same repeated 
client calls for multiple minutes without failure.

However, this fix is incomplete and therefore isn't ready to commit.  It's 
still possible for this test to fail, even though it happens much less 
frequently.  There must be some other contributing factor.  [~fpj] has a theory 
that the transaction batching and flushing logic in {{SyncRequestProcessor}} 
might not fully flush transactions if the request of death is queued under high 
load.

> During shutdown, server may fail to ack completed transactions to clients.
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2288
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2288
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: ZOOKEEPER-2288.001.patch
>
>
> During shutdown, requests may still be in flight in the request processing 
> pipeline.  Some of these requests have reached a state where the transaction 
> has executed and committed, but has not yet been acknowledged back to the 
> client.  It's possible that these transactions will not ack to the client 
> before the shutdown sequence completes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to