[
https://issues.apache.org/jira/browse/ZOOKEEPER-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated ZOOKEEPER-2288:
-------------------------------------
Attachment: ZOOKEEPER-2288.001.patch
This began with a discussion on the dev list. That mail thread contains more
details.
http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201510.mbox/%3cd23841d1.2e7c8%[email protected]%3E
Thank you to [~randgalt] for reporting this and showing a test case that
demonstrates the problem. To summarize, the shutdown sequence in
{{ZooKeeperServerMain#shutdown}} can close out client sockets before the
request processing pipeline shutdown gets a chance to ack transactions back to
those clients. This can cause a client application to retry erroneously,
resulting in an unexpected {{NoNode}} or {{NodeExists}} error.
I'm attaching a patch that fixes the shutdown sequence. This appears to be a
major contributing factor in the problem. I adapted Jordan's example to a
JUnit test. Without my patch, this test fails within a few seconds of making
repeated client calls. After applying my patch, I can run the same repeated
client calls for multiple minutes without failure.
However, this fix is incomplete and therefore isn't ready to commit. It's
still possible for this test to fail, even though it happens much less
frequently. There must be some other contributing factor. [~fpj] has a theory
that the transaction batching and flushing logic in {{SyncRequestProcessor}}
might not fully flush transactions if the request of death is queued under high
load.
> During shutdown, server may fail to ack completed transactions to clients.
> --------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2288
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2288
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: ZOOKEEPER-2288.001.patch
>
>
> During shutdown, requests may still be in flight in the request processing
> pipeline. Some of these requests have reached a state where the transaction
> has executed and committed, but has not yet been acknowledged back to the
> client. It's possible that these transactions will not ack to the client
> before the shutdown sequence completes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)