Jake McArthur created ZOOKEEPER-3894: ----------------------------------------
Summary: Out-of-order response after session moved Key: ZOOKEEPER-3894 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3894 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Jake McArthur A bug in NIOServerCnxn can result in a client failing with an error about out of order xids. What actually happens, as I understand it, is: # Client attempts to renew its session on slow server S1. # The attempt times out. # Client attempts to renew its session on server S2. # The attempt succeeds. S2 now owns the session. # The client sends one or more requests. The responses are large enough that they fill the socket's buffer in S2. # The original attempt finally succeeds. S1 now owns the session, but the client is still connected to S2. # The client sends an asynchronous request A to S2. Because the session has moved, S2 instructs the NIOServerCnxn to close. This is implemented as an empty sentinel value added to the queue of outgoing buffers. # The client sends some read request B to S2, and the response is enqueued behind the sentinel. # The doIO method of NIOServerCnxn writes its enqueued buffers to the socket, and then it closes the socket because one of the buffers was the sentinel. # Before the client observes that the socket it closed, it receives the response for B, and fails with an error because it expected the response for A. I think the fix is simply to avoid writing messages that were enqueued after the sentinel. -- This message was sent by Atlassian Jira (v8.3.4#803005)