[ https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560593#comment-16560593 ]
Hudson commented on ZOOKEEPER-3072: ----------------------------------- SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #125 (See [https://builds.apache.org/job/ZooKeeper-trunk/125/]) ZOOKEEPER-3072: Throttle race condition fix (breed: rev 2a372fcdce3c0142c0bb23f06098a2c1a49f807e) * (edit) src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java > Race condition in throttling > ---------------------------- > > Key: ZOOKEEPER-3072 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4 > Reporter: Botond Hejj > Priority: Major > Labels: pull-request-available > Fix For: 3.5.4, 3.6.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > There is a race condition in the server throttling code. It is possible that > the disableRecv is called after enableRecv. > Basically, the I/O work thread does this in processPacket: > [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102] > > submitRequest(si); > } > } > cnxn.incrOutstandingRequests(h); > } > > incrOutstandingRequests() checks for limit breach, and potentially turns on > throttling, > [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384] > > submitRequest() will create a logical request and en-queue it so that > Processor thread can pick it up. After being de-queued by Processor thread, > it does necessary handling, and then calls this > [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459] > : > > cnxn.sendResponse(hdr, rsp, "response"); > > and in sendResponse(), it first appends to outgoing buffer, and then checks > if un-throttle is needed: > [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708] > > However, if there is a context switch between submitRequest() and > cnxn.incrOutstandingRequests(), so that Processor thread completes > cnxn.sendResponse() call before I/O thread switches back, then enableRecv() > will happen before disableRecv(), and enableRecv() will fail the CAS ops, > while disableRecv() will succeed, resulting in a deadlock: un-throttle is > needed for letting in requests, and sendResponse is needed to trigger > un-throttle, but sendResponse() requires an incoming message. From that point > on, ZK server will no longer select the affected client socket for read, > leading to the observed client-side failure in the subject. > If you would like to reproduce this than setting the globalOutstandingLimit > down to 1 makes this reproducible easier as throttling starts with less > requests. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)