[jira] [Updated] (ZOOKEEPER-3072) Race condition in throttling

2018-07-27 Thread Benjamin Reed (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Reed updated ZOOKEEPER-3072:
-
Fix Version/s: 3.5.4
   3.6.0

> Race condition in throttling
> 
>
> Key: ZOOKEEPER-3072
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4
>Reporter: Botond Hejj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.4, 3.6.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There is a race condition in the server throttling code. It is possible that 
> the disableRecv is called after enableRecv.
> Basically, the I/O work thread does this in processPacket: 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102]
>  
>     submitRequest(si);
>     }
>     }
>     cnxn.incrOutstandingRequests(h);
>     }
>  
> incrOutstandingRequests() checks for limit breach, and potentially turns on 
> throttling, 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384]
>  
> submitRequest() will create a logical request and en-queue it so that 
> Processor thread can pick it up. After being de-queued by Processor thread, 
> it does necessary handling, and then calls this 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459]
>  :
>  
>     cnxn.sendResponse(hdr, rsp, "response");
>  
> and in sendResponse(), it first appends to outgoing buffer, and then checks 
> if un-throttle is needed:  
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708]
>  
> However, if there is a context switch between submitRequest() and 
> cnxn.incrOutstandingRequests(), so that Processor thread completes 
> cnxn.sendResponse() call before I/O thread switches back, then enableRecv() 
> will happen before disableRecv(), and enableRecv() will fail the CAS ops, 
> while disableRecv() will succeed, resulting in a deadlock: un-throttle is 
> needed for letting in requests, and sendResponse is needed to trigger 
> un-throttle, but sendResponse() requires an incoming message. From that point 
> on, ZK server will no longer select the affected client socket for read, 
> leading to the observed client-side failure in the subject.
> If you would like to reproduce this than setting the globalOutstandingLimit 
> down to 1 makes this reproducible easier as throttling starts with less 
> requests. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3072) Race condition in throttling

2018-07-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-3072:
--
Labels: pull-request-available  (was: )

> Race condition in throttling
> 
>
> Key: ZOOKEEPER-3072
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4
>Reporter: Botond Hejj
>Priority: Major
>  Labels: pull-request-available
>
> There is a race condition in the server throttling code. It is possible that 
> the disableRecv is called after enableRecv.
> Basically, the I/O work thread does this in processPacket: 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102]
>  
>     submitRequest(si);
>     }
>     }
>     cnxn.incrOutstandingRequests(h);
>     }
>  
> incrOutstandingRequests() checks for limit breach, and potentially turns on 
> throttling, 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384]
>  
> submitRequest() will create a logical request and en-queue it so that 
> Processor thread can pick it up. After being de-queued by Processor thread, 
> it does necessary handling, and then calls this 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459]
>  :
>  
>     cnxn.sendResponse(hdr, rsp, "response");
>  
> and in sendResponse(), it first appends to outgoing buffer, and then checks 
> if un-throttle is needed:  
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708]
>  
> However, if there is a context switch between submitRequest() and 
> cnxn.incrOutstandingRequests(), so that Processor thread completes 
> cnxn.sendResponse() call before I/O thread switches back, then enableRecv() 
> will happen before disableRecv(), and enableRecv() will fail the CAS ops, 
> while disableRecv() will succeed, resulting in a deadlock: un-throttle is 
> needed for letting in requests, and sendResponse is needed to trigger 
> un-throttle, but sendResponse() requires an incoming message. From that point 
> on, ZK server will no longer select the affected client socket for read, 
> leading to the observed client-side failure in the subject.
> If you would like to reproduce this than setting the globalOutstandingLimit 
> down to 1 makes this reproducible easier as throttling starts with less 
> requests. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)