linmaolin created ZOOKEEPER-4378: ------------------------------------ Summary: org.apache.zookeeper.server.NettyServerCnxn does not increase outstandingCount properly, cause outstandingCount to overflow. Key: ZOOKEEPER-4378 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4378 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.9 Environment: 3.5.9.
The connection status response gives a very large outstanding_requests. {quote}# curl -s http://127.0.0.1:18080/commands/stats | jq .secure_connections [ { "remote_socket_address": "10.103.249.25:50284", "interest_ops": 1, "outstanding_requests": 16587, "packets_received": 72288, "packets_sent": 72288 }, { "remote_socket_address": "10.103.249.25:50232", "interest_ops": 1, "outstanding_requests": 10369, "packets_received": 18197, "packets_sent": 18198 }, { "remote_socket_address": "10.103.249.25:53490", "interest_ops": 1, "outstanding_requests": 2867, "packets_received": 3179, "packets_sent": 3180 } ] {quote} Reporter: linmaolin Fix For: 3.6.0 In the receiveMessage method the count is increased right after processPacket without judges anything as below. {quote}if (initialized) { // TODO: if zks.processPacket() is changed to take a ByteBuffer[], // we could implement zero-copy queueing. zks.processPacket(this, bb); if (zks.shouldThrottle(outstandingCount.incrementAndGet())) { disableRecvNoWait(); } } {quote} But after the request is handled, the decrease operation is taken only when xid is larger than 0. {quote}@Override public void sendResponse(ReplyHeader h, Record r, String tag) throws IOException { if (closingChannel || !channel.isOpen()) { return; } super.sendResponse(h, r, tag); if (h.getXid() > 0) { // zks cannot be null otherwise we would not have gotten here! if (!zkServer.shouldThrottle(outstandingCount.decrementAndGet())) { enableRecv(); } } } {quote} So the bultin xids like "PING", "AUTH", will make outstandingCount larger and larger, until it hit the limit; All the request on that connection will be refused. I see the problem is solved in 3.6.0 version, should there be a patch for 3.5.9? Looking forward for your reply, Thantks! -- This message was sent by Atlassian Jira (v8.3.4#803005)