[
https://issues.apache.org/jira/browse/ZOOKEEPER-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mutu updated ZOOKEEPER-4817:
----------------------------
Description:
Recently, we encounter an confused issue. The client disconnection warning
disappears in system log. However, sometimes, this message appears in system
log. There is a cluster consisting of three node. A client sends many creation
requests and then read the node created by the first request. The client read
operation failed due to missing node. We watch the system log. Sometimes, there
is a client disconnection warning. Sometimes, there is not. After
investigating, when NIOServerCnxn.doIO is stuck in any IO point in this
function and the stuck time exceeds 20s, the client disconnection warning will
disappear. If the stuck time is less than 20s, the client disconnection warning
will appear in system log.
We find that the root cause is
When the doIO encounters the slowdown caused by teh fail-slow nic, the context
is same.
Are there any comments to figure out this issues? I will very appreciate them.
was:
If the client connection is disconnected with zoo server, cancelledkeyexception
will arise.
Here is a strange scenarios.
NIOServerCxn.doIO is blocked at line 333 by the fail-slow nic.
If the delay lasts more than 30s, cancelledkeyexception will disappear.
If the delay lasts for 25s, cancelledkeyexception will arise.
When the doIO encounters the slowdown caused by teh fail-slow nic, the context
is same.
Are there any comments to figure out this issues? I will very appreciate them.
> CancelledKeyException does not work in some cases.
> --------------------------------------------------
>
> Key: ZOOKEEPER-4817
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4817
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.10.0
> Reporter: mutu
> Priority: Major
> Attachments: node1-25.log, node1-60.log, node2-25.log, node2-60.log,
> node3-25.log, node3-60.log
>
>
> Recently, we encounter an confused issue. The client disconnection warning
> disappears in system log. However, sometimes, this message appears in system
> log. There is a cluster consisting of three node. A client sends many
> creation requests and then read the node created by the first request. The
> client read operation failed due to missing node. We watch the system log.
> Sometimes, there is a client disconnection warning. Sometimes, there is not.
> After investigating, when NIOServerCnxn.doIO is stuck in any IO point in this
> function and the stuck time exceeds 20s, the client disconnection warning
> will disappear. If the stuck time is less than 20s, the client disconnection
> warning will appear in system log.
> We find that the root cause is
>
> When the doIO encounters the slowdown caused by teh fail-slow nic, the
> context is same.
> Are there any comments to figure out this issues? I will very appreciate them.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)