[
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734548#comment-15734548
]
Michael Han commented on ZOOKEEPER-2251:
----------------------------------------
Has anyone that watching this issue able to have a deterministic reproduce of
the issue? The timeout solution is trivial but it's important to try to figure
out root cause of packet loss otherwise we could just fix the surface and then
have unexpected bug bite us.
> Add Client side packet response timeout to avoid infinite wait.
> ---------------------------------------------------------------
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
> Issue Type: Bug
> Components: java client
> Affects Versions: 3.4.9, 3.5.2
> Reporter: nijel
> Assignee: Arshad Mohammad
> Priority: Critical
> Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch,
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it
> is waiting for the response/ACK for the operation performed (synchronous API
> used here).
> I am using
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately
> the server response packet lost. Now, client will enter into infinite
> waiting.
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add
> packet ACK timeout or another better approach) in the jira.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)