[
https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228518#comment-13228518
]
Patrick Hunt commented on ZOOKEEPER-1412:
-----------------------------------------
Honestly I didn't have time to look into all of these scenarios, but I have a
concern wrt introducing this change into 3.3 code line, I'd be more comfortable
look at such a change for 3.4 or 3.5. For example, if we did introduce this
further changes, is there any chance to get a ping response which updates the
lastzxid, then disconnect from the server before getting a notification that
would then be missed on reconnect?
I'd like to cut a 3.3.5 with what I have so far, which clearly fixes the issue
identified (and 3.3.5 fixes another critical issues to get in user's hands). We
can then look at further improvements in a separate jira. Make sense?
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
> Key: ZOOKEEPER-1412
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
> Reporter: Botond Hejj
> Assignee: Patrick Hunt
> Priority: Blocker
> Fix For: 3.3.5, 3.4.4, 3.5.0
>
> Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch,
> ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch,
> ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The
> inconsistency relates to the behavior after the client reconnects to the
> zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger
> which should have been triggered also if the connections was not lost. This
> means if I watch for changes in node /foo and there is no change there than
> my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always
> triggered on reconnect. This is consistently happening if I connect to a
> follower in the ensemble and I don't do any operation which goes through the
> leader.
> Looking at the code I see that the client stores the lastzxid and sends that
> with its request. This is 0 on startup and will be updated everytime from the
> server replies. This lastzxid is also sent to the server after reconnect
> together with watches. The server decides which watch to trigger based on
> this lastzxid probably because that should mean the last known state of the
> client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there
> was already a request to the server to set the watch and in the reply the
> server could have sent back the zxid but it turns out that it sends just 0.
> Looking at the server code I see that for requests which doesn't go through
> the leader the follower server just sends back the same zxid that the client
> sent.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira