[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/647 Merged, close this PR. ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/647 Merged. Thanks @lvfangmin ! ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/647 @anmolnar should we get this in? ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user asfgit commented on the issue: https://github.com/apache/zookeeper/pull/647 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2518/ ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/647 retest this please ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/647 @lvfangmin Sounds acceptable. If the flaky cannot be fixed with my suggestion (waiting for client to disconnect), let's put the retry back in. I'll commit afterwards. Thanks. ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/647 @anmolnar what's your opinion with @hanm 's reply? ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/647 I think we can add Retry rules as long as the cause of the flaky is clear (e.g. this case, since ConnectionLoss is a well known flaky cause); what I worried previously was to apply it unanimously without analyzing the actual cause. ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user asfgit commented on the issue: https://github.com/apache/zookeeper/pull/647 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2474/ ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/647 @anmolnar I can understand your concern, let's remove the RetryRule for now, we can add it when necessary. ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/647 I still have bad feelings about introducing `RetryRule` in this patch. I haven't seen connectionLoss errors recently on the builds, neither on this patch after the fix and not sure it's a good thing to introduce it on an ad-hoc basis. Given that this 3.5. @hanm What are your thoughts? ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user asfgit commented on the issue: https://github.com/apache/zookeeper/pull/647 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2458/ ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user asfgit commented on the issue: https://github.com/apache/zookeeper/pull/647 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2454/ ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/647 retest this please ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user asfgit commented on the issue: https://github.com/apache/zookeeper/pull/647 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2389/ ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/647 I remember I commented in Jira ZOOKEEPER-3157, not sure why it didn't show up. I mentioned that we still need RetryRule, because there might be temporary quorum unstable issues like what we found on our test environment. The quorum set up in the test might be down due to leader election in case there is heavy load/limited resources on that test environment. We have seen this happened internally, so it's better to have retry for ConnectionLoss in this case. ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/647 @lvfangmin Given that I've already provided a fix for the flakyness in #657 , do we still need this retry rule? ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user asfgit commented on the issue: https://github.com/apache/zookeeper/pull/647 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376/ ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/647 >> I added a junit retry rule class to retry with specific exception LGTM, thanks @lvfangmin ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user asfgit commented on the issue: https://github.com/apache/zookeeper/pull/647 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2304/ ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/647 @anmolnar @hanm I added a junit retry rule class to retry with specific exception, currently I only use it to catch the connection loss exception in FuzzySnapshotRelatedTest, we can use this in other tests if there is similar issue. If this looks good to you I'll add it to 3.6 as well. ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/647 I'll commit this once the testing part is finalized. ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/647 @lvfangmin Got it. Fine. Go ahead please. Just make sure that all patches go under the same Jira, so that they couldn't get lost. ---
[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/647 @anmolnar here is the scenario shows why the previous fix on master has problem: 1. parent A is in it's parent's serializing list 2. before it's being serialized, child 1 was deleted in txn T1, and child 2 was created in txn T2 3. when parent A is serialized, it's cversion and pzxid is already updated correctly by T2 4. when reloading from disk, T1 will update the pzxid and left cversion there 5. T2 checked the node, and it's already there, then it goes to the patching process, and it found the parent's cversion is already up to date and skipped patching it, which leaves the pzxid in stale state ---