[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-11-11 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
Merged, close this PR.


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-11-08 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
Merged. Thanks @lvfangmin !


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-11-08 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
@anmolnar should we get this in?


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-26 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/647
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2518/



---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-26 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
retest this please


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-26 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
@lvfangmin Sounds acceptable.
If the flaky cannot be fixed with my suggestion (waiting for client to 
disconnect), let's put the retry back in.
I'll commit afterwards. Thanks.


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-22 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
@anmolnar what's your opinion with @hanm 's reply?


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-18 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
I think we can add Retry rules as long as the cause of the flaky is clear 
(e.g. this case, since ConnectionLoss is a well known flaky cause); what I 
worried previously was to apply it unanimously without analyzing the actual 
cause. 


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-18 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/647
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2474/



---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-18 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
@anmolnar I can understand your concern, let's remove the RetryRule for 
now, we can add it when necessary.


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-16 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
I still have bad feelings about introducing `RetryRule` in this patch. I 
haven't seen connectionLoss errors recently on the builds, neither on this 
patch after the fix and not sure it's a good thing to introduce it on an ad-hoc 
basis. Given that this 3.5.
@hanm What are your thoughts?


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-16 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/647
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2458/



---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-16 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/647
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2454/



---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-16 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
retest this please


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-08 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/647
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2389/



---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-08 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
I remember I commented in Jira ZOOKEEPER-3157, not sure why it didn't show 
up. 

I mentioned that we still need RetryRule, because there might be temporary 
quorum unstable issues like what we found on our test environment. The quorum 
set up in the test might be down due to leader election in case there is heavy 
load/limited resources on that test environment. We have seen this happened 
internally, so it's better to have retry for ConnectionLoss in this case.


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-08 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
@lvfangmin Given that I've already provided a fix for the flakyness in #657 
, do we still need this retry rule?


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-06 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/647
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2376/



---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-04 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
>>  I added a junit retry rule class to retry with specific exception

LGTM, thanks @lvfangmin 


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-01 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/647
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2304/



---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-01 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
@anmolnar @hanm I added a junit retry rule class to retry with specific 
exception, currently I only use it to catch the connection loss exception in 
FuzzySnapshotRelatedTest, we can use this in other tests if there is similar 
issue.

If this looks good to you I'll add it to 3.6 as well.


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-01 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
I'll commit this once the testing part is finalized.


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-10-01 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
@lvfangmin Got it. Fine. Go ahead please. Just make sure that all patches 
go under the same Jira, so that they couldn't get lost.


---


[GitHub] zookeeper issue #647: [ZOOKEEPER-3125] Fixing pzxid consistent issue when re...

2018-09-28 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/647
  
@anmolnar here is the scenario shows why the previous fix on master has 
problem:

1. parent A is in it's parent's serializing list
2. before it's being serialized, child 1 was deleted in txn T1, and child 2 
was created in txn T2
3. when parent A is serialized, it's cversion and pzxid is already updated 
correctly by T2
4. when reloading from disk, T1 will update the pzxid and left cversion 
there
5. T2 checked the node, and it's already there, then it goes to the 
patching process, and it found the parent's cversion is already up to date and 
skipped patching it, which leaves the pzxid in stale state


---