[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851856#comment-13851856 ] Flavio Junqueira commented on ZOOKEEPER-1733: - Giving proper credit: Committed revision 1551987. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.4.6, 3.5.0 > > Attachments: ZOOKEEPER-1733-3.4.patch, zookeeper-1733.patch, > zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851565#comment-13851565 ] Flavio Junqueira commented on ZOOKEEPER-1733: - +1, thanks, Michi! > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.4.6, 3.5.0 > > Attachments: ZOOKEEPER-1733-3.4.patch, zookeeper-1733.patch, > zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851080#comment-13851080 ] Hadoop QA commented on ZOOKEEPER-1733: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619188/ZOOKEEPER-1733-3.4.patch against trunk revision 1551624. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1847//console This message is automatically generated. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.4.6, 3.5.0 > > Attachments: ZOOKEEPER-1733-3.4.patch, zookeeper-1733.patch, > zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785001#comment-13785001 ] Hudson commented on ZOOKEEPER-1733: --- SUCCESS: Integrated in ZooKeeper-trunk #2077 (See [https://builds.apache.org/job/ZooKeeper-trunk/2077/]) ZOOKEEPER-1733. FLETest#testLE is flaky on windows boxes (Jeffrey Zhong via phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528586) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/FLETest.java > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.5.0 > > Attachments: zookeeper-1733.patch, zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784258#comment-13784258 ] Hadoop QA commented on ZOOKEEPER-1733: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594008/zookeeper-1733.patch against trunk revision 1528271. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1624//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1624//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1624//console This message is automatically generated. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.5.0 > > Attachments: zookeeper-1733.patch, zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784240#comment-13784240 ] Hadoop QA commented on ZOOKEEPER-1733: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606402/zookeeper-1733.patch against trunk revision 1528586. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1625//console This message is automatically generated. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.5.0 > > Attachments: zookeeper-1733.patch, zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770361#comment-13770361 ] Hadoop QA commented on ZOOKEEPER-1733: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594008/zookeeper-1733.patch against trunk revision 1524275. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1584//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1584//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1584//console This message is automatically generated. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.5.0 > > Attachments: zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770309#comment-13770309 ] Mahadev konar commented on ZOOKEEPER-1733: -- Running this through jenkins. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 3.5.0 > > Attachments: zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718912#comment-13718912 ] Flavio Junqueira commented on ZOOKEEPER-1733: - +1, looks good, Jeffrey. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Attachments: zookeeper-1733.patch > > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718062#comment-13718062 ] Flavio Junqueira commented on ZOOKEEPER-1733: - Thanks for looking into it, Jeffrey. Are we still talking about the 3.4 branch or this is about trunk too? Here are a couple of comments: - On top of making sure that a quorum leaves leader election, we should also check that the leader ends up thinking that it is the leader. It is a simple sanity check and I don't see a reason for removing it if we are not talking about the test failures on Windows. - If you're still focusing on the 3.4, then the best path I can see is to apply ZOOKEEPER-1292 to branch 3.4. If it still doesn't work in trunk, as we have observed, then we need to work on a patch on top ZOOKEEPER-1292. I'm not comfortable removing checks, though, unless it is clear that it is not verifying anything. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717877#comment-13717877 ] Jeffrey Zhong commented on ZOOKEEPER-1733: -- Actually after bumping up the waitCounter, "Fewer than a a majority has joined" failure is gone while I still got "Leader hasn't joined" randomly in test case testSingleElection , testDoubleElection or testTripleElection. I think we should remove the check which doesn't verify any thing more. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717746#comment-13717746 ] Jeffrey Zhong commented on ZOOKEEPER-1733: -- [~enis] found that cause. Basically windows run of testTripleElection took more than 10 secs. After bumping up waitCounter to 200, FLETest passes consistently. I'll try to create a patch to port zookeeper-1292 to 3.4. Meanwhile I'll check why windows run takes longer than linux and may open another JIRA. Thanks. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717717#comment-13717717 ] Flavio Junqueira commented on ZOOKEEPER-1733: - If you have cycles to look into it, please go ahead. There are other issues I'm looking into, so some help would be welcome. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717640#comment-13717640 ] Jeffrey Zhong commented on ZOOKEEPER-1733: -- Yes, trunk has other issues running on windows. Removing the check is only for 3.4 test case. [~fpj] are you planning to attack this for trunk? otherwise, I'll try to dig the issue with my limited zookeeper knowledge. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716463#comment-13716463 ] Flavio Junqueira commented on ZOOKEEPER-1733: - I ran it and checked the logs. The triple election test is just not completing. It goes through the first and the second rounds, but after the second leader dies, it doesn't elect the third. It needs some more investigation, I don't think removing the check as you suggest actually fixes it. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715883#comment-13715883 ] Jeffrey Zhong commented on ZOOKEEPER-1733: -- {bq} I haven't had a chance to run on a windows box yet, but the message you have above says that it didn't get a majority, so it's not that the leader hasn't joined {bq} In trunk, it failed before the test can reach "Leader hasn't joined" verification because assert failed in step "Fewer than a a majority has joined". The reason failed on windows is that windows quit a thread as soon as the thread function exits while it seems doesn't happen on a linux box. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715822#comment-13715822 ] Flavio Junqueira commented on ZOOKEEPER-1733: - I haven't had a chance to run on a windows box yet, but the message you have above says that it didn't get a majority, so it's not that the leader hasn't joined. In any case, do you understand the reason why it fails on windows and not in other environments? Perhaps looking at the logs will give us some insight. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Priority: Minor > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714823#comment-13714823 ] Jeffrey Zhong commented on ZOOKEEPER-1733: -- The test case failed in trunk as well on windows. Sometimes failed with a different error(shown below) than the "Leader hasn't joined" error. How do you think to remove the lead thread alive check from the 3.4 test case as it should be covered in the majority verification? Thanks. {code} [junit] java.lang.AssertionError: Fewer than a a majority has joined [junit] at org.junit.Assert.fail(Assert.java:93) [junit] at org.apache.zookeeper.test.FLETest.runElection(FLETest.java:348) [junit] at org.apache.zookeeper.test.FLETest.testTripleElection(FLETest.java:277) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) ... {code} > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Priority: Minor > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714386#comment-13714386 ] Flavio Junqueira commented on ZOOKEEPER-1733: - Could you check if you still have this problem with trunk, Jeffrey? We have improved that test in ZOOKEEPER-1292 but it got only to trunk. If necessary, we can think of porting it to the 4.2 branch. > FLETest#testLE is flaky on windows boxes > > > Key: ZOOKEEPER-1733 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.5 >Reporter: Jeffrey Zhong >Priority: Minor > > FLETest#testLE fail intermittently on windows boxes. The reason is that in > LEThread#run() we have: > {code} > if(leader == i){ > synchronized(finalObj){ > successCount++; > if(successCount > (count/2)) > finalObj.notify(); > } > break; > } > {code} > Basically once we have a confirmed leader, the leader thread dies due to the > "break" of while loop. > While in the verification step, we check if the leader thread alive or not as > following: > {code} >if(threads.get((int) leader).isAlive()){ >Assert.fail("Leader hasn't joined: " + leader); >} > {code} > On windows boxes, the above verification step fails frequently because leader > thread most likely already exits. > Do we know why we have the leader alive verification step only lead thread > can bump up successCount >= count/2? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira