[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-12-18 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851856#comment-13851856
 ] 

Flavio Junqueira commented on ZOOKEEPER-1733:
-

Giving proper credit: Committed revision 1551987.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.4.6, 3.5.0
>
> Attachments: ZOOKEEPER-1733-3.4.patch, zookeeper-1733.patch, 
> zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-12-18 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851565#comment-13851565
 ] 

Flavio Junqueira commented on ZOOKEEPER-1733:
-

+1, thanks, Michi!

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.4.6, 3.5.0
>
> Attachments: ZOOKEEPER-1733-3.4.patch, zookeeper-1733.patch, 
> zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851080#comment-13851080
 ] 

Hadoop QA commented on ZOOKEEPER-1733:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12619188/ZOOKEEPER-1733-3.4.patch
  against trunk revision 1551624.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1847//console

This message is automatically generated.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.4.6, 3.5.0
>
> Attachments: ZOOKEEPER-1733-3.4.patch, zookeeper-1733.patch, 
> zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785001#comment-13785001
 ] 

Hudson commented on ZOOKEEPER-1733:
---

SUCCESS: Integrated in ZooKeeper-trunk #2077 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/2077/])
ZOOKEEPER-1733. FLETest#testLE is flaky on windows boxes (Jeffrey Zhong via 
phunt) (phunt: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528586)
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/FLETest.java


> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.5.0
>
> Attachments: zookeeper-1733.patch, zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784258#comment-13784258
 ] 

Hadoop QA commented on ZOOKEEPER-1733:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12594008/zookeeper-1733.patch
  against trunk revision 1528271.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1624//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1624//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1624//console

This message is automatically generated.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.5.0
>
> Attachments: zookeeper-1733.patch, zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784240#comment-13784240
 ] 

Hadoop QA commented on ZOOKEEPER-1733:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12606402/zookeeper-1733.patch
  against trunk revision 1528586.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1625//console

This message is automatically generated.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.5.0
>
> Attachments: zookeeper-1733.patch, zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770361#comment-13770361
 ] 

Hadoop QA commented on ZOOKEEPER-1733:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12594008/zookeeper-1733.patch
  against trunk revision 1524275.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1584//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1584//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1584//console

This message is automatically generated.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.5.0
>
> Attachments: zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-09-17 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770309#comment-13770309
 ] 

Mahadev konar commented on ZOOKEEPER-1733:
--

Running this through jenkins.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Fix For: 3.5.0
>
> Attachments: zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-24 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718912#comment-13718912
 ] 

Flavio Junqueira commented on ZOOKEEPER-1733:
-

+1, looks good, Jeffrey.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: zookeeper-1733.patch
>
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-24 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718062#comment-13718062
 ] 

Flavio Junqueira commented on ZOOKEEPER-1733:
-

Thanks for looking into it, Jeffrey. Are we still talking about the 3.4 branch 
or this is about trunk too? Here are a couple of comments:

- On top of making sure that a quorum leaves leader election, we should also 
check that the leader ends up thinking that it is the leader. It is a simple 
sanity check and I don't see a reason for removing it if we are not talking 
about the test failures on Windows.
- If you're still focusing on the 3.4, then the best path I can see is to apply 
ZOOKEEPER-1292 to branch 3.4. If it still doesn't work in trunk, as we have 
observed, then we need to work on a patch on top ZOOKEEPER-1292. I'm not 
comfortable removing checks, though, unless it is clear that it is not 
verifying anything.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-23 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717877#comment-13717877
 ] 

Jeffrey Zhong commented on ZOOKEEPER-1733:
--

Actually after bumping up the waitCounter, "Fewer than a a majority has joined" 
failure is gone while I still got "Leader hasn't joined" randomly in test case 
testSingleElection , testDoubleElection or testTripleElection. I think we 
should remove the check which doesn't verify any thing more.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-23 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717746#comment-13717746
 ] 

Jeffrey Zhong commented on ZOOKEEPER-1733:
--

[~enis] found that cause. Basically windows run of testTripleElection took more 
than 10 secs. After bumping up waitCounter to 200, FLETest passes consistently. 
I'll try to create a patch to port zookeeper-1292 to 3.4. Meanwhile I'll check 
why windows run takes longer than linux and may open another JIRA. Thanks.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-23 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717717#comment-13717717
 ] 

Flavio Junqueira commented on ZOOKEEPER-1733:
-

If you have cycles to look into it, please go ahead. There are other issues I'm 
looking into, so some help would be welcome.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-23 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717640#comment-13717640
 ] 

Jeffrey Zhong commented on ZOOKEEPER-1733:
--

Yes, trunk has other issues running on windows. Removing the check is only for 
3.4 test case. [~fpj] are you planning to attack this for trunk? otherwise, 
I'll try to dig the issue with my limited zookeeper knowledge.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-23 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716463#comment-13716463
 ] 

Flavio Junqueira commented on ZOOKEEPER-1733:
-

I ran it and checked the logs. The triple election test is just not completing. 
It goes through the first and the second rounds, but after the second leader 
dies, it doesn't elect the third. It needs some more investigation, I don't 
think removing the check as you suggest actually fixes it. 

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-22 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715883#comment-13715883
 ] 

Jeffrey Zhong commented on ZOOKEEPER-1733:
--

{bq}
I haven't had a chance to run on a windows box yet, but the message you have 
above says that it didn't get a majority, so it's not that the leader hasn't 
joined
{bq}
In trunk, it failed before the test can reach "Leader hasn't joined" 
verification because assert failed in step "Fewer than a a majority has joined".

The reason failed on windows is that windows quit a thread as soon as the 
thread function exits while it seems doesn't happen on a linux box.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-22 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715822#comment-13715822
 ] 

Flavio Junqueira commented on ZOOKEEPER-1733:
-

I haven't had a chance to run on a windows box yet, but the message you have 
above says that it didn't get a majority, so it's not that the leader hasn't 
joined. In any case, do you understand the reason why it fails on windows and 
not in other environments? Perhaps looking at the logs will give us some 
insight.

> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Priority: Minor
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-21 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714823#comment-13714823
 ] 

Jeffrey Zhong commented on ZOOKEEPER-1733:
--

The test case failed in trunk as well on windows. Sometimes failed with a 
different error(shown below) than the "Leader hasn't joined" error. How do you 
think to remove the lead thread alive check from the 3.4 test case as it should 
be covered in the majority verification? Thanks.

{code}
[junit] java.lang.AssertionError: Fewer than a a majority has joined
[junit] at org.junit.Assert.fail(Assert.java:93)
[junit] at 
org.apache.zookeeper.test.FLETest.runElection(FLETest.java:348)
[junit] at 
org.apache.zookeeper.test.FLETest.testTripleElection(FLETest.java:277)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
...
{code}



> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Priority: Minor
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1733) FLETest#testLE is flaky on windows boxes

2013-07-20 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714386#comment-13714386
 ] 

Flavio Junqueira commented on ZOOKEEPER-1733:
-

Could you check if you still have this problem with trunk, Jeffrey? We have 
improved that test in ZOOKEEPER-1292 but it got only to trunk. If necessary, we 
can think of porting it to the 4.2 branch.


> FLETest#testLE is flaky on windows boxes
> 
>
> Key: ZOOKEEPER-1733
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.5
>Reporter: Jeffrey Zhong
>Priority: Minor
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
> if(leader == i){
> synchronized(finalObj){
> successCount++;
> if(successCount > (count/2)) 
> finalObj.notify();
> }
> break;
> }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>if(threads.get((int) leader).isAlive()){
>Assert.fail("Leader hasn't joined: " + leader);
>}
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira