[VOTE] Release ZooKeeper 3.4.0 (candidate 1)

2011-11-05 Thread Mahadev Konar
Hi all,
 I have created a 3.4.0 rc 1. Thanks to all who helped and spent the
Friday night fixing bugs :).

*** Please download, test and VOTE before the
*** vote closes 5pm PT on Saturday, Nov12***

http://people.apache.org/~mahadev/zookeeper-3.4.0-candidate-1/

Should we release this?

Please try out the release ASAP. Would appreciate early feedback.

thanks
mahadev


[jira] [Resolved] (ZOOKEEPER-1291) AcceptedEpoch not updated at leader before it proposes the epoch to followers

2011-11-05 Thread Camille Fournier (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier resolved ZOOKEEPER-1291.
-

   Resolution: Fixed
Fix Version/s: 3.5.0
 Release Note: Revision 1198053

> AcceptedEpoch not updated at leader before it proposes the epoch to followers
> -
>
> Key: ZOOKEEPER-1291
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1291
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 3.4.0
>Reporter: Alexander Shraer
>Assignee: Alexander Shraer
> Fix For: 3.4.0, 3.5.0
>
>
> It is possible that a leader proposes an epoch e and a follower adopts it by 
> setting acceptedEpoch to e but the leader itself hasn't yet done so. 
> While I'm not sure this contradicts Zab (there is no description of where the 
> leader actually sets its acceptedEpoch), it is very counter intuitive.
> The fix is to set acceptedEpoch in getEpochToPropose, i.e., before anyone 
> LearnerHandler passes the getEpochToPropose barrier.
> The fix is done as part of ZK-1264

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (ZOOKEEPER-1282) Learner.java not following Zab 1.0 protocol - setCurrentEpoch should be done upon receipt of NEWLEADER (before acking it) and not upon receipt of UPTODATE

2011-11-05 Thread Camille Fournier (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier resolved ZOOKEEPER-1282.
-

   Resolution: Fixed
Fix Version/s: 3.5.0
   3.4.0
 Release Note: Revision 1198053

> Learner.java not following Zab 1.0 protocol - setCurrentEpoch should be done 
> upon receipt of NEWLEADER (before acking it) and not upon receipt of UPTODATE
> --
>
> Key: ZOOKEEPER-1282
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1282
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 3.4.0
>Reporter: Alexander Shraer
>Assignee: Benjamin Reed
> Fix For: 3.4.0, 3.5.0
>
>
> according to https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
> phase 2 part 2, "Once it receives NEWLEADER(e) it atomically applies
> the new state and sets f.currentEpoch =e. "
> In Learner.java self.setCurrentEpoch(newEpoch) is done after receiving
> UPTODATE and not before acking the NEWLEADER message as should be.
> case Leader.UPTODATE:
> if (!snapshotTaken) {
> zk.takeSnapshot();
> }
> self.cnxnFactory.setZooKeeperServer(zk);
> break outerLoop;
> case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
> zk.takeSnapshot();
> snapshotTaken = true;
> writePacket(new QuorumPacket(Leader.ACK,
> newLeaderZxid, null, null), true);
> break;
> }
> }
> }
> long newEpoch = ZxidUtils.getEpochFromZxid(newLeaderZxid);
> self.setCurrentEpoch(newEpoch);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144823#comment-13144823
 ] 

Hadoop QA commented on ZOOKEEPER-1264:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12502617/ZOOKEEPER-1264-final.patch
  against trunk revision 1197891.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/781//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/781//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/781//console

This message is automatically generated.

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-br34-final.patch, ZOOKEEPER-1264-branch34.patch, 
> ZOOKEEPER-1264-final.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, ZOOKEEPER-ACCEPTEDEPOCH-trunk.patch, 
> ZOOKEEPER-ACCEPTEDEPOCH.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144824#comment-13144824
 ] 

Hadoop QA commented on ZOOKEEPER-1264:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12502617/ZOOKEEPER-1264-final.patch
  against trunk revision 1197891.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/782//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/782//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/782//console

This message is automatically generated.

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-br34-final.patch, ZOOKEEPER-1264-branch34.patch, 
> ZOOKEEPER-1264-final.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, ZOOKEEPER-ACCEPTEDEPOCH-trunk.patch, 
> ZOOKEEPER-ACCEPTEDEPOCH.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Success: ZOOKEEPER-1264 PreCommit Build #782

2011-11-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/782/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 140671 lines...]
 [exec] BUILD SUCCESSFUL
 [exec] Total time: 0 seconds
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] +1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12502617/ZOOKEEPER-1264-final.patch
 [exec]   against trunk revision 1197891.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/782//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/782//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/782//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] i0kclYeULQ logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 23 minutes 34 seconds
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1264
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed


Success: ZOOKEEPER-1264 PreCommit Build #781

2011-11-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/781/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 142205 lines...]
 [exec] BUILD SUCCESSFUL
 [exec] Total time: 0 seconds
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] +1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12502617/ZOOKEEPER-1264-final.patch
 [exec]   against trunk revision 1197891.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/781//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/781//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/781//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] Z6TPY4qwTw logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 23 minutes 43 seconds
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1264
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed


[jira] [Updated] (ZOOKEEPER-1237) ERRORs being logged when queued responses are sent after socket has closed.

2011-11-05 Thread Mahadev konar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1237:
-

Fix Version/s: (was: 3.4.0)
   3.4.1

Moving it out of 3.4.0, not a blocker.

> ERRORs being logged when queued responses are sent after socket has closed.
> ---
>
> Key: ZOOKEEPER-1237
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1237
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.3.4, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
> Fix For: 3.3.4, 3.5.0, 3.4.1
>
>
> After applying ZOOKEEPER-1049 to 3.3.3 (I believe the same problem exists in 
> 3.4/3.5 but haven't tested this) I'm seeing the following exception more 
> frequently:
> {noformat}
> Oct 19, 1:31:53 PM ERROR
> Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
> at 
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
> at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
> at 
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
> {noformat}
> This is a long standing problem where we try to send a response after the 
> socket has been closed. Prior to ZOOKEEPER-1049 this issues happened much 
> less frequently (2 sec linger), but I believe it was possible. The timing 
> window is just wider now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Failed: ZOOKEEPER-ZOOKEEPER-1264 PreCommit Build #780

2011-11-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-ZOOKEEPER-1264
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/780/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by user camille
Building remotely on hadoop9
Reverting /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk
Updating http://svn.apache.org/repos/asf/zookeeper/trunk
At revision 1198045
no change for http://svn.apache.org/repos/asf/zookeeper/trunk since the 
previous build
No emails were triggered.
[PreCommit-ZOOKEEPER-Build] $ /bin/bash /tmp/hudson3800632606099907024.sh
/home/jenkins/tools/java/latest/bin/java
Buildfile: build.xml

check-for-findbugs:

findbugs.check:

forrest.check:

hudson-test-patch:
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Testing patch for ZOOKEEPER-ZOOKEEPER-1264.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] At revision 1198045.
 [exec] ZOOKEEPER-ZOOKEEPER-1264 is not "Patch Available".  Exiting.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 2 seconds
Archiving artifacts
ERROR: No artifacts found that match the file pattern 
"trunk/build/test/findbugs/newPatchFindbugsWarnings.html,trunk/patchprocess/*.txt,trunk/patchprocess/*Warnings.xml,trunk/build/test/test-cppunit/*.txt,trunk/build/tmp/zk.log".
 Configuration error?
ERROR: 'trunk/build/test/findbugs/newPatchFindbugsWarnings.html' doesn't match 
anything: 'trunk' exists but not 
'trunk/build/test/findbugs/newPatchFindbugsWarnings.html'
Build step 'Archive the artifacts' changed build result to FAILURE
Recording test results
Description set: ZOOKEEPER-1264
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.


[jira] [Updated] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Camille Fournier (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1264:


Attachment: ZOOKEEPER-1264-final.patch

final trunk patch

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-br34-final.patch, ZOOKEEPER-1264-branch34.patch, 
> ZOOKEEPER-1264-final.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, ZOOKEEPER-ACCEPTEDEPOCH-trunk.patch, 
> ZOOKEEPER-ACCEPTEDEPOCH.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1282) Learner.java not following Zab 1.0 protocol - setCurrentEpoch should be done upon receipt of NEWLEADER (before acking it) and not upon receipt of UPTODATE

2011-11-05 Thread Alexander Shraer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Shraer updated ZOOKEEPER-1282:


Issue Type: Sub-task  (was: Bug)
Parent: ZOOKEEPER-1264

> Learner.java not following Zab 1.0 protocol - setCurrentEpoch should be done 
> upon receipt of NEWLEADER (before acking it) and not upon receipt of UPTODATE
> --
>
> Key: ZOOKEEPER-1282
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1282
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 3.4.0
>Reporter: Alexander Shraer
>Assignee: Benjamin Reed
>
> according to https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
> phase 2 part 2, "Once it receives NEWLEADER(e) it atomically applies
> the new state and sets f.currentEpoch =e. "
> In Learner.java self.setCurrentEpoch(newEpoch) is done after receiving
> UPTODATE and not before acking the NEWLEADER message as should be.
> case Leader.UPTODATE:
> if (!snapshotTaken) {
> zk.takeSnapshot();
> }
> self.cnxnFactory.setZooKeeperServer(zk);
> break outerLoop;
> case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
> zk.takeSnapshot();
> snapshotTaken = true;
> writePacket(new QuorumPacket(Leader.ACK,
> newLeaderZxid, null, null), true);
> break;
> }
> }
> }
> long newEpoch = ZxidUtils.getEpochFromZxid(newLeaderZxid);
> self.setCurrentEpoch(newEpoch);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1291) AcceptedEpoch not updated at leader before it proposes the epoch to followers

2011-11-05 Thread Alexander Shraer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Shraer updated ZOOKEEPER-1291:


Issue Type: Sub-task  (was: Bug)
Parent: ZOOKEEPER-1264

> AcceptedEpoch not updated at leader before it proposes the epoch to followers
> -
>
> Key: ZOOKEEPER-1291
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1291
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 3.4.0
>Reporter: Alexander Shraer
>Assignee: Alexander Shraer
> Fix For: 3.4.0
>
>
> It is possible that a leader proposes an epoch e and a follower adopts it by 
> setting acceptedEpoch to e but the leader itself hasn't yet done so. 
> While I'm not sure this contradicts Zab (there is no description of where the 
> leader actually sets its acceptedEpoch), it is very counter intuitive.
> The fix is to set acceptedEpoch in getEpochToPropose, i.e., before anyone 
> LearnerHandler passes the getEpochToPropose barrier.
> The fix is done as part of ZK-1264

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (ZOOKEEPER-1291) AcceptedEpoch not updated at leader before it proposes the epoch to followers

2011-11-05 Thread Alexander Shraer (Created) (JIRA)
AcceptedEpoch not updated at leader before it proposes the epoch to followers
-

 Key: ZOOKEEPER-1291
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1291
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.0
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.4.0


It is possible that a leader proposes an epoch e and a follower adopts it by 
setting acceptedEpoch to e but the leader itself hasn't yet done so. 

While I'm not sure this contradicts Zab (there is no description of where the 
leader actually sets its acceptedEpoch), it is very counter intuitive.

The fix is to set acceptedEpoch in getEpochToPropose, i.e., before anyone 
LearnerHandler passes the getEpochToPropose barrier.

The fix is done as part of ZK-1264

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (ZOOKEEPER-1282) Learner.java not following Zab 1.0 protocol - setCurrentEpoch should be done upon receipt of NEWLEADER (before acking it) and not upon receipt of UPTODATE

2011-11-05 Thread Camille Fournier (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier reassigned ZOOKEEPER-1282:
---

Assignee: Benjamin Reed

> Learner.java not following Zab 1.0 protocol - setCurrentEpoch should be done 
> upon receipt of NEWLEADER (before acking it) and not upon receipt of UPTODATE
> --
>
> Key: ZOOKEEPER-1282
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1282
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.0
>Reporter: Alexander Shraer
>Assignee: Benjamin Reed
>
> according to https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
> phase 2 part 2, "Once it receives NEWLEADER(e) it atomically applies
> the new state and sets f.currentEpoch =e. "
> In Learner.java self.setCurrentEpoch(newEpoch) is done after receiving
> UPTODATE and not before acking the NEWLEADER message as should be.
> case Leader.UPTODATE:
> if (!snapshotTaken) {
> zk.takeSnapshot();
> }
> self.cnxnFactory.setZooKeeperServer(zk);
> break outerLoop;
> case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
> zk.takeSnapshot();
> snapshotTaken = true;
> writePacket(new QuorumPacket(Leader.ACK,
> newLeaderZxid, null, null), true);
> break;
> }
> }
> }
> long newEpoch = ZxidUtils.getEpochFromZxid(newLeaderZxid);
> self.setCurrentEpoch(newEpoch);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Update on my 1270 testing

2011-11-05 Thread Flavio Junqueira

I'm fine with your proposal. -Flavio

On Nov 5, 2011, at 8:15 PM, Camille Fournier wrote:

2 has been flaky for so long, not sure whether it's worth being a  
blocker.

The AsyncHammerTests never pass for me locally. Not sure if it's a
problem or not... I am tempted to go with Mahadev on this and get this
3.4 release out the door. I would be happy to help manage a 3.4.1
release soon thereafter if we find serious issues.

C

On Sat, Nov 5, 2011 at 3:01 PM, Flavio Junqueira   
wrote:

If 2) is flakey,  we need to fix it, no?

-Flavio

On Nov 5, 2011, at 6:14 PM, Patrick Hunt wrote:

I ran the 1270-1194 patch continually overnight (trunk) in my ci  
env,

after ~25 test runs I saw 4 failures:

1) #402 - QuorumTest.testFollowersStartAfterLeader
2) #407 - org.apache.zookeeper.test.FLETest.testLE
3) #410 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
4) #415 - org.apache.zookeeper.test.AsyncHammerTest.testHammer

1) client could not connect to reestablished quorum: giving up after
30+ seconds.
2) known flakey test
3) QP failed to shutdown in 30 seconds:
QuorumPeer[myid=3]0.0.0.0/0.0.0.0:11224
4) QP failed to shutdown in 30 seconds:
QuorumPeer[myid=1]0.0.0.0/0.0.0.0:11222

On the plus side no "testearlyleaderabandon" failures.

On the minus side 3/4 are a bit worrysome. Searching back through  
all
my previous failures I don't see this happening. Perhaps these  
changes
have shifted some timing? My main concern is that this might be  
caused

directly by the patch itself

Patrick


flavio
junqueira

research scientist

f...@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300fax (408) 349 3301




flavio
junqueira

research scientist

f...@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300fax (408) 349 3301



Re: Update on my 1270 testing

2011-11-05 Thread Camille Fournier
2 has been flaky for so long, not sure whether it's worth being a blocker.
The AsyncHammerTests never pass for me locally. Not sure if it's a
problem or not... I am tempted to go with Mahadev on this and get this
3.4 release out the door. I would be happy to help manage a 3.4.1
release soon thereafter if we find serious issues.

C

On Sat, Nov 5, 2011 at 3:01 PM, Flavio Junqueira  wrote:
> If 2) is flakey,  we need to fix it, no?
>
> -Flavio
>
> On Nov 5, 2011, at 6:14 PM, Patrick Hunt wrote:
>
>> I ran the 1270-1194 patch continually overnight (trunk) in my ci env,
>> after ~25 test runs I saw 4 failures:
>>
>> 1) #402 - QuorumTest.testFollowersStartAfterLeader
>> 2) #407 - org.apache.zookeeper.test.FLETest.testLE
>> 3) #410 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
>> 4) #415 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
>>
>> 1) client could not connect to reestablished quorum: giving up after
>> 30+ seconds.
>> 2) known flakey test
>> 3) QP failed to shutdown in 30 seconds:
>> QuorumPeer[myid=3]0.0.0.0/0.0.0.0:11224
>> 4) QP failed to shutdown in 30 seconds:
>> QuorumPeer[myid=1]0.0.0.0/0.0.0.0:11222
>>
>> On the plus side no "testearlyleaderabandon" failures.
>>
>> On the minus side 3/4 are a bit worrysome. Searching back through all
>> my previous failures I don't see this happening. Perhaps these changes
>> have shifted some timing? My main concern is that this might be caused
>> directly by the patch itself
>>
>> Patrick
>
> flavio
> junqueira
>
> research scientist
>
> f...@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>


[jira] [Commented] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144794#comment-13144794
 ] 

Hadoop QA commented on ZOOKEEPER-1264:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12502608/ZOOKEEPER-1264-br34-final.patch
  against trunk revision 1197891.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/779//console

This message is automatically generated.

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-br34-final.patch, ZOOKEEPER-1264-branch34.patch, 
> ZOOKEEPER-1264-latest.patch, ZOOKEEPER-1264-merge.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, ZOOKEEPER-ACCEPTEDEPOCH-trunk.patch, 
> ZOOKEEPER-ACCEPTEDEPOCH.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Failed: ZOOKEEPER-1264 PreCommit Build #779

2011-11-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/779/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 87 lines...]
 [exec] Hunk #10 FAILED at 614.
 [exec] 4 out of 10 hunks FAILED -- saving rejects to file 
src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java.rej
 [exec] (Stripping trailing CRs from patch.)
 [exec] patching file 
src/java/main/org/apache/zookeeper/server/quorum/Leader.java
 [exec] Hunk #1 FAILED at 311.
 [exec] Hunk #2 FAILED at 770.
 [exec] Hunk #3 succeeded at 776 (offset -6 lines).
 [exec] 2 out of 3 hunks FAILED -- saving rejects to file 
src/java/main/org/apache/zookeeper/server/quorum/Leader.java.rej
 [exec] (Stripping trailing CRs from patch.)
 [exec] patching file 
src/java/main/org/apache/zookeeper/server/quorum/Learner.java
 [exec] PATCH APPLICATION FAILED
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12502608/ZOOKEEPER-1264-br34-final.patch
 [exec]   against trunk revision 1197891.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] -1 patch.  The patch command could not apply the patch.
 [exec] 
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/779//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] cUIq714nry logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1576:
 exec returned: 1

Total time: 39 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1264
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.


[jira] [Updated] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Camille Fournier (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1264:


Attachment: ZOOKEEPER-1264-br34-final.patch

Final version of fix for 3.4 branch

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-br34-final.patch, ZOOKEEPER-1264-branch34.patch, 
> ZOOKEEPER-1264-latest.patch, ZOOKEEPER-1264-merge.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, ZOOKEEPER-ACCEPTEDEPOCH-trunk.patch, 
> ZOOKEEPER-ACCEPTEDEPOCH.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Alexander Shraer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Shraer updated ZOOKEEPER-1264:


Attachment: ZOOKEEPER-ACCEPTEDEPOCH-trunk.patch

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, ZOOKEEPER-ACCEPTEDEPOCH-trunk.patch, 
> ZOOKEEPER-ACCEPTEDEPOCH.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Update on my 1270 testing

2011-11-05 Thread Flavio Junqueira

If 2) is flakey,  we need to fix it, no?

-Flavio

On Nov 5, 2011, at 6:14 PM, Patrick Hunt wrote:


I ran the 1270-1194 patch continually overnight (trunk) in my ci env,
after ~25 test runs I saw 4 failures:

1) #402 - QuorumTest.testFollowersStartAfterLeader
2) #407 - org.apache.zookeeper.test.FLETest.testLE
3) #410 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
4) #415 - org.apache.zookeeper.test.AsyncHammerTest.testHammer

1) client could not connect to reestablished quorum: giving up after
30+ seconds.
2) known flakey test
3) QP failed to shutdown in 30 seconds:  
QuorumPeer[myid=3]0.0.0.0/0.0.0.0:11224
4) QP failed to shutdown in 30 seconds:  
QuorumPeer[myid=1]0.0.0.0/0.0.0.0:11222


On the plus side no "testearlyleaderabandon" failures.

On the minus side 3/4 are a bit worrysome. Searching back through all
my previous failures I don't see this happening. Perhaps these changes
have shifted some timing? My main concern is that this might be caused
directly by the patch itself

Patrick


flavio
junqueira

research scientist

f...@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300fax (408) 349 3301



Re: Update on my 1270 testing

2011-11-05 Thread Mahadev Konar
Thanks for stats Pat. 3) and 4) though a little worrisome but we can
open a jira against 3.4.1 and look at fixing them later. I'd think
they shouldnt  be a blocker for 3.4 release. What do others think?

thanks
mahadev

On Sat, Nov 5, 2011 at 10:14 AM, Patrick Hunt  wrote:
> I ran the 1270-1194 patch continually overnight (trunk) in my ci env,
> after ~25 test runs I saw 4 failures:
>
> 1) #402 - QuorumTest.testFollowersStartAfterLeader
> 2) #407 - org.apache.zookeeper.test.FLETest.testLE
> 3) #410 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
> 4) #415 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
>
> 1) client could not connect to reestablished quorum: giving up after
> 30+ seconds.
> 2) known flakey test
> 3) QP failed to shutdown in 30 seconds: 
> QuorumPeer[myid=3]0.0.0.0/0.0.0.0:11224
> 4) QP failed to shutdown in 30 seconds: 
> QuorumPeer[myid=1]0.0.0.0/0.0.0.0:11222
>
> On the plus side no "testearlyleaderabandon" failures.
>
> On the minus side 3/4 are a bit worrysome. Searching back through all
> my previous failures I don't see this happening. Perhaps these changes
> have shifted some timing? My main concern is that this might be caused
> directly by the patch itself
>
> Patrick
>


[jira] [Commented] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Flavio Junqueira (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144789#comment-13144789
 ] 

Flavio Junqueira commented on ZOOKEEPER-1264:
-

+1, looks right to me.

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, ZOOKEEPER-ACCEPTEDEPOCH.patch, 
> followerresyncfailure_log.txt.gz, logs.zip, tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144788#comment-13144788
 ] 

Hadoop QA commented on ZOOKEEPER-1264:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12502606/ZOOKEEPER-ACCEPTEDEPOCH.patch
  against trunk revision 1197891.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/778//console

This message is automatically generated.

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, ZOOKEEPER-ACCEPTEDEPOCH.patch, 
> followerresyncfailure_log.txt.gz, logs.zip, tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Failed: ZOOKEEPER-1264 PreCommit Build #778

2011-11-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/778/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 66 lines...]
 [exec] 
==
 [exec] 
 [exec] 
 [exec] patching file 
src/java/main/org/apache/zookeeper/server/quorum/Leader.java
 [exec] Hunk #1 FAILED at 311.
 [exec] Hunk #2 FAILED at 770.
 [exec] Hunk #3 succeeded at 777 (offset -6 lines).
 [exec] 2 out of 3 hunks FAILED -- saving rejects to file 
src/java/main/org/apache/zookeeper/server/quorum/Leader.java.rej
 [exec] PATCH APPLICATION FAILED
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12502606/ZOOKEEPER-ACCEPTEDEPOCH.patch
 [exec]   against trunk revision 1197891.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] -1 patch.  The patch command could not apply the patch.
 [exec] 
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/778//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 786O5XJoEs logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1576:
 exec returned: 1

Total time: 39 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1264
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.


[jira] [Updated] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Alexander Shraer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Shraer updated ZOOKEEPER-1264:


Attachment: ZOOKEEPER-ACCEPTEDEPOCH.patch

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, ZOOKEEPER-ACCEPTEDEPOCH.patch, 
> followerresyncfailure_log.txt.gz, logs.zip, tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144782#comment-13144782
 ] 

Hadoop QA commented on ZOOKEEPER-1264:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12502605/ZOOKEEPER-1264-latest.patch
  against trunk revision 1197891.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/777//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/777//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/777//console

This message is automatically generated.

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Success: ZOOKEEPER-1264 PreCommit Build #777

2011-11-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/777/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 141386 lines...]
 [exec] BUILD SUCCESSFUL
 [exec] Total time: 0 seconds
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] +1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12502605/ZOOKEEPER-1264-latest.patch
 [exec]   against trunk revision 1197891.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/777//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/777//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/777//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 0s9S2zOL71 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 23 minutes 36 seconds
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1264
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed


[jira] [Commented] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Alexander Shraer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144780#comment-13144780
 ] 

Alexander Shraer commented on ZOOKEEPER-1264:
-

I know where the problem is. Ben assumes that acceptedEpoch is set atomically 
before the leader replies to the client with LEADERINFO, but it doesn't. The 
leader sets it after returning from getEpochToPropose. This is a bug. I'll 
upload a small patch in a sec


> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Camille Fournier (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144779#comment-13144779
 ] 

Camille Fournier commented on ZOOKEEPER-1264:
-

Ben added a check to testNormalRun at line 502:

 Assert.assertEquals(1, l.self.getAcceptedEpoch());

This line consistently fails for me. I'm not sure why this would be failing in 
the 3.4 branch but not trunk.

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Camille Fournier (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1264:


Attachment: ZOOKEEPER-1264-latest.patch

This is the patch to trunk, verifying via hudson that it passes everything.

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-latest.patch, 
> ZOOKEEPER-1264-merge.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Flavio Junqueira (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144758#comment-13144758
 ] 

Flavio Junqueira commented on ZOOKEEPER-1264:
-

I'll have a look.

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-merge.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Update on my 1270 testing

2011-11-05 Thread Patrick Hunt
I ran the 1270-1194 patch continually overnight (trunk) in my ci env,
after ~25 test runs I saw 4 failures:

1) #402 - QuorumTest.testFollowersStartAfterLeader
2) #407 - org.apache.zookeeper.test.FLETest.testLE
3) #410 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
4) #415 - org.apache.zookeeper.test.AsyncHammerTest.testHammer

1) client could not connect to reestablished quorum: giving up after
30+ seconds.
2) known flakey test
3) QP failed to shutdown in 30 seconds: QuorumPeer[myid=3]0.0.0.0/0.0.0.0:11224
4) QP failed to shutdown in 30 seconds: QuorumPeer[myid=1]0.0.0.0/0.0.0.0:11222

On the plus side no "testearlyleaderabandon" failures.

On the minus side 3/4 are a bit worrysome. Searching back through all
my previous failures I don't see this happening. Perhaps these changes
have shifted some timing? My main concern is that this might be caused
directly by the patch itself

Patrick


[jira] [Commented] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144745#comment-13144745
 ] 

Hadoop QA commented on ZOOKEEPER-1264:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12502600/ZOOKEEPER-1264-34-bad.patch
  against trunk revision 1197891.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/776//console

This message is automatically generated.

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-merge.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Failed: ZOOKEEPER-1264 PreCommit Build #776

2011-11-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/776/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 85 lines...]
 [exec] Hunk #4 FAILED at 210.
 [exec] Hunk #5 succeeded at 227 (offset -11 lines).
 [exec] Hunk #6 succeeded at 271 (offset -11 lines).
 [exec] Hunk #7 succeeded at 488 (offset -11 lines).
 [exec] Hunk #8 succeeded at 520 (offset -11 lines).
 [exec] Hunk #9 succeeded at 560 (offset -11 lines).
 [exec] Hunk #10 FAILED at 614.
 [exec] 4 out of 10 hunks FAILED -- saving rejects to file 
src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java.rej
 [exec] (Stripping trailing CRs from patch.)
 [exec] patching file 
src/java/main/org/apache/zookeeper/server/quorum/Learner.java
 [exec] PATCH APPLICATION FAILED
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12502600/ZOOKEEPER-1264-34-bad.patch
 [exec]   against trunk revision 1197891.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] -1 patch.  The patch command could not apply the patch.
 [exec] 
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/776//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 4BR3I2Q187 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1576:
 exec returned: 1

Total time: 39 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1264
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.


[jira] [Updated] (ZOOKEEPER-1264) FollowerResyncConcurrencyTest failing intermittently

2011-11-05 Thread Camille Fournier (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Fournier updated ZOOKEEPER-1264:


Attachment: ZOOKEEPER-1264-34-bad.patch

I still cannot manage a clean patch of this issue and tests to the 3.4 branch. 
This is my latest attempt, which still fails the Zab1_0Test. Can someone else 
please take a look here?

> FollowerResyncConcurrencyTest failing intermittently
> 
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.3.3, 3.4.0, 3.5.0
>Reporter: Patrick Hunt
>Assignee: Camille Fournier
>Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264-34-bad.patch, 
> ZOOKEEPER-1264-branch34.patch, ZOOKEEPER-1264-merge.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, ZOOKEEPER-1264.patch, 
> ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch, 
> ZOOKEEPER-1264_branch34.patch, ZOOKEEPER-1264unittest.patch, 
> ZOOKEEPER-1264unittest.patch, followerresyncfailure_log.txt.gz, logs.zip, 
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently. 
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
>at 
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
>at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (ZOOKEEPER-1270) testEarlyLeaderAbandonment failing intermittently, quorum formed, no serving.

2011-11-05 Thread Flavio Junqueira (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira resolved ZOOKEEPER-1270.
-

Resolution: Fixed

Sounds right, thanks for the clarification.

> testEarlyLeaderAbandonment failing intermittently, quorum formed, no serving.
> -
>
> Key: ZOOKEEPER-1270
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1270
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Patrick Hunt
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1270-1194.patch, 
> ZOOKEEPER-1270-and-1194-branch34.patch, ZOOKEEPER-1270-and-1194.patch, 
> ZOOKEEPER-1270-and-1194.patch, ZOOKEEPER-1270.patch, ZOOKEEPER-1270.patch, 
> ZOOKEEPER-1270_br34.patch, ZOOKEEPER-1270tests.patch, 
> ZOOKEEPER-1270tests2.patch, testEarlyLeaderAbandonment.txt.gz, 
> testEarlyLeaderAbandonment2.txt.gz, testEarlyLeaderAbandonment3.txt.gz, 
> testEarlyLeaderAbandonment4.txt.gz, zookeeper-1270-1194-34.patch
>
>
> Looks pretty serious - quorum is formed but no clients can attach. Will 
> attach logs momentarily.
> This test was introduced in the following commit (all three jira commit at 
> once):
> ZOOKEEPER-335. zookeeper servers should commit the new leader txn to their 
> logs.
> ZOOKEEPER-1081. modify leader/follower code to correctly deal with new leader
> ZOOKEEPER-1082. modify leader election to correctly take into account current

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1270) testEarlyLeaderAbandonment failing intermittently, quorum formed, no serving.

2011-11-05 Thread Alexander Shraer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144658#comment-13144658
 ] 

Alexander Shraer commented on ZOOKEEPER-1270:
-

Hi Flavio,

LearnerHandler sends NEWLEADER messages after running waitForEpochAck.
The patch makes sure that waitForEpochAck does not return until the leader also 
runs it. Leader runs waitForEpochAck after inserting NEWLEADER message into the 
outstandingProposals.

Alex

> testEarlyLeaderAbandonment failing intermittently, quorum formed, no serving.
> -
>
> Key: ZOOKEEPER-1270
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1270
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Patrick Hunt
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1270-1194.patch, 
> ZOOKEEPER-1270-and-1194-branch34.patch, ZOOKEEPER-1270-and-1194.patch, 
> ZOOKEEPER-1270-and-1194.patch, ZOOKEEPER-1270.patch, ZOOKEEPER-1270.patch, 
> ZOOKEEPER-1270_br34.patch, ZOOKEEPER-1270tests.patch, 
> ZOOKEEPER-1270tests2.patch, testEarlyLeaderAbandonment.txt.gz, 
> testEarlyLeaderAbandonment2.txt.gz, testEarlyLeaderAbandonment3.txt.gz, 
> testEarlyLeaderAbandonment4.txt.gz, zookeeper-1270-1194-34.patch
>
>
> Looks pretty serious - quorum is formed but no clients can attach. Will 
> attach logs momentarily.
> This test was introduced in the following commit (all three jira commit at 
> once):
> ZOOKEEPER-335. zookeeper servers should commit the new leader txn to their 
> logs.
> ZOOKEEPER-1081. modify leader/follower code to correctly deal with new leader
> ZOOKEEPER-1082. modify leader election to correctly take into account current

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1270) testEarlyLeaderAbandonment failing intermittently, quorum formed, no serving.

2011-11-05 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144651#comment-13144651
 ] 

Hudson commented on ZOOKEEPER-1270:
---

Integrated in ZooKeeper-trunk #1356 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/1356/])
ZOOKEEPER-1270. testEarlyLeaderAbandonment failing intermittently, quorum 
formed, no serving. (Flavio, Camille and Alexander Shraer via mahadev)

mahadev : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1197891
Files : 
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java
* 
/zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java
* 
/zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/ClientBase.java


> testEarlyLeaderAbandonment failing intermittently, quorum formed, no serving.
> -
>
> Key: ZOOKEEPER-1270
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1270
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Patrick Hunt
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1270-1194.patch, 
> ZOOKEEPER-1270-and-1194-branch34.patch, ZOOKEEPER-1270-and-1194.patch, 
> ZOOKEEPER-1270-and-1194.patch, ZOOKEEPER-1270.patch, ZOOKEEPER-1270.patch, 
> ZOOKEEPER-1270_br34.patch, ZOOKEEPER-1270tests.patch, 
> ZOOKEEPER-1270tests2.patch, testEarlyLeaderAbandonment.txt.gz, 
> testEarlyLeaderAbandonment2.txt.gz, testEarlyLeaderAbandonment3.txt.gz, 
> testEarlyLeaderAbandonment4.txt.gz, zookeeper-1270-1194-34.patch
>
>
> Looks pretty serious - quorum is formed but no clients can attach. Will 
> attach logs momentarily.
> This test was introduced in the following commit (all three jira commit at 
> once):
> ZOOKEEPER-335. zookeeper servers should commit the new leader txn to their 
> logs.
> ZOOKEEPER-1081. modify leader/follower code to correctly deal with new leader
> ZOOKEEPER-1082. modify leader election to correctly take into account current

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (ZOOKEEPER-1270) testEarlyLeaderAbandonment failing intermittently, quorum formed, no serving.

2011-11-05 Thread Flavio Junqueira (Reopened) (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira reopened ZOOKEEPER-1270:
-


Sorry for missing part of the fun.

Alex, Thanks for spotting the duplicate, that was a great catch. I'm not 
convinced that your patch completely solves the problem, though. LearnerHandler 
sends NEWLEADER concurrently with the leader adding NEWLEADER to 
outstandingRequests, so even fixing the barriers as you did does not prevent 
the race I mentioned above, I think.

I must say that I didn't find my proposed patch very elegant, but I'm not 
entirely sure that yours covers all cases, so let me know what you think.  

> testEarlyLeaderAbandonment failing intermittently, quorum formed, no serving.
> -
>
> Key: ZOOKEEPER-1270
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1270
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Patrick Hunt
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1270-1194.patch, 
> ZOOKEEPER-1270-and-1194-branch34.patch, ZOOKEEPER-1270-and-1194.patch, 
> ZOOKEEPER-1270-and-1194.patch, ZOOKEEPER-1270.patch, ZOOKEEPER-1270.patch, 
> ZOOKEEPER-1270_br34.patch, ZOOKEEPER-1270tests.patch, 
> ZOOKEEPER-1270tests2.patch, testEarlyLeaderAbandonment.txt.gz, 
> testEarlyLeaderAbandonment2.txt.gz, testEarlyLeaderAbandonment3.txt.gz, 
> testEarlyLeaderAbandonment4.txt.gz, zookeeper-1270-1194-34.patch
>
>
> Looks pretty serious - quorum is formed but no clients can attach. Will 
> attach logs momentarily.
> This test was introduced in the following commit (all three jira commit at 
> once):
> ZOOKEEPER-335. zookeeper servers should commit the new leader txn to their 
> logs.
> ZOOKEEPER-1081. modify leader/follower code to correctly deal with new leader
> ZOOKEEPER-1082. modify leader election to correctly take into account current

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




ZooKeeper-trunk-solaris - Build # 40 - Failure

2011-11-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/40/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 130491 lines...]
[junit] 2011-11-05 08:51:58,432 [myid:] - INFO  
[Thread-4:NIOServerCnxn@1000] - Closed socket connection for client 
/127.0.0.1:38730 (no session established for client)
[junit] 2011-11-05 08:51:58,432 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2011-11-05 08:51:58,433 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2011-11-05 08:51:58,433 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2011-11-05 08:51:58,433 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2011-11-05 08:51:58,433 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2011-11-05 08:51:58,433 [myid:] - INFO  [main:ClientBase@435] - 
STOPPING server
[junit] 2011-11-05 08:51:58,434 [myid:] - INFO  [main:ZooKeeperServer@391] 
- shutting down
[junit] 2011-11-05 08:51:58,434 [myid:] - INFO  
[main:SessionTrackerImpl@206] - Shutting down
[junit] 2011-11-05 08:51:58,434 [myid:] - INFO  
[main:PrepRequestProcessor@694] - Shutting down
[junit] 2011-11-05 08:51:58,434 [myid:] - INFO  
[main:SyncRequestProcessor@173] - Shutting down
[junit] 2011-11-05 08:51:58,434 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@134] - PrepRequestProcessor exited loop!
[junit] 2011-11-05 08:51:58,435 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited!
[junit] 2011-11-05 08:51:58,435 [myid:] - INFO  
[main:FinalRequestProcessor@419] - shutdown of request processor complete
[junit] 2011-11-05 08:51:58,435 [myid:] - INFO  [main:ClientBase@227] - 
connecting to 127.0.0.1 11221
[junit] 2011-11-05 08:51:58,436 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[]
[junit] 2011-11-05 08:51:58,437 [myid:] - INFO  [main:ClientBase@428] - 
STARTING server
[junit] 2011-11-05 08:51:58,437 [myid:] - INFO  [main:ZooKeeperServer@143] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test4778098318111395137.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test4778098318111395137.junit.dir/version-2
[junit] 2011-11-05 08:51:58,438 [myid:] - INFO  
[main:NIOServerCnxnFactory@110] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2011-11-05 08:51:58,438 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test4778098318111395137.junit.dir/version-2/snapshot.b
[junit] 2011-11-05 08:51:58,440 [myid:] - INFO  [main:FileTxnSnapLog@255] - 
Snapshotting: b
[junit] 2011-11-05 08:51:58,442 [myid:] - INFO  [main:ClientBase@227] - 
connecting to 127.0.0.1 11221
[junit] 2011-11-05 08:51:58,442 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@213] - 
Accepted socket connection from /127.0.0.1:38732
[junit] 2011-11-05 08:51:58,442 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@820] - Processing 
stat command from /127.0.0.1:38732
[junit] 2011-11-05 08:51:58,443 [myid:] - INFO  
[Thread-5:NIOServerCnxn$StatCommand@655] - Stat command output
[junit] 2011-11-05 08:51:58,443 [myid:] - INFO  
[Thread-5:NIOServerCnxn@1000] - Closed socket connection for client 
/127.0.0.1:38732 (no session established for client)
[junit] 2011-11-05 08:51:58,448 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2011-11-05 08:51:58,449 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2011-11-05 08:51:58,449 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2011-11-05 08:51:58,449 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2011-11-05 08:51:58,449 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2011-11-05 08:51:58,450 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota
[junit] 2011-11-05 08:51:58,450 [myid:] - INFO  [main:ClientBase@465] - 
tearDown starting
[junit] 2011-11-05 08:51:58,528 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@511] - EventThread shut down
[j

[jira] [Commented] (ZOOKEEPER-1288) Always log sessionId and zxid as hexadecimals

2011-11-05 Thread Thomas Koch (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144621#comment-13144621
 ] 

Thomas Koch commented on ZOOKEEPER-1288:


I didn't want to suggest changing the serialized format, just the in memory 
format. The StatPersisted object can be generated for serialization from the 
data stored in the DataNode.

> Always log sessionId and zxid as hexadecimals
> -
>
> Key: ZOOKEEPER-1288
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1288
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Thomas Koch
>Assignee: Thomas Koch
>
> At some points, sessionIds or zxid are written in decimal numbers to the log 
> but most of the time as hexadecimals. It's an unnecessary hassle to manually 
> convert these numbers to find additional log lines referring the same 
> numbers. Or worse people may not know that there may be additional 
> information available if they also search for the decimal representation of a 
> number.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira