Failed: ZOOKEEPER-1140 PreCommit Build #478

2011-08-30 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1140
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/478/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 207477 lines...]
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12490942/ZOOKEEPER-1140.patch
 [exec]   against trunk revision 1163015.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/478//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/478//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/478//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 5D38WioebT logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1450:
 exec returned: 1

Total time: 22 minutes 31 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1140
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed


[jira] [Commented] (ZOOKEEPER-1140) server shutdown is not stopping threads

2011-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093470#comment-13093470
 ] 

Hadoop QA commented on ZOOKEEPER-1140:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12490942/ZOOKEEPER-1140.patch
  against trunk revision 1163015.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/478//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/478//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/478//console

This message is automatically generated.

 server shutdown is not stopping threads
 ---

 Key: ZOOKEEPER-1140
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1140
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.0
Reporter: Patrick Hunt
Assignee: Laxman
Priority: Blocker
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1140.patch


 Near the end of QuorumZxidSyncTest there are tons of threads running - 115 
 ProcessThread threads, similar numbers of SessionTracker.
 Also I see ~100 ReadOnlyRequestProcessor - why is this running as a separate 
 thread? (henry/flavio?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1165) better eclipse support in tests

2011-08-30 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-1165:
-

Fix Version/s: (was: 3.4.0)
   3.5.0

Not a blocker. Moving it out!

 better eclipse support in tests
 ---

 Key: ZOOKEEPER-1165
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1165
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.4.0
 Environment: Eclipse
Reporter: Warren Turkal
Assignee: Warren Turkal
Priority: Minor
  Labels: patch
 Fix For: 3.5.0

 Attachments: BaseSysTest.java.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The Eclipse test runner tries to run tests from all classes that inherit from 
 TestCase. However, this class is inherited by at least one class 
 (org.apache.zookeeper.test.system.BaseSysTest) that has no test cases as it 
 is used as infrastructure for other real test cases. This patch annotates 
 that class with @Ignore, which causes the class to be Ignored. Also, due to 
 the way annotations are not inherited by default, this patch will not affect 
 classes that inherit from this class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira





[jira] [Commented] (ZOOKEEPER-1136) NEW_LEADER should be queued not sent to match the Zab 1.0 protocol on the twiki

2011-08-30 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093490#comment-13093490
 ] 

Mahadev konar commented on ZOOKEEPER-1136:
--

Ben,
 Any update on this? 

 NEW_LEADER should be queued not sent to match the Zab 1.0 protocol on the 
 twiki
 ---

 Key: ZOOKEEPER-1136
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1136
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Benjamin Reed
Assignee: Benjamin Reed
Priority: Blocker
 Fix For: 3.3.4, 3.4.0

 Attachments: ZOOKEEPER-1136.patch, ZOOKEEPER-1136.patch


 the NEW_LEADER message was sent at the beginning of the sync phase in Zab 
 pre1.0, but it must be at the end in Zab 1.0. if the protocol is 1.0 or 
 greater we need to queue rather than send the packet.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (ZOOKEEPER-847) Missing acl check in zookeeper create

2011-08-30 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman reassigned ZOOKEEPER-847:


Assignee: Laxman  (was: Thomas Koch)

 Missing acl check in zookeeper create
 -

 Key: ZOOKEEPER-847
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-847
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
Reporter: Patrick Datko
Assignee: Laxman

 I watched the source of the zookeeper class and I missed an acl check in the 
 asynchronous version of the create operation. Is there any reason, that in 
 the asynch version is no
 check whether the acl is valid, or did someone forget to implement it. It's 
 interesting because we worked on a refactoring of the zookeeper client and 
 don't want to implement a bug.
 The following code is missing:
 if (acl != null  acl.size() == 0) {
 throw new KeeperException.InvalidACLException();
 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (ZOOKEEPER-851) ZK lets any node to become an observer

2011-08-30 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman reassigned ZOOKEEPER-851:


Assignee: Laxman

 ZK lets any node to become an observer
 --

 Key: ZOOKEEPER-851
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.1
Reporter: Vishal Kher
Assignee: Laxman
Priority: Critical
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-851.patch


 I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as 
 show below:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 I wanted to add another node to the cluster. In fourth node's zoo.cfg, I 
 created another entry for that node and started zk server. The zoo.cfg on the 
 first 3 nodes was left unchanged. The fourth node was able to join the 
 cluster even though the 3 nodes had no idea about the fourth node.
 zoo.cfg on fourth node:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 server.3=10.17.117.71:2888:3888
 It looks like 10.17.117.71 is becoming an observer in this case. I was 
 expecting that the leader will reject 10.17.117.71.
 # telnet 10.17.117.71 2181
 Trying 10.17.117.71...
 Connected to 10.17.117.71.
 Escape character is '^]'.
 stat
 Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT
 Clients:
  /10.17.117.71:37297[1](queued=0,recved=1,sent=0)
 Latency min/avg/max: 0/0/0
 Received: 3
 Sent: 2
 Outstanding: 0
 Zxid: 0x20065
 Mode: follower
 Node count: 288

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-851) ZK lets any node to become an observer

2011-08-30 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman updated ZOOKEEPER-851:
-

Attachment: ZOOKEEPER-851.patch

 ZK lets any node to become an observer
 --

 Key: ZOOKEEPER-851
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.1
Reporter: Vishal Kher
Assignee: Laxman
Priority: Critical
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-851.patch


 I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as 
 show below:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 I wanted to add another node to the cluster. In fourth node's zoo.cfg, I 
 created another entry for that node and started zk server. The zoo.cfg on the 
 first 3 nodes was left unchanged. The fourth node was able to join the 
 cluster even though the 3 nodes had no idea about the fourth node.
 zoo.cfg on fourth node:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 server.3=10.17.117.71:2888:3888
 It looks like 10.17.117.71 is becoming an observer in this case. I was 
 expecting that the leader will reject 10.17.117.71.
 # telnet 10.17.117.71 2181
 Trying 10.17.117.71...
 Connected to 10.17.117.71.
 Escape character is '^]'.
 stat
 Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT
 Clients:
  /10.17.117.71:37297[1](queued=0,recved=1,sent=0)
 Latency min/avg/max: 0/0/0
 Received: 3
 Sent: 2
 Outstanding: 0
 Zxid: 0x20065
 Mode: follower
 Node count: 288

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-847) Missing acl check in zookeeper create

2011-08-30 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman updated ZOOKEEPER-847:
-

Attachment: ZOOKEEPER-847.patch

 Missing acl check in zookeeper create
 -

 Key: ZOOKEEPER-847
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-847
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
Reporter: Patrick Datko
Assignee: Laxman
 Attachments: ZOOKEEPER-847.patch


 I watched the source of the zookeeper class and I missed an acl check in the 
 asynchronous version of the create operation. Is there any reason, that in 
 the asynch version is no
 check whether the acl is valid, or did someone forget to implement it. It's 
 interesting because we worked on a refactoring of the zookeeper client and 
 don't want to implement a bug.
 The following code is missing:
 if (acl != null  acl.size() == 0) {
 throw new KeeperException.InvalidACLException();
 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1140) server shutdown is not stopping threads

2011-08-30 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093560#comment-13093560
 ] 

Laxman commented on ZOOKEEPER-1140:
---

Thanks for review and commit Mahadev.

 server shutdown is not stopping threads
 ---

 Key: ZOOKEEPER-1140
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1140
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.0
Reporter: Patrick Hunt
Assignee: Laxman
Priority: Blocker
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1140.patch


 Near the end of QuorumZxidSyncTest there are tons of threads running - 115 
 ProcessThread threads, similar numbers of SessionTracker.
 Also I see ~100 ReadOnlyRequestProcessor - why is this running as a separate 
 thread? (henry/flavio?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Failed: ZOOKEEPER-851 PreCommit Build #479

2011-08-30 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-851
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/479/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 44938 lines...]
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12492214/ZOOKEEPER-851.patch
 [exec]   against trunk revision 1163106.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] -1 javac.  The applied patch generated 11 javac compiler 
warnings (more than the trunk's current 10 warnings).
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 1 new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/479//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/479//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/479//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] I65qc2Tmvb logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1450:
 exec returned: 4

Total time: 7 minutes 34 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-851
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
20 tests failed.
REGRESSION:  org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testQuorum

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.


REGRESSION:  org.apache.zookeeper.test.AsyncHammerTest.testHammer

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.


REGRESSION:  org.apache.zookeeper.test.AsyncTest.testAsync

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.


REGRESSION:  org.apache.zookeeper.test.CnxManagerTest.testWorkerThreads

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.


REGRESSION:  

[jira] [Commented] (ZOOKEEPER-851) ZK lets any node to become an observer

2011-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093564#comment-13093564
 ] 

Hadoop QA commented on ZOOKEEPER-851:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12492214/ZOOKEEPER-851.patch
  against trunk revision 1163106.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 11 javac compiler warnings (more 
than the trunk's current 10 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/479//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/479//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/479//console

This message is automatically generated.

 ZK lets any node to become an observer
 --

 Key: ZOOKEEPER-851
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-851
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.1
Reporter: Vishal Kher
Assignee: Laxman
Priority: Critical
 Fix For: 3.4.0, 3.5.0

 Attachments: ZOOKEEPER-851.patch


 I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as 
 show below:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 I wanted to add another node to the cluster. In fourth node's zoo.cfg, I 
 created another entry for that node and started zk server. The zoo.cfg on the 
 first 3 nodes was left unchanged. The fourth node was able to join the 
 cluster even though the 3 nodes had no idea about the fourth node.
 zoo.cfg on fourth node:
 tickTime=2000
 dataDir=/var/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.0=10.150.27.61:2888:3888
 server.1=10.150.27.62:2888:3888
 server.2=10.150.27.63:2888:3888
 server.3=10.17.117.71:2888:3888
 It looks like 10.17.117.71 is becoming an observer in this case. I was 
 expecting that the leader will reject 10.17.117.71.
 # telnet 10.17.117.71 2181
 Trying 10.17.117.71...
 Connected to 10.17.117.71.
 Escape character is '^]'.
 stat
 Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT
 Clients:
  /10.17.117.71:37297[1](queued=0,recved=1,sent=0)
 Latency min/avg/max: 0/0/0
 Received: 3
 Sent: 2
 Outstanding: 0
 Zxid: 0x20065
 Mode: follower
 Node count: 288

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Success: ZOOKEEPER-847 PreCommit Build #480

2011-08-30 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-847
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/480/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 208507 lines...]
 [exec] BUILD SUCCESSFUL
 [exec] Total time: 0 seconds
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] +1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12492215/ZOOKEEPER-847.patch
 [exec]   against trunk revision 1163106.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 12 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/480//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/480//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/480//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 5S87Zk49Dl logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 24 minutes 47 seconds
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-847
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed


[jira] [Commented] (ZOOKEEPER-847) Missing acl check in zookeeper create

2011-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093568#comment-13093568
 ] 

Hadoop QA commented on ZOOKEEPER-847:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12492215/ZOOKEEPER-847.patch
  against trunk revision 1163106.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/480//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/480//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/480//console

This message is automatically generated.

 Missing acl check in zookeeper create
 -

 Key: ZOOKEEPER-847
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-847
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1, 3.3.2, 3.3.3
Reporter: Patrick Datko
Assignee: Laxman
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-847.patch


 I watched the source of the zookeeper class and I missed an acl check in the 
 asynchronous version of the create operation. Is there any reason, that in 
 the asynch version is no
 check whether the acl is valid, or did someone forget to implement it. It's 
 interesting because we worked on a refactoring of the zookeeper client and 
 don't want to implement a bug.
 The following code is missing:
 if (acl != null  acl.size() == 0) {
 throw new KeeperException.InvalidACLException();
 }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1140) server shutdown is not stopping threads

2011-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093647#comment-13093647
 ] 

Hudson commented on ZOOKEEPER-1140:
---

Integrated in ZooKeeper-trunk #1288 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/1288/])
ZOOKEEPER-1140. server shutdown is not stopping threads. (laxman via 
mahadev)

mahadev : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1163102
Files : 
* /zookeeper/trunk/CHANGES.txt
* 
/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
* 
/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
* 
/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/ReadOnlyZooKeeperServer.java


 server shutdown is not stopping threads
 ---

 Key: ZOOKEEPER-1140
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1140
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.0
Reporter: Patrick Hunt
Assignee: Laxman
Priority: Blocker
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1140.patch


 Near the end of QuorumZxidSyncTest there are tons of threads running - 115 
 ProcessThread threads, similar numbers of SessionTracker.
 Also I see ~100 ReadOnlyRequestProcessor - why is this running as a separate 
 thread? (henry/flavio?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1051) SIGPIPE in Zookeeper 0.3.* when send'ing after cluster disconnection

2011-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093644#comment-13093644
 ] 

Hudson commented on ZOOKEEPER-1051:
---

Integrated in ZooKeeper-trunk #1288 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/1288/])
ZOOKEEPER-1051. SIGPIPE in Zookeeper 0.3.* when send'ing after cluster 
disconnection (Stephen Tyree via mahadev)

mahadev : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1163106
Files : 
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/c/src/zookeeper.c


 SIGPIPE in Zookeeper 0.3.* when send'ing after cluster disconnection
 

 Key: ZOOKEEPER-1051
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1051
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.2, 3.3.3, 3.4.0
Reporter: Stephen Tyree
Assignee: Stephen Tyree
Priority: Minor
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1051.patch, ZOOKEEPER-1051.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 In libzookeeper_mt, if your process is going rather slowly (such as when 
 running it in Valgrind's Memcheck) or you are using gdb with breakpoints, you 
 can occasionally get SIGPIPE when trying to send a message to the cluster. 
 For example:
 ==12788==
 ==12788== Process terminating with default action of signal 13 (SIGPIPE)
 ==12788==at 0x3F5180DE91: send (in /lib64/libpthread-2.5.so)
 ==12788==by 0x7F060AA: ??? (in /usr/lib64/libzookeeper_mt.so.2.0.0)
 ==12788==by 0x7F06E5B: zookeeper_process (in 
 /usr/lib64/libzookeeper_mt.so.2.0.0)
 ==12788==by 0x7F0D38E: ??? (in /usr/lib64/libzookeeper_mt.so.2.0.0)
 ==12788==by 0x3F5180673C: start_thread (in /lib64/libpthread-2.5.so)
 ==12788==by 0x3F50CD3F6C: clone (in /lib64/libc-2.5.so)
 ==12788==
 This is probably not the behavior we would like, since we handle server 
 disconnections after a failed call to send. To fix this, there are a few 
 options we could use. For BSD environments, we can tell a socket to never 
 send SIGPIPE with send using setsockopt:
 setsockopt(sd, SOL_SOCKET, SO_NOSIGPIPE, (void *)set, sizeof(int));
 For Linux environments, we can add a MSG_NOSIGNAL flag to every send call 
 that says to not send SIGPIPE on a bad file descriptor.
 For more information, see: 
 http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-them-properly

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-999) Create an package integration project

2011-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093645#comment-13093645
 ] 

Hudson commented on ZOOKEEPER-999:
--

Integrated in ZooKeeper-trunk #1288 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/1288/])
ZOOKEEPER-999. Create an package integration project (Eric Yang via phunt)

phunt : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1163015
Files : 
* /zookeeper/trunk/CHANGES.txt


 Create an package integration project
 -

 Key: ZOOKEEPER-999
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-999
 Project: ZooKeeper
  Issue Type: New Feature
  Components: build
 Environment: Java 6, RHEL/Ubuntu
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-999-1.patch, ZOOKEEPER-999-10.patch, 
 ZOOKEEPER-999-11.patch, ZOOKEEPER-999-12.patch, ZOOKEEPER-999-13.patch, 
 ZOOKEEPER-999-2.patch, ZOOKEEPER-999-3.patch, ZOOKEEPER-999-4.patch, 
 ZOOKEEPER-999-5.patch, ZOOKEEPER-999-6.patch, ZOOKEEPER-999-7.patch, 
 ZOOKEEPER-999-8.patch, ZOOKEEPER-999-9.patch, ZOOKEEPER-999.patch


 This goal of this ticket is to generate a set of RPM/debian package which 
 integrate well with RPM sets created by HADOOP-6255.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (ZOOKEEPER-1153) Deprecate AuthFLE and LE

2011-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093646#comment-13093646
 ] 

Hudson commented on ZOOKEEPER-1153:
---

Integrated in ZooKeeper-trunk #1288 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/1288/])
ZOOKEEPER-1153. Deprecate AuthFLE and LE. (Flavio Junqueira via mahadev)

mahadev : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1163099
Files : 
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/docs/bookkeeperConfig.pdf
* /zookeeper/trunk/docs/bookkeeperOverview.pdf
* /zookeeper/trunk/docs/bookkeeperProgrammer.pdf
* /zookeeper/trunk/docs/bookkeeperStarted.pdf
* /zookeeper/trunk/docs/bookkeeperStream.pdf
* /zookeeper/trunk/docs/index.pdf
* /zookeeper/trunk/docs/javaExample.pdf
* /zookeeper/trunk/docs/linkmap.pdf
* /zookeeper/trunk/docs/recipes.pdf
* /zookeeper/trunk/docs/releasenotes.pdf
* /zookeeper/trunk/docs/zookeeperAdmin.html
* /zookeeper/trunk/docs/zookeeperAdmin.pdf
* /zookeeper/trunk/docs/zookeeperHierarchicalQuorums.pdf
* /zookeeper/trunk/docs/zookeeperInternals.pdf
* /zookeeper/trunk/docs/zookeeperJMX.pdf
* /zookeeper/trunk/docs/zookeeperObservers.pdf
* /zookeeper/trunk/docs/zookeeperOver.pdf
* /zookeeper/trunk/docs/zookeeperProgrammers.pdf
* /zookeeper/trunk/docs/zookeeperQuotas.pdf
* /zookeeper/trunk/docs/zookeeperStarted.pdf
* /zookeeper/trunk/docs/zookeeperTutorial.pdf
* /zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml
* 
/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/AuthFastLeaderElection.java
* 
/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LeaderElection.java
* 
/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java


 Deprecate AuthFLE and LE
 

 Key: ZOOKEEPER-1153
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1153
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.3.3
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-1153.patch, ZOOKEEPER-1153.patch


 I propose we mark these as deprecated in 3.4.0 and remove them in the 
 following release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




RE: How zab avoid split-brain problem?

2011-08-30 Thread Alexander Shraer
Hi Peter,

It's the second option. The servers don't know if the leader failed or 
was partitioned from them. So each group of 3 servers in your scenario
can't distinguish the situation from another scenario where none of the servers
failed but these 3 servers are partitioned from the other 4. To prevent a split 
brain
in an asynchronous network a leader must have the support of a quorum.

Alex

 -Original Message-
 From: cheetah [mailto:xuw...@gmail.com]
 Sent: Tuesday, August 30, 2011 12:23 AM
 To: dev@zookeeper.apache.org
 Subject: How zab avoid split-brain problem?
 
 Hi folks,
 I am reading the zab paper, but a bit confusing how zab handle
 split
 brain problem.
 Suppose there are A, B, C, D, E, F and G seven servers, now A is
 the
 leader. When A dies and at the same time, B,C,D are isolated from E, F
 and
 G.
  In this case, will Zab continue working like this: if BCD and
 EFG,
 so the two groups are both voting and electing B and E as their leaders
 separately. Thus, there is a split brain problem.
  Or Zookeeper just stop working, because there were original 7
 servers,
 after 1 failure, a new leader still expects to have a quorum of 3
 servers
 voting for it as the leader. And because the two groups are separate
 from
 each other, no leader can be elected out.
 
   If it is the first case, Zookeeper will have a split brain
 problem,
 which probably is not the case. But in the second case, a 7-node
 Zookeeper
 service can only handle a node failure and a network partition failure.
 
  Am I understanding wrongly? Looking forward to your insights.
 
 Thanks,
 Peter


[jira] [Commented] (ZOOKEEPER-706) large numbers of watches can cause session re-establishment to fail

2011-08-30 Thread Eric Hwang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094136#comment-13094136
 ] 

Eric Hwang commented on ZOOKEEPER-706:
--

Any idea if the jute.maxbuffer setting needs to be applied to both server and 
client? or just client?

 large numbers of watches can cause session re-establishment to fail
 ---

 Key: ZOOKEEPER-706
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-706
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client, java client
Affects Versions: 3.1.2, 3.2.2, 3.3.0
Reporter: Patrick Hunt
Priority: Critical
 Fix For: 3.5.0


 If a client sets a large number of watches the set watches operation during 
 session re-establishment can fail.
 for example:
  WARN  [NIOServerCxn.Factory:22801:NIOServerCnxn@417] - Exception causing 
 close of session 0xe727001201a4ee7c due to java.io.IOException: Len error 
 4348380
 in this case the client was a web monitoring app and had set both data and 
 child watches on  32k znodes.
 there are two issues I see here we need to fix:
 1) handle this case properly (split up the set watches into multiple calls I 
 guess...)
 2) the session should have expired after the timeout. however we seem to 
 consider any message from the client as re-setting the expiration on the 
 server side. Probably we should only consider messages from the client that 
 are sent during an established session, otherwise we can see this situation 
 where the session is not established however the session is not expired 
 either. Perhaps we should create another JIRA for this particular issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: How zab avoid split-brain problem?

2011-08-30 Thread cheetah
Hi Alex,

Thanks for the explanation.

Then I have another question:

If there are 7 machines in my current zookeeper clusters, two of them are
failed. How can I reconfigure the Zookeeper to make it working with 5
machines? i.e if the master can get 3 machines' reply, it can commit the
transaction.

On the other hand, if I add 2 machines to make a 9 node Zookeeper cluster,
how can I configure it to make it taking advantages of 9 machines?

This is more related to user mailing list. So I cc to it.

Thanks,
Peter

On Tue, Aug 30, 2011 at 12:21 PM, Alexander Shraer shra...@yahoo-inc.comwrote:

 Hi Peter,

 It's the second option. The servers don't know if the leader failed or
 was partitioned from them. So each group of 3 servers in your scenario
 can't distinguish the situation from another scenario where none of the
 servers
 failed but these 3 servers are partitioned from the other 4. To prevent a
 split brain
 in an asynchronous network a leader must have the support of a quorum.

 Alex

  -Original Message-
  From: cheetah [mailto:xuw...@gmail.com]
  Sent: Tuesday, August 30, 2011 12:23 AM
  To: dev@zookeeper.apache.org
  Subject: How zab avoid split-brain problem?
 
  Hi folks,
  I am reading the zab paper, but a bit confusing how zab handle
  split
  brain problem.
  Suppose there are A, B, C, D, E, F and G seven servers, now A is
  the
  leader. When A dies and at the same time, B,C,D are isolated from E, F
  and
  G.
   In this case, will Zab continue working like this: if BCD and
  EFG,
  so the two groups are both voting and electing B and E as their leaders
  separately. Thus, there is a split brain problem.
   Or Zookeeper just stop working, because there were original 7
  servers,
  after 1 failure, a new leader still expects to have a quorum of 3
  servers
  voting for it as the leader. And because the two groups are separate
  from
  each other, no leader can be elected out.
 
If it is the first case, Zookeeper will have a split brain
  problem,
  which probably is not the case. But in the second case, a 7-node
  Zookeeper
  service can only handle a node failure and a network partition failure.
 
   Am I understanding wrongly? Looking forward to your insights.
 
  Thanks,
  Peter



RE: How zab avoid split-brain problem?

2011-08-30 Thread Alexander Shraer
Hi Peter,

We're currently working on adding dynamic reconfiguration functionality to 
Zookeeper. I hope that it will get in to the next release of ZK (after 3.4). 
With this you'll just run a new zk command to add/remove any servers, change 
ports, change roles (followers/observers), etc.

Currently, membership is determined by the config file so the only way of doing 
this is rolling restart. This means that you change configuration files and 
bounce the servers back. You should do it in a way that guarantees that at any 
time any quorum of the servers that are up intersects with any quorum of the 
old configuration (otherwise you might lose data). For example, if you're going 
from (A, B, C) to (A, B, C, D, E), it is possible that A and B have the latest 
data whereas C is falling behind (ZK stores data on a quorum), so if you just 
change the config files of A, B, C to say that they are part of the larger 
configuration then C might be elected with the support of D and E and you might 
lose data. So in this case you'll have to first add D, and later add E, this 
way the quorums intersect. Same thing when removing servers.

Alex

 -Original Message-
 From: cheetah [mailto:xuw...@gmail.com]
 Sent: Tuesday, August 30, 2011 3:36 PM
 To: dev@zookeeper.apache.org
 Cc: u...@zookeeper.apache.org
 Subject: Re: How zab avoid split-brain problem?
 
 Hi Alex,
 
 Thanks for the explanation.
 
 Then I have another question:
 
 If there are 7 machines in my current zookeeper clusters, two of them
 are
 failed. How can I reconfigure the Zookeeper to make it working with 5
 machines? i.e if the master can get 3 machines' reply, it can commit
 the
 transaction.
 
 On the other hand, if I add 2 machines to make a 9 node Zookeeper
 cluster,
 how can I configure it to make it taking advantages of 9 machines?
 
 This is more related to user mailing list. So I cc to it.
 
 Thanks,
 Peter
 
 On Tue, Aug 30, 2011 at 12:21 PM, Alexander Shraer shralex@yahoo-
 inc.comwrote:
 
  Hi Peter,
 
  It's the second option. The servers don't know if the leader failed
 or
  was partitioned from them. So each group of 3 servers in your
 scenario
  can't distinguish the situation from another scenario where none of
 the
  servers
  failed but these 3 servers are partitioned from the other 4. To
 prevent a
  split brain
  in an asynchronous network a leader must have the support of a
 quorum.
 
  Alex
 
   -Original Message-
   From: cheetah [mailto:xuw...@gmail.com]
   Sent: Tuesday, August 30, 2011 12:23 AM
   To: dev@zookeeper.apache.org
   Subject: How zab avoid split-brain problem?
  
   Hi folks,
   I am reading the zab paper, but a bit confusing how zab handle
   split
   brain problem.
   Suppose there are A, B, C, D, E, F and G seven servers, now A
 is
   the
   leader. When A dies and at the same time, B,C,D are isolated from
 E, F
   and
   G.
In this case, will Zab continue working like this: if BCD
 and
   EFG,
   so the two groups are both voting and electing B and E as their
 leaders
   separately. Thus, there is a split brain problem.
Or Zookeeper just stop working, because there were original 7
   servers,
   after 1 failure, a new leader still expects to have a quorum of 3
   servers
   voting for it as the leader. And because the two groups are
 separate
   from
   each other, no leader can be elected out.
  
 If it is the first case, Zookeeper will have a split brain
   problem,
   which probably is not the case. But in the second case, a 7-node
   Zookeeper
   service can only handle a node failure and a network partition
 failure.
  
Am I understanding wrongly? Looking forward to your insights.
  
   Thanks,
   Peter
 


Re: NodeExistsException when creating a znode with sequential and ephemeral mode

2011-08-30 Thread Alex
Camille

We applied the patch (ZOOKEEPER-1046-for333) to our SUT.
There was no error.

Thanks

alex

2011년 8월 30일 오전 11:19, 박영근(Alex) alex.p...@nexr.com님의 말:

 We used 3.3.3.

 We will check out the latest code.

 Thanks Camille.

 Alex


 2011/8/30 Camille Fournier cami...@apache.org

 More specifically, we fixed this for the upcoming release:
 https://issues.apache.org/jira/browse/ZOOKEEPER-1046

 You can try checking out the latest code and building it, should fix
 your error. I believe 3.3.4 will be released in a week or two.

 c

 On Mon, Aug 29, 2011 at 9:56 PM, Camille Fournier cami...@apache.org
 wrote:
  What version of ZK were you using?
 
  On Mon, Aug 29, 2011 at 9:50 PM, 박영근(Alex) alex.p...@nexr.com wrote:
  Hi, all
 
  I met a problem of NodeExistsException when creating a znode with
 sequential
  and ephemeral mode.
  the number of total test was 6442314 and 797 errors had occurred.
 
  The related log message is as in the following:
  2011-08-27 16:26:17,559 - INFO
  [ProcessThread:-1:PrepRequestProcessor@407][]
  - Got user-level KeeperException when processing
 sessionid:0x2320911802a0002
  type:create cxid:0x1246d7 zxid:0xfffe txntype:unknown
  reqpath:n/a Error
  Path:/NexR/MasteElection/__rwLock/readLock-lssm07-0005967078
  Error:KeeperErrorCode = NodeExists for
  /NexR/MasteElection/__rwLock/readLock-lssm07-0005967078
 
  The sequential number would be created by increasing parent's Cversion
 in
  the PrepRequestProcess.
  So, I guess that this problem was caused by inconsistency of parent
 znode.
 
  Our test scenario is very aggressive:
  The grinder agent sends a request of creating a znode of
  CreateMode. SEQUENTIAL_EPHEMERAL.
  three number of servers compose ensemble.
  each NIC of server is down and up repeatedly;
  NIC of server1 become down every one minute and sleeping for 9 seconds,
 then
  up
  NIC of server2 become down every 2 minute and sleeping for 9 seconds,
 then
  up
  NIC of server3 become down every 3 minute and sleeping for 9 seconds,
 then
  up
 
  while the probability of error occurrence is 0.0001 as mentioned above,
  if the ZooKeeper cannot guarantee the consistency, it is  a fatal.
 
  Is there any idea or related issue?
 
  thanks in advance.
 
  alex.
 
 





[jira] [Created] (BOOKKEEPER-61) BufferedChannel read endless when the remaining bytes of file is less than the capacity of read buffer

2011-08-30 Thread Sijie Guo (JIRA)
BufferedChannel read endless when the remaining bytes of file is less than the 
capacity of read buffer
--

 Key: BOOKKEEPER-61
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-61
 Project: Bookkeeper
  Issue Type: Bug
  Components: bookkeeper-server
Affects Versions: 3.4.0
Reporter: Sijie Guo


If last record in entry log file is truncated (length of data is short than 
expected length), bookie went into infinite loop on reading this record.

A truncated record can be caused in following cases:
1) bookie server is killed during bookie restart to relay logs.
2) bookie server is killed when bookie does adding entry operation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (BOOKKEEPER-61) BufferedChannel read endless when the remaining bytes of file is less than the capacity of read buffer

2011-08-30 Thread Sijie Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-61:


Attachment: bookkeeper-61.patch

return number of bytes been read when reach the end of file

 BufferedChannel read endless when the remaining bytes of file is less than 
 the capacity of read buffer
 --

 Key: BOOKKEEPER-61
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-61
 Project: Bookkeeper
  Issue Type: Bug
  Components: bookkeeper-server
Affects Versions: 3.4.0
Reporter: Sijie Guo
 Attachments: bookkeeper-61.patch


 If last record in entry log file is truncated (length of data is short than 
 expected length), bookie went into infinite loop on reading this record.
 A truncated record can be caused in following cases:
 1) bookie server is killed during bookie restart to relay logs.
 2) bookie server is killed when bookie does adding entry operation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (BOOKKEEPER-56) Race condition of message handler in connection recovery in Hedwig client

2011-08-30 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093662#comment-13093662
 ] 

Ivan Kelly commented on BOOKKEEPER-56:
--

It seems inelegant to have to look up the delivery handler every time, when the 
message has already arrived in an object which can know how to deliver it. 
Perhaps we could add a package private method on HedwigSubscriber, called 
restartDelivery, which gets the handler from the hashmap and sets it in the 
response handler. In this case, the patch wouldn't modify the response handler 
at all, just how the reconnect callback sets it.

The correct behaviour in this case is that the reconnect callback should not be 
able to overwrite the message handler. I think it is also valid to broaden this 
to say that noone should ever be able to overwrite the message handler, as this 
would indicate that startDelivery had been called twice without stopDelivery 
being called in between, which would indicate a programming error on the part 
of the client. 

There are tabs in the patch. For BK/HW the standard is 4 space indentation.

 Race condition of message handler in connection recovery in Hedwig client
 -

 Key: BOOKKEEPER-56
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-56
 Project: Bookkeeper
  Issue Type: Bug
  Components: hedwig-client
Affects Versions: 3.4.0
Reporter: Gavin Li
 Fix For: 3.4.0

 Attachments: patch_56


 There's a race condition in the connection recovery logic in Hedwig client. 
 The message handler user set might be overwritten incorrectly. 
 When handling channelDisconnected event, we try to reconnect to Hedwig 
 server. After the connection is created and subscribed, we'll call 
 StartDelivery() to recover the message handler to the original one of the 
 disconnected connection. But if during this process, user calls 
 StartDelivery() to set a new message handler, it will get overwritten to the 
 original one.
 The process can be demonstrated as below:
 main thread__netty worker thread
 __
 StartDelivery(messageHandlerA)
 (connection Broken here, and recovered later...)
 ResponseHandler::channelDisconnected()
(connection disconnected event received)
 new 
 SubscribeReconnectCallback(subHandler.getMessageHandler()) (store 
 messageHandlerA in SubscribeReconnectCallback to recover later)
 client.doConnect() (try reconnect)
 doSubUnsub() (resubscribe)
 SubscriberResponseHandler::handleSubscribeResponse()
   (subscription succeeds)
 StartDelivery(messageHandlderB)
 SubscribeReconnectCallback::operationFinished()
 StartDelvery(messageHandlerA)   
 (messageHandler get overwritten)
 I can stably reproduce this by simulating this race condition by put some 
 sleep in ResponseHandler.
 I think essentially speaking we should not store messageHandler in 
 ResponseHandler, since the message handler is supposed to be bound to 
 connection. Instead, no matter which connection is in use, we should use the 
 same messageHandler, the one user set last time. So I think we should change 
 to store messageHandler in the HedwigSubscriber, in this way we don't need to 
 recover the handler in connection recovery and thus won't face this race 
 condition.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira