date:20121012

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474893#comment-13474893
]

Flavio Junqueira edited comment on ZOOKEEPER-1549 at 10/12/12 8:44 AM:
---

I don't think major changes are needed, at least for the leader case. We simply
shouldn't be taking snapshots over uncommitted state. Check ZOOKEEPER-1558 and
ZOOKEEPER-1559, subtasks of this jira.

was (Author: fpj):
I don't think major changes are needed, at least for the leader case. We
simply shouldn't be taking snapshots over uncommitted state. Check
ZOOKEEPER-1558 and ZOOKEEPER-1559, a subtask of this jira.

Data inconsistency when follower is receiving a DIFF with a dirty snapshot
--

Key: ZOOKEEPER-1549
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
Project: ZooKeeper
Issue Type: Bug
Components: quorum
Affects Versions: 3.4.3
Reporter: Jacky007
Priority: Critical
Attachments: case.patch

the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is
not correct.
here is scenario(similar to 1154):
Initial Condition
1.Lets say there are three nodes in the ensemble A,B,C with A being the
leader
2.The current epoch is 7.
3.For simplicity of the example, lets say zxid is a two digit number,
with epoch being the first digit.
4.The zxid is 73
5.All the nodes have seen the change 73 and have persistently logged it.
Step 1
Request with zxid 74 is issued. The leader A writes it to the log but there
is a crash of the entire ensemble and B,C never write the change 74 to their
log.
Step 2
A,B restart, A is elected as the new leader, and A will load data and take a
clean snapshot(change 74 is in it), then send diff to B, but B died before
sync with A. A died later.
Step 3
B,C restart, A is still down
B,C form the quorum
B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
epoch is now 8, zxid is 80
Request with zxid 81 is successful. On B, minCommitLog is now 71,
maxCommitLog is 81
Step 4
A starts up. It applies the change in request with zxid 74 to its in-memory
data tree
A contacts B to registerAsFollower and provides 74 as its ZxId
Since 71=74=81, B decides to send A the diff.
Problem:
The problem with the above sequence is that after truncate the log, A will
load the snapshot again which is not correct.
In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874),
the leader will send a snapshot to follower, it will not be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1549) Data inconsistency when follower is receiving a DIFF with a dirty snapshot

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474893#comment-13474893
]

Flavio Junqueira commented on ZOOKEEPER-1549:
-

I don't think major changes are needed, at least for the leader case. We simply
shouldn't be taking snapshots over uncommitted state. Check ZOOKEEPER-1558 and
ZOOKEEPER-1559, a subtask of this jira.

Data inconsistency when follower is receiving a DIFF with a dirty snapshot
--

[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes

2012-10-12 Thread Jacky007 (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474981#comment-13474981
 ] 

Jacky007 commented on ZOOKEEPER-1560:
-

I think this would work for both 1560 and 1561.
{noformat}
 if (p != null) {
updateLastSend();
if ((p.requestHeader != null) 
(p.requestHeader.getType() != OpCode.ping) 
(p.requestHeader.getType() != OpCode.auth)) {
p.requestHeader.setXid(cnxn.getXid());
}
p.createBB();
ByteBuffer pbb = p.bb;
  ---   while (pbb.hasRemaining()) sock.write(pbb);
  ---   outgoingQueue.removeFirstOccurrence(p);
sentCount++;
if (p.requestHeader != null
 p.requestHeader.getType() != OpCode.ping
 p.requestHeader.getType() != OpCode.auth) {
pending.add(p);
}
   }
{noformat}

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Failed: ZOOKEEPER-1560 PreCommit Build #1215

Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1215/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 172814 lines...]
 [exec] 
 [exec] 
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12548889/zookeeper-1560-v4.txt
 [exec]   against trunk revision 1391526.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1215//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1215//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1215//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] b22x2sL361 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1568:
 exec returned: 1

Total time: 24 minutes 12 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1560
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
2 tests failed.
FAILED:  org.apache.zookeeper.test.ChrootClientTest.testLargeNodeData

Error Message:
KeeperErrorCode = ConnectionLoss for /large

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /large
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:61)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)


FAILED:  org.apache.zookeeper.test.ClientTest.testLargeNodeData

Error Message:
KeeperErrorCode = ConnectionLoss for /large

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /large
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:61)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)

[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475014#comment-13475014
 ] 

Hadoop QA commented on ZOOKEEPER-1560:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12548889/zookeeper-1560-v4.txt
  against trunk revision 1391526.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1215//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1215//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1215//console

This message is automatically generated.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Jenkins build is back to stable : bookkeeper-trunk » bookkeeper-server #750

See 
https://builds.apache.org/job/bookkeeper-trunk/org.apache.bookkeeper$bookkeeper-server/750/

Jenkins build is still unstable: bookkeeper-trunk » hedwig-server #750

See 
https://builds.apache.org/job/bookkeeper-trunk/org.apache.bookkeeper$hedwig-server/750/

[jira] [Updated] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1560:
--

Attachment: zookeeper-1560-v5.txt

From 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1215//testReport/org.apache.zookeeper.test/ClientTest/testLargeNodeData/
 :
{code}
2012-10-12 14:10:50,042 [myid:] - WARN  
[main-SendThread(localhost:11221):ClientCnxn$SendThread@1089] - Session 
0x13a555031cf for server localhost/127.0.0.1:11221, unexpected error, 
closing socket connection and attempting reconnect
java.io.IOException: Couldn't write 2000 bytes, 1152 bytes written
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:142)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2012-10-12 14:10:50,044 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@349] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x13a555031cf, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:662)
{code}
Patch v5 adds more information to exception message.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1560:
--

Attachment: zookeeper-1560-v6.txt

Patch v6 changes the condition for raising IOE: if there is no progress between 
successive sock.write() calls.

I guess socket's output buffer might be a limiting factor as to the number of 
bytes written in a particular sock.write() call.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Failed: ZOOKEEPER-1560 PreCommit Build #1216

Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1216/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 169973 lines...]
 [exec] 
 [exec] 
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12548893/zookeeper-1560-v5.txt
 [exec]   against trunk revision 1391526.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1216//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1216//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1216//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] vuJG8poe1s logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1568:
 exec returned: 1

Total time: 23 minutes 57 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1560
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
2 tests failed.
FAILED:  org.apache.zookeeper.test.ChrootClientTest.testLargeNodeData

Error Message:
KeeperErrorCode = ConnectionLoss for /large

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /large
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:61)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)


FAILED:  org.apache.zookeeper.test.ClientTest.testLargeNodeData

Error Message:
KeeperErrorCode = ConnectionLoss for /large

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /large
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:61)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)

[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475062#comment-13475062
 ] 

Hadoop QA commented on ZOOKEEPER-1560:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12548893/zookeeper-1560-v5.txt
  against trunk revision 1391526.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1216//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1216//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1216//console

This message is automatically generated.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-422) Simplify AbstractSubscriptionManager


[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475068#comment-13475068
 ] 

Flavio Junqueira commented on BOOKKEEPER-422:
-

It sounds like a good idea to me to use a SortedMap, Sijie. Do you see a 
problem with doing it, Stu?

It also sounds like a good idea to validate the subscriber id as you point out, 
Sijie. It should be a separate jira as you suggest.

 Simplify AbstractSubscriptionManager
 

 Key: BOOKKEEPER-422
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-422
 Project: Bookkeeper
  Issue Type: Improvement
  Components: hedwig-server
Reporter: Stu Hood
Assignee: Stu Hood
Priority: Minor
 Attachments: bk-422.diff, bk-422.diff, bk-422.diff


 It's difficult to maintain a duplicated/cached count of local subscribers, 
 and we've experienced a few issues due to it getting out of sync with the 
 actual set of subscribers. Since a count of local subscribers can be 
 calculated from the top2sub2seq map, let's do that instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (BOOKKEEPER-430) Remove manual bookie registration from overview

Flavio Junqueira created BOOKKEEPER-430:
---

 Summary: Remove manual bookie registration from overview
 Key: BOOKKEEPER-430
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-430
 Project: Bookkeeper
  Issue Type: Improvement
Affects Versions: 4.1.0
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira


The documentation suggests that a user needs to manually register a bookie, 
which is not right.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Failed: ZOOKEEPER-1560 PreCommit Build #1217

Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1217/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 169234 lines...]
 [exec] 
 [exec] 
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12548898/zookeeper-1560-v6.txt
 [exec]   against trunk revision 1391526.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1217//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1217//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1217//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] l38K6LEVny logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1568:
 exec returned: 1

Total time: 24 minutes 48 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1560
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
3 tests failed.
REGRESSION:  org.apache.zookeeper.test.LETest.testLE

Error Message:
Thread 3 got 27 expected 28

Stack Trace:
junit.framework.AssertionFailedError: Thread 3 got 27 expected 28
at org.apache.zookeeper.test.LETest.testLE(LETest.java:135)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)


FAILED:  org.apache.zookeeper.test.ChrootClientTest.testLargeNodeData

Error Message:
KeeperErrorCode = ConnectionLoss for /large

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /large
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:61)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)


FAILED:  org.apache.zookeeper.test.ClientTest.testLargeNodeData

Error Message:
KeeperErrorCode = ConnectionLoss for /large

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /large
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:61)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)

[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475079#comment-13475079
 ] 

Hadoop QA commented on ZOOKEEPER-1560:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12548898/zookeeper-1560-v6.txt
  against trunk revision 1391526.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1217//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1217//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1217//console

This message is automatically generated.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes

2012-10-12 Thread Eugene Koontz (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475085#comment-13475085
 ] 

Eugene Koontz commented on ZOOKEEPER-1560:
--

It seems like in a particular iteration, 0 bytes is written:

{code}
localhost/127.0.0.1:11222, unexpected error, closing socket connection and 
attempting reconnect
 [exec] [junit] java.io.IOException: Couldn't write 2000 bytes, 0 bytes 
written in this iteration and 77152 bytes written in total. Original limit: 
500074
 [exec] [junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:145)
 [exec] [junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:375)
 [exec] [junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
 [exec] [junit] 2012-10-12 15:20:42,629 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11222:NIOServerCnxn@349] - caught end of 
stream exception
 [exec] [junit] EndOfStreamException: Unable to read additional data 
from client sessionid 0x13a55902b650001, likely client has closed socket
 [exec] [junit] at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
 [exec] [junit] at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
 [exec] [junit] at java.lang.Thread.run(Thread.java:662)
 [exec] [junit] 2012-10-12 15:20:42,630 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11222:NIOServerCnxn@1001] - Closed socket 
connection for client /127.0.0.1:57126 which had sessionid 0x13a55902b650001
{code}

Seems like there's a strange resemblance among all the test failures thus far: 
always fails after 77152 bytes written.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (BOOKKEEPER-431) Duplicate definition of COOKIES_NODE

Flavio Junqueira created BOOKKEEPER-431:
---

 Summary: Duplicate definition of COOKIES_NODE
 Key: BOOKKEEPER-431
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-431
 Project: Bookkeeper
  Issue Type: Improvement
Affects Versions: 4.1.0
Reporter: Flavio Junqueira
Priority: Minor
 Fix For: 4.2.0


Is it necessary two definitions of COOKIES_NODE, one in cookie.java and one in 
AbstractZkLedgerManager?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1560:
--

Attachment: zookeeper-1560-v7.txt

Patch v7 changes the IOE to a warning.
Let's see if the test is able to make further progress.

I wonder whether 77152 bytes would be big enough for most use cases.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-431) Duplicate definition of COOKIES_NODE


[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475099#comment-13475099
 ] 

Flavio Junqueira commented on BOOKKEEPER-431:
-

Actually, Cookie.java defines COOKIE_NODE while AbstractZkLedgerManager defines 
COOKIES_NODE. I also noticed that AVAILABLE_NODE is duplicated. Is it for 
readability reasons? Shouldn't we have that in a single place?

 Duplicate definition of COOKIES_NODE
 

 Key: BOOKKEEPER-431
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-431
 Project: Bookkeeper
  Issue Type: Improvement
Affects Versions: 4.1.0
Reporter: Flavio Junqueira
Priority: Minor
 Fix For: 4.2.0


 Is it necessary two definitions of COOKIES_NODE, one in cookie.java and one 
 in AbstractZkLedgerManager?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (BOOKKEEPER-431) Duplicate definition of COOKIES_NODE

2012-10-12 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G reassigned BOOKKEEPER-431:
--

Assignee: Uma Maheswara Rao G

 Duplicate definition of COOKIES_NODE
 

 Key: BOOKKEEPER-431
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-431
 Project: Bookkeeper
  Issue Type: Improvement
Affects Versions: 4.1.0
Reporter: Flavio Junqueira
Assignee: Uma Maheswara Rao G
Priority: Minor
 Fix For: 4.2.0


 Is it necessary two definitions of COOKIES_NODE, one in cookie.java and one 
 in AbstractZkLedgerManager?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-431) Duplicate definition of COOKIES_NODE

2012-10-12 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475101#comment-13475101
 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-431:


How about having a constants file and maintaining all such consts at place?
If we maintain the constants inside specific files, it is very easy to 
duplicate the consts.

 Duplicate definition of COOKIES_NODE
 

 Key: BOOKKEEPER-431
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-431
 Project: Bookkeeper
  Issue Type: Improvement
Affects Versions: 4.1.0
Reporter: Flavio Junqueira
Assignee: Uma Maheswara Rao G
Priority: Minor
 Fix For: 4.2.0


 Is it necessary two definitions of COOKIES_NODE, one in cookie.java and one 
 in AbstractZkLedgerManager?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Success: ZOOKEEPER-1560 PreCommit Build #1218

Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1218/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 262995 lines...]
 [exec] BUILD SUCCESSFUL
 [exec] Total time: 0 seconds
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] +1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12548908/zookeeper-1560-v7.txt
 [exec]   against trunk revision 1391526.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1218//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1218//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1218//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] b2727V26Mo logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 27 minutes 20 seconds
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1560
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475134#comment-13475134
 ] 

Hadoop QA commented on ZOOKEEPER-1560:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12548908/zookeeper-1560-v7.txt
  against trunk revision 1391526.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1218//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1218//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1218//console

This message is automatically generated.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475146#comment-13475146
 ] 

Ted Yu commented on ZOOKEEPER-1560:
---

Good news was that patch v7 passed.
Not so good news was that I didn't find any occurrence of the warning message I 
added in v7.

Essentially patch v7 is the same as patch v2 - we shouldn't bail if a single 
sock.write() call didn't make progress.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1504) Multi-thread NIOServerCnxn


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Shrauner updated ZOOKEEPER-1504:


Attachment: ZOOKEEPER-1504.patch

Rebase

 Multi-thread NIOServerCnxn
 --

 Key: ZOOKEEPER-1504
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1504
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.3, 3.4.4, 3.5.0
Reporter: Jay Shrauner
Assignee: Jay Shrauner
  Labels: performance, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch, 
 ZOOKEEPER-1504.patch, ZOOKEEPER-1504.patch


 NIOServerCnxnFactory is single threaded, which doesn't scale well to large 
 numbers of clients. This is particularly noticeable when thousands of clients 
 connect. I propose multi-threading this code as follows:
 - 1   acceptor thread, for accepting new connections
 - 1-N selector threads
 - 0-M I/O worker threads
 Numbers of threads are configurable, with defaults scaling according to 
 number of cores. Communication with the selector threads is handled via 
 LinkedBlockingQueues, and connections are permanently assigned to a 
 particular selector thread so that all potentially blocking SelectionKey 
 operations can be performed solely by the selector thread. An ExecutorService 
 is used for the worker threads.
 On a 32 core machine running Linux 2.6.38, achieved best performance with 4 
 selector threads and 64 worker threads for a 70% +/- 5% improvement in 
 throughput.
 This patch incorporates and supersedes the patches for
 https://issues.apache.org/jira/browse/ZOOKEEPER-517
 https://issues.apache.org/jira/browse/ZOOKEEPER-1444
 New classes introduced in this patch are:
   - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from 
 SessionTrackerImpl used to expire sessions so that the same logic can be used 
 to expire connections
   - RateLogger (from ZOOKEEPER-517): rate limit error message logging, 
 currently only used to throttle rate of logging out of file descriptors 
 errors
   - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that 
 makes worker threads daemon threads and names then in an easily debuggable 
 manner. Supports assignable threads (as used by CommitProcessor) and 
 non-assignable threads (as used here).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)

2012-10-12 Thread Ted Yu

Patch v7 for ZOOKEEPER-1560 passes test suite.

Please take a look.

On Thu, Oct 11, 2012 at 2:45 PM, Mahadev Konar maha...@hortonworks.comwrote:

 Thanks Alex for bringing it up. Ill hold the release for now. I see a
 patch on 1560. Ill take a look and we'll see how to roll this into
 3.4.5.

 thanks
 mahadev

 On Thu, Oct 11, 2012 at 2:42 PM, Alexander Shraer shra...@gmail.com
 wrote:
  Hi Mahadev,
 
  ZOOKEEPER-1560 and ZOOKEEPER-1561 indicate a potentially serious issue,
  introduced recently in ZOOKEEPER-1437. Please consider this w.r.t. the
  3.4.5 release.
 
  Best Regards,
  Alex
 
  On Wed, Oct 10, 2012 at 10:38 PM, Mahadev Konar maha...@hortonworks.com
 wrote:
  I think we have waited enough. Closing the vote now.
 
  With 5 +1's (3 binding) the vote passes. I will do the needful for
  getting the release out.
 
  Thanks for voting folks.
 
  mahadev
 
  On Wed, Oct 10, 2012 at 9:04 AM, Flavio Junqueira f...@yahoo-inc.com
 wrote:
  +1
 
  -Flavio
 
  On Oct 8, 2012, at 7:05 AM, Mahadev Konar wrote:
 
  Given Eugene's findings on ZOOKEEPER-1557, I think we can continue
  rolling the current RC out. Others please vote on the thread if you
  see any issues with that. Folks who have already voted, please re vote
  in case you have a change of opinion.
 
  As for myself, I ran a couple of tests with the RC using open jdk 7
  and things seem to work.
 
  +1 from my side. Pat/Ben/Flavio/others what do you guys think?
 
  thanks
  mahadev
 
  On Sun, Oct 7, 2012 at 8:34 AM, Ted Yu yuzhih...@gmail.com wrote:
  Currently ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7
 are using
  lock ZooKeeper-solaris.
  I think ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7
 should use
  a separate lock since they wouldn't run on a Solaris machine.
  I didn't seem to find how a new lock name can be added.
 
  Recent builds for ZooKeeper_branch34_openjdk7 and
 ZooKeeper_branch34_jdk7
  have been green.
 
  Cheers
 
  On Sun, Oct 7, 2012 at 6:56 AM, Patrick Hunt ph...@apache.org
 wrote:
 
  I've seen that before, it's a flakey test that's unrelated to the
 sasl
  stuff.
 
  Patrick
 
  On Sat, Oct 6, 2012 at 2:25 PM, Ted Yu yuzhih...@gmail.com wrote:
  I saw one test failure:
 
 
 
 https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk7/9/testReport/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testHighestZxidJoinLate/
 
  FYI
 
  On Sat, Oct 6, 2012 at 7:16 AM, Ted Yu yuzhih...@gmail.com
 wrote:
 
  Up in ZOOKEEPER-1557, Eugene separated one test out and test
 failure
  seems
  to be gone.
 
  For ZooKeeper_branch34_jdk7, the two failed builds:
  #10 corresponded to ZooKeeper_branch34_openjdk7 build #7,
  #8 corresponded to ZooKeeper_branch34_openjdk7 build #5
  where tests failed due to BindException
 
  Cheers
 
 
  On Sat, Oct 6, 2012 at 7:06 AM, Patrick Hunt ph...@apache.org
 wrote:
 
  Yes. Those ubuntu machines have two slots each. If both tests
 run at
  the same time... bam.
 
  I just added exclusion locks to the configuration of these two
 jobs,
  that should help.
 
  Patrick
 
  On Fri, Oct 5, 2012 at 8:58 PM, Ted Yu yuzhih...@gmail.com
 wrote:
  I think that was due to the following running on the same
 machine at
  the
  same time:
 
  Building remotely on ubuntu4
  https://builds.apache.org/computer/ubuntu4 in workspace
 
 /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7
 
  We should introduce randomized port so that test suite can
 execute in
  parallel.
 
  Cheers
 
  On Fri, Oct 5, 2012 at 8:55 PM, Ted Yu yuzhih...@gmail.com
 wrote:
 
  Some tests failed in build 8 due to (See
 
 
 
 
 https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_jdk7/8/testReport/org.apache.zookeeper.server/ZxidRolloverTest/testRolloverThenRestart/
  ):
 
  java.lang.RuntimeException: java.net.BindException: Address
 already
  in
  use
   at
  org.apache.zookeeper.test.QuorumUtil.init(QuorumUtil.java:118)
   at
 
 
 org.apache.zookeeper.server.ZxidRolloverTest.setUp(ZxidRolloverTest.java:63)
  Caused by: java.net.BindException: Address already in use
   at sun.nio.ch.Net.bind0(Native Method)
   at sun.nio.ch.Net.bind(Net.java:344)
   at sun.nio.ch.Net.bind(Net.java:336)
   at
 
 
 sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:199)
   at
  sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
   at
  sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
   at
 
 
 org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
   at
 
 
 org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:125)
   at
 
 
 org.apache.zookeeper.server.quorum.QuorumPeer.init(QuorumPeer.java:517)
   at
  org.apache.zookeeper.test.QuorumUtil.init(QuorumUtil.java:113)
 
 
 
  On Fri, Oct 5, 2012 at 9:56 AM, Patrick Hunt ph...@apache.org
 
  wrote:
 
  fwiw: I setup jdk7 and openjdk7 jobs last night for branch34
 on

[jira] [Updated] (ZOOKEEPER-1505) Multi-thread CommitProcessor


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Shrauner updated ZOOKEEPER-1505:


Attachment: ZOOKEEPER-1505.patch

Address feedback from review--shutdown CommitProcessor if downstream processor 
throws an exception (preserves previous behavior)

 Multi-thread CommitProcessor
 

 Key: ZOOKEEPER-1505
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.3, 3.4.4, 3.5.0
Reporter: Jay Shrauner
Assignee: Jay Shrauner
  Labels: performance, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch, 
 ZOOKEEPER-1505.patch


 CommitProcessor has a single thread that both pulls requests off its queues 
 and runs all downstream processors. This is noticeably inefficient for 
 read-intensive workloads, which could be run concurrently. The trick is 
 handling write transactions. I propose multi-threading this code according to 
 the following two constraints
   - each session must see its requests responded to in order
   - all committed transactions must be handled in zxid order, across all 
 sessions
 I believe these cover the only constraints we need to honor. In particular, I 
 believe we can relax the following:
   - it does not matter if the read request in one session happens before or 
 after the write request in another session
 With these constraints, I propose the following threads
   - 1primary queue servicing/work dispatching thread
   - 0-N  assignable worker threads, where a given session is always assigned 
 to the same worker thread
 By assigning sessions always to the same worker thread (using a simple 
 sessionId mod number of worker threads), we guarantee the first constraint-- 
 requests we push onto the thread queue are processed in order. The way we 
 guarantee the second constraint is we only allow a single commit transaction 
 to be in flight at a time--the queue servicing thread blocks while a commit 
 transaction is in flight, and when the transaction completes it clears the 
 flag.
 On a 32 core machine running Linux 2.6.38, achieved best performance with 32 
 worker threads for a 56% +/- 5% improvement in throughput (this improvement 
 was measured on top of that for ZOOKEEPER-1504, not in isolation).
 New classes introduced in this patch are:
 WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that 
 makes worker threads daemon threads and names then in an easily debuggable 
 manner. Supports assignable threads (as used here) and non-assignable threads 
 (as used by NIOServerCnxnFactory).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (ZOOKEEPER-1147) Add support for local sessions

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Shrauner reassigned ZOOKEEPER-1147:
---

Assignee: Thawan Kooburat (was: Jay Shrauner)

Add support for local sessions
--

Key: ZOOKEEPER-1147
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
Project: ZooKeeper
Issue Type: Improvement
Components: server
Affects Versions: 3.3.3
Reporter: Vishal Kathuria
Assignee: Thawan Kooburat
Labels: api-change, scaling
Fix For: 3.5.0

Original Estimate: 840h
Remaining Estimate: 840h

This improvement is in the bucket of making ZooKeeper work at a large scale.
We are planning on having about a 1 million clients connect to a ZooKeeper
ensemble through a set of 50-100 observers. Majority of these clients are
read only - ie they do not do any updates or create ephemeral nodes.
In ZooKeeper today, the client creates a session and the session creation is
handled like any other update. In the above use case, the session create/drop
workload can easily overwhelm an ensemble. The following is a proposal for a
local session, to support a larger number of connections.
1. The idea is to introduce a new type of session - local session. A
local session doesn't have a full functionality of a normal session.
2. Local sessions cannot create ephemeral nodes.
3. Once a local session is lost, you cannot re-establish it using the
session-id/password. The session and its watches are gone for good.
4. When a local session connects, the session info is only maintained
on the zookeeper server (in this case, an observer) that it is connected to.
The leader is not aware of the creation of such a session and there is no
state written to disk.
5. The pings and expiration is handled by the server that the session
is connected to.
With the above changes, we can make ZooKeeper scale to a much larger number
of clients without making the core ensemble a bottleneck.
In terms of API, there are two options that are being considered
1. Let the client specify at the connect time which kind of session do they
want.
2. All sessions connect as local sessions and automatically get promoted to
global sessions when they do an operation that requires a global session
(e.g. creating an ephemeral node)
Chubby took the approach of lazily promoting all sessions to global, but I
don't think that would work in our case, where we want to keep sessions which
never create ephemeral nodes as always local. Option 2 would make it more
broadly usable but option 1 would be easier to implement.
We are thinking of implementing option 1 as the first cut. There would be a
client flag, IsLocalSession (much like the current readOnly flag) that would
be used to determine whether to create a local session or a global session.

Re: [VOTE] Release ZooKeeper 3.4.5 (candidate 0)

2012-10-12 Thread Mahadev Konar

Thanks Ted. Will review the changes over the weekend.

Thanks again
mahadev

On Fri, Oct 12, 2012 at 1:12 PM, Ted Yu yuzhih...@gmail.com wrote:
 Patch v7 for ZOOKEEPER-1560 passes test suite.

 Please take a look.

 On Thu, Oct 11, 2012 at 2:45 PM, Mahadev Konar maha...@hortonworks.comwrote:

 Thanks Alex for bringing it up. Ill hold the release for now. I see a
 patch on 1560. Ill take a look and we'll see how to roll this into
 3.4.5.

 thanks
 mahadev

 On Thu, Oct 11, 2012 at 2:42 PM, Alexander Shraer shra...@gmail.com
 wrote:
  Hi Mahadev,
 
  ZOOKEEPER-1560 and ZOOKEEPER-1561 indicate a potentially serious issue,
  introduced recently in ZOOKEEPER-1437. Please consider this w.r.t. the
  3.4.5 release.
 
  Best Regards,
  Alex
 
  On Wed, Oct 10, 2012 at 10:38 PM, Mahadev Konar maha...@hortonworks.com
 wrote:
  I think we have waited enough. Closing the vote now.
 
  With 5 +1's (3 binding) the vote passes. I will do the needful for
  getting the release out.
 
  Thanks for voting folks.
 
  mahadev
 
  On Wed, Oct 10, 2012 at 9:04 AM, Flavio Junqueira f...@yahoo-inc.com
 wrote:
  +1
 
  -Flavio
 
  On Oct 8, 2012, at 7:05 AM, Mahadev Konar wrote:
 
  Given Eugene's findings on ZOOKEEPER-1557, I think we can continue
  rolling the current RC out. Others please vote on the thread if you
  see any issues with that. Folks who have already voted, please re vote
  in case you have a change of opinion.
 
  As for myself, I ran a couple of tests with the RC using open jdk 7
  and things seem to work.
 
  +1 from my side. Pat/Ben/Flavio/others what do you guys think?
 
  thanks
  mahadev
 
  On Sun, Oct 7, 2012 at 8:34 AM, Ted Yu yuzhih...@gmail.com wrote:
  Currently ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7
 are using
  lock ZooKeeper-solaris.
  I think ZooKeeper_branch34_openjdk7 and ZooKeeper_branch34_jdk7
 should use
  a separate lock since they wouldn't run on a Solaris machine.
  I didn't seem to find how a new lock name can be added.
 
  Recent builds for ZooKeeper_branch34_openjdk7 and
 ZooKeeper_branch34_jdk7
  have been green.
 
  Cheers
 
  On Sun, Oct 7, 2012 at 6:56 AM, Patrick Hunt ph...@apache.org
 wrote:
 
  I've seen that before, it's a flakey test that's unrelated to the
 sasl
  stuff.
 
  Patrick
 
  On Sat, Oct 6, 2012 at 2:25 PM, Ted Yu yuzhih...@gmail.com wrote:
  I saw one test failure:
 
 
 
 https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_openjdk7/9/testReport/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testHighestZxidJoinLate/
 
  FYI
 
  On Sat, Oct 6, 2012 at 7:16 AM, Ted Yu yuzhih...@gmail.com
 wrote:
 
  Up in ZOOKEEPER-1557, Eugene separated one test out and test
 failure
  seems
  to be gone.
 
  For ZooKeeper_branch34_jdk7, the two failed builds:
  #10 corresponded to ZooKeeper_branch34_openjdk7 build #7,
  #8 corresponded to ZooKeeper_branch34_openjdk7 build #5
  where tests failed due to BindException
 
  Cheers
 
 
  On Sat, Oct 6, 2012 at 7:06 AM, Patrick Hunt ph...@apache.org
 wrote:
 
  Yes. Those ubuntu machines have two slots each. If both tests
 run at
  the same time... bam.
 
  I just added exclusion locks to the configuration of these two
 jobs,
  that should help.
 
  Patrick
 
  On Fri, Oct 5, 2012 at 8:58 PM, Ted Yu yuzhih...@gmail.com
 wrote:
  I think that was due to the following running on the same
 machine at
  the
  same time:
 
  Building remotely on ubuntu4
  https://builds.apache.org/computer/ubuntu4 in workspace
 
 /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7
 
  We should introduce randomized port so that test suite can
 execute in
  parallel.
 
  Cheers
 
  On Fri, Oct 5, 2012 at 8:55 PM, Ted Yu yuzhih...@gmail.com
 wrote:
 
  Some tests failed in build 8 due to (See
 
 
 
 
 https://builds.apache.org//view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_jdk7/8/testReport/org.apache.zookeeper.server/ZxidRolloverTest/testRolloverThenRestart/
  ):
 
  java.lang.RuntimeException: java.net.BindException: Address
 already
  in
  use
   at
  org.apache.zookeeper.test.QuorumUtil.init(QuorumUtil.java:118)
   at
 
 
 org.apache.zookeeper.server.ZxidRolloverTest.setUp(ZxidRolloverTest.java:63)
  Caused by: java.net.BindException: Address already in use
   at sun.nio.ch.Net.bind0(Native Method)
   at sun.nio.ch.Net.bind(Net.java:344)
   at sun.nio.ch.Net.bind(Net.java:336)
   at
 
 
 sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:199)
   at
  sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
   at
  sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
   at
 
 
 org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
   at
 
 
 org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:125)
   at
 
 
 org.apache.zookeeper.server.quorum.QuorumPeer.init(QuorumPeer.java:517)
   at
  org.apache.zookeeper.test.QuorumUtil.init(QuorumUtil.java:113)

Failed: ZOOKEEPER-1505 PreCommit Build #1219

Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1505
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1219/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 262152 lines...]
 [exec] 
 [exec] 
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12548952/ZOOKEEPER-1505.patch
 [exec]   against trunk revision 1391526.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 1 new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1219//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1219//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1219//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] vrs878qBhU logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1568:
 exec returned: 1

Total time: 27 minutes 44 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1505
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

Success: ZOOKEEPER-1504 PreCommit Build #1220

Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1504
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1220/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 266158 lines...]
 [exec] BUILD SUCCESSFUL
 [exec] Total time: 0 seconds
 [exec] 
 [exec] 
 [exec] 
 [exec] 
 [exec] +1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12548950/ZOOKEEPER-1504.patch
 [exec]   against trunk revision 1391526.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1220//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1220//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1220//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 61jfuJgRdC logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 27 minutes 38 seconds
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1504
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-1505) Multi-thread CommitProcessor


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475444#comment-13475444
 ] 

Jay Shrauner commented on ZOOKEEPER-1505:
-

Findbug warning (naked notify) is bogus; this is a helper routine to wakeup 
the main thread with the state change happening in the routines that call it.

From the blurb in findbug: This bug does not necessarily indicate an error, 
since the change to mutable object state may have taken place in a method 
which then called the method containing the notification.

 Multi-thread CommitProcessor
 

 Key: ZOOKEEPER-1505
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1505
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.3, 3.4.4, 3.5.0
Reporter: Jay Shrauner
Assignee: Jay Shrauner
  Labels: performance, scaling
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1505.patch, ZOOKEEPER-1505.patch, 
 ZOOKEEPER-1505.patch


 CommitProcessor has a single thread that both pulls requests off its queues 
 and runs all downstream processors. This is noticeably inefficient for 
 read-intensive workloads, which could be run concurrently. The trick is 
 handling write transactions. I propose multi-threading this code according to 
 the following two constraints
   - each session must see its requests responded to in order
   - all committed transactions must be handled in zxid order, across all 
 sessions
 I believe these cover the only constraints we need to honor. In particular, I 
 believe we can relax the following:
   - it does not matter if the read request in one session happens before or 
 after the write request in another session
 With these constraints, I propose the following threads
   - 1primary queue servicing/work dispatching thread
   - 0-N  assignable worker threads, where a given session is always assigned 
 to the same worker thread
 By assigning sessions always to the same worker thread (using a simple 
 sessionId mod number of worker threads), we guarantee the first constraint-- 
 requests we push onto the thread queue are processed in order. The way we 
 guarantee the second constraint is we only allow a single commit transaction 
 to be in flight at a time--the queue servicing thread blocks while a commit 
 transaction is in flight, and when the transaction completes it clears the 
 flag.
 On a 32 core machine running Linux 2.6.38, achieved best performance with 32 
 worker threads for a 56% +/- 5% improvement in throughput (this improvement 
 was measured on top of that for ZOOKEEPER-1504, not in isolation).
 New classes introduced in this patch are:
 WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that 
 makes worker threads daemon threads and names then in an easily debuggable 
 manner. Supports assignable threads (as used here) and non-assignable threads 
 (as used by NIOServerCnxnFactory).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: Multi-thread NIOServerCnxn

2012-10-12 Thread Jay Shrauner


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6256/
---

(Updated Oct. 12, 2012, 11:45 p.m.)


Review request for zookeeper and Patrick Hunt.


Changes
---

Rebase


Description
---

See https://issues.apache.org/jira/browse/ZOOKEEPER-1504


This addresses bug ZOOKEEPER-1504.
https://issues.apache.org/jira/browse/ZOOKEEPER-1504


Diffs (updated)
-

  /src/java/main/org/apache/zookeeper/server/ExpiryQueue.java PRE-CREATION 
  /src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java 1391526 
  /src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java 1391526 
  /src/java/main/org/apache/zookeeper/server/RateLogger.java PRE-CREATION 
  /src/java/main/org/apache/zookeeper/server/ServerCnxn.java 1391526 
  /src/java/main/org/apache/zookeeper/server/ServerCnxnFactory.java 1391526 
  /src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java 1391526 
  /src/java/main/org/apache/zookeeper/server/WorkerService.java PRE-CREATION 
  /src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java 1391526 
  /src/java/test/org/apache/zookeeper/test/ServerCnxnTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/6256/diff/


Testing
---


Thanks,

Jay Shrauner

Re: Review Request: Multi-thread CommitProcessor

2012-10-12 Thread Jay Shrauner


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6260/
---

(Updated Oct. 12, 2012, 11:47 p.m.)


Review request for zookeeper and Patrick Hunt.


Changes
---

Address feedback from review--shutdown CommitProcessor if downstream processor 
throws an exception (preserves previous behavior)


Description
---

See https://issues.apache.org/jira/browse/ZOOKEEPER-1505


This addresses bug ZOOKEEPER-1505.
https://issues.apache.org/jira/browse/ZOOKEEPER-1505


Diffs (updated)
-

  /src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 1391526 
  /src/java/main/org/apache/zookeeper/server/ServerCnxnFactory.java 1391526 
  /src/java/main/org/apache/zookeeper/server/WorkerService.java PRE-CREATION 
  /src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java 
1391526 
  /src/java/main/org/apache/zookeeper/server/quorum/Leader.java 1391526 
  /src/java/test/org/apache/zookeeper/server/quorum/CommitProcessorTest.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/6260/diff/


Testing
---


Thanks,

Jay Shrauner

[jira] [Created] (ZOOKEEPER-1562) Memory leaks in zoo_multi API

2012-10-12 Thread Deepak Jagtap (JIRA)

Deepak Jagtap created ZOOKEEPER-1562:


 Summary: Memory leaks in zoo_multi API
 Key: ZOOKEEPER-1562
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1562
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.4.3, 3.4.4
 Environment: Zookeeper client and server both are running on CentOS 6.3
Reporter: Deepak Jagtap
Priority: Trivial


Valgrind is reporting memory leak for zoo_multi operations.

==4056== 2,240 (160 direct, 2,080 indirect) bytes in 1 blocks are definitely 
lost in loss record 18 of 24
==4056==at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==4056==by 0x504D822: create_completion_entry (zookeeper.c:2322)
==4056==by 0x5052833: zoo_amulti (zookeeper.c:3141)
==4056==by 0x5052A8B: zoo_multi (zookeeper.c:3240)

It looks like completion entries for individual operations in multiupdate 
transaction are not getting freed. My observation is that memory leak size 
depends on the number of operations in single mutlipupdate transaction

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1355) Add zk.updateServerList(newServerList)

2012-10-12 Thread Marshall McMullen (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall McMullen updated ZOOKEEPER-1355:
-

Attachment: ZOOKEEPER-1355-12-Oct.patch

This is an updated version of patch that applies cleanly to the latest tip of 
trunk.

 Add zk.updateServerList(newServerList) 
 ---

 Key: ZOOKEEPER-1355
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1355
 Project: ZooKeeper
  Issue Type: New Feature
  Components: c client, java client
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: loadbalancing-more-details.pdf, loadbalancing.pdf, 
 ZOOKEEPER-1355-10-Oct.patch, ZOOKEEPER-1355-12-Oct.patch, 
 ZOOKEEPER-1355-ver10-1.patch, ZOOKEEPER-1355-ver10-2.patch, 
 ZOOKEEPER-1355-ver10-3.patch, ZOOKEEPER-1355-ver10-4.patch, 
 ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10.patch, 
 ZOOKEEPER-1355-ver11-1.patch, ZOOKEEPER-1355-ver11.patch, 
 ZOOKEEPER-1355-ver12-1.patch, ZOOKEEPER-1355-ver12-2.patch, 
 ZOOKEEPER-1355-ver12-4.patch, ZOOKEEPER-1355-ver12.patch, 
 ZOOKEEPER-1355-ver13.patch, ZOOKEEPER-1355-ver14.patch, 
 ZOOKEEPER-1355-ver2.patch, ZOOKEEPER=1355-ver3.patch, 
 ZOOKEEPER-1355-ver4.patch, ZOOKEEPER-1355-ver5.patch, 
 ZOOKEEPER-1355-ver6.patch, ZOOKEEPER-1355-ver7.patch, 
 ZOOKEEPER-1355-ver8.patch, ZOOKEEPER-1355-ver9-1.patch, 
 ZOOKEEPER-1355-ver9.patch, ZOOOKEEPER-1355.patch, ZOOOKEEPER-1355-test.patch, 
 ZOOOKEEPER-1355-ver1.patch


 When the set of servers changes, we would like to update the server list 
 stored by clients without restarting the clients.
 Moreover, assuming that the number of clients per server is the same (in 
 expectation) in the old configuration (as guaranteed by the current list 
 shuffling for example), we would like to re-balance client connections across 
 the new set of servers in a way that a) the number of clients per server is 
 the same for all servers (in expectation) and b) there is no 
 excessive/unnecessary client migration.
 It is simple to achieve (a) without (b) - just re-shuffle the new list of 
 servers at every client. But this would create unnecessary migration, which 
 we'd like to avoid.
 We propose a simple probabilistic migration scheme that achieves (a) and (b) 
 - each client locally decides whether and where to migrate when the list of 
 servers changes. The attached document describes the scheme and shows an 
 evaluation of it in Zookeeper. We also implemented re-balancing through a 
 consistent-hashing scheme and show a comparison. We derived the probabilistic 
 migration rules from a simple formula that we can also provide, if someone's 
 interested in the proof.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Failed: ZOOKEEPER-1355 PreCommit Build #1221

Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1355
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1221/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 607 lines...]
 [exec] 
 [exec] 
 [exec] 
 [exec] -1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12548996/ZOOKEEPER-1355-12-Oct.patch
 [exec]   against trunk revision 1391526.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 34 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] -1 javac.  The patch appears to cause tar ant target to fail.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1221//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1221//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1221//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] Wzr34544w1 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1568:
 exec returned: 2

Total time: 2 minutes 1 second
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Description set: ZOOKEEPER-1355
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (ZOOKEEPER-1355) Add zk.updateServerList(newServerList)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475510#comment-13475510
]

Hadoop QA commented on ZOOKEEPER-1355:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12548996/ZOOKEEPER-1355-12-Oct.patch
against trunk revision 1391526.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 34 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The patch appears to cause tar ant target to fail.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1221//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1221//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1221//console

This message is automatically generated.

Add zk.updateServerList(newServerList)
---

Key: ZOOKEEPER-1355
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1355
Project: ZooKeeper
Issue Type: New Feature
Components: c client, java client
Reporter: Alexander Shraer
Assignee: Alexander Shraer
Fix For: 3.5.0

Attachments: loadbalancing-more-details.pdf, loadbalancing.pdf,
ZOOKEEPER-1355-10-Oct.patch, ZOOKEEPER-1355-12-Oct.patch,
ZOOKEEPER-1355-ver10-1.patch, ZOOKEEPER-1355-ver10-2.patch,
ZOOKEEPER-1355-ver10-3.patch, ZOOKEEPER-1355-ver10-4.patch,
ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10.patch,
ZOOKEEPER-1355-ver11-1.patch, ZOOKEEPER-1355-ver11.patch,
ZOOKEEPER-1355-ver12-1.patch, ZOOKEEPER-1355-ver12-2.patch,
ZOOKEEPER-1355-ver12-4.patch, ZOOKEEPER-1355-ver12.patch,
ZOOKEEPER-1355-ver13.patch, ZOOKEEPER-1355-ver14.patch,
ZOOKEEPER-1355-ver2.patch, ZOOKEEPER=1355-ver3.patch,
ZOOKEEPER-1355-ver4.patch, ZOOKEEPER-1355-ver5.patch,
ZOOKEEPER-1355-ver6.patch, ZOOKEEPER-1355-ver7.patch,
ZOOKEEPER-1355-ver8.patch, ZOOKEEPER-1355-ver9-1.patch,
ZOOKEEPER-1355-ver9.patch, ZOOOKEEPER-1355.patch, ZOOOKEEPER-1355-test.patch,
ZOOOKEEPER-1355-ver1.patch

When the set of servers changes, we would like to update the server list
stored by clients without restarting the clients.
Moreover, assuming that the number of clients per server is the same (in
expectation) in the old configuration (as guaranteed by the current list
shuffling for example), we would like to re-balance client connections across
the new set of servers in a way that a) the number of clients per server is
the same for all servers (in expectation) and b) there is no
excessive/unnecessary client migration.
It is simple to achieve (a) without (b) - just re-shuffle the new list of
servers at every client. But this would create unnecessary migration, which
we'd like to avoid.
We propose a simple probabilistic migration scheme that achieves (a) and (b)
- each client locally decides whether and where to migrate when the list of
servers changes. The attached document describes the scheme and shows an
evaluation of it in Zookeeper. We also implemented re-balancing through a
consistent-hashing scheme and show a comparison. We derived the probabilistic
migration rules from a simple formula that we can also provide, if someone's
interested in the proof.

[jira] [Updated] (ZOOKEEPER-1355) Add zk.updateServerList(newServerList)

2012-10-12 Thread Marshall McMullen (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall McMullen updated ZOOKEEPER-1355:
-

Attachment: ZOOKEEPER-1355-13-Oct.patch

Had meant to remove the Zab Test part from this patch as Alex tells me that was 
already committed to trunk under another Jira.

 Add zk.updateServerList(newServerList) 
 ---

 Key: ZOOKEEPER-1355
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1355
 Project: ZooKeeper
  Issue Type: New Feature
  Components: c client, java client
Reporter: Alexander Shraer
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: loadbalancing-more-details.pdf, loadbalancing.pdf, 
 ZOOKEEPER-1355-10-Oct.patch, ZOOKEEPER-1355-12-Oct.patch, 
 ZOOKEEPER-1355-13-Oct.patch, ZOOKEEPER-1355-ver10-1.patch, 
 ZOOKEEPER-1355-ver10-2.patch, ZOOKEEPER-1355-ver10-3.patch, 
 ZOOKEEPER-1355-ver10-4.patch, ZOOKEEPER-1355-ver10-4.patch, 
 ZOOKEEPER-1355-ver10.patch, ZOOKEEPER-1355-ver11-1.patch, 
 ZOOKEEPER-1355-ver11.patch, ZOOKEEPER-1355-ver12-1.patch, 
 ZOOKEEPER-1355-ver12-2.patch, ZOOKEEPER-1355-ver12-4.patch, 
 ZOOKEEPER-1355-ver12.patch, ZOOKEEPER-1355-ver13.patch, 
 ZOOKEEPER-1355-ver14.patch, ZOOKEEPER-1355-ver2.patch, 
 ZOOKEEPER=1355-ver3.patch, ZOOKEEPER-1355-ver4.patch, 
 ZOOKEEPER-1355-ver5.patch, ZOOKEEPER-1355-ver6.patch, 
 ZOOKEEPER-1355-ver7.patch, ZOOKEEPER-1355-ver8.patch, 
 ZOOKEEPER-1355-ver9-1.patch, ZOOKEEPER-1355-ver9.patch, 
 ZOOOKEEPER-1355.patch, ZOOOKEEPER-1355-test.patch, ZOOOKEEPER-1355-ver1.patch


 When the set of servers changes, we would like to update the server list 
 stored by clients without restarting the clients.
 Moreover, assuming that the number of clients per server is the same (in 
 expectation) in the old configuration (as guaranteed by the current list 
 shuffling for example), we would like to re-balance client connections across 
 the new set of servers in a way that a) the number of clients per server is 
 the same for all servers (in expectation) and b) there is no 
 excessive/unnecessary client migration.
 It is simple to achieve (a) without (b) - just re-shuffle the new list of 
 servers at every client. But this would create unnecessary migration, which 
 we'd like to avoid.
 We propose a simple probabilistic migration scheme that achieves (a) and (b) 
 - each client locally decides whether and where to migrate when the list of 
 servers changes. The attached document describes the scheme and shows an 
 evaluation of it in Zookeeper. We also implemented re-balancing through a 
 consistent-hashing scheme and show a comparison. We derived the probabilistic 
 migration rules from a simple formula that we can also provide, if someone's 
 interested in the proof.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Jenkins build is still unstable: bookkeeper-trunk #750

See https://builds.apache.org/job/bookkeeper-trunk/750/

[jira] [Updated] (BOOKKEEPER-430) Remove manual bookie registration from overview


 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated BOOKKEEPER-430:


Attachment: BOOKKEEPER-430.patch

 Remove manual bookie registration from overview
 ---

 Key: BOOKKEEPER-430
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-430
 Project: Bookkeeper
  Issue Type: Improvement
Affects Versions: 4.1.0
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Attachments: BOOKKEEPER-430.patch


 The documentation suggests that a user needs to manually register a bookie, 
 which is not right.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (BOOKKEEPER-430) Remove manual bookie registration from overview


 [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated BOOKKEEPER-430:


Component/s: Documentation

 Remove manual bookie registration from overview
 ---

 Key: BOOKKEEPER-430
 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-430
 Project: Bookkeeper
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 4.1.0
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Attachments: BOOKKEEPER-430.patch


 The documentation suggests that a user needs to manually register a bookie, 
 which is not right.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-431) Duplicate definition of COOKIES_NODE