ZooKeeper_branch34 - Build # 784 - Failure
See https://builds.apache.org/job/ZooKeeper_branch34/784/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 234528 lines...] [junit] 2013-11-04 08:52:47,630 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-11-04 08:52:47,631 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-11-04 08:52:47,632 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-11-04 08:52:47,632 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-11-04 08:52:47,632 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-04 08:52:47,632 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-11-04 08:52:47,632 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@224] - NIOServerCnxn factory exited run method [junit] 2013-11-04 08:52:47,633 [myid:] - INFO [main:ZooKeeperServer@441] - shutting down [junit] 2013-11-04 08:52:47,633 [myid:] - INFO [main:SessionTrackerImpl@225] - Shutting down [junit] 2013-11-04 08:52:47,633 [myid:] - INFO [main:PrepRequestProcessor@761] - Shutting down [junit] 2013-11-04 08:52:47,633 [myid:] - INFO [main:SyncRequestProcessor@209] - Shutting down [junit] 2013-11-04 08:52:47,633 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop! [junit] 2013-11-04 08:52:47,633 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited! [junit] 2013-11-04 08:52:47,634 [myid:] - INFO [main:FinalRequestProcessor@415] - shutdown of request processor complete [junit] 2013-11-04 08:52:47,634 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-04 08:52:47,635 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-11-04 08:52:47,636 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-11-04 08:52:47,636 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34/branch-3.4/build/test/tmp/test4289329406240412390.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34/branch-3.4/build/test/tmp/test4289329406240412390.junit.dir/version-2 [junit] 2013-11-04 08:52:47,637 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-11-04 08:52:47,640 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-04 08:52:47,641 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:55748 [junit] 2013-11-04 08:52:47,641 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@817] - Processing stat command from /127.0.0.1:55748 [junit] 2013-11-04 08:52:47,641 [myid:] - INFO [Thread-5:NIOServerCnxn$StatCommand@653] - Stat command output [junit] 2013-11-04 08:52:47,642 [myid:] - INFO [Thread-5:NIOServerCnxn@997] - Closed socket connection for client /127.0.0.1:55748 (no session established for client) [junit] 2013-11-04 08:52:47,642 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-11-04 08:52:47,643 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-11-04 08:52:47,643 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-11-04 08:52:47,643 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-11-04 08:52:47,644 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-04 08:52:47,644 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-11-04 08:52:47,644 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-11-04 08:52:47,719 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x1422250748a closed [junit] 2013-11-04 08:52:47,719 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@509] - EventThread shut down [junit] 2013-11-04 08:52:47,719 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-11-04 08:52:47,720 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@224] - NIOServerCnxn factory exited run method [junit] 2013-11-04 08:52:47,720
[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812694#comment-13812694 ] Germán Blanco commented on ZOOKEEPER-1805: -- Thank you for considering my comments. Here are a few more ... Personally, I would check that both Votes are out of the election before ignoring the fields in Vote.java. And I believe peerEpoch should be taken into account for normal votes: {noformat} +if ((state != ServerState.LOOKING) (other.state != ServerState.LOOKING)) {^M + return (id == other.id);^M +} else {^M + return (id == other.id^M +(zxid == other.zxid) ^M +(electionEpoch == other.electionEpoch) +(peerEpoch == other.peerEpoch));^M +}^M {noformat} I think that the previous test case (testJoinInconsistentEnsemble in FLETest.java) would look better if we now change also the peerEpoch: {noformat} +Vote newVote = new Vote(leaderSid, zxid+100, electionEpoch+100, peerEpoch+100, state); {noformat} In this way, this test case also verifies the new changes. As indicated before, I would also remove the method updateElectionVote in QuorumPeer.java and the line under the comment for ZOOKEEPER-1732 in Leader.java and Learner.java. The value of peerEpoch will be ignored, so updating it looks like a waste of time to me. Don't care value in ZooKeeper election breaks rolling upgrades Key: ZOOKEEPER-1805 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1805-b3.4.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch This is an issue that has been originally reported in ZOOKEEPER-1732. -- This message was sent by Atlassian JIRA (v6.1#6144)
ZooKeeper-trunk-solaris - Build # 721 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/721/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 217092 lines...] [junit] 2013-11-04 10:50:33,130 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method [junit] 2013-11-04 10:50:33,130 [myid:] - INFO [main:ZooKeeperServer@428] - shutting down [junit] 2013-11-04 10:50:33,131 [myid:] - INFO [main:SessionTrackerImpl@183] - Shutting down [junit] 2013-11-04 10:50:33,131 [myid:] - INFO [main:PrepRequestProcessor@972] - Shutting down [junit] 2013-11-04 10:50:33,131 [myid:] - INFO [main:SyncRequestProcessor@190] - Shutting down [junit] 2013-11-04 10:50:33,131 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop! [junit] 2013-11-04 10:50:33,131 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited! [junit] 2013-11-04 10:50:33,131 [myid:] - INFO [main:FinalRequestProcessor@442] - shutdown of request processor complete [junit] 2013-11-04 10:50:33,132 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-04 10:50:33,132 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-11-04 10:50:33,133 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-11-04 10:50:33,134 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test4149150081156729558.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test4149150081156729558.junit.dir/version-2 [junit] 2013-11-04 10:50:33,134 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers. [junit] 2013-11-04 10:50:33,135 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-11-04 10:50:33,136 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test4149150081156729558.junit.dir/version-2/snapshot.b [junit] 2013-11-04 10:50:33,138 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test4149150081156729558.junit.dir/version-2/snapshot.b [junit] 2013-11-04 10:50:33,139 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-04 10:50:33,140 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:57739 [junit] 2013-11-04 10:50:33,141 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@828] - Processing stat command from /127.0.0.1:57739 [junit] 2013-11-04 10:50:33,141 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@677] - Stat command output [junit] 2013-11-04 10:50:33,141 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client /127.0.0.1:57739 (no session established for client) [junit] 2013-11-04 10:50:33,141 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-11-04 10:50:33,142 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-11-04 10:50:33,143 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-11-04 10:50:33,143 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-11-04 10:50:33,143 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-04 10:50:33,143 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-11-04 10:50:33,143 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-11-04 10:50:33,217 [myid:] - INFO [main:ZooKeeper@777] - Session: 0x14222bc4408 closed [junit] 2013-11-04 10:50:33,217 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down [junit] 2013-11-04 10:50:33,218 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-11-04 10:50:33,218 [myid:] - INFO
[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812776#comment-13812776 ] Flavio Junqueira commented on ZOOKEEPER-1805: - It still bothers me that we can't distinguish between old and new notification messages. I was thinking about introducing a format version field so that we can get around this problem and make the check in the way proposed instead of working around it. I have a patch mostly ready, but I'd like to know if this a direction that is ok to pursue. If this ok, then I can add a sub-task here so that we can work this out separately, before fixing this issue. Don't care value in ZooKeeper election breaks rolling upgrades Key: ZOOKEEPER-1805 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1805-b3.4.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch This is an issue that has been originally reported in ZOOKEEPER-1732. -- This message was sent by Atlassian JIRA (v6.1#6144)
ZooKeeper-trunk - Build # 2110 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk/2110/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 310215 lines...] [junit] 2013-11-04 12:14:39,783 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop! [junit] 2013-11-04 12:14:39,784 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@168] - SyncRequestProcessor exited! [junit] 2013-11-04 12:14:39,784 [myid:] - INFO [main:FinalRequestProcessor@442] - shutdown of request processor complete [junit] 2013-11-04 12:14:39,784 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-04 12:14:39,785 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[] [junit] 2013-11-04 12:14:39,786 [myid:] - INFO [main:ClientBase@414] - STARTING server [junit] 2013-11-04 12:14:39,786 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test7233425283476512048.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test7233425283476512048.junit.dir/version-2 [junit] 2013-11-04 12:14:39,787 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers. [junit] 2013-11-04 12:14:39,787 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-11-04 12:14:39,788 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test7233425283476512048.junit.dir/version-2/snapshot.b [junit] 2013-11-04 12:14:39,791 [myid:] - INFO [main:FileTxnSnapLog@297] - Snapshotting: 0xb to /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test7233425283476512048.junit.dir/version-2/snapshot.b [junit] 2013-11-04 12:14:39,793 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-04 12:14:39,793 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:37685 [junit] 2013-11-04 12:14:39,794 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@828] - Processing stat command from /127.0.0.1:37685 [junit] 2013-11-04 12:14:39,794 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@677] - Stat command output [junit] 2013-11-04 12:14:39,795 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@999] - Closed socket connection for client /127.0.0.1:37685 (no session established for client) [junit] 2013-11-04 12:14:39,795 [myid:] - INFO [main:JMXEnv@133] - ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-11-04 12:14:39,803 [myid:] - INFO [main:JMXEnv@105] - expect:InMemoryDataTree [junit] 2013-11-04 12:14:39,803 [myid:] - INFO [main:JMXEnv@108] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2013-11-04 12:14:39,803 [myid:] - INFO [main:JMXEnv@105] - expect:StandaloneServer_port [junit] 2013-11-04 12:14:39,803 [myid:] - INFO [main:JMXEnv@108] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-04 12:14:39,804 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota [junit] 2013-11-04 12:14:39,804 [myid:] - INFO [main:ClientBase@451] - tearDown starting [junit] 2013-11-04 12:14:39,864 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down [junit] 2013-11-04 12:14:39,864 [myid:] - INFO [main:ZooKeeper@777] - Session: 0x14223094596 closed [junit] 2013-11-04 12:14:39,864 [myid:] - INFO [main:ClientBase@421] - STOPPING server [junit] 2013-11-04 12:14:39,864 [myid:] - INFO [ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - ConnnectionExpirerThread interrupted [junit] 2013-11-04 12:14:39,864 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method [junit] 2013-11-04 12:14:39,864 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@219] - accept thread exitted run method [junit] 2013-11-04 12:14:39,864 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method [junit] 2013-11-04 12:14:39,865 [myid:] - INFO [main:ZooKeeperServer@428] - shutting down [junit] 2013-11-04 12:14:39,865 [myid:] - INFO
[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales updated ZOOKEEPER-1807: -- Issue Type: Bug (was: New Feature) Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales updated ZOOKEEPER-1807: -- Attachment: (was: ZOOKEEPER-1807.patch) Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-9863 PreCommit Build #1739
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-9863 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1739/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by remote host 127.0.0.1 Building remotely on hadoop9 in workspace /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build Reverting /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk to depth infinity with ignoreExternals: false Updating http://svn.apache.org/repos/asf/zookeeper/trunk at revision '2013-11-04T17:42:47.747 +' At revision 1538690 no change for http://svn.apache.org/repos/asf/zookeeper/trunk since the previous build No emails were triggered. [PreCommit-ZOOKEEPER-Build] $ /bin/bash /tmp/hudson2500245945366574607.sh /home/jenkins/tools/java/latest/bin/java Buildfile: build.xml check-for-findbugs: findbugs.check: forrest.check: hudson-test-patch: [exec] [exec] [exec] == [exec] == [exec] Testing patch for ZOOKEEPER-9863. [exec] == [exec] == [exec] [exec] [exec] At revision 1538690. [exec] ZOOKEEPER-9863 is not Patch Available. Exiting. [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD SUCCESSFUL Total time: 2 seconds Archiving artifacts ERROR: No artifacts found that match the file pattern trunk/build/test/findbugs/newPatchFindbugsWarnings.html,trunk/patchprocess/*.txt,trunk/patchprocess/*Warnings.xml,trunk/build/test/test-cppunit/*.txt,trunk/build/tmp/zk.log. Configuration error? ERROR: ?trunk/build/test/findbugs/newPatchFindbugsWarnings.html? doesn?t match anything: ?trunk? exists but not ?trunk/build/test/findbugs/newPatchFindbugsWarnings.html? Build step 'Archive the artifacts' changed build result to FAILURE Recording test results Description set: ZOOKEEPER-9863 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813071#comment-13813071 ] Alexander Shraer commented on ZOOKEEPER-1807: - probably there's not going to be any more of a loop than for participants. if you think this is not acceptable for observers, it would be sufficient to reply only when the sending server has a bigger config version (the one in QuorumVerifier) than the potential receiver. Otherwise there's no benefit for the receiver in terms of learning about new configs. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813077#comment-13813077 ] Raul Gutierrez Segales commented on ZOOKEEPER-1807: --- Thanks for the quick comment Alex. Yeah sounds to me that might be acceptable. Again, for huge deployments it might be a bit of concern since you'll be putting extra pressure on the cluster after, say, a big network partition. Thoughts? Cc: [~thawan], [~fpj]. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813085#comment-13813085 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611988/ZOOKEEPER-1807.patch against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1740//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1740//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1740//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Raul Gutierrez Segales Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Germán Blanco reassigned ZOOKEEPER-1807: Assignee: Germán Blanco (was: Raul Gutierrez Segales) Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1807 PreCommit Build #1740
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1740/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 237685 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12611988/ZOOKEEPER-1807.patch [exec] against trunk revision 1535491. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1740//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1740//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1740//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] e450abe5dff16d08a430c5fe301fe1d6d2f1a583 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 2 Total time: 36 minutes 29 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1807 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales updated ZOOKEEPER-1807: -- Attachment: notifications-loop.png Here's how notification traffic (on election port 3888 in my case) goes down with the patch (i.e.: without the notifications loop). It's pretty dramatic so I'd say this is definitely a blocker for 3.5.0. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813108#comment-13813108 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611999/notifications-loop.png against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1741//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1807 PreCommit Build #1741
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1741/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 63 lines...] [exec] == [exec] Applying patch. [exec] == [exec] == [exec] [exec] [exec] /usr/bin/patch: Only garbage was found in the patch input. [exec] patch unexpectedly ends in middle of line [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12611999/notifications-loop.png [exec] against trunk revision 1535491. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1741//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 740c8b4185b4d78f429ca9b61a33f873119c071a logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 1 minute 11 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1807 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813111#comment-13813111 ] Thawan Kooburat commented on ZOOKEEPER-1807: I believe we have a much different concern using large number of observers. In our internal deployment, we did a few hacks which essentially kill all observer-to-observer communication. Observers only observe the result of election algorithm. We also add random delay when observer try to reconnect, so that participants has a chance to synchronize with the leader and form the quorum before the observers take away the leader's bandwidth. My understanding is that with our leader election algorithm, you need to broadcast your vote whenever your current vote change, so this will generate a lot of message during the initial phase of the algorithm. Also, N x N communication needed by LE is not going to scale for large deployment. For me, I don't think promoting observer to participant is going to be a common case (only needed for DR purpose), it would be acceptable to have optional flag to disable that feature in order to reduce LE overhead with large number of observers. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Shraer updated ZOOKEEPER-1807: Attachment: ZOOKEEPER-1807-alex.patch Sorry for the confusion, everyone, but it seems that for reconfiguration purposes its only important to send a notification (containing new config) to a server if its a participant either in the current or in the next configuration. Only in that case we may need to convince him to adopt its new role as a participant and help form a quorum. So perhaps the attached patch could work. What do you think ? Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813287#comment-13813287 ] Alexander Shraer commented on ZOOKEEPER-1807: - This part is described in Section 3.2 of the paper: https://www.usenix.org/system/files/conference/atc12/atc12-final74.pdf Of course the paper doesn't talk about FastLeaderElection and things like that. So the actual implementation needs to have comments, and it does have them in many places, here we should probably explain some more. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1807 PreCommit Build #1742
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1742/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 270048 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12612023/ZOOKEEPER-1807-alex.patch [exec] against trunk revision 1535491. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1742//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1742//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1742//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 703a0aaef1d8bc57a09a2890b34bef39bdde99b1 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 33 minutes 40 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1807 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813294#comment-13813294 ] Hadoop QA commented on ZOOKEEPER-1807: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612023/ZOOKEEPER-1807-alex.patch against trunk revision 1535491. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1742//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1742//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1742//console This message is automatically generated. Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Germán Blanco Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1804) Stat the realtime tps of zookeepr server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813386#comment-13813386 ] Patrick Hunt commented on ZOOKEEPER-1804: - Hi [~nileader], in order for the patchbot to do it's work you'll need to attach a patch generated with the --no-prefix option in git, see the guide: https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute Stat the realtime tps of zookeepr server Key: ZOOKEEPER-1804 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1804 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Leader Ni Assignee: Leader Ni Attachments: ZOOKEEPER-1804-2.patch, ZOOKEEPER-1804.patch At this time, we assessed whether zookeeper supports some business scenarios, always use the number of subscribers, or to assess the number of clients。 You konw, some times, many client connection with zookeeper, but do noting, and the onthers do complex business logic。 So,we must stat the realtime tps of zookeepr。 [-Solution---] Solution1: If you only want to know the real time transaction processed, you can use the patch ZOOKEEPER-1804.patch. Solution2: If you also want to know how client use zookeeper, and the real time r/w ps of each zookeeper client, you can use the patch ZOOKEEPER-1804-2.patch use java properties: -Dserver_process_stats=true to open the function. Sample: $echo rwps|nc localhost 2181 RealTime R/W Statistics: getChildren2: 0.5994005994005994 createSession: 1.6983016983016983 closeSession: 0.999000999000999 setData: 110.18981018981019 setWatches: 129.17082917082917 getChildren: 68.83116883116884 delete: 19.980019980019982 create: 22.27772227772228 exists: 1806.2937062937062 getDate: 729.5704295704296 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1666) Avoid Reverse DNS lookup if the hostname in connection string is literal IP address.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813548#comment-13813548 ] Camille Fournier commented on ZOOKEEPER-1666: - I checked the C code, not a C expert but it looks like we rely on getaddrinfo which takes an ip address or a hostname, so I think we're good there. I will check this in. Avoid Reverse DNS lookup if the hostname in connection string is literal IP address. Key: ZOOKEEPER-1666 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1666 Project: ZooKeeper Issue Type: Improvement Components: java client Reporter: George Cao Assignee: George Cao Labels: patch, test Attachments: ZOOKEEPER-1666.patch, ZOOKEEPER-1666.patch In our ENV, if the InetSocketAddress.getHostName() is called and the host name in the connection string are literal IP address, then the call will trigger a reverse DNS lookup which is very slow. And in this situation, the host name can simply set as the IP without causing any problem. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1666) Avoid Reverse DNS lookup if the hostname in connection string is literal IP address.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813559#comment-13813559 ] Camille Fournier commented on ZOOKEEPER-1666: - I got this into 3.5, but it requires a bit of a rewrite to work for 3.4.6. If we want to put it there, I need you to write it to fit, [~georgecao]. LMK, otherwise I will resolve this for just 3.5. Avoid Reverse DNS lookup if the hostname in connection string is literal IP address. Key: ZOOKEEPER-1666 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1666 Project: ZooKeeper Issue Type: Improvement Components: java client Reporter: George Cao Assignee: George Cao Labels: patch, test Attachments: ZOOKEEPER-1666.patch, ZOOKEEPER-1666.patch In our ENV, if the InetSocketAddress.getHostName() is called and the host name in the connection string are literal IP address, then the call will trigger a reverse DNS lookup which is very slow. And in this situation, the host name can simply set as the IP without causing any problem. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1652) zookeeper java client does a reverse dns lookup when connecting
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813565#comment-13813565 ] Camille Fournier commented on ZOOKEEPER-1652: - I believe this addresses the same issue as ZOOKEEPER-1666, but will work for 3.4.6. So I'm going to use this for 3.4.6, but not trunk, which was resolved with the other patch. zookeeper java client does a reverse dns lookup when connecting --- Key: ZOOKEEPER-1652 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1652 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.5 Reporter: Sean Bridges Assignee: Sean Bridges Priority: Critical Attachments: ZOOKEEPER-1652.patch When connecting to zookeeper, the client does a reverse dns lookup on the hostname. In our environment, the reverse dns lookup takes 5 seconds to fail, causing zookeeper clients to connect slowly. The reverse dns lookup occurs in ClientCnx in the calls to adr.getHostName() {code} setName(getName().replaceAll(\\(.*\\), ( + addr.getHostName() + : + addr.getPort() + ))); try { zooKeeperSaslClient = new ZooKeeperSaslClient(zookeeper/+addr.getHostName()); } catch (LoginException e) { {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1652) zookeeper java client does a reverse dns lookup when connecting
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813570#comment-13813570 ] Camille Fournier commented on ZOOKEEPER-1652: - Actually, when I apply this change with the test for ZOOKEEPER-1666, that test fails. [~georgecao], [~sbridges], want to take a look? zookeeper java client does a reverse dns lookup when connecting --- Key: ZOOKEEPER-1652 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1652 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.5 Reporter: Sean Bridges Assignee: Sean Bridges Priority: Critical Attachments: ZOOKEEPER-1652.patch When connecting to zookeeper, the client does a reverse dns lookup on the hostname. In our environment, the reverse dns lookup takes 5 seconds to fail, causing zookeeper clients to connect slowly. The reverse dns lookup occurs in ClientCnx in the calls to adr.getHostName() {code} setName(getName().replaceAll(\\(.*\\), ( + addr.getHostName() + : + addr.getPort() + ))); try { zooKeeperSaslClient = new ZooKeeperSaslClient(zookeeper/+addr.getHostName()); } catch (LoginException e) { {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (ZOOKEEPER-1666) Avoid Reverse DNS lookup if the hostname in connection string is literal IP address.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Camille Fournier updated ZOOKEEPER-1666: Attachment: ZOOKEEPER-1666-34.patch 3.4.6 patch Avoid Reverse DNS lookup if the hostname in connection string is literal IP address. Key: ZOOKEEPER-1666 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1666 Project: ZooKeeper Issue Type: Improvement Components: java client Reporter: George Cao Assignee: George Cao Labels: patch, test Attachments: ZOOKEEPER-1666-34.patch, ZOOKEEPER-1666.patch, ZOOKEEPER-1666.patch In our ENV, if the InetSocketAddress.getHostName() is called and the host name in the connection string are literal IP address, then the call will trigger a reverse DNS lookup which is very slow. And in this situation, the host name can simply set as the IP without causing any problem. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1652) zookeeper java client does a reverse dns lookup when connecting
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813578#comment-13813578 ] Camille Fournier commented on ZOOKEEPER-1652: - Elected to make a patch for ZOOKEEPER-1666 that makes that work on 3.4.6. Please look there for that, and leave comments. zookeeper java client does a reverse dns lookup when connecting --- Key: ZOOKEEPER-1652 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1652 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.5 Reporter: Sean Bridges Assignee: Sean Bridges Priority: Critical Attachments: ZOOKEEPER-1652.patch When connecting to zookeeper, the client does a reverse dns lookup on the hostname. In our environment, the reverse dns lookup takes 5 seconds to fail, causing zookeeper clients to connect slowly. The reverse dns lookup occurs in ClientCnx in the calls to adr.getHostName() {code} setName(getName().replaceAll(\\(.*\\), ( + addr.getHostName() + : + addr.getPort() + ))); try { zooKeeperSaslClient = new ZooKeeperSaslClient(zookeeper/+addr.getHostName()); } catch (LoginException e) { {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (ZOOKEEPER-1666) Avoid Reverse DNS lookup if the hostname in connection string is literal IP address.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813637#comment-13813637 ] Hadoop QA commented on ZOOKEEPER-1666: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612093/ZOOKEEPER-1666-34.patch against trunk revision 1538853. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1743//console This message is automatically generated. Avoid Reverse DNS lookup if the hostname in connection string is literal IP address. Key: ZOOKEEPER-1666 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1666 Project: ZooKeeper Issue Type: Improvement Components: java client Reporter: George Cao Assignee: George Cao Labels: patch, test Attachments: ZOOKEEPER-1666-34.patch, ZOOKEEPER-1666.patch, ZOOKEEPER-1666.patch In our ENV, if the InetSocketAddress.getHostName() is called and the host name in the connection string are literal IP address, then the call will trigger a reverse DNS lookup which is very slow. And in this situation, the host name can simply set as the IP without causing any problem. -- This message was sent by Atlassian JIRA (v6.1#6144)
Failed: ZOOKEEPER-1666 PreCommit Build #1743
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1666 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1743/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 68 lines...] [exec] == [exec] [exec] [exec] patching file src/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java [exec] Hunk #1 FAILED at 35. [exec] Hunk #2 succeeded at 301 with fuzz 1 (offset 209 lines). [exec] 1 out of 2 hunks FAILED -- saving rejects to file src/java/test/org/apache/zookeeper/test/StaticHostProviderTest.java.rej [exec] patching file src/java/main/org/apache/zookeeper/client/StaticHostProvider.java [exec] Hunk #1 FAILED at 56. [exec] 1 out of 1 hunk FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/client/StaticHostProvider.java.rej [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12612093/ZOOKEEPER-1666-34.patch [exec] against trunk revision 1538853. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1743//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] feef8621728acce1af4f17c9cd65d22bf710ec7c logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1623: exec returned: 1 Total time: 1 minute 26 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1666 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
ZooKeeper_branch33_solaris - Build # 698 - Still Failing
See https://builds.apache.org/job/ZooKeeper_branch33_solaris/698/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 98610 lines...] [junit] 2013-11-05 07:09:01,635 - INFO [main:ZooKeeperServer@154] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test5859429153162977043.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test5859429153162977043.junit.dir/version-2 [junit] 2013-11-05 07:09:01,636 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-11-05 07:09:01,638 - INFO [main:FileSnap@82] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test5859429153162977043.junit.dir/version-2/snapshot.0 [junit] 2013-11-05 07:09:01,641 - INFO [main:FileTxnSnapLog@256] - Snapshotting: b [junit] 2013-11-05 07:09:01,644 - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-05 07:09:01,645 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:64395 [junit] 2013-11-05 07:09:01,645 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1237] - Processing stat command from /127.0.0.1:64395 [junit] 2013-11-05 07:09:01,646 - INFO [Thread-4:NIOServerCnxn$StatCommand@1153] - Stat command output [junit] ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] 2013-11-05 07:09:01,647 - INFO [Thread-4:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:64395 (no session established for client) [junit] expect:InMemoryDataTree [junit] found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] expect:StandaloneServer_port [junit] found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-05 07:09:01,649 - INFO [main:ClientBase@408] - STOPPING server [junit] 2013-11-05 07:09:01,650 - INFO [ProcessThread:-1:PrepRequestProcessor@128] - PrepRequestProcessor exited loop! [junit] 2013-11-05 07:09:01,650 - INFO [SyncThread:0:SyncRequestProcessor@151] - SyncRequestProcessor exited! [junit] 2013-11-05 07:09:01,651 - INFO [main:FinalRequestProcessor@370] - shutdown of request processor complete [junit] 2013-11-05 07:09:01,652 - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] ensureOnly:[] [junit] 2013-11-05 07:09:01,654 - INFO [main:ClientBase@401] - STARTING server [junit] 2013-11-05 07:09:01,654 - INFO [main:ZooKeeperServer@154] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test5859429153162977043.junit.dir/version-2 snapdir /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test5859429153162977043.junit.dir/version-2 [junit] 2013-11-05 07:09:01,655 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2013-11-05 07:09:01,656 - INFO [main:FileSnap@82] - Reading snapshot /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test5859429153162977043.junit.dir/version-2/snapshot.b [junit] 2013-11-05 07:09:01,659 - INFO [main:FileTxnSnapLog@256] - Snapshotting: b [junit] 2013-11-05 07:09:01,661 - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2013-11-05 07:09:01,662 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:64397 [junit] 2013-11-05 07:09:01,662 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1237] - Processing stat command from /127.0.0.1:64397 [junit] 2013-11-05 07:09:01,663 - INFO [Thread-5:NIOServerCnxn$StatCommand@1153] - Stat command output [junit] 2013-11-05 07:09:01,663 - INFO [Thread-5:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:64397 (no session established for client) [junit] ensureOnly:[InMemoryDataTree, StandaloneServer_port] [junit] expect:InMemoryDataTree [junit] found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] expect:StandaloneServer_port [junit] found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2013-11-05 07:09:01,665 - INFO
[jira] [Commented] (ZOOKEEPER-1805) Don't care value in ZooKeeper election breaks rolling upgrades
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813720#comment-13813720 ] Germán Blanco commented on ZOOKEEPER-1805: -- I understand that you want to add a version field to notifications in order to know which come from a server that ignores zxid and electionEpoch for an established ensemble and which come from a server without this change, corrrect? Once that is done, then it would be possible to make the correct comparison for the epoch when joining an ensemble with a mixture of updated and not-updated servers. That sounds good for me. Having a version field will help in the future if any other change is required in notifications for fast leader election. For this problem, it means that the comparison between votes only needs to be different for the special case in which there is a mixture of servers, and it doesn't need to be modified at all for the rest of the cases, which seems to be a safer approach. Don't care value in ZooKeeper election breaks rolling upgrades Key: ZOOKEEPER-1805 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805 Project: ZooKeeper Issue Type: Bug Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1805-b3.4.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch This is an issue that has been originally reported in ZOOKEEPER-1732. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Shraer reassigned ZOOKEEPER-1807: --- Assignee: Alexander Shraer (was: Germán Blanco) Observers spam each other creating connections to the election addr --- Key: ZOOKEEPER-1807 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807 Project: ZooKeeper Issue Type: Bug Reporter: Raul Gutierrez Segales Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807.patch, notifications-loop.png Hey [~shralex], I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these: {noformat} 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14 {noformat} and so and so on ad nauseam. Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107: {noformat} private void sendNotifications() { -for (QuorumServer server : self.getVotingView().values()) { -long sid = server.id; - +for (long sid : self.getAllKnownServerIds()) { +QuorumVerifier qv = self.getQuorumVerifier(); {noformat} Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). -- This message was sent by Atlassian JIRA (v6.1#6144)