ZooKeeper_branch33_solaris - Build # 632 - Failure

2013-08-29 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch33_solaris/632/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 106127 lines...]
[junit] 2013-08-29 07:02:34,305 - INFO  [main:ZooKeeperServer@154] - 
Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test1251651383782493561.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test1251651383782493561.junit.dir/version-2
[junit] 2013-08-29 07:02:34,306 - INFO  [main:NIOServerCnxn$Factory@143] - 
binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2013-08-29 07:02:34,307 - INFO  [main:FileSnap@82] - Reading 
snapshot 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test1251651383782493561.junit.dir/version-2/snapshot.0
[junit] 2013-08-29 07:02:34,311 - INFO  [main:FileTxnSnapLog@256] - 
Snapshotting: b
[junit] 2013-08-29 07:02:34,314 - INFO  [main:FourLetterWordMain@43] - 
connecting to 127.0.0.1 11221
[junit] 2013-08-29 07:02:34,315 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn$Factory@251] - 
Accepted socket connection from /127.0.0.1:56802
[junit] 2013-08-29 07:02:34,316 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1237] - Processing 
stat command from /127.0.0.1:56802
[junit] 2013-08-29 07:02:34,317 - INFO  
[Thread-4:NIOServerCnxn$StatCommand@1153] - Stat command output
[junit] 2013-08-29 07:02:34,317 - INFO  [Thread-4:NIOServerCnxn@1435] - 
Closed socket connection for client /127.0.0.1:56802 (no session established 
for client)
[junit] ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] expect:InMemoryDataTree
[junit] found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] expect:StandaloneServer_port
[junit] found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2013-08-29 07:02:34,320 - INFO  [main:ClientBase@408] - STOPPING 
server
[junit] 2013-08-29 07:02:34,321 - INFO  
[ProcessThread:-1:PrepRequestProcessor@128] - PrepRequestProcessor exited loop!
[junit] 2013-08-29 07:02:34,321 - INFO  
[SyncThread:0:SyncRequestProcessor@151] - SyncRequestProcessor exited!
[junit] 2013-08-29 07:02:34,322 - INFO  [main:FinalRequestProcessor@370] - 
shutdown of request processor complete
[junit] 2013-08-29 07:02:34,323 - INFO  [main:FourLetterWordMain@43] - 
connecting to 127.0.0.1 11221
[junit] ensureOnly:[]
[junit] 2013-08-29 07:02:34,325 - INFO  [main:ClientBase@401] - STARTING 
server
[junit] 2013-08-29 07:02:34,326 - INFO  [main:ZooKeeperServer@154] - 
Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test1251651383782493561.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test1251651383782493561.junit.dir/version-2
[junit] 2013-08-29 07:02:34,327 - INFO  [main:NIOServerCnxn$Factory@143] - 
binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2013-08-29 07:02:34,328 - INFO  [main:FileSnap@82] - Reading 
snapshot 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build/test/tmp/test1251651383782493561.junit.dir/version-2/snapshot.b
[junit] 2013-08-29 07:02:34,331 - INFO  [main:FileTxnSnapLog@256] - 
Snapshotting: b
[junit] 2013-08-29 07:02:34,333 - INFO  [main:FourLetterWordMain@43] - 
connecting to 127.0.0.1 11221
[junit] 2013-08-29 07:02:34,334 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn$Factory@251] - 
Accepted socket connection from /127.0.0.1:56804
[junit] 2013-08-29 07:02:34,334 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1237] - Processing 
stat command from /127.0.0.1:56804
[junit] 2013-08-29 07:02:34,335 - INFO  
[Thread-5:NIOServerCnxn$StatCommand@1153] - Stat command output
[junit] 2013-08-29 07:02:34,336 - INFO  [Thread-5:NIOServerCnxn@1435] - 
Closed socket connection for client /127.0.0.1:56804 (no session established 
for client)
[junit] ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] expect:InMemoryDataTree
[junit] found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] expect:StandaloneServer_port
[junit] found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2013-08-29 07:02:34,338 - INFO  

ZooKeeper-trunk-solaris - Build # 656 - Still Failing

2013-08-29 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/656/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 192443 lines...]
[junit] 2013-08-29 09:05:04,995 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2013-08-29 09:05:04,995 [myid:] - INFO  [main:ZooKeeperServer@422] 
- shutting down
[junit] 2013-08-29 09:05:04,996 [myid:] - INFO  
[main:SessionTrackerImpl@180] - Shutting down
[junit] 2013-08-29 09:05:04,996 [myid:] - INFO  
[main:PrepRequestProcessor@929] - Shutting down
[junit] 2013-08-29 09:05:04,996 [myid:] - INFO  
[main:SyncRequestProcessor@175] - Shutting down
[junit] 2013-08-29 09:05:04,996 [myid:] - INFO  [ProcessThread(sid:0 
cport:-1)::PrepRequestProcessor@156] - PrepRequestProcessor exited loop!
[junit] 2013-08-29 09:05:04,996 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@155] - SyncRequestProcessor exited!
[junit] 2013-08-29 09:05:04,996 [myid:] - INFO  
[main:FinalRequestProcessor@427] - shutdown of request processor complete
[junit] 2013-08-29 09:05:04,997 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-08-29 09:05:04,998 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[]
[junit] 2013-08-29 09:05:04,999 [myid:] - INFO  [main:ClientBase@414] - 
STARTING server
[junit] 2013-08-29 09:05:04,999 [myid:] - INFO  [main:ZooKeeperServer@149] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test7376087128879398642.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test7376087128879398642.junit.dir/version-2
[junit] 2013-08-29 09:05:05,000 [myid:] - INFO  
[main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s 
sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 
kB direct buffers.
[junit] 2013-08-29 09:05:05,000 [myid:] - INFO  
[main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2013-08-29 09:05:05,001 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test7376087128879398642.junit.dir/version-2/snapshot.b
[junit] 2013-08-29 09:05:05,003 [myid:] - INFO  [main:FileTxnSnapLog@297] - 
Snapshotting: 0xb to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris/trunk/build/test/tmp/test7376087128879398642.junit.dir/version-2/snapshot.b
[junit] 2013-08-29 09:05:05,005 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2013-08-29 09:05:05,005 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:42319
[junit] 2013-08-29 09:05:05,006 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@829] - Processing stat command from 
/127.0.0.1:42319
[junit] 2013-08-29 09:05:05,006 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn$StatCommand@678] - Stat command output
[junit] 2013-08-29 09:05:05,007 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@1000] - Closed socket connection for client 
/127.0.0.1:42319 (no session established for client)
[junit] 2013-08-29 09:05:05,007 [myid:] - INFO  [main:JMXEnv@133] - 
ensureOnly:[InMemoryDataTree, StandaloneServer_port]
[junit] 2013-08-29 09:05:05,008 [myid:] - INFO  [main:JMXEnv@105] - 
expect:InMemoryDataTree
[junit] 2013-08-29 09:05:05,008 [myid:] - INFO  [main:JMXEnv@108] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
[junit] 2013-08-29 09:05:05,009 [myid:] - INFO  [main:JMXEnv@105] - 
expect:StandaloneServer_port
[junit] 2013-08-29 09:05:05,009 [myid:] - INFO  [main:JMXEnv@108] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port-1
[junit] 2013-08-29 09:05:05,009 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD testQuota
[junit] 2013-08-29 09:05:05,009 [myid:] - INFO  [main:ClientBase@451] - 
tearDown starting
[junit] 2013-08-29 09:05:05,087 [myid:] - INFO  [main:ZooKeeper@777] - 
Session: 0x140c951a04c closed
[junit] 2013-08-29 09:05:05,088 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down
[junit] 2013-08-29 09:05:05,089 [myid:] - INFO  [main:ClientBase@421] - 
STOPPING server
[junit] 2013-08-29 09:05:05,090 [myid:] - INFO  

[jira] [Commented] (ZOOKEEPER-1670) zookeeper should set a default value for SERVER_JVMFLAGS and CLIENT_JVMFLAGS so that memory usage is controlled

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753559#comment-13753559
 ] 

Flavio Junqueira commented on ZOOKEEPER-1670:
-

+1, looks good to me. I'll create a jira for the jvm flags for windows start 
scripts.

 zookeeper should set a default value for SERVER_JVMFLAGS and CLIENT_JVMFLAGS 
 so that memory usage is controlled
 ---

 Key: ZOOKEEPER-1670
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1670
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.5
Reporter: Arpit Gupta
Assignee: Arpit Gupta
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1670.patch, ZOOKEEPER-1670.patch, 
 ZOOKEEPER-1670.patch, ZOOKEEPER-1670.patch, ZOOKEEPER-1670.patch


 We noticed this with jdk 1.6 where if no heap size is set the process takes 
 up to 1/4 of mem available on the machine.
 More info 
 http://stackoverflow.com/questions/3428251/is-there-a-default-xmx-setting-for-java-1-5
 You can run the following command to see what are the defaults for your 
 machine
 {code}
 java -XX:+PrintFlagsFinal -version 21 | grep -i -E 
 'heapsize|permsize|version'
 {code}
 And we noticed on two different class of machines that this was 1/4th of 
 total memory on the machine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1655) Make jline dependency optional in maven pom

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753565#comment-13753565
 ] 

Flavio Junqueira commented on ZOOKEEPER-1655:
-

I'm trying to understand the implications of this patch. One of the concerns in 
HADOOP-9342 was the version of jline. Should we bump up the version here? 
Otherwise it sounds good to make it optional. 

 Make jline dependency optional in maven pom
 ---

 Key: ZOOKEEPER-1655
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1655
 Project: ZooKeeper
  Issue Type: Bug
  Components: build
Affects Versions: 3.4.2
Reporter: Thomas Weise
Assignee: Thomas Weise
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1655.patch


 Old JLine version used in ZK CLI should not be pulled into downstream 
 projects. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1240) Compiler issue with redhat linux

2013-08-29 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-1240:


Fix Version/s: (was: 3.4.6)
   3.5.0

 Compiler issue with redhat linux
 

 Key: ZOOKEEPER-1240
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1240
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3
 Environment: Linux phy 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 
 2007 x86_64 x86_64 x86_64 GNU/Linux
 gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)
Reporter: Peng Futian
Assignee: Peng Futian
Priority: Minor
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1240.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When I compile zookeeper c client in my project, there are some error:
 ../../../include/zookeeper/recordio.h:70: error:expected unqualified-id 
 before '__extension__'
 ../../../include/zookeeper/recordio.h:70: error:expected `)' before 
 '__extension__'
 ../../.. /include/zookeeper/recordio.h:70: error:expected unqualified-id 
 before ')' token

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1240) Compiler issue with redhat linux

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753588#comment-13753588
 ] 

Flavio Junqueira commented on ZOOKEEPER-1240:
-

Since I got no feedback, I'm moving this one to 3.5.0.

 Compiler issue with redhat linux
 

 Key: ZOOKEEPER-1240
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1240
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3
 Environment: Linux phy 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 
 2007 x86_64 x86_64 x86_64 GNU/Linux
 gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)
Reporter: Peng Futian
Assignee: Peng Futian
Priority: Minor
  Labels: patch
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1240.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When I compile zookeeper c client in my project, there are some error:
 ../../../include/zookeeper/recordio.h:70: error:expected unqualified-id 
 before '__extension__'
 ../../../include/zookeeper/recordio.h:70: error:expected `)' before 
 '__extension__'
 ../../.. /include/zookeeper/recordio.h:70: error:expected unqualified-id 
 before ')' token

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1711) ZooKeeper server binds to all ip addresses for leader election and broadcast

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753597#comment-13753597
 ] 

Flavio Junqueira commented on ZOOKEEPER-1711:
-

I'd like to get a resolution on this jira. If this is duplicate, then we should 
resolve it as such and mark ZOOKEEPER-1096 for 3.4.6.

 ZooKeeper server binds to all ip addresses for leader election and broadcast
 

 Key: ZOOKEEPER-1711
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1711
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
 Environment: Any
Reporter: Germán Blanco
Priority: Minor
 Fix For: 3.4.6

   Original Estimate: 72h
  Remaining Estimate: 72h

 Unlike current ZooKeeper version in trunk intended for release as 3.5.0, the 
 current ZooKeeper server version 3.4.5 binds to all ip addresses on the 
 specified port for election. It only makes sense to bind to the ip address 
 indicated in the configuration file, which is where the other servers will 
 connect. Listening to other ip addresses could have bad security implications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1167) C api lacks synchronous version of sync() call.

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753598#comment-13753598
 ] 

Flavio Junqueira commented on ZOOKEEPER-1167:
-

This is not exactly a bug fix, so I'm moving it to 3.5.0.

 C api lacks synchronous version of sync() call.
 ---

 Key: ZOOKEEPER-1167
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1167
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3, 3.4.3, 3.5.0
Reporter: Nicholas Harteau
Assignee: Marshall McMullen
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1167.patch


 Reading through the source, the C API implements zoo_async() which is the 
 zookeeper sync() method implemented in the multithreaded/asynchronous C API.  
 It doesn't implement anything equivalent in the non-multithreaded API.
 I'm not sure if this was oversight or intentional, but it means that the 
 non-multithreaded API can't guarantee consistent client views on critical 
 reads.
 The zkperl bindings depend on the synchronous, non-multithreaded API so also 
 can't call sync() currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1167) C api lacks synchronous version of sync() call.

2013-08-29 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-1167:


Fix Version/s: (was: 3.4.6)
   3.5.0

 C api lacks synchronous version of sync() call.
 ---

 Key: ZOOKEEPER-1167
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1167
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.3.3, 3.4.3, 3.5.0
Reporter: Nicholas Harteau
Assignee: Marshall McMullen
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1167.patch


 Reading through the source, the C API implements zoo_async() which is the 
 zookeeper sync() method implemented in the multithreaded/asynchronous C API.  
 It doesn't implement anything equivalent in the non-multithreaded API.
 I'm not sure if this was oversight or intentional, but it means that the 
 non-multithreaded API can't guarantee consistent client views on critical 
 reads.
 The zkperl bindings depend on the synchronous, non-multithreaded API so also 
 can't call sync() currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753601#comment-13753601
 ] 

Flavio Junqueira commented on ZOOKEEPER-1477:
-

I haven't heard anything back, so I'm moving it to 3.5.0. I'm not sure this is 
still an issue, though, everything seems to work for me on macos.

 Test failures with Java 7 on Mac OS X
 -

 Key: ZOOKEEPER-1477
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.3
 Environment: Mac OS X Lion (10.7.4)
 Java version:
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Reporter: Diwaker Gupta
 Fix For: 3.4.6

 Attachments: with-ZK-1550.txt


 I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
 including ZooKeeperTest. A common symptom was spurious 
 {{ConnectionLossException}}:
 {code}
 2012-06-01 12:01:23,420 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testDeleteRecursiveAsync
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... (snipped)
 {code}
 As background, I was actually investigating some non-deterministic failures 
 when using Netflix's Curator with Java 7 (see 
 https://github.com/Netflix/curator/issues/79). After a while, I figured I 
 should establish a clean ZK baseline first and realized it is actually a ZK 
 issue, not a Curator issue.
 We are trying to migrate to Java 7 but this is a blocking issue for us right 
 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1477) Test failures with Java 7 on Mac OS X

2013-08-29 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-1477:


Fix Version/s: (was: 3.4.6)
   3.5.0

 Test failures with Java 7 on Mac OS X
 -

 Key: ZOOKEEPER-1477
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1477
 Project: ZooKeeper
  Issue Type: Bug
  Components: server, tests
Affects Versions: 3.4.3
 Environment: Mac OS X Lion (10.7.4)
 Java version:
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Reporter: Diwaker Gupta
 Fix For: 3.5.0

 Attachments: with-ZK-1550.txt


 I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, 
 including ZooKeeperTest. A common symptom was spurious 
 {{ConnectionLossException}}:
 {code}
 2012-06-01 12:01:23,420 [myid:] - INFO  
 [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED 
 testDeleteRecursiveAsync
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 ... (snipped)
 {code}
 As background, I was actually investigating some non-deterministic failures 
 when using Netflix's Curator with Java 7 (see 
 https://github.com/Netflix/curator/issues/79). After a while, I figured I 
 should establish a clean ZK baseline first and realized it is actually a ZK 
 issue, not a Curator issue.
 We are trying to migrate to Java 7 but this is a blocking issue for us right 
 now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (ZOOKEEPER-1711) ZooKeeper server binds to all ip addresses for leader election and broadcast

2013-08-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Germán Blanco resolved ZOOKEEPER-1711.
--

Resolution: Duplicate

This is a duplicate of 1096

 ZooKeeper server binds to all ip addresses for leader election and broadcast
 

 Key: ZOOKEEPER-1711
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1711
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
 Environment: Any
Reporter: Germán Blanco
Priority: Minor
 Fix For: 3.4.6

   Original Estimate: 72h
  Remaining Estimate: 72h

 Unlike current ZooKeeper version in trunk intended for release as 3.5.0, the 
 current ZooKeeper server version 3.4.5 binds to all ip addresses on the 
 specified port for election. It only makes sense to bind to the ip address 
 indicated in the configuration file, which is where the other servers will 
 connect. Listening to other ip addresses could have bad security implications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address

2013-08-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Germán Blanco updated ZOOKEEPER-1096:
-

  Description: 
Server should specify the local address that is used for leader communication 
and leader election (and not use the default of listening on all interfaces).  
This is similar to the clientPortAddress parameter that was added a year ago.  
After reviewing the code, we can't think of a reason why only the port would be 
used with the wildcard interface, when servers are already connecting 
specifically to that interface anyway.

I have submitted a patch, but it does not account for all leader election 
algorithms.

Probably should have an option to toggle this, for backwards compatibility, 
although it seems like it would be a bug if this change broke things.

There is some more information about making it an option here:
http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

  was:
Server should specify the local address that is used for leader communication 
(and not use the default of listening on all interfaces).  This is similar to 
the clientPortAddress parameter that was added a year ago.  After reviewing the 
code, we can't think of a reason why only the port would be used with the 
wildcard interface, when servers are already connecting specifically to that 
interface anyway.

I have submitted a patch, but it does not account for all leader election 
algorithms.

Probably should have an option to toggle this, for backwards compatibility, 
although it seems like it would be a bug if this change broke things.

There is some more information about making it an option here:
http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

Fix Version/s: 3.4.6

 Leader communication should listen on specified IP, not wildcard address
 

 Key: ZOOKEEPER-1096
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1096.patch, ZOOKEEPER-1096.patch


 Server should specify the local address that is used for leader communication 
 and leader election (and not use the default of listening on all interfaces). 
  This is similar to the clientPortAddress parameter that was added a year 
 ago.  After reviewing the code, we can't think of a reason why only the port 
 would be used with the wildcard interface, when servers are already 
 connecting specifically to that interface anyway.
 I have submitted a patch, but it does not account for all leader election 
 algorithms.
 Probably should have an option to toggle this, for backwards compatibility, 
 although it seems like it would be a bug if this change broke things.
 There is some more information about making it an option here:
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (ZOOKEEPER-1747) Zookeeper server fails to start if transaction log file is corrupted

2013-08-29 Thread Sergey Maslyakov (JIRA)
Sergey Maslyakov created ZOOKEEPER-1747:
---

 Summary: Zookeeper server fails to start if transaction log file 
is corrupted
 Key: ZOOKEEPER-1747
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1747
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
 Environment: Solaris10/x86, Java 1.6
Reporter: Sergey Maslyakov


On multiple occasions when ZK was not able to write out a transaction log or a 
snapshot file, the consequent attempt to restart the server fails. Usually it 
happens when the underlying file system filled up; thus, preventing ZK server 
from writing out consistent data file.

Upon start-up, the server reads in the snapshot and the transaction log. If the 
deserializer fails and throws an exception, server terminates. Please see the 
stack trace below.

Server not coming up for whatever reason is often an undesirable condition. It 
would be nice to have an option to force-ignore parsing errors, especially, in 
the transaction log. A check sum on the data could be a possible solution to 
ensure the integrity and parsability.

Another robustness enhancement could be via proper handling of the condition 
when snapshot or transaction log cannot be completely written to disk. 
Basically, better handling of write errors.


{noformat}
2013-08-28 12:05:30,732 ERROR [ZooKeeperServerMain] Unexpected exception, 
exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at 
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at 
org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:160)
at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at 
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250)
at 
org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:383)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:129)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1681) ZooKeeper 3.4.x can optionally use netty for nio but the pom does not declare the dep as optional

2013-08-29 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-1681:


Issue Type: Improvement  (was: Bug)

 ZooKeeper 3.4.x can optionally use netty for nio but the pom does not declare 
 the dep as optional
 -

 Key: ZOOKEEPER-1681
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1681
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.4.0, 3.4.1, 3.4.2, 3.4.4, 3.4.5
Reporter: John Sirois
 Fix For: 3.4.6


 For example in 
 [3.4.5|http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom]
  we see:
 {code}
 $ curl -sS 
 http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom
  | grep -B1 -A4 org.jboss.netty
 dependency
   groupIdorg.jboss.netty/groupId
   artifactIdnetty/artifactId
   version3.2.2.Final/version
   scopecompile/scope
 /dependency
 {code}
 As a consumer I can depend on zookeeper with an exclude for 
 org.jboss.netty#netty or I can let my transitive dep resolver pick a winner.  
 This might be fine, except for those who might be using a more modern netty 
 published under the newish io.netty groupId.  With this twist you get both 
 org.jboss.netty#netty;foo and io.netty#netty;bar on your classpath and 
 runtime errors ensue from incompatibilities. unless you add an exclude 
 against zookeeper (and clearly don't enable the zk netty nio handling.)
 I propose that this is a pom bug although this is debatable.  Clearly as 
 currently packaged zookeeper needs netty to compile, but I'd argue since it 
 does not need netty to run, either the scope should be provided or optional 
 or a zookeeper-netty lib should be broken out as an optional dependency and 
 this new dep published by zookeeper can have a proper compile dependency on 
 netty.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1681) ZooKeeper 3.4.x can optionally use netty for nio but the pom does not declare the dep as optional

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753649#comment-13753649
 ] 

Flavio Junqueira commented on ZOOKEEPER-1681:
-

This one looks more like an improvement, so I'll move it to 3.5.0 and will 
reclassify it. Please feel free to disagree!

 ZooKeeper 3.4.x can optionally use netty for nio but the pom does not declare 
 the dep as optional
 -

 Key: ZOOKEEPER-1681
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1681
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.0, 3.4.1, 3.4.2, 3.4.4, 3.4.5
Reporter: John Sirois
 Fix For: 3.4.6


 For example in 
 [3.4.5|http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom]
  we see:
 {code}
 $ curl -sS 
 http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom
  | grep -B1 -A4 org.jboss.netty
 dependency
   groupIdorg.jboss.netty/groupId
   artifactIdnetty/artifactId
   version3.2.2.Final/version
   scopecompile/scope
 /dependency
 {code}
 As a consumer I can depend on zookeeper with an exclude for 
 org.jboss.netty#netty or I can let my transitive dep resolver pick a winner.  
 This might be fine, except for those who might be using a more modern netty 
 published under the newish io.netty groupId.  With this twist you get both 
 org.jboss.netty#netty;foo and io.netty#netty;bar on your classpath and 
 runtime errors ensue from incompatibilities. unless you add an exclude 
 against zookeeper (and clearly don't enable the zk netty nio handling.)
 I propose that this is a pom bug although this is debatable.  Clearly as 
 currently packaged zookeeper needs netty to compile, but I'd argue since it 
 does not need netty to run, either the scope should be provided or optional 
 or a zookeeper-netty lib should be broken out as an optional dependency and 
 this new dep published by zookeeper can have a proper compile dependency on 
 netty.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1681) ZooKeeper 3.4.x can optionally use netty for nio but the pom does not declare the dep as optional

2013-08-29 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-1681:


Fix Version/s: (was: 3.4.6)
   3.5.0

 ZooKeeper 3.4.x can optionally use netty for nio but the pom does not declare 
 the dep as optional
 -

 Key: ZOOKEEPER-1681
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1681
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.4.0, 3.4.1, 3.4.2, 3.4.4, 3.4.5
Reporter: John Sirois
 Fix For: 3.5.0


 For example in 
 [3.4.5|http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom]
  we see:
 {code}
 $ curl -sS 
 http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom
  | grep -B1 -A4 org.jboss.netty
 dependency
   groupIdorg.jboss.netty/groupId
   artifactIdnetty/artifactId
   version3.2.2.Final/version
   scopecompile/scope
 /dependency
 {code}
 As a consumer I can depend on zookeeper with an exclude for 
 org.jboss.netty#netty or I can let my transitive dep resolver pick a winner.  
 This might be fine, except for those who might be using a more modern netty 
 published under the newish io.netty groupId.  With this twist you get both 
 org.jboss.netty#netty;foo and io.netty#netty;bar on your classpath and 
 runtime errors ensue from incompatibilities. unless you add an exclude 
 against zookeeper (and clearly don't enable the zk netty nio handling.)
 I propose that this is a pom bug although this is debatable.  Clearly as 
 currently packaged zookeeper needs netty to compile, but I'd argue since it 
 does not need netty to run, either the scope should be provided or optional 
 or a zookeeper-netty lib should be broken out as an optional dependency and 
 this new dep published by zookeeper can have a proper compile dependency on 
 netty.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (ZOOKEEPER-1548) Cluster fails election loop in new and interesting way

2013-08-29 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira resolved ZOOKEEPER-1548.
-

Resolution: Duplicate

 Cluster fails election loop in new and interesting way
 --

 Key: ZOOKEEPER-1548
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1548
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.3
Reporter: Alan Horn
 Fix For: 3.4.6

 Attachments: 1-follower, 2-follower, 3-leader


 Hi,
 We have a five node cluster, recently upgraded from 3.3.5 to 3.4.3. Was 
 running fine for a few weeks after the upgrade, then the following sequence 
 of events occurred :
 1. All servers stopped responding to 'ruok' at the same time
 2. Our local supervisor process restarted all of them at the same time 
 (yes, this is bad, we didn't expect it to fail this way :)
 3. The cluster would not serve requests after this. Appeared to be unable to 
 complete an election.
 We tried various things at this point, none of which worked :
 * Moved around the restart order of the nodes (e.g. 4 thru 0, instead of 0 
 thru 4)
 * Reduced number of running nodes from 5 - 3 to simplify the quorum, by only 
 starting up 0, 1  2, in one test, and  0, 2  4 in the other
 * Removed the *Epoch files from version-2/ snapshot directory
 * Put the same version2/snapshot.x file on each server in the cluster
 * Added the (same on all nodes) last txlog onto each cluster
 * Kept only the last snapshot plus txlog unique on each server
 * Moved leaderServes=no to leaderServes=yes
 * Removed all files and started up with empty data as a control. This worked, 
 but of course isn't terribly useful :)
 Finally, I brought the data up on a single node running in standalone and 
 this worked (yay!) So at this point we brought the single node back into 
 service and have kept the other four available to debug why the election is 
 failing.
 We downgraded the four nodes to 3.3.5, and then they completed the election 
 and started serving as expected.
 We did a rolling upgrade to 3.4.3, and everything was fine until we restarted 
 the leader, whereupon we encountered the same re-election loop as before.
 We're a bit out of ideas at this point, so I was hoping someone from this 
 list might have some useful input.
 Output from two followers and a leader during this condition are attached.
 Cheers,
 Al

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1548) Cluster fails election loop in new and interesting way

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753655#comment-13753655
 ] 

Flavio Junqueira commented on ZOOKEEPER-1548:
-

I'm resolving this one as a duplicate of ZOOKEEPER-1115. 

 Cluster fails election loop in new and interesting way
 --

 Key: ZOOKEEPER-1548
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1548
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.3
Reporter: Alan Horn
 Fix For: 3.4.6

 Attachments: 1-follower, 2-follower, 3-leader


 Hi,
 We have a five node cluster, recently upgraded from 3.3.5 to 3.4.3. Was 
 running fine for a few weeks after the upgrade, then the following sequence 
 of events occurred :
 1. All servers stopped responding to 'ruok' at the same time
 2. Our local supervisor process restarted all of them at the same time 
 (yes, this is bad, we didn't expect it to fail this way :)
 3. The cluster would not serve requests after this. Appeared to be unable to 
 complete an election.
 We tried various things at this point, none of which worked :
 * Moved around the restart order of the nodes (e.g. 4 thru 0, instead of 0 
 thru 4)
 * Reduced number of running nodes from 5 - 3 to simplify the quorum, by only 
 starting up 0, 1  2, in one test, and  0, 2  4 in the other
 * Removed the *Epoch files from version-2/ snapshot directory
 * Put the same version2/snapshot.x file on each server in the cluster
 * Added the (same on all nodes) last txlog onto each cluster
 * Kept only the last snapshot plus txlog unique on each server
 * Moved leaderServes=no to leaderServes=yes
 * Removed all files and started up with empty data as a control. This worked, 
 but of course isn't terribly useful :)
 Finally, I brought the data up on a single node running in standalone and 
 this worked (yay!) So at this point we brought the single node back into 
 service and have kept the other four available to debug why the election is 
 failing.
 We downgraded the four nodes to 3.3.5, and then they completed the election 
 and started serving as expected.
 We did a rolling upgrade to 3.4.3, and everything was fine until we restarted 
 the leader, whereupon we encountered the same re-election loop as before.
 We're a bit out of ideas at this point, so I was hoping someone from this 
 list might have some useful input.
 Output from two followers and a leader during this condition are attached.
 Cheers,
 Al

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (ZOOKEEPER-1448) Node+Quota creation in transaction log can crash leader startup

2013-08-29 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira reassigned ZOOKEEPER-1448:
---

Assignee: Flavio Junqueira  (was: Botond Hejj)

 Node+Quota creation in transaction log can crash leader startup
 ---

 Key: ZOOKEEPER-1448
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1448
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.5
Reporter: Botond Hejj
Assignee: Flavio Junqueira
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1448_branch3.3.patch, ZOOKEEPER-1448.patch, 
 ZOOKEEPER-1448.patch, ZOOKEEPER-1448.patch, ZOOKEEPER-1448.patch


 Hi,
 I've found a bug in zookeeper related to quota creation which can shutdown 
 zookeeper leader on startup.
 Steps to reproduce:
 1. create /quota_bug
 2. setquota -n 1 /quota_bug
 3. stop the whole ensemble (the previous operations should be in the 
 transaction log)
 4. start all the servers
 5. the elected leader will shutdown with an exception (Missing stat node for 
 count /zookeeper/quota/quota_bug/zookeeper_
 stats)
 I've debugged a bit what happening and I found the following problem:
 On startup each server loads the last snapshot and replays the last 
 transaction log. While doing this it fills up the pTrie variable of the 
 DataTree with the path of the nodes which have quota.
 After the leader is elected the leader servers loads the snapshot and last 
 transaction log but it doesn't clean up the pTrie variable. This means it 
 still contains the /quota_bug path. Now when the create /quota_bug is 
 processed from the transaction log the DataTree already thinks that the quota 
 nodes (/zookeeper/quota/quota_bug/zookeeper_limits and 
 /zookeeper/quota/quota_bug/zookeeper_stats) are created but those node 
 creation actually comes later in the transaction log. This leads to the 
 missing stat node exception.
 I think clearing the pTrie should solve this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1448) Node+Quota creation in transaction log can crash leader startup

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753665#comment-13753665
 ] 

Flavio Junqueira commented on ZOOKEEPER-1448:
-

If I can get a patch that applies to the 3.4 branch cleanly and removes the 
references to log4j, then I can include it in 3.4.6. I suppose this is also an 
issue for 3.5.0, no? 

 Node+Quota creation in transaction log can crash leader startup
 ---

 Key: ZOOKEEPER-1448
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1448
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.5
Reporter: Botond Hejj
Assignee: Flavio Junqueira
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1448_branch3.3.patch, ZOOKEEPER-1448.patch, 
 ZOOKEEPER-1448.patch, ZOOKEEPER-1448.patch, ZOOKEEPER-1448.patch


 Hi,
 I've found a bug in zookeeper related to quota creation which can shutdown 
 zookeeper leader on startup.
 Steps to reproduce:
 1. create /quota_bug
 2. setquota -n 1 /quota_bug
 3. stop the whole ensemble (the previous operations should be in the 
 transaction log)
 4. start all the servers
 5. the elected leader will shutdown with an exception (Missing stat node for 
 count /zookeeper/quota/quota_bug/zookeeper_
 stats)
 I've debugged a bit what happening and I found the following problem:
 On startup each server loads the last snapshot and replays the last 
 transaction log. While doing this it fills up the pTrie variable of the 
 DataTree with the path of the nodes which have quota.
 After the leader is elected the leader servers loads the snapshot and last 
 transaction log but it doesn't clean up the pTrie variable. This means it 
 still contains the /quota_bug path. Now when the create /quota_bug is 
 processed from the transaction log the DataTree already thinks that the quota 
 nodes (/zookeeper/quota/quota_bug/zookeeper_limits and 
 /zookeeper/quota/quota_bug/zookeeper_stats) are created but those node 
 creation actually comes later in the transaction log. This leads to the 
 missing stat node exception.
 I think clearing the pTrie should solve this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (ZOOKEEPER-1748) TCP keepalive for leader election connections

2013-08-29 Thread JIRA
Antal Sasvári created ZOOKEEPER-1748:


 Summary: TCP keepalive for leader election connections
 Key: ZOOKEEPER-1748
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
 Project: ZooKeeper
  Issue Type: Improvement
  Components: leaderElection
Affects Versions: 3.4.5, 3.5.0
 Environment: Linux, Java 1.7
Reporter: Antal Sasvári
Priority: Minor
 Fix For: 3.5.0, 3.4.6


In our system we encountered the following problem:

If the system is stable, and there is no leader election, the leader election 
port connections are open for very long time without any packets being sent on 
them.
Some network elements silently drop the established TCP connection after a 
timeout if there are no packets being sent on it. In this case the ZK servers 
will not notice the connection loss. This causes additional delay later when 
the next leader election is started, as the TCP connections are not alive any 
more.

We would like to be able to enable TCP keepalive on the leader election sockets 
in order to prevent the connection timeout in some network elements due to 
connection inactivity.
This could be controlled by adding a new config parameter called tcpKeepAlive 
in the ZooKeeper configuration file. It would be only applicable in case of 
algorithm 3 (TCP based fast leader election), having the default value false.

If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for 
the leader election sockets in QuorumCnxManager.setSockOpts() by calling 
sock.setKeepAlive(true).
We have tested this change successfully in our environment.

Please comment whether you see any problem with this. If not, I am going to 
submit a patch.

I've been told that e.g. Apache ActiveMQ also has a config option for similar 
purpose called transport.keepalive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address

2013-08-29 Thread Jared Cantwell (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753686#comment-13753686
 ] 

Jared Cantwell commented on ZOOKEEPER-1096:
---

3.5.0 solves some of this issue by correctly binding to the full address in 
QuorumCxnManager.java.  Other places still bind to wildcard though, and that 
obviously doesn't apply to 3.4.6 I don't think.

 Leader communication should listen on specified IP, not wildcard address
 

 Key: ZOOKEEPER-1096
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1096.patch, ZOOKEEPER-1096.patch


 Server should specify the local address that is used for leader communication 
 and leader election (and not use the default of listening on all interfaces). 
  This is similar to the clientPortAddress parameter that was added a year 
 ago.  After reviewing the code, we can't think of a reason why only the port 
 would be used with the wildcard interface, when servers are already 
 connecting specifically to that interface anyway.
 I have submitted a patch, but it does not account for all leader election 
 algorithms.
 Probably should have an option to toggle this, for backwards compatibility, 
 although it seems like it would be a bug if this change broke things.
 There is some more information about making it an option here:
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1657) Increased CPU usage by unnecessary SASL checks

2013-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753670#comment-13753670
 ] 

Hadoop QA commented on ZOOKEEPER-1657:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12598620/ZOOKEEPER-1657.patch
  against trunk revision 1516126.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1547//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1547//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1547//console

This message is automatically generated.

 Increased CPU usage by unnecessary SASL checks
 --

 Key: ZOOKEEPER-1657
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1657
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.5
Reporter: Gunnar Wagenknecht
Assignee: Eugene Koontz
  Labels: performance
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, 
 ZOOKEEPER-1657.patch, ZOOKEEPER-1657.patch, zookeeper-hotspot-gone.png, 
 zookeeper-hotspot.png


 I did some profiling in one of our Java environments and found an interesting 
 footprint in ZooKeeper. The SASL support seems to trigger a lot times on the 
 client although it's not even in use.
 Is there a switch to disable SASL completely?
 The attached screenshot shows a 10-minute profiling session on one of our 
 production Jetty servers. The Jetty server handles ~1k web requests per 
 minute. The average response time per web request is a few milli seconds. The 
 profiling was performed on a machine running for 24h. 
 We noticed a significant CPU increase on our servers when deploying an update 
 from ZooKeeper 3.3.2 to ZooKeeper 3.4.5. Thus, we started investigating. The 
 screenshot shows that only 32% CPU time are spent in Jetty. In contrast, 65% 
 are spend in ZooKeeper. 
 A few notes/thoughts:
 * {{ClientCnxn$SendThread.clientTunneledAuthenticationInProgress}} seems to 
 be the culprit
 * {{javax.security.auth.login.Configuration.getConfiguration}} seems to be 
 called very often?
 * There is quite a bit reflection involved in 
 {{java.security.AccessController.doPrivileged}}
 * No security manager is active in the JVM: I tend to place an if-check in 
 the code before calling {{AccessController.doPrivileged}}. When no SM is 
 installed, the runnable can be called directly which safes cycles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753816#comment-13753816
 ] 

Flavio Junqueira commented on ZOOKEEPER-1096:
-

[~jaredc], I'm not sure I get your comment. Are you saying that we don't need 
to consider this issue for 3.4.6?

 Leader communication should listen on specified IP, not wildcard address
 

 Key: ZOOKEEPER-1096
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1096.patch, ZOOKEEPER-1096.patch


 Server should specify the local address that is used for leader communication 
 and leader election (and not use the default of listening on all interfaces). 
  This is similar to the clientPortAddress parameter that was added a year 
 ago.  After reviewing the code, we can't think of a reason why only the port 
 would be used with the wildcard interface, when servers are already 
 connecting specifically to that interface anyway.
 I have submitted a patch, but it does not account for all leader election 
 algorithms.
 Probably should have an option to toggle this, for backwards compatibility, 
 although it seems like it would be a bug if this change broke things.
 There is some more information about making it an option here:
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address

2013-08-29 Thread Jared Cantwell (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753820#comment-13753820
 ] 

Jared Cantwell commented on ZOOKEEPER-1096:
---

I am simply saying that QuorumCxnManager in 3.5.0 has this issue resolved.

 Leader communication should listen on specified IP, not wildcard address
 

 Key: ZOOKEEPER-1096
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1096.patch, ZOOKEEPER-1096.patch


 Server should specify the local address that is used for leader communication 
 and leader election (and not use the default of listening on all interfaces). 
  This is similar to the clientPortAddress parameter that was added a year 
 ago.  After reviewing the code, we can't think of a reason why only the port 
 would be used with the wildcard interface, when servers are already 
 connecting specifically to that interface anyway.
 I have submitted a patch, but it does not account for all leader election 
 algorithms.
 Probably should have an option to toggle this, for backwards compatibility, 
 although it seems like it would be a bug if this change broke things.
 There is some more information about making it an option here:
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address

2013-08-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753827#comment-13753827
 ] 

Flavio Junqueira commented on ZOOKEEPER-1096:
-

[~jaredc], do you remember by any chance the jira that fixed it?

german, if it is fixed in QCM, then there is nothing else to fix for FLE, yes? 
We possibly need to port the QCM changes to 3.4.6, though. 

 Leader communication should listen on specified IP, not wildcard address
 

 Key: ZOOKEEPER-1096
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1096.patch, ZOOKEEPER-1096.patch


 Server should specify the local address that is used for leader communication 
 and leader election (and not use the default of listening on all interfaces). 
  This is similar to the clientPortAddress parameter that was added a year 
 ago.  After reviewing the code, we can't think of a reason why only the port 
 would be used with the wildcard interface, when servers are already 
 connecting specifically to that interface anyway.
 I have submitted a patch, but it does not account for all leader election 
 algorithms.
 Probably should have an option to toggle this, for backwards compatibility, 
 although it seems like it would be a bug if this change broke things.
 There is some more information about making it an option here:
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1096) Leader communication should listen on specified IP, not wildcard address

2013-08-29 Thread Jared Cantwell (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753879#comment-13753879
 ] 

Jared Cantwell commented on ZOOKEEPER-1096:
---

It was changed as part of ZOOKEEPER-1411:

http://svn.apache.org/viewvc?view=revisionrevision=1328991 
http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java?r1=1328991r2=1328990pathrev=1328991

 Leader communication should listen on specified IP, not wildcard address
 

 Key: ZOOKEEPER-1096
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1096
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.3, 3.4.0
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Fix For: 3.5.0, 3.4.6

 Attachments: ZOOKEEPER-1096.patch, ZOOKEEPER-1096.patch


 Server should specify the local address that is used for leader communication 
 and leader election (and not use the default of listening on all interfaces). 
  This is similar to the clientPortAddress parameter that was added a year 
 ago.  After reviewing the code, we can't think of a reason why only the port 
 would be used with the wildcard interface, when servers are already 
 connecting specifically to that interface anyway.
 I have submitted a patch, but it does not account for all leader election 
 algorithms.
 Probably should have an option to toggle this, for backwards compatibility, 
 although it seems like it would be a bug if this change broke things.
 There is some more information about making it an option here:
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=h2+7gnj_4p28hgcxjh345h...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira