ZooKeeper-trunk-solaris - Build # 1251 - Still Failing

2016-07-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/1251/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 27050 lines...]
[junit] 2016-07-31 07:37:04,680 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@77] - RUNNING TEST METHOD 
committedAndUncommittedOfTheSameSessionRaceTest
[junit] 2016-07-31 07:37:04,682 [myid:] - INFO  [main:CommitProcessor@299] 
- CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,708 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 12237
[junit] 2016-07-31 07:37:04,710 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 5
[junit] 2016-07-31 07:37:04,710 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
committedAndUncommittedOfTheSameSessionRaceTest
[junit] 2016-07-31 07:37:04,710 [myid:] - INFO  [main:CommitProcessor@414] 
- Shutting down
[junit] 2016-07-31 07:37:04,711 [myid:] - INFO  [main:ZKTestCase$1@65] - 
SUCCEEDED committedAndUncommittedOfTheSameSessionRaceTest
[junit] 2016-07-31 07:37:04,711 [myid:] - INFO  [main:ZKTestCase$1@60] - 
FINISHED committedAndUncommittedOfTheSameSessionRaceTest
[junit] 2016-07-31 07:37:04,734 [myid:] - INFO  [main:ZKTestCase$1@55] - 
STARTING noStarvationOfNonLocalCommittedRequestsTest
[junit] 2016-07-31 07:37:04,738 [myid:] - INFO  [Time-limited 
test:JUnit4ZKTestRunner$LoggedInvokeMethod@77] - RUNNING TEST METHOD 
noStarvationOfNonLocalCommittedRequestsTest
[junit] 2016-07-31 07:37:04,743 [myid:] - INFO  [Time-limited 
test:CommitProcessor@299] - CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,747 [myid:] - INFO  [Time-limited 
test:CommitProcessor@299] - CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,751 [myid:] - INFO  [Time-limited 
test:CommitProcessor@299] - CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,755 [myid:] - INFO  [Time-limited 
test:CommitProcessor@299] - CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,760 [myid:] - INFO  [Time-limited 
test:CommitProcessor@299] - CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,764 [myid:] - INFO  [Time-limited 
test:CommitProcessor@299] - CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,768 [myid:] - INFO  [Time-limited 
test:CommitProcessor@299] - CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,772 [myid:] - INFO  [Time-limited 
test:CommitProcessor@299] - CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,776 [myid:] - INFO  [Time-limited 
test:CommitProcessor@299] - CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,780 [myid:] - INFO  [Time-limited 
test:CommitProcessor@299] - CommitProcessor exited loop!
[junit] 2016-07-31 07:37:04,781 [myid:] - INFO  [Time-limited 
test:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 14285
[junit] 2016-07-31 07:37:04,781 [myid:] - INFO  [Time-limited 
test:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 165
[junit] 2016-07-31 07:37:04,781 [myid:] - INFO  [Time-limited 
test:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
noStarvationOfNonLocalCommittedRequestsTest
[junit] 2016-07-31 07:37:04,781 [myid:] - INFO  [main:CommitProcessor@414] 
- Shutting down
[junit] 2016-07-31 07:37:04,810 [myid:] - INFO  [main:ZKTestCase$1@65] - 
SUCCEEDED noStarvationOfNonLocalCommittedRequestsTest
[junit] 2016-07-31 07:37:04,811 [myid:] - INFO  [main:ZKTestCase$1@60] - 
FINISHED noStarvationOfNonLocalCommittedRequestsTest
[junit] 2016-07-31 07:37:04,812 [myid:] - INFO  [main:ZKTestCase$1@55] - 
STARTING processAsMuchUncommittedRequestsAsPossibleTest
[junit] 2016-07-31 07:37:04,812 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@77] - RUNNING TEST METHOD 
processAsMuchUncommittedRequestsAsPossibleTest
[junit] 2016-07-31 07:37:04,817 [myid:] - INFO  [main:CommitProcessor@299] 
- CommitProcessor exited loop!
[junit] 2016-07-31 07:37:05,821 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 5687
[junit] 2016-07-31 07:37:05,821 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 153
[junit] 2016-07-31 07:37:05,821 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
processAsMuchUncommittedRequestsAsPossibleTest
[junit] 2016-07-31 07:37:05,822 [myid:] - INFO  [main:CommitProcessor@414] 
- Shutting down
[junit] 2016-07-31 07:37:05,822 [myid:] - INFO  [main:ZKTestCase$1@65] - 
SUCCEEDED processAsMuchUncommittedRequestsAsPossibleTest
[junit] 2016-07-31 07:37:05,822 [myid:] - INFO  [main:ZKTestCase$1@60] - 
FINISHED processAsMuchUncommittedRequestsAsPossibleTest
[junit] 2016-07-31 07:37:05,823 [myid:] - INFO  [main:ZKTestCase$1@55] - 
STARTING processAllFollo

ZooKeeper-trunk-jdk8 - Build # 689 - Failure

2016-07-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-jdk8/689/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 453878 lines...]
[junit] 2016-07-31 13:00:00,393 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2016-07-31 13:00:00,395 [myid:] - INFO  [main:ClientBase@466] - 
STARTING server
[junit] 2016-07-31 13:00:00,395 [myid:] - INFO  [main:ClientBase@386] - 
CREATING server instance 127.0.0.1:11222
[junit] 2016-07-31 13:00:00,395 [myid:] - INFO  
[main:NIOServerCnxnFactory@673] - Configuring NIO connection handler with 10s 
sessionless connection timeout, 3 selector thread(s), 48 worker threads, and 64 
kB direct buffers.
[junit] 2016-07-31 13:00:00,396 [myid:] - INFO  
[main:NIOServerCnxnFactory@686] - binding to port 0.0.0.0/0.0.0.0:11222
[junit] 2016-07-31 13:00:00,397 [myid:] - INFO  [main:ClientBase@361] - 
STARTING server instance 127.0.0.1:11222
[junit] 2016-07-31 13:00:00,397 [myid:] - INFO  [main:ZooKeeperServer@858] 
- minSessionTimeout set to 6000
[junit] 2016-07-31 13:00:00,397 [myid:] - INFO  [main:ZooKeeperServer@867] 
- maxSessionTimeout set to 6
[junit] 2016-07-31 13:00:00,398 [myid:] - INFO  [main:ZooKeeperServer@156] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/x1/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/build/test/tmp/test8683382206792705829.junit.dir/version-2
 snapdir 
/x1/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/build/test/tmp/test8683382206792705829.junit.dir/version-2
[junit] 2016-07-31 13:00:00,399 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
/x1/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/build/test/tmp/test8683382206792705829.junit.dir/version-2/snapshot.b
[junit] 2016-07-31 13:00:00,403 [myid:] - INFO  [main:FileTxnSnapLog@298] - 
Snapshotting: 0xb to 
/x1/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/build/test/tmp/test8683382206792705829.junit.dir/version-2/snapshot.b
[junit] 2016-07-31 13:00:00,423 [myid:] - INFO  
[main:FourLetterWordMain@85] - connecting to 127.0.0.1 11222
[junit] 2016-07-31 13:00:00,424 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11222:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:44992
[junit] 2016-07-31 13:00:00,425 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@485] - Processing stat command from 
/127.0.0.1:44992
[junit] 2016-07-31 13:00:00,425 [myid:] - INFO  
[NIOWorkerThread-1:StatCommand@49] - Stat command output
[junit] 2016-07-31 13:00:00,425 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@607] - Closed socket connection for client 
/127.0.0.1:44992 (no session established for client)
[junit] 2016-07-31 13:00:00,425 [myid:] - INFO  [main:JMXEnv@228] - 
ensureParent:[InMemoryDataTree, StandaloneServer_port]
[junit] 2016-07-31 13:00:00,427 [myid:] - INFO  [main:JMXEnv@245] - 
expect:InMemoryDataTree
[junit] 2016-07-31 13:00:00,428 [myid:] - INFO  [main:JMXEnv@249] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port11222,name1=InMemoryDataTree
[junit] 2016-07-31 13:00:00,428 [myid:] - INFO  [main:JMXEnv@245] - 
expect:StandaloneServer_port
[junit] 2016-07-31 13:00:00,428 [myid:] - INFO  [main:JMXEnv@249] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port11222
[junit] 2016-07-31 13:00:00,429 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 5800
[junit] 2016-07-31 13:00:00,429 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 25
[junit] 2016-07-31 13:00:00,429 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testQuota
[junit] 2016-07-31 13:00:00,429 [myid:] - INFO  [main:ClientBase@543] - 
tearDown starting
[junit] 2016-07-31 13:00:00,471 [myid:] - INFO  [main:ZooKeeper@1313] - 
Session: 0x100b008c946 closed
[junit] 2016-07-31 13:00:00,471 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for 
session: 0x100b008c946
[junit] 2016-07-31 13:00:00,471 [myid:] - INFO  [main:ClientBase@513] - 
STOPPING server
[junit] 2016-07-31 13:00:00,471 [myid:] - INFO  
[ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - 
ConnnectionExpirerThread interrupted
[junit] 2016-07-31 13:00:00,471 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2016-07-31 13:00:00,471 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-2:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2016-07-31 13:00:00,471 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-0:NIOSer

ZooKeeper_branch35_jdk8 - Build # 173 - Still Failing

2016-07-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/173/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 447779 lines...]
[junit] 2016-07-31 13:12:41,136 [myid:] - WARN  [New I/O worker 
#6635:NettyServerCnxnFactory$CnxnChannelHandler@142] - Exception caught [id: 
0xdafcb055, /127.0.0.1:59251 :> /127.0.0.1:30317] EXCEPTION: 
java.nio.channels.ClosedChannelException
[junit] java.nio.channels.ClosedChannelException
[junit] at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
[junit] at 
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
[junit] at 
org.jboss.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:151)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:315)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
[junit] at 
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2016-07-31 13:12:41,138 [myid:] - INFO  
[SyncThread:0:MBeanRegistry@128] - Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port30317,name1=Connections,name2=127.0.0.1,name3=0x1045071ffca]
[junit] 2016-07-31 13:12:41,237 [myid:] - INFO  [main:ZooKeeper@1313] - 
Session: 0x1045071ffca closed
[junit] 2016-07-31 13:12:41,237 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for 
session: 0x1045071ffca
[junit] 2016-07-31 13:12:41,237 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 140096
[junit] 2016-07-31 13:12:41,238 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 1640
[junit] 2016-07-31 13:12:41,238 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testWatcherAutoResetWithLocal
[junit] 2016-07-31 13:12:41,238 [myid:] - INFO  [main:ClientBase@543] - 
tearDown starting
[junit] 2016-07-31 13:12:41,238 [myid:] - INFO  [main:ClientBase@513] - 
STOPPING server
[junit] 2016-07-31 13:12:41,238 [myid:] - INFO  
[main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:30317
[junit] 2016-07-31 13:12:41,242 [myid:] - INFO  [main:ZooKeeperServer@498] 
- shutting down
[junit] 2016-07-31 13:12:41,243 [myid:] - INFO  
[main:SessionTrackerImpl@232] - Shutting down
[junit] 2016-07-31 13:12:41,243 [myid:] - INFO  
[main:PrepRequestProcessor@965] - Shutting down
[junit] 2016-07-31 13:12:41,243 [myid:] - INFO  
[main:SyncRequestProcessor@191] - Shutting down
[junit] 2016-07-31 13:12:41,243 [myid:] - INFO  [ProcessThread(sid:0 
cport:30317)::PrepRequestProcessor@154] - PrepRequestProcessor exited loop!
[junit] 2016-07-31 13:12:41,243 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited!
[junit] 2016-07-31 13:12:41,243 [myid:] - INFO  
[main:FinalRequestProcessor@479] - shutdown of request processor complete
[junit] 2016-07-31 13:12:41,244 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port30317,name1=InMemoryDataTree]
[junit] 2016-07-31 13:12:41,244 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port30317]
[junit] 2016-07-31 13:12:41,244 [myid:] - INFO  
[main:FourLetterWordMain@85] - connecting to 127.0.0.1 30317
[junit] 2016-07-31 13:12:41,245 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2016-07-31 13:12:41,249 [myid:] - INFO  [main:ClientBase@568] - 
fdcount after test is: 4815 at start it was 4815
[junit] 2016-07-31 13:12:41,249 [myid:] - INFO  [main:ZKTestCase$1@65] - 
SUCCEEDED testWatcherAutoResetWithLocal
[junit] 2016-07-31 13:12:41,249 [myid:] - INFO  [main:ZKTestCase$1@60] - 
FINISHED te

ZooKeeper_branch34_solaris - Build # 1235 - Failure

2016-07-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_solaris/1235/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 176204 lines...]
[junit] 2016-07-31 13:54:25,138 [myid:] - INFO  [main:JMXEnv@246] - 
expect:StandaloneServer_port
[junit] 2016-07-31 13:54:25,138 [myid:] - INFO  [main:JMXEnv@250] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port11221
[junit] 2016-07-31 13:54:25,139 [myid:] - INFO  [main:ClientBase@490] - 
STOPPING server
[junit] 2016-07-31 13:54:25,140 [myid:] - INFO  [main:ZooKeeperServer@469] 
- shutting down
[junit] 2016-07-31 13:54:25,140 [myid:] - INFO  
[main:SessionTrackerImpl@225] - Shutting down
[junit] 2016-07-31 13:54:25,140 [myid:] - INFO  
[main:PrepRequestProcessor@765] - Shutting down
[junit] 2016-07-31 13:54:25,140 [myid:] - INFO  
[main:SyncRequestProcessor@209] - Shutting down
[junit] 2016-07-31 13:54:25,140 [myid:] - INFO  [ProcessThread(sid:0 
cport:11221)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
[junit] 2016-07-31 13:54:25,140 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited!
[junit] 2016-07-31 13:54:25,140 [myid:] - INFO  
[main:FinalRequestProcessor@402] - shutdown of request processor complete
[junit] 2016-07-31 13:54:25,141 [myid:] - INFO  
[main:FourLetterWordMain@62] - connecting to 127.0.0.1 11221
[junit] 2016-07-31 13:54:25,141 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2016-07-31 13:54:25,142 [myid:] - INFO  [main:ClientBase@443] - 
STARTING server
[junit] 2016-07-31 13:54:25,142 [myid:] - INFO  [main:ClientBase@364] - 
CREATING server instance 127.0.0.1:11221
[junit] 2016-07-31 13:54:25,143 [myid:] - INFO  
[main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2016-07-31 13:54:25,143 [myid:] - INFO  [main:ClientBase@339] - 
STARTING server instance 127.0.0.1:11221
[junit] 2016-07-31 13:54:25,143 [myid:] - INFO  [main:ZooKeeperServer@170] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch34_solaris/trunk/build/test/tmp/test1678098410937737466.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch34_solaris/trunk/build/test/tmp/test1678098410937737466.junit.dir/version-2
[junit] 2016-07-31 13:54:25,146 [myid:] - INFO  
[main:FourLetterWordMain@62] - connecting to 127.0.0.1 11221
[junit] 2016-07-31 13:54:25,146 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@192] - 
Accepted socket connection from /127.0.0.1:38433
[junit] 2016-07-31 13:54:25,146 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@827] - Processing 
stat command from /127.0.0.1:38433
[junit] 2016-07-31 13:54:25,147 [myid:] - INFO  
[Thread-5:NIOServerCnxn$StatCommand@663] - Stat command output
[junit] 2016-07-31 13:54:25,147 [myid:] - INFO  
[Thread-5:NIOServerCnxn@1008] - Closed socket connection for client 
/127.0.0.1:38433 (no session established for client)
[junit] 2016-07-31 13:54:25,148 [myid:] - INFO  [main:JMXEnv@229] - 
ensureParent:[InMemoryDataTree, StandaloneServer_port]
[junit] 2016-07-31 13:54:25,149 [myid:] - INFO  [main:JMXEnv@246] - 
expect:InMemoryDataTree
[junit] 2016-07-31 13:54:25,149 [myid:] - INFO  [main:JMXEnv@250] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port11221,name1=InMemoryDataTree
[junit] 2016-07-31 13:54:25,149 [myid:] - INFO  [main:JMXEnv@246] - 
expect:StandaloneServer_port
[junit] 2016-07-31 13:54:25,149 [myid:] - INFO  [main:JMXEnv@250] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port11221
[junit] 2016-07-31 13:54:25,149 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@58] - Memory used 9197
[junit] 2016-07-31 13:54:25,150 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@63] - Number of threads 20
[junit] 2016-07-31 13:54:25,150 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - FINISHED TEST METHOD testQuota
[junit] 2016-07-31 13:54:25,150 [myid:] - INFO  [main:ClientBase@520] - 
tearDown starting
[junit] 2016-07-31 13:54:26,131 [myid:] - INFO  [main:ZooKeeper@684] - 
Session: 0x156413bf0c7 closed
[junit] 2016-07-31 13:54:26,131 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@519] - EventThread shut down for 
session: 0x156413bf0c7
[junit] 2016-07-31 13:54:26,131 [myid:] - INFO  [main:ClientBase@490] - 
STOPPING server
[junit] 2016-07-31 13:54:26,133 [myid:] - INFO  [main:ZooKeeperServer@469] 
- shutting down
[junit] 2016-07-31 13:54:26,133 [myid:] - INFO  
[main:SessionTrackerImpl@225] - 

[jira] [Commented] (ZOOKEEPER-2479) Add 'fleTimeTaken' value in LeaderMXBean and FollowerMXBean

2016-07-31 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401240#comment-15401240
 ] 

Flavio Junqueira commented on ZOOKEEPER-2479:
-

Thanks, [~rakesh_r]. Why do we need to make {{fleTimeTaken = -1}} while 
election is ongoing? Isn't it better if we server the value of the last 
election and change when the election completes? It sounds better that serving 
a special, and I see no reason why that special value would be useful.

> Add 'fleTimeTaken' value in LeaderMXBean and FollowerMXBean
> ---
>
> Key: ZOOKEEPER-2479
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2479
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.4.9, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2479.patch, ZOOKEEPER-2479.patch, 
> ZOOKEEPER-2479.patch
>
>
> The idea of this jira is to expose {{time taken}} for the leader election via 
> jmx Leader, Follower beans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


ZooKeeper_branch35_solaris - Build # 192 - Still Failing

2016-07-31 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_solaris/192/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 423570 lines...]
[junit] 2016-07-31 17:24:22,606 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2016-07-31 17:24:22,607 [myid:] - INFO  [main:ClientBase@466] - 
STARTING server
[junit] 2016-07-31 17:24:22,607 [myid:] - INFO  [main:ClientBase@386] - 
CREATING server instance 127.0.0.1:11222
[junit] 2016-07-31 17:24:22,607 [myid:] - INFO  
[main:NIOServerCnxnFactory@673] - Configuring NIO connection handler with 10s 
sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 
kB direct buffers.
[junit] 2016-07-31 17:24:22,608 [myid:] - INFO  
[main:NIOServerCnxnFactory@686] - binding to port 0.0.0.0/0.0.0.0:11222
[junit] 2016-07-31 17:24:22,608 [myid:] - INFO  [main:ClientBase@361] - 
STARTING server instance 127.0.0.1:11222
[junit] 2016-07-31 17:24:22,609 [myid:] - INFO  [main:ZooKeeperServer@858] 
- minSessionTimeout set to 6000
[junit] 2016-07-31 17:24:22,609 [myid:] - INFO  [main:ZooKeeperServer@867] 
- maxSessionTimeout set to 6
[junit] 2016-07-31 17:24:22,609 [myid:] - INFO  [main:ZooKeeperServer@156] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch35_solaris/branch-3.5/build/test/tmp/test2110867843338043533.junit.dir/version-2
 snapdir 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch35_solaris/branch-3.5/build/test/tmp/test2110867843338043533.junit.dir/version-2
[junit] 2016-07-31 17:24:22,610 [myid:] - INFO  [main:FileSnap@83] - 
Reading snapshot 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch35_solaris/branch-3.5/build/test/tmp/test2110867843338043533.junit.dir/version-2/snapshot.b
[junit] 2016-07-31 17:24:22,612 [myid:] - INFO  [main:FileTxnSnapLog@298] - 
Snapshotting: 0xb to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch35_solaris/branch-3.5/build/test/tmp/test2110867843338043533.junit.dir/version-2/snapshot.b
[junit] 2016-07-31 17:24:22,613 [myid:] - INFO  
[main:FourLetterWordMain@85] - connecting to 127.0.0.1 11222
[junit] 2016-07-31 17:24:22,613 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11222:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:55461
[junit] 2016-07-31 17:24:22,614 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@485] - Processing stat command from 
/127.0.0.1:55461
[junit] 2016-07-31 17:24:22,614 [myid:] - INFO  
[NIOWorkerThread-1:StatCommand@49] - Stat command output
[junit] 2016-07-31 17:24:22,615 [myid:] - INFO  
[NIOWorkerThread-1:NIOServerCnxn@607] - Closed socket connection for client 
/127.0.0.1:55461 (no session established for client)
[junit] 2016-07-31 17:24:22,615 [myid:] - INFO  [main:JMXEnv@228] - 
ensureParent:[InMemoryDataTree, StandaloneServer_port]
[junit] 2016-07-31 17:24:22,616 [myid:] - INFO  [main:JMXEnv@245] - 
expect:InMemoryDataTree
[junit] 2016-07-31 17:24:22,616 [myid:] - INFO  [main:JMXEnv@249] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port11222,name1=InMemoryDataTree
[junit] 2016-07-31 17:24:22,616 [myid:] - INFO  [main:JMXEnv@245] - 
expect:StandaloneServer_port
[junit] 2016-07-31 17:24:22,616 [myid:] - INFO  [main:JMXEnv@249] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port11222
[junit] 2016-07-31 17:24:22,617 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 17843
[junit] 2016-07-31 17:24:22,617 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 24
[junit] 2016-07-31 17:24:22,617 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testQuota
[junit] 2016-07-31 17:24:22,617 [myid:] - INFO  [main:ClientBase@543] - 
tearDown starting
[junit] 2016-07-31 17:24:22,692 [myid:] - INFO  [main:ZooKeeper@1313] - 
Session: 0x122bef83ed4 closed
[junit] 2016-07-31 17:24:22,692 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for 
session: 0x122bef83ed4
[junit] 2016-07-31 17:24:22,692 [myid:] - INFO  [main:ClientBase@513] - 
STOPPING server
[junit] 2016-07-31 17:24:22,692 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11222:NIOServerCnxnFactory$AcceptThread@219]
 - accept thread exitted run method
[junit] 2016-07-31 17:24:22,693 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2016-07-31 17:24:22,693 [myid:] - INFO  
[ConnnectionExpirer:NIOServerCnxnF

[jira] [Commented] (ZOOKEEPER-2466) Client skips servers when trying to connect

2016-07-31 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401261#comment-15401261
 ] 

Flavio Junqueira commented on ZOOKEEPER-2466:
-

[~hanm] from your latest message, I understand that this patch still needs some 
more work wrt:
# Test case
# Java client

I'm canceling the patch for now.

> Client skips servers when trying to connect
> ---
>
> Key: ZOOKEEPER-2466
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2466
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: Flavio Junqueira
>Assignee: Michael Han
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2466.patch
>
>
> I've been looking at {{Zookeeper_simpleSystem::testFirstServerDown}} and I 
> observed the following behavior. The list of servers to connect contains two 
> servers, let's call them S1 and S2. The client never connects, but the odd 
> bit is the sequence of servers that the client tries to connect to:
> {noformat}
> S1
> S2
> S1
> S1
> S1
> 
> {noformat}
> It intrigued me that S2 is only tried once and never again. Checking the 
> code, here is what happens. Initially, {{zh->reconfig}} is 1, so in 
> {{zoo_cycle_next_server}} we return an address from 
> {{get_next_server_in_reconfig}}, which is taken from {{zh->addrs_new}} in 
> this test case. The attempt to connect fails, and {{handle_error}} is invoked 
> in the error handling path. {{handle_error}} actually invokes 
> {{addrvec_next}} which changes the address pointer to the next server on the 
> list.
> After two attempts, it decides that it has tried all servers in 
> {{zoo_cycle_next_server}} and sets {{zh->reconfig}} to zero. Once 
> {{zh->reconfig == 0}}, we have that each call to {{zoo_cycle_next_server}} 
> moves the address pointer to the next server in {{zh->addrs}}. But, given 
> that {{handle_error}} also moves the pointer to the next server, we end up 
> moving the pointer ahead twice upon every failed attempt to connect, which is 
> wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2466) Client skips servers when trying to connect

2016-07-31 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401274#comment-15401274
 ] 

Michael Han commented on ZOOKEEPER-2466:


Yes those are on my list - I'll submit a new one next week. 

> Client skips servers when trying to connect
> ---
>
> Key: ZOOKEEPER-2466
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2466
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Reporter: Flavio Junqueira
>Assignee: Michael Han
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2466.patch
>
>
> I've been looking at {{Zookeeper_simpleSystem::testFirstServerDown}} and I 
> observed the following behavior. The list of servers to connect contains two 
> servers, let's call them S1 and S2. The client never connects, but the odd 
> bit is the sequence of servers that the client tries to connect to:
> {noformat}
> S1
> S2
> S1
> S1
> S1
> 
> {noformat}
> It intrigued me that S2 is only tried once and never again. Checking the 
> code, here is what happens. Initially, {{zh->reconfig}} is 1, so in 
> {{zoo_cycle_next_server}} we return an address from 
> {{get_next_server_in_reconfig}}, which is taken from {{zh->addrs_new}} in 
> this test case. The attempt to connect fails, and {{handle_error}} is invoked 
> in the error handling path. {{handle_error}} actually invokes 
> {{addrvec_next}} which changes the address pointer to the next server on the 
> list.
> After two attempts, it decides that it has tried all servers in 
> {{zoo_cycle_next_server}} and sets {{zh->reconfig}} to zero. Once 
> {{zh->reconfig == 0}}, we have that each call to {{zoo_cycle_next_server}} 
> moves the address pointer to the next server in {{zh->addrs}}. But, given 
> that {{handle_error}} also moves the pointer to the next server, we end up 
> moving the pointer ahead twice upon every failed attempt to connect, which is 
> wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2454) Limit Connection Count based on User

2016-07-31 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401280#comment-15401280
 ] 

Flavio Junqueira commented on ZOOKEEPER-2454:
-

[~arshad.mohammad] [~botond.hejj] [~eribeiro] thanks everyone for patch and 
reviews. As I'm reading there are two main concerns:

# Absence of support for Netty
# Definition of user (or use id directly?)

what's the current plan? My take is that we need netty support because going 
forward folks will be using more the netty option because of, for example, ssl 
support. I also think that we need a crisp story around users to avoid problems 
with future auth providers. It affects the semantics that the service exposes, 
so we need to be extra careful.

It is a great feature, though. We should get it in once we sort out these 
issues. 

> Limit Connection Count based on User
> 
>
> Key: ZOOKEEPER-2454
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2454
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: server
>Reporter: Botond Hejj
>Assignee: Botond Hejj
>Priority: Minor
> Attachments: ZOOKEEPER-2454-br-3-4.patch, ZOOKEEPER-2454.patch, 
> ZOOKEEPER-2454.patch
>
>
> ZooKeeper currently can limit connection count from clients coming from the 
> same ip. It is a great feature to malfunctioning clients DOS-ing the server 
> with many requests.
> I propose additional safegurads for ZooKeeper. 
> It would be great if optionally connection count could be limited for a 
> specific user or a specific user on an ip.
> This is great in cases where ZooKeeper ensemble is shared by multiple users 
> and these users share the same client ips. This can be common in container 
> based cloud deployment where external ip of multiple clients can be the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2495) Cluster unavailable on disk full(ENOSPC), disk quota(EDQUOT), disk write error(EIO) errors

2016-07-31 Thread Ramnatthan Alagappan (JIRA)
Ramnatthan Alagappan created ZOOKEEPER-2495:
---

 Summary: Cluster unavailable on disk full(ENOSPC), disk 
quota(EDQUOT), disk write error(EIO) errors
 Key: ZOOKEEPER-2495
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2495
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.4.8
 Environment: Normal ZooKeeper cluster with 3 Linux nodes.
Reporter: Ramnatthan Alagappan


ZooKeeper cluster completely stalls with *no* transactions making progress when 
a storage related error (such as *ENOSPC, EDQUOT, EIO*) is encountered by the 
current *leader*. 

Surprisingly, the same errors in some circumstances cause the node to 
completely crash and therefore allowing other nodes in the cluster to become 
the leader and make progress with transactions. Interestingly, the same errors 
if encountered while initializing a new log file causes the current leader to 
go to weird state (but does not crash) where it thinks it is the leader (and so 
does not allow others to become the leader). *This causes the entire cluster to 
freeze. *

Here is the stacktrace of the leader:



2016-07-11 15:42:27,502 [myid:3] - INFO  [SyncThread:3:FileTxnLog@199] - 
Creating new log file: log.20001
2016-07-11 15:42:27,505 [myid:3] - ERROR 
[SyncThread:3:ZooKeeperCriticalThread@49] - Severe unrecoverable error, from 
thread : SyncThread:3
java.io.IOException: Disk quota exceeded
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at 
org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:211)
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314)
at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:476)
at 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140)



>From the trace and the code, it looks like the problem happens only when a new 
>log file is initialized and only when there are errors in two cases:

1. Error during the append of *log header*.
2. Error during *padding zero bytes to the end of the log*.
 
If similar errors happen when writing some other blocks of data, then the node 
just completely crashes allowing others to be elected as a new leader. These 
two blocks of the newly created log file are special as they take a different 
error recovery code path -- the node does not completely crash but rather 
certain threads are killed but supposedly the quorum holding thread stays up 
thereby preventing others to become the new leader.  This causes the other 
nodes to think that there is no problem with the leader but the cluster just 
becomes unavailable for any subsequent operations such as read/write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2495) Cluster unavailable on disk full(ENOSPC), disk quota(EDQUOT), disk write error(EIO) errors

2016-07-31 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401406#comment-15401406
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2495:
---

[~ramanala]: out of curiosity, using what filesystem did this happen with? 

> Cluster unavailable on disk full(ENOSPC), disk quota(EDQUOT), disk write 
> error(EIO) errors
> --
>
> Key: ZOOKEEPER-2495
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2495
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, server
>Affects Versions: 3.4.8
> Environment: Normal ZooKeeper cluster with 3 Linux nodes.
>Reporter: Ramnatthan Alagappan
>
> ZooKeeper cluster completely stalls with *no* transactions making progress 
> when a storage related error (such as *ENOSPC, EDQUOT, EIO*) is encountered 
> by the current *leader*. 
> Surprisingly, the same errors in some circumstances cause the node to 
> completely crash and therefore allowing other nodes in the cluster to become 
> the leader and make progress with transactions. Interestingly, the same 
> errors if encountered while initializing a new log file causes the current 
> leader to go to weird state (but does not crash) where it thinks it is the 
> leader (and so does not allow others to become the leader). *This causes the 
> entire cluster to freeze. *
> Here is the stacktrace of the leader:
> 
> 2016-07-11 15:42:27,502 [myid:3] - INFO  [SyncThread:3:FileTxnLog@199] - 
> Creating new log file: log.20001
> 2016-07-11 15:42:27,505 [myid:3] - ERROR 
> [SyncThread:3:ZooKeeperCriticalThread@49] - Severe unrecoverable error, from 
> thread : SyncThread:3
> java.io.IOException: Disk quota exceeded
>   at java.io.FileOutputStream.writeBytes(Native Method)
>   at java.io.FileOutputStream.write(FileOutputStream.java:345)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:211)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314)
>   at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:476)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140)
> 
> From the trace and the code, it looks like the problem happens only when a 
> new log file is initialized and only when there are errors in two cases:
> 1. Error during the append of *log header*.
> 2. Error during *padding zero bytes to the end of the log*.
>  
> If similar errors happen when writing some other blocks of data, then the 
> node just completely crashes allowing others to be elected as a new leader. 
> These two blocks of the newly created log file are special as they take a 
> different error recovery code path -- the node does not completely crash but 
> rather certain threads are killed but supposedly the quorum holding thread 
> stays up thereby preventing others to become the new leader.  This causes the 
> other nodes to think that there is no problem with the leader but the cluster 
> just becomes unavailable for any subsequent operations such as read/write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2495) Cluster unavailable on disk full(ENOSPC), disk quota(EDQUOT), disk write error(EIO) errors

2016-07-31 Thread Ramnatthan Alagappan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401423#comment-15401423
 ] 

Ramnatthan Alagappan commented on ZOOKEEPER-2495:
-

[~rgs] : I did not reproduce this issue on any particular file system. I have a 
small testing tool that drives applications into such corner cases by 
simulating possible error conditions. In reality, ENOSPC and EDQUOT can be 
encountered in all modern file systems such as ext4. EIO can be thrown to 
applications if the file system is mounted in a synchronous mode. 


> Cluster unavailable on disk full(ENOSPC), disk quota(EDQUOT), disk write 
> error(EIO) errors
> --
>
> Key: ZOOKEEPER-2495
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2495
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, server
>Affects Versions: 3.4.8
> Environment: Normal ZooKeeper cluster with 3 Linux nodes.
>Reporter: Ramnatthan Alagappan
>
> ZooKeeper cluster completely stalls with *no* transactions making progress 
> when a storage related error (such as *ENOSPC, EDQUOT, EIO*) is encountered 
> by the current *leader*. 
> Surprisingly, the same errors in some circumstances cause the node to 
> completely crash and therefore allowing other nodes in the cluster to become 
> the leader and make progress with transactions. Interestingly, the same 
> errors if encountered while initializing a new log file causes the current 
> leader to go to weird state (but does not crash) where it thinks it is the 
> leader (and so does not allow others to become the leader). *This causes the 
> entire cluster to freeze. *
> Here is the stacktrace of the leader:
> 
> 2016-07-11 15:42:27,502 [myid:3] - INFO  [SyncThread:3:FileTxnLog@199] - 
> Creating new log file: log.20001
> 2016-07-11 15:42:27,505 [myid:3] - ERROR 
> [SyncThread:3:ZooKeeperCriticalThread@49] - Severe unrecoverable error, from 
> thread : SyncThread:3
> java.io.IOException: Disk quota exceeded
>   at java.io.FileOutputStream.writeBytes(Native Method)
>   at java.io.FileOutputStream.write(FileOutputStream.java:345)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:211)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314)
>   at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:476)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140)
> 
> From the trace and the code, it looks like the problem happens only when a 
> new log file is initialized and only when there are errors in two cases:
> 1. Error during the append of *log header*.
> 2. Error during *padding zero bytes to the end of the log*.
>  
> If similar errors happen when writing some other blocks of data, then the 
> node just completely crashes allowing others to be elected as a new leader. 
> These two blocks of the newly created log file are special as they take a 
> different error recovery code path -- the node does not completely crash but 
> rather certain threads are killed but supposedly the quorum holding thread 
> stays up thereby preventing others to become the new leader.  This causes the 
> other nodes to think that there is no problem with the leader but the cluster 
> just becomes unavailable for any subsequent operations such as read/write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2016-07-31 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401425#comment-15401425
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2080:
---

[~hanm]: thanks for tracking this down and for the patch! A few questions/asks, 
looking at the code:

{code}
Election election = null;
synchronized(self) {
try {
rqv = self.configFromString(new String(b));
QuorumVerifier curQV = self.getQuorumVerifier();
if (rqv.getVersion() > curQV.getVersion()) {
LOG.info("{} Received version: {} my version: {}", self.getId(),
Long.toHexString(rqv.getVersion()),
Long.toHexString(self.getQuorumVerifier().getVersion()));
if (self.getPeerState() == ServerState.LOOKING) {
LOG.debug("Invoking processReconfig(), state: {}", 
self.getServerState());
self.processReconfig(rqv, null, null, false);
if (!rqv.equals(curQV)) {
LOG.info("restarting leader election");
// Signaling quorum peer to restart leader election.
self.shuttingDownLE = true;
 // Get a hold of current leader election object of quorum 
peer,
// so we can clean it up later without holding the lock of 
quorum
// peer. If we shutdown current leader election we will run 
into
// potential deadlock. See ZOOKEEPER-2080 for more details.
election = self.getElectionAlg();
}
} else {
LOG.debug("Skip processReconfig(), state: {}", 
self.getServerState());
}
}
} catch (IOException e) {
LOG.error("Something went wrong while processing config received from 
{}", response.sid);
   } catch (ConfigException e) {
   LOG.error("Something went wrong while processing config received from 
{}", response.sid);
   }
}
{code}

Do we really need to synchronize around self for the first part:

{code}
rqv = self.configFromString(new String(b));
QuorumVerifier curQV = self.getQuorumVerifier();
if (rqv.getVersion() > curQV.getVersion()) {

{code}

? Sounds like that can be done without synchronizing... no? 

Also, given you've spent a good amount of cycles untangling the dependencies 
around locking QuorumPeer, could you maybe add a comment before the 
synchronize(self) block noting why it is needed and who else might be 
contending for this lock. Thanks so much!

I think unit testing these things is a bit tricky, we might get a better return 
by just keeping better comments around synchronized regions and generally 
keeping them well maintained (imho). So I am happy to +1 without tests. 

> ReconfigRecoveryTest fails intermittently
> -
>
> Key: ZOOKEEPER-2080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Michael Han
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, 
> jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, repro-20150816.log, 
> threaddump.log
>
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-07-31 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401429#comment-15401429
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2169:
---

[~fpj]: it's here.

cc: [~randgalt]

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ZOOKEEPER-2169) Enable creation of nodes with TTLs

2016-07-31 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401429#comment-15401429
 ] 

Raul Gutierrez Segales edited comment on ZOOKEEPER-2169 at 8/1/16 1:31 AM:
---

[~fpj]: it's here: https://reviews.apache.org/r/46983/.

cc: [~randgalt]


was (Author: rgs):
[~fpj]: it's here.

cc: [~randgalt]

> Enable creation of nodes with TTLs
> --
>
> Key: ZOOKEEPER-2169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2169
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: c client, java client, jute, server
>Affects Versions: 3.6.0
>Reporter: Camille Fournier
>Assignee: Jordan Zimmerman
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2169-2.patch, ZOOKEEPER-2169-3.patch, 
> ZOOKEEPER-2169-4.patch, ZOOKEEPER-2169-5.patch, ZOOKEEPER-2169.patch
>
>
> As a user, I would like to be able to create a node that is NOT tied to a 
> session but that WILL expire automatically if action is not taken by some 
> client within a time window.
> I propose this to enable clients interacting with ZK via http or other "thin 
> clients" to create ephemeral-like nodes.
> Some ideas for the design, up for discussion:
> The node should support all normal ZK node operations including ACLs, 
> sequential key generation, etc, however, it should not support the ephemeral 
> flag. The node will be created with a TTL that is updated via a refresh 
> operation. 
> The ZK quorum will watch this node similarly to the way that it watches for 
> session liveness; if the node is not refreshed within the TTL, it will expire.
> QUESTIONS:
> 1) Should we let the refresh operation set the TTL to a different base value?
> 2) If so, should the setting of the TTL to a new base value cause a watch to 
> fire?
> 3) Do we want to allow these nodes to have children or prevent this similar 
> to ephemeral nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2479) Add 'fleTimeTaken' value in LeaderMXBean and FollowerMXBean

2016-07-31 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401440#comment-15401440
 ] 

Rakesh R commented on ZOOKEEPER-2479:
-

During FLE, server will go to {{LOOKING}} state after unregistering LeaderBean 
or FollowerBean, so the jmx bean won't be available to the users and I think no 
impact of resetting the value here. But I just resets the 
{{QuorumPeer#fleTimeTaken}} variable value to convey the message that during 
FLE, the value will be -1 and showing that LE is in progress. I don't have 
strong reason for this, its a new variable and we have chance to define the 
semantics. I will remove the resetting and upload patch.

> Add 'fleTimeTaken' value in LeaderMXBean and FollowerMXBean
> ---
>
> Key: ZOOKEEPER-2479
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2479
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.4.9, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2479.patch, ZOOKEEPER-2479.patch, 
> ZOOKEEPER-2479.patch
>
>
> The idea of this jira is to expose {{time taken}} for the leader election via 
> jmx Leader, Follower beans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2479) Add 'fleTimeTaken' value in LeaderMXBean and FollowerMXBean

2016-07-31 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated ZOOKEEPER-2479:

Attachment: ZOOKEEPER-2479.patch

> Add 'fleTimeTaken' value in LeaderMXBean and FollowerMXBean
> ---
>
> Key: ZOOKEEPER-2479
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2479
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.4.9, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2479.patch, ZOOKEEPER-2479.patch, 
> ZOOKEEPER-2479.patch, ZOOKEEPER-2479.patch
>
>
> The idea of this jira is to expose {{time taken}} for the leader election via 
> jmx Leader, Follower beans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2479) Add 'fleTimeTaken' value in LeaderMXBean and FollowerMXBean

2016-07-31 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401466#comment-15401466
 ] 

Rakesh R commented on ZOOKEEPER-2479:
-

Attached new patch addressing [~fpj]'s comment. It seems, need a separate patch 
for branch 3.4, I will prepare once I get a +1 for the trunk patch. Thanks!

> Add 'fleTimeTaken' value in LeaderMXBean and FollowerMXBean
> ---
>
> Key: ZOOKEEPER-2479
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2479
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.4.9, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2479.patch, ZOOKEEPER-2479.patch, 
> ZOOKEEPER-2479.patch, ZOOKEEPER-2479.patch
>
>
> The idea of this jira is to expose {{time taken}} for the leader election via 
> jmx Leader, Follower beans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: ZOOKEEPER-2479 PreCommit Build #3312

2016-07-31 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2479
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3312/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 441428 lines...]
 [exec]   
http://issues.apache.org/jira/secure/attachment/12821253/ZOOKEEPER-2479.patch
 [exec]   against trunk revision 1754582.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 2.0.3) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3312//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3312//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3312//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] b8520af1e6733fc01fb321c45faf534de7b07282 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 19 minutes 42 seconds
Archiving artifacts
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
Recording test results
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
[description-setter] Description set: ZOOKEEPER-2479
Email was triggered for: Success
Sending email for trigger: Success
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7
Setting 
LATEST1_7_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2479) Add 'fleTimeTaken' value in LeaderMXBean and FollowerMXBean

2016-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401475#comment-15401475
 ] 

Hadoop QA commented on ZOOKEEPER-2479:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12821253/ZOOKEEPER-2479.patch
  against trunk revision 1754582.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3312//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3312//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3312//console

This message is automatically generated.

> Add 'fleTimeTaken' value in LeaderMXBean and FollowerMXBean
> ---
>
> Key: ZOOKEEPER-2479
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2479
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 3.4.9, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2479.patch, ZOOKEEPER-2479.patch, 
> ZOOKEEPER-2479.patch, ZOOKEEPER-2479.patch
>
>
> The idea of this jira is to expose {{time taken}} for the leader election via 
> jmx Leader, Follower beans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2491) C client build error in vs 2015

2016-07-31 Thread spooky000 (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401539#comment-15401539
 ] 

spooky000 commented on ZOOKEEPER-2491:
--

I think "-1 core tests" is not releated my patch.


> C client build error in vs 2015 
> 
>
> Key: ZOOKEEPER-2491
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2491
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.5.2
> Environment: windows vs 2015
>Reporter: spooky000
>Assignee: spooky000
>Priority: Minor
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2491.patch, ZOOKEEPER-2491.patch, 
> ZOOKEEPER-2491.patch
>
>
> Visual Studio 2015  supports snprintf.
> #define snprintf _snprintf throw error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)