[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243864#comment-14243864 ] Rakesh R commented on ZOOKEEPER-2091: - bq.I am confused how it reads the returning result without finishing sending the entire packet? As I remember the situation happened in one of the cluster due to the heavy system resource usage like cpu/memory was high(due to map reduce job execution). It was continuously hitting the below exception and all the operations was failing. {code} org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1127) java.io.IOException: Nothing in the queue, but got 425467 at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:788) {code} Now with the patch, the cluster is running and no issue reported till now. Possible logic error in ClientCnxnSocketNIO --- Key: ZOOKEEPER-2091 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.6 Reporter: Cheng Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-2091.patch When SASL authentication is enabled, the ZooKeeper client will finally call ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server: @Override void sendPacket(Packet p) throws IOException { SocketChannel sock = (SocketChannel) sockKey.channel(); if (sock == null) { throw new IOException(Socket is null!); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); } One problem I can see is that the sock is non-blocking, so when the sock's output buffer is full(theoretically), only part of the Packet is sent out and the communication will break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
ZooKeeper-trunk-ibm6 - Build # 704 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-ibm6/704/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 361384 lines...] [junit] 2014-12-12 09:32:18,574 [myid:] - INFO [main:MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11221] [junit] 2014-12-12 09:32:18,575 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-12-12 09:32:18,576 [myid:] - INFO [main:JMXEnv@142] - ensureOnly:[] [junit] 2014-12-12 09:32:18,578 [myid:] - INFO [main:ClientBase@443] - STARTING server [junit] 2014-12-12 09:32:18,578 [myid:] - INFO [main:ClientBase@364] - CREATING server instance 127.0.0.1:11221 [junit] 2014-12-12 09:32:18,579 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 32 worker threads, and 64 kB direct buffers. [junit] 2014-12-12 09:32:18,579 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2014-12-12 09:32:18,580 [myid:] - INFO [main:ClientBase@339] - STARTING server instance 127.0.0.1:11221 [junit] 2014-12-12 09:32:18,580 [myid:] - INFO [main:ZooKeeperServer@781] - minSessionTimeout set to 6000 [junit] 2014-12-12 09:32:18,580 [myid:] - INFO [main:ZooKeeperServer@790] - maxSessionTimeout set to 6 [junit] 2014-12-12 09:32:18,581 [myid:] - INFO [main:ZooKeeperServer@152] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-ibm6/trunk/build/test/tmp/test5240311933835984256.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-ibm6/trunk/build/test/tmp/test5240311933835984256.junit.dir/version-2 [junit] 2014-12-12 09:32:18,582 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-ibm6/trunk/build/test/tmp/test5240311933835984256.junit.dir/version-2/snapshot.b [junit] 2014-12-12 09:32:18,585 [myid:] - INFO [main:FileTxnSnapLog@298] - Snapshotting: 0xb to /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-ibm6/trunk/build/test/tmp/test5240311933835984256.junit.dir/version-2/snapshot.b [junit] 2014-12-12 09:32:18,587 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-12-12 09:32:18,588 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:57199 [junit] 2014-12-12 09:32:18,589 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@835] - Processing stat command from /127.0.0.1:57199 [junit] 2014-12-12 09:32:18,589 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@684] - Stat command output [junit] 2014-12-12 09:32:18,590 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@1006] - Closed socket connection for client /127.0.0.1:57199 (no session established for client) [junit] 2014-12-12 09:32:18,590 [myid:] - INFO [main:JMXEnv@224] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2014-12-12 09:32:18,593 [myid:] - INFO [main:JMXEnv@241] - expect:InMemoryDataTree [junit] 2014-12-12 09:32:18,594 [myid:] - INFO [main:JMXEnv@245] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port11221,name1=InMemoryDataTree [junit] 2014-12-12 09:32:18,594 [myid:] - INFO [main:JMXEnv@241] - expect:StandaloneServer_port [junit] 2014-12-12 09:32:18,594 [myid:] - INFO [main:JMXEnv@245] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port11221 [junit] 2014-12-12 09:32:18,594 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@55] - Memory used 5433 [junit] 2014-12-12 09:32:18,595 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@60] - Number of threads 40 [junit] 2014-12-12 09:32:18,595 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@65] - FINISHED TEST METHOD testQuota [junit] 2014-12-12 09:32:18,595 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2014-12-12 09:32:18,628 [myid:] - INFO [main:ZooKeeper@968] - Session: 0x14a3dd7760f closed [junit] 2014-12-12 09:32:18,628 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@529] - EventThread shut down [junit] 2014-12-12 09:32:18,628 [myid:] - INFO [main:ClientBase@490] - STOPPING server [junit] 2014-12-12 09:32:18,629 [myid:] - INFO [ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - ConnnectionExpirerThread interrupted [junit] 2014-12-12 09:32:18,629 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method
ZooKeeper-trunk - Build # 2529 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk/2529/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 330090 lines...] [junit] 2014-12-12 11:10:53,928 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@835] - Processing stat command from /127.0.0.1:40400 [junit] 2014-12-12 11:10:53,929 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@684] - Stat command output [junit] 2014-12-12 11:10:53,929 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@1006] - Closed socket connection for client /127.0.0.1:40400 (no session established for client) [junit] 2014-12-12 11:10:53,930 [myid:] - INFO [main:JMXEnv@224] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2014-12-12 11:10:53,932 [myid:] - INFO [main:JMXEnv@241] - expect:InMemoryDataTree [junit] 2014-12-12 11:10:53,932 [myid:] - INFO [main:JMXEnv@245] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port11221,name1=InMemoryDataTree [junit] 2014-12-12 11:10:53,932 [myid:] - INFO [main:JMXEnv@241] - expect:StandaloneServer_port [junit] 2014-12-12 11:10:53,932 [myid:] - INFO [main:JMXEnv@245] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port11221 [junit] 2014-12-12 11:10:53,933 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@55] - Memory used 83951 [junit] 2014-12-12 11:10:53,933 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@60] - Number of threads 24 [junit] 2014-12-12 11:10:53,933 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@65] - FINISHED TEST METHOD testQuota [junit] 2014-12-12 11:10:53,933 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2014-12-12 11:10:53,973 [myid:] - INFO [main:ZooKeeper@968] - Session: 0x14a3e31b8ba closed [junit] 2014-12-12 11:10:53,974 [myid:] - INFO [main:ClientBase@490] - STOPPING server [junit] 2014-12-12 11:10:53,974 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@529] - EventThread shut down [junit] 2014-12-12 11:10:53,974 [myid:] - INFO [ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - ConnnectionExpirerThread interrupted [junit] 2014-12-12 11:10:53,974 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method [junit] 2014-12-12 11:10:53,974 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] - selector thread exitted run method [junit] 2014-12-12 11:10:53,974 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@219] - accept thread exitted run method [junit] 2014-12-12 11:10:53,975 [myid:] - INFO [main:ZooKeeperServer@443] - shutting down [junit] 2014-12-12 11:10:53,976 [myid:] - INFO [main:SessionTrackerImpl@231] - Shutting down [junit] 2014-12-12 11:10:53,976 [myid:] - INFO [main:PrepRequestProcessor@971] - Shutting down [junit] 2014-12-12 11:10:53,976 [myid:] - INFO [main:SyncRequestProcessor@191] - Shutting down [junit] 2014-12-12 11:10:53,976 [myid:] - INFO [ProcessThread(sid:0 cport:11221)::PrepRequestProcessor@155] - PrepRequestProcessor exited loop! [junit] 2014-12-12 11:10:53,977 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited! [junit] 2014-12-12 11:10:53,977 [myid:] - INFO [main:FinalRequestProcessor@476] - shutdown of request processor complete [junit] 2014-12-12 11:10:53,978 [myid:] - INFO [main:MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11221,name1=InMemoryDataTree] [junit] 2014-12-12 11:10:53,978 [myid:] - INFO [main:MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11221] [junit] 2014-12-12 11:10:53,979 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-12-12 11:10:53,980 [myid:] - INFO [main:JMXEnv@142] - ensureOnly:[] [junit] 2014-12-12 11:10:53,984 [myid:] - INFO [main:ClientBase@545] - fdcount after test is: 46 at start it was 33 [junit] 2014-12-12 11:10:53,985 [myid:] - INFO [main:ClientBase@547] - sleeping for 20 secs [junit] 2014-12-12 11:10:53,986 [myid:] - INFO [main:ZKTestCase$1@65] - SUCCEEDED testQuota [junit] 2014-12-12 11:10:53,986 [myid:] - INFO [main:ZKTestCase$1@60] - FINISHED testQuota [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.702 sec [junit] 2014-12-12 11:10:54,000 [myid:] - INFO [SessionTracker:SessionTrackerImpl@157] - SessionTrackerImpl exited loop! [junit] 2014-12-12 11:10:54,000 [myid:] - INFO [SessionTracker:SessionTrackerImpl@157] - SessionTrackerImpl exited loop! [junit] 2014-12-12 11:10:54,001
Re: Review Request 27244: ZOOKEEPER-2069
On Dec. 11, 2014, 7:37 p.m., Rakesh R wrote: src/java/main/org/apache/zookeeper/ClientCnxnSocketNetty.java, line 108 https://reviews.apache.org/r/27244/diff/28/?file=789550#file789550line108 bootstrap.setOption(connectTimeoutMillis, System.getProperty(connectTimeoutMillis)); What do you think about providing 'clientConnectTimeoutMillis' as a configurable item. By default its 10secs. Hongchao Deng wrote: Sounds good. But currently the timeout thing is handled outside in ClientCnxn sendThread. This SSL thing will gradually drive transformation to Netty style connection :) ok. agreed. On Dec. 11, 2014, 7:37 p.m., Rakesh R wrote: src/java/main/org/apache/zookeeper/ClientCnxnSocketNetty.java, line 99 https://reviews.apache.org/r/27244/diff/28/?file=789550#file789550line99 Could you tell me when this situation can come? Is this added for safer side? Hongchao Deng wrote: I will remove it. I had this code for safe check when I didn't know ClientCnxn so well. ok. fine On Dec. 11, 2014, 7:37 p.m., Rakesh R wrote: src/java/main/org/apache/zookeeper/ClientCnxnSocketNetty.java, line 63 https://reviews.apache.org/r/27244/diff/28/?file=789550#file789550line63 Can you give thread name for this thread pool like, Executors.newCachedThreadPool(new ThreadFactoryBuilder() .setNameFormat(ZooKeeper-NIOBoss-%d).build()), Executors.newCachedThreadPool(new ThreadFactoryBuilder() .setNameFormat(ZooKeeper-NIOWorker-%d).build()) Hongchao Deng wrote: ThreadFactoryBuilder is from Guava... We didn't have it yet. oh yes. Good if we can name the thread, otw leave it:) - Rakesh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/#review64767 --- On Dec. 12, 2014, 12:14 a.m., Hongchao Deng wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/ --- (Updated Dec. 12, 2014, 12:14 a.m.) Review request for zookeeper. Repository: zookeeper-git Description --- ZOOKEEPER-2069 Diffs - build.xml bb5ff4f src/java/main/org/apache/zookeeper/ClientCnxn.java b4ece07 src/java/main/org/apache/zookeeper/ClientCnxnSocket.java 5ca0ba7 src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java adb27ee src/java/main/org/apache/zookeeper/ClientCnxnSocketNetty.java PRE-CREATION src/java/main/org/apache/zookeeper/ZooKeeperTestable.java 775d1a2 src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java dbc1080 src/java/test/org/apache/zookeeper/test/ClientTest.java dbe595c src/java/test/org/apache/zookeeper/test/NettyNettySuiteBase.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteHammerTest.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteTest.java PRE-CREATION Diff: https://reviews.apache.org/r/27244/diff/ Testing --- 1. use LinkedBlockingDeque. Thanks, Hongchao Deng
ZooKeeper_branch35_jdk7 - Build # 139 - Still Failing
See https://builds.apache.org/job/ZooKeeper_branch35_jdk7/139/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 336860 lines...] [exec] mv -f .deps/zktest_mt-TestWatchers.Tpo .deps/zktest_mt-TestWatchers.Po [exec] g++ -DHAVE_CONFIG_H -I. -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/include -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/tests -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/generated -DUSE_STATIC_LIB -DTHREADED -DZKSERVER_CMD=\/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/tests/zkServer.sh\ -DZOO_IPV6_ENABLED -g -O2 -MT zktest_mt-TestClient.o -MD -MP -MF .deps/zktest_mt-TestClient.Tpo -c -o zktest_mt-TestClient.o `test -f 'tests/TestClient.cc' || echo '/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/'`tests/TestClient.cc [exec] mv -f .deps/zktest_mt-TestClient.Tpo .deps/zktest_mt-TestClient.Po [exec] g++ -DHAVE_CONFIG_H -I. -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/include -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/tests -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/generated -DUSE_STATIC_LIB -DTHREADED -DZKSERVER_CMD=\/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/tests/zkServer.sh\ -DZOO_IPV6_ENABLED -g -O2 -MT zktest_mt-ZooKeeperQuorumServer.o -MD -MP -MF .deps/zktest_mt-ZooKeeperQuorumServer.Tpo -c -o zktest_mt-ZooKeeperQuorumServer.o `test -f 'tests/ZooKeeperQuorumServer.cc' || echo '/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/'`tests/ZooKeeperQuorumServer.cc [exec] mv -f .deps/zktest_mt-ZooKeeperQuorumServer.Tpo .deps/zktest_mt-ZooKeeperQuorumServer.Po [exec] g++ -DHAVE_CONFIG_H -I. -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/include -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/tests -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/generated -DUSE_STATIC_LIB -DTHREADED -DZKSERVER_CMD=\/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/tests/zkServer.sh\ -DZOO_IPV6_ENABLED -g -O2 -MT zktest_mt-TestReadOnlyClient.o -MD -MP -MF .deps/zktest_mt-TestReadOnlyClient.Tpo -c -o zktest_mt-TestReadOnlyClient.o `test -f 'tests/TestReadOnlyClient.cc' || echo '/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/'`tests/TestReadOnlyClient.cc [exec] mv -f .deps/zktest_mt-TestReadOnlyClient.Tpo .deps/zktest_mt-TestReadOnlyClient.Po [exec] g++ -DHAVE_CONFIG_H -I. -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/include -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/tests -I/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/generated -DUSE_STATIC_LIB -DTHREADED -DZKSERVER_CMD=\/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/tests/zkServer.sh\ -DZOO_IPV6_ENABLED -g -O2 -MT zktest_mt-PthreadMocks.o -MD -MP -MF .deps/zktest_mt-PthreadMocks.Tpo -c -o zktest_mt-PthreadMocks.o `test -f 'tests/PthreadMocks.cc' || echo '/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/'`tests/PthreadMocks.cc [exec] mv -f .deps/zktest_mt-PthreadMocks.Tpo .deps/zktest_mt-PthreadMocks.Po [exec] /bin/bash ./libtool --tag=CXX --mode=link g++ -DUSE_STATIC_LIB -DTHREADED -DZKSERVER_CMD=\/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/tests/zkServer.sh\ -DZOO_IPV6_ENABLED -g -O2 -static-libtool-libs -Wl,--wrap -Wl,calloc -Wl,--wrap -Wl,free -Wl,--wrap -Wl,flush_send_queue -Wl,--wrap -Wl,get_xid -Wl,--wrap -Wl,deliverWatchers -Wl,--wrap -Wl,activateWatcher -Wl,--wrap -Wl,pthread_mutex_lock -Wl,--wrap -Wl,pthread_mutex_trylock -Wl,--wrap -Wl,pthread_mutex_unlock -o zktest-mt zktest_mt-TestDriver.o zktest_mt-LibCMocks.o zktest_mt-LibCSymTable.o zktest_mt-MocksBase.o zktest_mt-ZKMocks.o zktest_mt-Util.o zktest_mt-ThreadingUtil.o zktest_mt-TestZookeeperInit.o zktest_mt-TestZookeeperClose.o zktest_mt-TestReconfig.o zktest_mt-TestReconfigServer.o zktest_mt-TestClientRetry.o zktest_mt-TestOperations.o zktest_mt-TestMulti.o zktest_mt-TestWatchers.o zktest_mt-TestClient.o zktest_mt-ZooKeeperQuorumServer.o zktest_mt-TestReadOnlyClient.o zktest_mt-PthreadMocks.o libzkmt.la libhashtable.la -lpthread -L/usr/lib/x86_64-linux-gnu -lcppunit -ldl [exec] libtool: link: g++ -DUSE_STATIC_LIB -DTHREADED -DZKSERVER_CMD=\/jenkins/workspace/ZooKeeper_branch35_jdk7/branch-3.5/src/c/tests/zkServer.sh\ -DZOO_IPV6_ENABLED -g -O2 -Wl,--wrap -Wl,calloc -Wl,--wrap -Wl,free -Wl,--wrap -Wl,flush_send_queue -Wl,--wrap -Wl,get_xid -Wl,--wrap -Wl,deliverWatchers -Wl,--wrap
[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244534#comment-14244534 ] Hongchao Deng commented on ZOOKEEPER-2091: -- Did you try only loop writing in ClientCnxnSocketNIO#SendPacket() and see if it works? I can tell that's a problem. Possible logic error in ClientCnxnSocketNIO --- Key: ZOOKEEPER-2091 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.6 Reporter: Cheng Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-2091.patch When SASL authentication is enabled, the ZooKeeper client will finally call ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server: @Override void sendPacket(Packet p) throws IOException { SocketChannel sock = (SocketChannel) sockKey.channel(); if (sock == null) { throw new IOException(Socket is null!); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); } One problem I can see is that the sock is non-blocking, so when the sock's output buffer is full(theoretically), only part of the Packet is sent out and the communication will break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 27244: ZOOKEEPER-2069
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/ --- (Updated Dec. 12, 2014, 7:16 p.m.) Review request for zookeeper. Repository: zookeeper-git Description --- ZOOKEEPER-2069 Diffs (updated) - build.xml bb5ff4f src/java/main/org/apache/zookeeper/ClientCnxn.java b4ece07 src/java/main/org/apache/zookeeper/ClientCnxnSocket.java 5ca0ba7 src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java adb27ee src/java/main/org/apache/zookeeper/ClientCnxnSocketNetty.java PRE-CREATION src/java/main/org/apache/zookeeper/ZooKeeperTestable.java 775d1a2 src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java dbc1080 src/java/test/org/apache/zookeeper/test/ClientTest.java dbe595c src/java/test/org/apache/zookeeper/test/NettyNettySuiteBase.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteHammerTest.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteTest.java PRE-CREATION Diff: https://reviews.apache.org/r/27244/diff/ Testing --- 1. use LinkedBlockingDeque. Thanks, Hongchao Deng
[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244659#comment-14244659 ] Camille Fournier commented on ZOOKEEPER-2091: - I agree that it seems like only sending the full packet in ClientCnxnSocketNIO#SendPacket() would fix this issue. We don't do this in doIO presumably because we might only send part of the buffer, then allow another read to come in before we send the rest of the packet so as not to block on the complete send of the pending outgoing packet? Is that correct for implementing the nonblocking socket, can someone verify? Possible logic error in ClientCnxnSocketNIO --- Key: ZOOKEEPER-2091 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.6 Reporter: Cheng Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-2091.patch When SASL authentication is enabled, the ZooKeeper client will finally call ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server: @Override void sendPacket(Packet p) throws IOException { SocketChannel sock = (SocketChannel) sockKey.channel(); if (sock == null) { throw new IOException(Socket is null!); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); } One problem I can see is that the sock is non-blocking, so when the sock's output buffer is full(theoretically), only part of the Packet is sent out and the communication will break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244672#comment-14244672 ] Hongchao Deng commented on ZOOKEEPER-2091: -- bq. then allow another read to come in From what I understand in the code, reading the returning result for the packet sent needs to be complete, a.k.a. after the packet sent completely. {code} if (!incomingBuffer.hasRemaining()) { ... ... sendThread.readResponse(incomingBuffer); ... } {code} incomingBuffer needs to be filled. Possible logic error in ClientCnxnSocketNIO --- Key: ZOOKEEPER-2091 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.6 Reporter: Cheng Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-2091.patch When SASL authentication is enabled, the ZooKeeper client will finally call ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server: @Override void sendPacket(Packet p) throws IOException { SocketChannel sock = (SocketChannel) sockKey.channel(); if (sock == null) { throw new IOException(Socket is null!); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); } One problem I can see is that the sock is non-blocking, so when the sock's output buffer is full(theoretically), only part of the Packet is sent out and the communication will break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
ZooKeeper-trunk-openjdk7 - Build # 658 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/658/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 329349 lines...] [junit] 2014-12-12 20:25:45,846 [myid:] - INFO [main:MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11221] [junit] 2014-12-12 20:25:45,846 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-12-12 20:25:45,847 [myid:] - INFO [main:JMXEnv@142] - ensureOnly:[] [junit] 2014-12-12 20:25:45,848 [myid:] - INFO [main:ClientBase@443] - STARTING server [junit] 2014-12-12 20:25:45,848 [myid:] - INFO [main:ClientBase@364] - CREATING server instance 127.0.0.1:11221 [junit] 2014-12-12 20:25:45,848 [myid:] - INFO [main:NIOServerCnxnFactory@670] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 32 worker threads, and 64 kB direct buffers. [junit] 2014-12-12 20:25:45,849 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2014-12-12 20:25:45,849 [myid:] - INFO [main:ClientBase@339] - STARTING server instance 127.0.0.1:11221 [junit] 2014-12-12 20:25:45,849 [myid:] - INFO [main:ZooKeeperServer@781] - minSessionTimeout set to 6000 [junit] 2014-12-12 20:25:45,849 [myid:] - INFO [main:ZooKeeperServer@790] - maxSessionTimeout set to 6 [junit] 2014-12-12 20:25:45,849 [myid:] - INFO [main:ZooKeeperServer@152] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-openjdk7/trunk/build/test/tmp/test188742289405531704.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-openjdk7/trunk/build/test/tmp/test188742289405531704.junit.dir/version-2 [junit] 2014-12-12 20:25:45,851 [myid:] - INFO [main:FileSnap@83] - Reading snapshot /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-openjdk7/trunk/build/test/tmp/test188742289405531704.junit.dir/version-2/snapshot.b [junit] 2014-12-12 20:25:45,853 [myid:] - INFO [main:FileTxnSnapLog@298] - Snapshotting: 0xb to /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-openjdk7/trunk/build/test/tmp/test188742289405531704.junit.dir/version-2/snapshot.b [junit] 2014-12-12 20:25:45,855 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-12-12 20:25:45,855 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:33893 [junit] 2014-12-12 20:25:45,857 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@835] - Processing stat command from /127.0.0.1:33893 [junit] 2014-12-12 20:25:45,857 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@684] - Stat command output [junit] 2014-12-12 20:25:45,857 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@1006] - Closed socket connection for client /127.0.0.1:33893 (no session established for client) [junit] 2014-12-12 20:25:45,857 [myid:] - INFO [main:JMXEnv@224] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2014-12-12 20:25:45,859 [myid:] - INFO [main:JMXEnv@241] - expect:InMemoryDataTree [junit] 2014-12-12 20:25:45,859 [myid:] - INFO [main:JMXEnv@245] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port11221,name1=InMemoryDataTree [junit] 2014-12-12 20:25:45,860 [myid:] - INFO [main:JMXEnv@241] - expect:StandaloneServer_port [junit] 2014-12-12 20:25:45,860 [myid:] - INFO [main:JMXEnv@245] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port11221 [junit] 2014-12-12 20:25:45,860 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@55] - Memory used 85675 [junit] 2014-12-12 20:25:45,860 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@60] - Number of threads 24 [junit] 2014-12-12 20:25:45,860 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@65] - FINISHED TEST METHOD testQuota [junit] 2014-12-12 20:25:45,861 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2014-12-12 20:25:45,925 [myid:] - INFO [main:ZooKeeper@968] - Session: 0x14a402db7bc closed [junit] 2014-12-12 20:25:45,925 [myid:] - INFO [main:ClientBase@490] - STOPPING server [junit] 2014-12-12 20:25:45,925 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@529] - EventThread shut down [junit] 2014-12-12 20:25:45,925 [myid:] - INFO [ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - ConnnectionExpirerThread interrupted [junit] 2014-12-12 20:25:45,925 [myid:] - INFO [NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] - selector thread
Make JDK 7 the minimum support version
Hi all, I have been working on Netty + SSL and one of the changes I do is change outgoingQueue to be LinkedBlockingDeque, which requires JDK 6. As Flavio (fpj) and Patrick (phunt) suggested, there are JIRAs discussing JDK7: ZOOKEEPER-2046, ZOOKEEPER-1963. It seems many people are interested in an upgrade to JDK 7. As we know, JDK 6 stops public updates. Making JDK 7 the minimum of ZooKeeper going forward will drive us up-to-date and less work on supporting older versions of JDK. I am raising this topic here to gather public responses. The question is simple: should we make JDK 7 the minimum support version? Your advice, ideas, suggestions are highly welcome and appreciated! -- *- Hongchao Deng* *Software Engineer*
[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244861#comment-14244861 ] Camille Fournier commented on ZOOKEEPER-2091: - I'm all the way in the bottom of this code and don't even remember the invariants. Why don't we wait till all the bytes are flushed to the buffer if we don't read anything until that's done? FWIW, I looked into the depth's of Netty's NIO socket handling. In particular to verify that Netty sockets will sometimes switch to reading before all of the outgoing bytes are written, which they will. I thought it was interesting though that they also configure a writeSpinCount that lets them try to flush all bytes a configurable number of times before going back to the selector to do other IO. Might be something worth considering. Or, you know, not doing this socket stuff ourselves at all and just using netty :) https://github.com/netty/netty/blob/0eb059bf58642f3c06144e1ea4a9d6c7632eb4d5/transport/src/main/java/io/netty/channel/nio/AbstractNioByteChannel.java#L198 In conclusion, I think that it may not matter much either way but probably we should only add this to the SendPacket method. Possible logic error in ClientCnxnSocketNIO --- Key: ZOOKEEPER-2091 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.6 Reporter: Cheng Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-2091.patch When SASL authentication is enabled, the ZooKeeper client will finally call ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server: @Override void sendPacket(Packet p) throws IOException { SocketChannel sock = (SocketChannel) sockKey.channel(); if (sock == null) { throw new IOException(Socket is null!); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); } One problem I can see is that the sock is non-blocking, so when the sock's output buffer is full(theoretically), only part of the Packet is sent out and the communication will break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244878#comment-14244878 ] Hongchao Deng commented on ZOOKEEPER-2091: -- Thanks [~fournc] for the in-depth thinking. I do have another JIRA, ZOOKEEPER-2069 to drive Netty things. Back to this issue. The problem I can find here is that SendPacket() assumes a successful write. The other problem is that Sasl process implementation is too hackish. It stops/puts aside the nonblocking write logic. My point is that this is a good chance to fix it. Possible logic error in ClientCnxnSocketNIO --- Key: ZOOKEEPER-2091 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.6 Reporter: Cheng Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-2091.patch When SASL authentication is enabled, the ZooKeeper client will finally call ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server: @Override void sendPacket(Packet p) throws IOException { SocketChannel sock = (SocketChannel) sockKey.channel(); if (sock == null) { throw new IOException(Socket is null!); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); } One problem I can see is that the sock is non-blocking, so when the sock's output buffer is full(theoretically), only part of the Packet is sent out and the communication will break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244892#comment-14244892 ] Camille Fournier commented on ZOOKEEPER-2091: - I agree [~hdeng] but I'm not sure what conclusion you're driving to. From my observation, we can do one of two obvious things: 1) Only change SendPacket, not doIO, which I *think* I agree should solve the observed problem with less impact than the current patch 2) Something more drastic to actually fix the hack that is our current Sasl hacks, which I don't have a patch to do and I'm not sure if we have any other open tickets that will resolve this. If we think solution 1 solves the issue, I think it is a simple enough fix to go ahead and use while we look into Netty etc. Possible logic error in ClientCnxnSocketNIO --- Key: ZOOKEEPER-2091 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.6 Reporter: Cheng Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-2091.patch When SASL authentication is enabled, the ZooKeeper client will finally call ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server: @Override void sendPacket(Packet p) throws IOException { SocketChannel sock = (SocketChannel) sockKey.channel(); if (sock == null) { throw new IOException(Socket is null!); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); } One problem I can see is that the sock is non-blocking, so when the sock's output buffer is full(theoretically), only part of the Packet is sent out and the communication will break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 27244: ZOOKEEPER-2069
On Dec. 11, 2014, 11:16 p.m., fpj wrote: src/java/main/org/apache/zookeeper/ClientCnxn.java, line 1472 https://reviews.apache.org/r/27244/diff/28/?file=789547#file789547line1472 I'm now confused by this change. Are you getting rid of enableWrite? saslCompleted is fairly different from enableWrite. Hongchao Deng wrote: So the guy who make the SASL change do it in a hackish way. It simply blocks the entire NIO thread and send it directly through socket. But when it finishes SASL, it calls enableWrite(). I can definitely use the old name enableWrite(). But do you think it is better to drive the change here? Since Netty client doesn't need to enable write here. Hongchao Deng wrote: Hi Flavio. Maybe you mean enableWrite() is also used by CilentCnxnNIO and worried that I change the name as well. I only change the interface to outside. Inside ClientCnxnNIO there is still a enableWrite function which is used inside the class's other places. Yes, you're right, I did confuse the calls. On Dec. 11, 2014, 11:16 p.m., fpj wrote: src/java/main/org/apache/zookeeper/ClientCnxn.java, line 134 https://reviews.apache.org/r/27244/diff/28/?file=789547#file789547line134 It sounds like making pendingQueue and outgoingQueue are optimizations that are independent from this issue. Shouldn't we make this change in a different jira? Hongchao Deng wrote: outgoingQueue isn't optimization... My previous patch used a semaphore which will be waken up by WakeupCnxn(). Your suggestion is use a blocking outgoingQueue. What do you think? I am fine since I know both changes well and I can do it in separate JIRAs. My concern here is that it ended up implying more changes than just removing the semaphore. For example, the patch removes this synchronized block (not sure it is going to format it right): for (Packet p : outgoingQueue) { conLossPacket(p); } outgoingQueue.clear(); and checking the other synchronized blocks removed, it isn't entirely straightforward that you can remove all of them. - fpj --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/#review64807 --- On Dec. 12, 2014, 7:16 p.m., Hongchao Deng wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/ --- (Updated Dec. 12, 2014, 7:16 p.m.) Review request for zookeeper. Repository: zookeeper-git Description --- ZOOKEEPER-2069 Diffs - build.xml bb5ff4f src/java/main/org/apache/zookeeper/ClientCnxn.java b4ece07 src/java/main/org/apache/zookeeper/ClientCnxnSocket.java 5ca0ba7 src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java adb27ee src/java/main/org/apache/zookeeper/ClientCnxnSocketNetty.java PRE-CREATION src/java/main/org/apache/zookeeper/ZooKeeperTestable.java 775d1a2 src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java dbc1080 src/java/test/org/apache/zookeeper/test/ClientTest.java dbe595c src/java/test/org/apache/zookeeper/test/NettyNettySuiteBase.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteHammerTest.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteTest.java PRE-CREATION Diff: https://reviews.apache.org/r/27244/diff/ Testing --- 1. use LinkedBlockingDeque. Thanks, Hongchao Deng
Re: Review Request 27244: ZOOKEEPER-2069
On Dec. 11, 2014, 11:16 p.m., fpj wrote: src/java/main/org/apache/zookeeper/ClientCnxn.java, line 134 https://reviews.apache.org/r/27244/diff/28/?file=789547#file789547line134 It sounds like making pendingQueue and outgoingQueue are optimizations that are independent from this issue. Shouldn't we make this change in a different jira? Hongchao Deng wrote: outgoingQueue isn't optimization... My previous patch used a semaphore which will be waken up by WakeupCnxn(). Your suggestion is use a blocking outgoingQueue. What do you think? I am fine since I know both changes well and I can do it in separate JIRAs. fpj wrote: My concern here is that it ended up implying more changes than just removing the semaphore. For example, the patch removes this synchronized block (not sure it is going to format it right): for (Packet p : outgoingQueue) { conLossPacket(p); } outgoingQueue.clear(); and checking the other synchronized blocks removed, it isn't entirely straightforward that you can remove all of them. I can add those synchronized back. However, findbugs will complain that so I will add rules too. On Netty side I can assure your concern: 1. In ClientCnxn, primeConnection(), cleanup(), queuePacket(). primeConnection(), cleanup() is guaranteed to be start and end of life cycles. queuePacket() is appending packets to outgoingQueue and waking up the blocking queue. What do you think? - Hongchao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/#review64807 --- On Dec. 12, 2014, 7:16 p.m., Hongchao Deng wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/ --- (Updated Dec. 12, 2014, 7:16 p.m.) Review request for zookeeper. Repository: zookeeper-git Description --- ZOOKEEPER-2069 Diffs - build.xml bb5ff4f src/java/main/org/apache/zookeeper/ClientCnxn.java b4ece07 src/java/main/org/apache/zookeeper/ClientCnxnSocket.java 5ca0ba7 src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java adb27ee src/java/main/org/apache/zookeeper/ClientCnxnSocketNetty.java PRE-CREATION src/java/main/org/apache/zookeeper/ZooKeeperTestable.java 775d1a2 src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java dbc1080 src/java/test/org/apache/zookeeper/test/ClientTest.java dbe595c src/java/test/org/apache/zookeeper/test/NettyNettySuiteBase.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteHammerTest.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteTest.java PRE-CREATION Diff: https://reviews.apache.org/r/27244/diff/ Testing --- 1. use LinkedBlockingDeque. Thanks, Hongchao Deng
Re: Make JDK 7 the minimum support version
+1 for supporting only 7 or later. On 12 Dec 2014, at 21:28, Hongchao Deng hd...@cloudera.com wrote: Hi all, I have been working on Netty + SSL and one of the changes I do is change outgoingQueue to be LinkedBlockingDeque, which requires JDK 6. As Flavio (fpj) and Patrick (phunt) suggested, there are JIRAs discussing JDK7: ZOOKEEPER-2046, ZOOKEEPER-1963. It seems many people are interested in an upgrade to JDK 7. As we know, JDK 6 stops public updates. Making JDK 7 the minimum of ZooKeeper going forward will drive us up-to-date and less work on supporting older versions of JDK. I am raising this topic here to gather public responses. The question is simple: should we make JDK 7 the minimum support version? Your advice, ideas, suggestions are highly welcome and appreciated! -- *- Hongchao Deng* *Software Engineer*
Re: Review Request 27244: ZOOKEEPER-2069
On Dec. 11, 2014, 11:16 p.m., fpj wrote: src/java/main/org/apache/zookeeper/ClientCnxn.java, line 134 https://reviews.apache.org/r/27244/diff/28/?file=789547#file789547line134 It sounds like making pendingQueue and outgoingQueue are optimizations that are independent from this issue. Shouldn't we make this change in a different jira? Hongchao Deng wrote: outgoingQueue isn't optimization... My previous patch used a semaphore which will be waken up by WakeupCnxn(). Your suggestion is use a blocking outgoingQueue. What do you think? I am fine since I know both changes well and I can do it in separate JIRAs. fpj wrote: My concern here is that it ended up implying more changes than just removing the semaphore. For example, the patch removes this synchronized block (not sure it is going to format it right): for (Packet p : outgoingQueue) { conLossPacket(p); } outgoingQueue.clear(); and checking the other synchronized blocks removed, it isn't entirely straightforward that you can remove all of them. Hongchao Deng wrote: I can add those synchronized back. However, findbugs will complain that so I will add rules too. On Netty side I can assure your concern: 1. In ClientCnxn, primeConnection(), cleanup(), queuePacket(). primeConnection(), cleanup() is guaranteed to be start and end of life cycles. queuePacket() is appending packets to outgoingQueue and waking up the blocking queue. What do you think? I'm not suggesting that you bring back the synchronized blocks. I'm saying that I haven't checked carefully and it isn't straightforward that you can remove them. For example, the one in ClientCnxnSocketNIO line 163 has multiple accesses to outgoingQueue, so removing the synchronized block doesn't necessarily gives us the same behavior. Can you actually justify the removal of each one of those synchronized blocks? If you aren't sure, then I'll check it carefully. - fpj --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/#review64807 --- On Dec. 12, 2014, 7:16 p.m., Hongchao Deng wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/ --- (Updated Dec. 12, 2014, 7:16 p.m.) Review request for zookeeper. Repository: zookeeper-git Description --- ZOOKEEPER-2069 Diffs - build.xml bb5ff4f src/java/main/org/apache/zookeeper/ClientCnxn.java b4ece07 src/java/main/org/apache/zookeeper/ClientCnxnSocket.java 5ca0ba7 src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java adb27ee src/java/main/org/apache/zookeeper/ClientCnxnSocketNetty.java PRE-CREATION src/java/main/org/apache/zookeeper/ZooKeeperTestable.java 775d1a2 src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java dbc1080 src/java/test/org/apache/zookeeper/test/ClientTest.java dbe595c src/java/test/org/apache/zookeeper/test/NettyNettySuiteBase.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteHammerTest.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteTest.java PRE-CREATION Diff: https://reviews.apache.org/r/27244/diff/ Testing --- 1. use LinkedBlockingDeque. Thanks, Hongchao Deng
Re: Make JDK 7 the minimum support version
+1 Makes sense to me. I'm assuming we're talking 3.5+ here, right? We would continue to support jdk6 in 3.4? Patrick On Fri, Dec 12, 2014 at 3:42 PM, Flavio Junqueira fpjunque...@yahoo.com.invalid wrote: +1 for supporting only 7 or later. On 12 Dec 2014, at 21:28, Hongchao Deng hd...@cloudera.com wrote: Hi all, I have been working on Netty + SSL and one of the changes I do is change outgoingQueue to be LinkedBlockingDeque, which requires JDK 6. As Flavio (fpj) and Patrick (phunt) suggested, there are JIRAs discussing JDK7: ZOOKEEPER-2046, ZOOKEEPER-1963. It seems many people are interested in an upgrade to JDK 7. As we know, JDK 6 stops public updates. Making JDK 7 the minimum of ZooKeeper going forward will drive us up-to-date and less work on supporting older versions of JDK. I am raising this topic here to gather public responses. The question is simple: should we make JDK 7 the minimum support version? Your advice, ideas, suggestions are highly welcome and appreciated! -- *- Hongchao Deng* *Software Engineer*
Re: Make JDK 7 the minimum support version
+1 for 3.5+ On Fri, Dec 12, 2014 at 6:53 PM, Patrick Hunt ph...@apache.org wrote: +1 Makes sense to me. I'm assuming we're talking 3.5+ here, right? We would continue to support jdk6 in 3.4? Patrick On Fri, Dec 12, 2014 at 3:42 PM, Flavio Junqueira fpjunque...@yahoo.com.invalid wrote: +1 for supporting only 7 or later. On 12 Dec 2014, at 21:28, Hongchao Deng hd...@cloudera.com wrote: Hi all, I have been working on Netty + SSL and one of the changes I do is change outgoingQueue to be LinkedBlockingDeque, which requires JDK 6. As Flavio (fpj) and Patrick (phunt) suggested, there are JIRAs discussing JDK7: ZOOKEEPER-2046, ZOOKEEPER-1963. It seems many people are interested in an upgrade to JDK 7. As we know, JDK 6 stops public updates. Making JDK 7 the minimum of ZooKeeper going forward will drive us up-to-date and less work on supporting older versions of JDK. I am raising this topic here to gather public responses. The question is simple: should we make JDK 7 the minimum support version? Your advice, ideas, suggestions are highly welcome and appreciated! -- *- Hongchao Deng* *Software Engineer*
Re: Review Request 27244: ZOOKEEPER-2069
On Dec. 11, 2014, 11:16 p.m., fpj wrote: src/java/main/org/apache/zookeeper/ClientCnxn.java, line 134 https://reviews.apache.org/r/27244/diff/28/?file=789547#file789547line134 It sounds like making pendingQueue and outgoingQueue are optimizations that are independent from this issue. Shouldn't we make this change in a different jira? Hongchao Deng wrote: outgoingQueue isn't optimization... My previous patch used a semaphore which will be waken up by WakeupCnxn(). Your suggestion is use a blocking outgoingQueue. What do you think? I am fine since I know both changes well and I can do it in separate JIRAs. fpj wrote: My concern here is that it ended up implying more changes than just removing the semaphore. For example, the patch removes this synchronized block (not sure it is going to format it right): for (Packet p : outgoingQueue) { conLossPacket(p); } outgoingQueue.clear(); and checking the other synchronized blocks removed, it isn't entirely straightforward that you can remove all of them. Hongchao Deng wrote: I can add those synchronized back. However, findbugs will complain that so I will add rules too. On Netty side I can assure your concern: 1. In ClientCnxn, primeConnection(), cleanup(), queuePacket(). primeConnection(), cleanup() is guaranteed to be start and end of life cycles. queuePacket() is appending packets to outgoingQueue and waking up the blocking queue. What do you think? fpj wrote: I'm not suggesting that you bring back the synchronized blocks. I'm saying that I haven't checked carefully and it isn't straightforward that you can remove them. For example, the one in ClientCnxnSocketNIO line 163 has multiple accesses to outgoingQueue, so removing the synchronized block doesn't necessarily gives us the same behavior. Can you actually justify the removal of each one of those synchronized blocks? If you aren't sure, then I'll check it carefully. That would be great if you can check it. Actually I just found a possible race. Here's my version of analysis: 1. primeConnection: The start of life cycle. No one else can touch outgoingQueue except queuePacket(). primeConn prepends packets while queuePacket appends. 2. !!cleanup: The end of life cycle. No one else except queuePacket(). A possible race with queuePacket() in check closing and add packets. I will fix it shortly. 3. queuePacket: ... 4. findSendablePacket and doIO: this is the only place that remove items from outgoing Queue - the check in findSendablePacket() guarantees not empty. Because no where will remove items - enumeration will give a consistent current view. Adding new packets doesn't count in. - found non-priming packet, remove and addFirst wouldn't have race because here only queuePacket() will be called. And queuePacket() appends only. 5. last check in doTransport: this is duplicate.. - Hongchao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/#review64807 --- On Dec. 12, 2014, 7:16 p.m., Hongchao Deng wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/ --- (Updated Dec. 12, 2014, 7:16 p.m.) Review request for zookeeper. Repository: zookeeper-git Description --- ZOOKEEPER-2069 Diffs - build.xml bb5ff4f src/java/main/org/apache/zookeeper/ClientCnxn.java b4ece07 src/java/main/org/apache/zookeeper/ClientCnxnSocket.java 5ca0ba7 src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java adb27ee src/java/main/org/apache/zookeeper/ClientCnxnSocketNetty.java PRE-CREATION src/java/main/org/apache/zookeeper/ZooKeeperTestable.java 775d1a2 src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java dbc1080 src/java/test/org/apache/zookeeper/test/ClientTest.java dbe595c src/java/test/org/apache/zookeeper/test/NettyNettySuiteBase.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteHammerTest.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteTest.java PRE-CREATION Diff: https://reviews.apache.org/r/27244/diff/ Testing --- 1. use LinkedBlockingDeque. Thanks, Hongchao Deng
RE: Make JDK 7 the minimum support version
Right. We are discussing on ZK 3.5+ - Hongchao Deng From: ph...@apache.org Date: Fri, 12 Dec 2014 15:53:17 -0800 Subject: Re: Make JDK 7 the minimum support version To: dev@zookeeper.apache.org CC: u...@zookeeper.apache.org +1 Makes sense to me. I'm assuming we're talking 3.5+ here, right? We would continue to support jdk6 in 3.4? Patrick On Fri, Dec 12, 2014 at 3:42 PM, Flavio Junqueira fpjunque...@yahoo.com.invalid wrote: +1 for supporting only 7 or later. On 12 Dec 2014, at 21:28, Hongchao Deng hd...@cloudera.com wrote: Hi all, I have been working on Netty + SSL and one of the changes I do is change outgoingQueue to be LinkedBlockingDeque, which requires JDK 6. As Flavio (fpj) and Patrick (phunt) suggested, there are JIRAs discussing JDK7: ZOOKEEPER-2046, ZOOKEEPER-1963. It seems many people are interested in an upgrade to JDK 7. As we know, JDK 6 stops public updates. Making JDK 7 the minimum of ZooKeeper going forward will drive us up-to-date and less work on supporting older versions of JDK. I am raising this topic here to gather public responses. The question is simple: should we make JDK 7 the minimum support version? Your advice, ideas, suggestions are highly welcome and appreciated! -- *- Hongchao Deng* *Software Engineer*
Re: Review Request 27244: ZOOKEEPER-2069
On Dec. 11, 2014, 11:16 p.m., fpj wrote: src/java/main/org/apache/zookeeper/ClientCnxn.java, line 134 https://reviews.apache.org/r/27244/diff/28/?file=789547#file789547line134 It sounds like making pendingQueue and outgoingQueue are optimizations that are independent from this issue. Shouldn't we make this change in a different jira? Hongchao Deng wrote: outgoingQueue isn't optimization... My previous patch used a semaphore which will be waken up by WakeupCnxn(). Your suggestion is use a blocking outgoingQueue. What do you think? I am fine since I know both changes well and I can do it in separate JIRAs. fpj wrote: My concern here is that it ended up implying more changes than just removing the semaphore. For example, the patch removes this synchronized block (not sure it is going to format it right): for (Packet p : outgoingQueue) { conLossPacket(p); } outgoingQueue.clear(); and checking the other synchronized blocks removed, it isn't entirely straightforward that you can remove all of them. Hongchao Deng wrote: I can add those synchronized back. However, findbugs will complain that so I will add rules too. On Netty side I can assure your concern: 1. In ClientCnxn, primeConnection(), cleanup(), queuePacket(). primeConnection(), cleanup() is guaranteed to be start and end of life cycles. queuePacket() is appending packets to outgoingQueue and waking up the blocking queue. What do you think? fpj wrote: I'm not suggesting that you bring back the synchronized blocks. I'm saying that I haven't checked carefully and it isn't straightforward that you can remove them. For example, the one in ClientCnxnSocketNIO line 163 has multiple accesses to outgoingQueue, so removing the synchronized block doesn't necessarily gives us the same behavior. Can you actually justify the removal of each one of those synchronized blocks? If you aren't sure, then I'll check it carefully. Hongchao Deng wrote: That would be great if you can check it. Actually I just found a possible race. Here's my version of analysis: 1. primeConnection: The start of life cycle. No one else can touch outgoingQueue except queuePacket(). primeConn prepends packets while queuePacket appends. 2. !!cleanup: The end of life cycle. No one else except queuePacket(). A possible race with queuePacket() in check closing and add packets. I will fix it shortly. 3. queuePacket: ... 4. findSendablePacket and doIO: this is the only place that remove items from outgoing Queue - the check in findSendablePacket() guarantees not empty. Because no where will remove items - enumeration will give a consistent current view. Adding new packets doesn't count in. - found non-priming packet, remove and addFirst wouldn't have race because here only queuePacket() will be called. And queuePacket() appends only. 5. last check in doTransport: this is duplicate.. I just realize between cleanup() and queuePacket() there is race regardless of the synchronized.. Even though cleanup() finishes, it doesn't guarantee that state is CLOSED or closeSession packet was sent. Thus queuePacket() doesn't call conLossPacket(). - Hongchao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/#review64807 --- On Dec. 12, 2014, 7:16 p.m., Hongchao Deng wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27244/ --- (Updated Dec. 12, 2014, 7:16 p.m.) Review request for zookeeper. Repository: zookeeper-git Description --- ZOOKEEPER-2069 Diffs - build.xml bb5ff4f src/java/main/org/apache/zookeeper/ClientCnxn.java b4ece07 src/java/main/org/apache/zookeeper/ClientCnxnSocket.java 5ca0ba7 src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java adb27ee src/java/main/org/apache/zookeeper/ClientCnxnSocketNetty.java PRE-CREATION src/java/main/org/apache/zookeeper/ZooKeeperTestable.java 775d1a2 src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java dbc1080 src/java/test/org/apache/zookeeper/test/ClientTest.java dbe595c src/java/test/org/apache/zookeeper/test/NettyNettySuiteBase.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteHammerTest.java PRE-CREATION src/java/test/org/apache/zookeeper/test/NettyNettySuiteTest.java PRE-CREATION Diff: https://reviews.apache.org/r/27244/diff/ Testing --- 1. use LinkedBlockingDeque. Thanks, Hongchao Deng
[jira] [Commented] (ZOOKEEPER-2091) Possible logic error in ClientCnxnSocketNIO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245244#comment-14245244 ] Rakesh R commented on ZOOKEEPER-2091: - bq.Did you try only loop writing in ClientCnxnSocketNIO#SendPacket() and see if it works? I can tell that's a problem. What could be the case [of missing pendingQueue.add(p)| https://issues.apache.org/jira/browse/ZOOKEEPER-2091?focusedCommentId=14227396page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14227396] ?. Unfortunately now I didn't get chance to put one more patch into the cluster where it hits this problem. Possible logic error in ClientCnxnSocketNIO --- Key: ZOOKEEPER-2091 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2091 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.6 Reporter: Cheng Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-2091.patch When SASL authentication is enabled, the ZooKeeper client will finally call ClientCnxnSocketNIO#sendPacket(Packet p) to send a packet to server: @Override void sendPacket(Packet p) throws IOException { SocketChannel sock = (SocketChannel) sockKey.channel(); if (sock == null) { throw new IOException(Socket is null!); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); } One problem I can see is that the sock is non-blocking, so when the sock's output buffer is full(theoretically), only part of the Packet is sent out and the communication will break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)