[GitHub] zookeeper pull request #672: ZOOKEEPER-3032 - MAVEN MIGRATION - zookeeper-se...

2018-10-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/672


---


[GitHub] zookeeper issue #672: ZOOKEEPER-3032 - MAVEN MIGRATION - zookeeper-server

2018-10-19 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/672
  
Merged to master branch. Thanks @nkalmar !


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226681741
  
--- Diff: 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -68,18 +70,21 @@
 public class ClientCnxnSocketNetty extends ClientCnxnSocket {
 private static final Logger LOG = 
LoggerFactory.getLogger(ClientCnxnSocketNetty.class);
 
-ChannelFactory channelFactory = new NioClientSocketChannelFactory(
-Executors.newCachedThreadPool(), 
Executors.newCachedThreadPool());
-Channel channel;
-CountDownLatch firstConnect;
-ChannelFuture connectFuture;
-Lock connectLock = new ReentrantLock();
-AtomicBoolean disconnected = new AtomicBoolean();
-AtomicBoolean needSasl = new AtomicBoolean();
-Semaphore waitSasl = new Semaphore(0);
+private final EventLoopGroup eventLoopGroup;
+private Channel channel;
+private CountDownLatch firstConnect;
+private ChannelFuture connectFuture;
+private final Lock connectLock = new ReentrantLock();
+private final AtomicBoolean disconnected = new AtomicBoolean();
+private final AtomicBoolean needSasl = new AtomicBoolean();
+private final Semaphore waitSasl = new Semaphore(0);
+
+private static final AtomicReference TEST_ALLOCATOR =
+new AtomicReference<>(null);
 
 ClientCnxnSocketNetty(ZKClientConfig clientConfig) throws IOException {
 this.clientConfig = clientConfig;
+eventLoopGroup = new NioEventLoopGroup(0, 
Executors.newCachedThreadPool());
--- End diff --

Let's move to Epoll.
It can be a followup change (I can send of you don't have already it on 
your stack of changes)


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226690548
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
 ---
@@ -335,29 +260,34 @@ public void operationComplete(ChannelFuture future)
 CnxnChannelHandler channelHandler = new CnxnChannelHandler();
 
 NettyServerCnxnFactory() {
-bootstrap = new ServerBootstrap(
-new NioServerSocketChannelFactory(
-Executors.newCachedThreadPool(),
-Executors.newCachedThreadPool()));
-// parent channel
-bootstrap.setOption("reuseAddress", true);
-// child channels
-bootstrap.setOption("child.tcpNoDelay", true);
-/* set socket linger to off, so that socket close does not block */
-bootstrap.setOption("child.soLinger", -1);
-bootstrap.setPipelineFactory(new ChannelPipelineFactory() {
-@Override
-public ChannelPipeline getPipeline() throws Exception {
-ChannelPipeline p = Channels.pipeline();
-if (secure) {
-initSSL(p);
-}
-p.addLast("servercnxnfactory", channelHandler);
-
-return p;
-}
-});
 x509Util = new ClientX509Util();
+
+EventLoopGroup bossGroup = new NioEventLoopGroup(0, 
Executors.newCachedThreadPool());
--- End diff --

Consider EPoll


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226682998
  
--- Diff: 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -103,71 +108,95 @@
 boolean isConnected() {
 // Assuming that isConnected() is only used to initiate connection,
 // not used by some other connection status judgement.
-return channel != null;
+connectLock.lock();
+try {
+return connectFuture != null || channel != null;
+} finally {
+connectLock.unlock();
+}
 }
 
 @Override
 void connect(InetSocketAddress addr) throws IOException {
 firstConnect = new CountDownLatch(1);
 
-ClientBootstrap bootstrap = new ClientBootstrap(channelFactory);
-
-bootstrap.setPipelineFactory(new 
ZKClientPipelineFactory(addr.getHostString(), addr.getPort()));
-bootstrap.setOption("soLinger", -1);
-bootstrap.setOption("tcpNoDelay", true);
-
-connectFuture = bootstrap.connect(addr);
-connectFuture.addListener(new ChannelFutureListener() {
-@Override
-public void operationComplete(ChannelFuture channelFuture) 
throws Exception {
-// this lock guarantees that channel won't be assgined 
after cleanup().
-connectLock.lock();
-try {
-if (!channelFuture.isSuccess() || connectFuture == 
null) {
-LOG.info("future isn't success, cause: {}", 
channelFuture.getCause());
-return;
-}
-// setup channel, variables, connection, etc.
-channel = channelFuture.getChannel();
-
-disconnected.set(false);
-initialized = false;
-lenBuffer.clear();
-incomingBuffer = lenBuffer;
-
-sendThread.primeConnection();
-updateNow();
-updateLastSendAndHeard();
-
-if (sendThread.tunnelAuthInProgress()) {
-waitSasl.drainPermits();
-needSasl.set(true);
-sendPrimePacket();
-} else {
-needSasl.set(false);
-}
+Bootstrap bootstrap = new Bootstrap();
+bootstrap.group(Objects.requireNonNull(eventLoopGroup))
+.channel(NioSocketChannel.class)
+.handler(new ZKClientPipelineFactory(addr.getHostString(), 
addr.getPort()))
+.option(ChannelOption.SO_LINGER, -1)
+.option(ChannelOption.TCP_NODELAY, true);
+ByteBufAllocator testAllocator = TEST_ALLOCATOR.get();
+if (testAllocator != null) {
+bootstrap.option(ChannelOption.ALLOCATOR, testAllocator);
+}
+bootstrap.validate();
+
+connectLock.lock();
+try {
+connectFuture = bootstrap.connect(addr);
+connectFuture.addListener(new ChannelFutureListener() {
+@Override
+public void operationComplete(ChannelFuture channelFuture) 
throws Exception {
+// this lock guarantees that channel won't be assigned 
after cleanup().
+connectLock.lock();
+try {
+if (!channelFuture.isSuccess()) {
+LOG.info("future isn't success, cause:", 
channelFuture.cause());
+return;
+} else if (connectFuture == null) {
+LOG.info("connect attempt cancelled");
+// If the connect attempt was cancelled but 
succeeded
+// anyway, make sure to close the channel, 
otherwise
+// we may leak a file descriptor.
+channelFuture.channel().close();
--- End diff --

Can this turn into an NPE? As channel() may return null. 


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226689302
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
 ---
@@ -116,170 +115,94 @@ public void channelConnected(ChannelHandlerContext 
ctx,
 
 NettyServerCnxn cnxn = new NettyServerCnxn(channel,
 zkServer, NettyServerCnxnFactory.this);
-ctx.setAttachment(cnxn);
+ctx.channel().attr(CONNECTION_ATTRIBUTE).set(cnxn);
 
 if (secure) {
-SslHandler sslHandler = 
ctx.getPipeline().get(SslHandler.class);
-ChannelFuture handshakeFuture = sslHandler.handshake();
+SslHandler sslHandler = 
ctx.pipeline().get(SslHandler.class);
+Future handshakeFuture = 
sslHandler.handshakeFuture();
 handshakeFuture.addListener(new 
CertificateVerifier(sslHandler, cnxn));
 } else {
-allChannels.add(ctx.getChannel());
+allChannels.add(ctx.channel());
 addCnxn(cnxn);
 }
 }
 
 @Override
-public void channelDisconnected(ChannelHandlerContext ctx,
-ChannelStateEvent e) throws Exception
-{
-if (LOG.isTraceEnabled()) {
-LOG.trace("Channel disconnected " + e);
-}
-NettyServerCnxn cnxn = (NettyServerCnxn) ctx.getAttachment();
+public void channelInactive(ChannelHandlerContext ctx) throws 
Exception {
+LOG.trace("Channel inactive {}", ctx.channel());
--- End diff --

 isTraceEnabled is missing here?


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226687650
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java 
---
@@ -200,24 +186,13 @@ public void setSessionId(long sessionId) {
 this.sessionId = sessionId;
 }
 
-@Override
-public void enableRecv() {
-if (throttled) {
-throttled = false;
-if (LOG.isDebugEnabled()) {
-LOG.debug("Sending unthrottle event " + this);
-}
-channel.getPipeline().sendUpstream(new 
ResumeMessageEvent(channel));
-}
-}
-
 @Override
 public void sendBuffer(ByteBuffer sendBuffer) {
 if (sendBuffer == ServerCnxnFactory.closeConn) {
 close();
 return;
 }
-channel.write(wrappedBuffer(sendBuffer));
+channel.writeAndFlush(Unpooled.wrappedBuffer(sendBuffer));
--- End diff --

Consider using voidPromise()


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226684843
  
--- Diff: 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -439,13 +466,34 @@ public void messageReceived(ChannelHandlerContext ctx,
 }
 }
 wakeupCnxn();
+// Note: SimpleChannelInboundHandler releases the ByteBuf for 
us
+// so we don't need to do it.
 }
 
 @Override
-public void exceptionCaught(ChannelHandlerContext ctx,
-ExceptionEvent e) throws Exception {
-LOG.warn("Exception caught: {}", e, e.getCause());
+public void exceptionCaught(ChannelHandlerContext ctx, Throwable 
cause) {
+LOG.warn("Exception caught", cause);
 cleanup();
 }
 }
+
+/**
+ * Sets the test ByteBufAllocator. This allocator will be used by all
+ * future instances of this class.
+ * It is not recommended to use this method outside of testing.
+ * @param allocator the ByteBufAllocator to use for all netty buffer
+ *  allocations.
+ */
+public static void setTestAllocator(ByteBufAllocator allocator) {
+TEST_ALLOCATOR.set(allocator);
--- End diff --

I think this is a security hole. We are in the client, so untrusted code 
may call this public method.
We should use mockito/powermock for this stuff.
Is there are another way?


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226685450
  
--- Diff: 
zookeeper-common/src/test/java/org/apache/zookeeper/common/TestByteBufAllocator.java
 ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.common;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.CompositeByteBuf;
+import io.netty.buffer.PooledByteBufAllocator;
+import io.netty.util.ResourceLeakDetector;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.atomic.AtomicReference;
+
+/**
+ * This is a custom ByteBufAllocator that tracks outstanding allocations 
and
+ * crashes the program if any of them are leaked.
+ *
+ * Never use this class in production, it will cause your server to run out
+ * of memory! This is because it holds strong references to all allocated
+ * buffers and doesn't release them until checkForLeaks() is called at the
+ * end of a unit test.
+ *
+ * Note: the original code was copied from 
https://github.com/airlift/drift,
+ * with the permission and encouragement of airlift's author (dain). 
Airlift
+ * uses the same apache 2.0 license as Zookeeper so this should be ok.
+ *
+ * However, the code was modified to take advantage of Netty's built-in
+ * leak tracking and make a best effort to print details about buffer 
leaks.
+ */
+public class TestByteBufAllocator extends PooledByteBufAllocator {
--- End diff --

This is interesting

Netty has already built in support for this kind of stuff.I see that this 
class is smarter.
Isn't running test with paranoid leak detection enough?


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226690017
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
 ---
@@ -116,170 +115,94 @@ public void channelConnected(ChannelHandlerContext 
ctx,
 
 NettyServerCnxn cnxn = new NettyServerCnxn(channel,
 zkServer, NettyServerCnxnFactory.this);
-ctx.setAttachment(cnxn);
+ctx.channel().attr(CONNECTION_ATTRIBUTE).set(cnxn);
 
 if (secure) {
-SslHandler sslHandler = 
ctx.getPipeline().get(SslHandler.class);
-ChannelFuture handshakeFuture = sslHandler.handshake();
+SslHandler sslHandler = 
ctx.pipeline().get(SslHandler.class);
+Future handshakeFuture = 
sslHandler.handshakeFuture();
 handshakeFuture.addListener(new 
CertificateVerifier(sslHandler, cnxn));
 } else {
-allChannels.add(ctx.getChannel());
+allChannels.add(ctx.channel());
 addCnxn(cnxn);
 }
 }
 
 @Override
-public void channelDisconnected(ChannelHandlerContext ctx,
-ChannelStateEvent e) throws Exception
-{
-if (LOG.isTraceEnabled()) {
-LOG.trace("Channel disconnected " + e);
-}
-NettyServerCnxn cnxn = (NettyServerCnxn) ctx.getAttachment();
+public void channelInactive(ChannelHandlerContext ctx) throws 
Exception {
+LOG.trace("Channel inactive {}", ctx.channel());
+allChannels.remove(ctx.channel());
+NettyServerCnxn cnxn = 
ctx.channel().attr(CONNECTION_ATTRIBUTE).getAndSet(null);
 if (cnxn != null) {
-if (LOG.isTraceEnabled()) {
-LOG.trace("Channel disconnect caused close " + e);
-}
+LOG.trace("Channel inactive caused close {}", cnxn);
 cnxn.close();
 }
 }
 
 @Override
-public void exceptionCaught(ChannelHandlerContext ctx, 
ExceptionEvent e)
-throws Exception
-{
-LOG.warn("Exception caught " + e, e.getCause());
-NettyServerCnxn cnxn = (NettyServerCnxn) ctx.getAttachment();
+public void exceptionCaught(ChannelHandlerContext ctx, Throwable 
cause) throws Exception {
+LOG.warn("Exception caught", cause);
+NettyServerCnxn cnxn = 
ctx.channel().attr(CONNECTION_ATTRIBUTE).getAndSet(null);
 if (cnxn != null) {
-if (LOG.isDebugEnabled()) {
-LOG.debug("Closing " + cnxn);
-}
+LOG.debug("Closing {}", cnxn);
 cnxn.close();
 }
 }
 
 @Override
-public void messageReceived(ChannelHandlerContext ctx, 
MessageEvent e)
-throws Exception
-{
-if (LOG.isTraceEnabled()) {
-LOG.trace("message received called " + e.getMessage());
-}
+public void userEventTriggered(ChannelHandlerContext ctx, Object 
evt) throws Exception {
 try {
-if (LOG.isDebugEnabled()) {
-LOG.debug("New message " + e.toString()
-+ " from " + ctx.getChannel());
-}
-NettyServerCnxn cnxn = 
(NettyServerCnxn)ctx.getAttachment();
-synchronized(cnxn) {
-processMessage(e, cnxn);
+if (evt == NettyServerCnxn.AutoReadEvent.ENABLE) {
+LOG.debug("Received AutoReadEvent.ENABLE");
+NettyServerCnxn cnxn = 
ctx.channel().attr(CONNECTION_ATTRIBUTE).get();
+// TODO(ilyam): Not sure if cnxn can be null here. It 
becomes null if channelInactive()
+// or exceptionCaught() trigger, but it's unclear to 
me if userEventTriggered() can run
+// after either of those. Check for null just to be 
safe ...
+if (cnxn != null) {
+cnxn.processQueuedBuffer();
+}
+ctx.channel().config().setAutoRead(true);
+} else if (evt == NettyServerCnxn.AutoReadEvent.DISABLE) {
+LOG.debug("Received AutoReadEvent.DISABLE");
+ctx.channel().config().setAutoRead(false);
 }
-} catch(Exception ex) {
-LOG.error("Unexpected exception in

[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226683922
  
--- Diff: 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -184,7 +213,9 @@ void cleanup() {
 
 @Override
 void close() {
-channelFactory.releaseExternalResources();
+if (!eventLoopGroup.isShuttingDown()) {
--- End diff --

Is this really needed?


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226684209
  
--- Diff: 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -267,7 +298,7 @@ private void sendPkt(Packet p) {
 p.createBB();
 updateLastSend();
 sentCount++;
-channel.write(ChannelBuffers.wrappedBuffer(p.bb));
+channel.writeAndFlush(Unpooled.wrappedBuffer(p.bb));
--- End diff --

What about adding ', channel.voidPromise()' ?


---


[jira] [Commented] (ZOOKEEPER-3032) Step 1.6 - Create zk-server maven structure

2018-10-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656984#comment-16656984
 ] 

Hudson commented on ZOOKEEPER-3032:
---

FAILURE: Integrated in Jenkins build Zookeeper-trunk-single-thread #31 (See 
[https://builds.apache.org/job/Zookeeper-trunk-single-thread/31/])
ZOOKEEPER-3032: MAVEN MIGRATION - zookeeper-server (andor: rev 
cb9f303bda9137d1aebe8eff3eab85c8a59f3cdd)
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/cli/CliWrapperException.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ObserverHierarchicalQuorumTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ClientSSLTest.java
* (delete) 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocket.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/ZooKeeperTest.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/RestoreCommittedLogTest.java
* (edit) bin/zkEnv.sh
* (delete) zookeeper-common/src/main/java/org/apache/zookeeper/StatsTrack.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/ObserverLETest.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/common/Time.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/LogChopperTest.java
* (delete) 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientWatchManager.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/FLEPredicateTest.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/ZKUtil.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/cli/StatPrinter.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/AsyncOpsTest.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/ZooDefs.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/RestoreCommittedLogTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/QuorumUtilTest.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/StringUtilTest.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/cli/LsCommand.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/InvalidSnapshotTest.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/cli/CloseCommand.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ReconfigExceptionTest.java
* (add) zookeeper-server/src/test/java/org/apache/zookeeper/test/QuorumUtil.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/SaslAuthDesignatedClientTest.java
* (delete) 
zookeeper-common/src/test/resources/data/invalidsnap/version-2/log.63b
* (add) zookeeper-server/src/test/resources/test-patch.properties
* (edit) zookeeper-contrib/zookeeper-contrib-rest/src/test/zkServer.sh
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/RemoveWatchesTest.java
* (delete) zookeeper-common/src/test/resources/data/invalidsnap/version-2/log.42
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/StaticHostProviderTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/FLERestartTest.java
* (add) zookeeper-server/src/main/resources/lib/cobertura/README.txt
* (delete) zookeeper-common/src/test/resources/test-patch.properties
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/EventTypeTest.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/MultiTransactionRecordTest.java
* (add) zookeeper-server/src/test/java/org/apache/zookeeper/test/AsyncTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/ClientReconnectTest.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/NonRecoverableErrorTest.java
* (add) zookeeper-server/src/test/resources/data/invalidsnap/version-2/log.63b
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/MultiResponseTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/MultiTransactionRecordTest.java
* (delete) 
zookeeper-common/src/main/java/org/apache/zookeeper/cli/SetCommand.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNIO.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/cli/CliParseException.java
* (add) zookeeper-server/src/main/resources/lib/log4j-1.2.17.LICENSE.txt
* (delete) 
zookeeper-common/src/test/resources/data/buffersize/snapshot/version-2/snapshot.2
* (add) 
zookeeper-server/src/test/resources/data/invalidsnap/version-2/snapshot.273
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/OSMXBeanTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ClientSkipACLTest.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/cli/AclParser.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/CnxManagerTest.java
* (add) zookeeper-server/src/main/java/org/apa

[jira] [Commented] (ZOOKEEPER-3169) Reduce session revalidation time after zxid roll over

2018-10-19 Thread Fangmin Lv (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657033#comment-16657033
 ] 

Fangmin Lv commented on ZOOKEEPER-3169:
---

[~TyqITstudent] is this a duplicate Jira with ZOOKEEPER-3168, can we close this 
one if it is opened by mistake?

> Reduce session revalidation time after zxid roll over
> -
>
> Key: ZOOKEEPER-3169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3169
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.5, 3.5.0
>Reporter: 田毅群
>Priority: Major
> Fix For: 3.4.5
>
>
> 1. Sometimes Zookeeper cluster will receive a lot of connections from 
> clients, sometimes connection number even exceeds 1W.  When zxid rolls over, 
> the clients will reconnect and revalidate the session.
> 2. In Zookeeper design structure, when follower server receives the session 
> revalidation requests, it will send requests to leader server, which is 
> designed to be responsible for session revalidation. 
> 3.  In a short time, Leader will handle lots of requests.  I use a tool to 
> get the statistics, some clients need to wait over 20s. It is too long for 
> some special clients, like ResourceManager.
> 4.  I design a thought: when zxid rollover happens. Leader will record the 
> accurate time. When reelection finishs, all servers will get the rollover 
> time. When clients reconnect and revalidate session. All servers can judge 
> it. So it can reduce a lots of pressure of cluster,  all clients can will 
> wait for less time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3169) Reduce session revalidation time after zxid roll over

2018-10-19 Thread Fangmin Lv (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657035#comment-16657035
 ] 

Fangmin Lv commented on ZOOKEEPER-3169:
---

Btw, can you point the code where the reconnect and revalidate happened when 
the zxid rolls over? I'm not aware of this before.

> Reduce session revalidation time after zxid roll over
> -
>
> Key: ZOOKEEPER-3169
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3169
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.5, 3.5.0
>Reporter: 田毅群
>Priority: Major
> Fix For: 3.4.5
>
>
> 1. Sometimes Zookeeper cluster will receive a lot of connections from 
> clients, sometimes connection number even exceeds 1W.  When zxid rolls over, 
> the clients will reconnect and revalidate the session.
> 2. In Zookeeper design structure, when follower server receives the session 
> revalidation requests, it will send requests to leader server, which is 
> designed to be responsible for session revalidation. 
> 3.  In a short time, Leader will handle lots of requests.  I use a tool to 
> get the statistics, some clients need to wait over 20s. It is too long for 
> some special clients, like ResourceManager.
> 4.  I design a thought: when zxid rollover happens. Leader will record the 
> accurate time. When reelection finishs, all servers will get the rollover 
> time. When clients reconnect and revalidate session. All servers can judge 
> it. So it can reduce a lots of pressure of cluster,  all clients can will 
> wait for less time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3170) Umbrella for eliminating ZooKeeper flaky tests

2018-10-19 Thread Fangmin Lv (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657037#comment-16657037
 ] 

Fangmin Lv commented on ZOOKEEPER-3170:
---

Thanks [~andorm] for tracking all those flaky tests, I'll pick up some of them 
to work on.

> Umbrella for eliminating ZooKeeper flaky tests
> --
>
> Key: ZOOKEEPER-3170
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3170
> Project: ZooKeeper
>  Issue Type: Test
>  Components: tests
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
>
> Umbrella ticket for joint community efforts to reduce number of flaky tests 
> and improve the stability of our Jenkins builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #674: ZOOKEEPER-3032: MAVEN MIGRATION - branch-3.5 - ...

2018-10-19 Thread nkalmar
GitHub user nkalmar opened a pull request:

https://github.com/apache/zookeeper/pull/674

ZOOKEEPER-3032: MAVEN MIGRATION - branch-3.5 - zookeeper-server

Separating the java code is not feasible. Moving common and client back to 
server.

Author: Norbert Kalmar 

Reviewers: an...@apache.org

Closes #672 from nkalmar/ZOOKEEPER-3032r

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nkalmar/zookeeper ZOOKEEPER-3032r-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/674.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #674


commit 70e5c926ea524a2e8e486cee116398fe12a19be1
Author: Norbert Kalmar 
Date:   2018-10-19T12:39:50Z

ZOOKEEPER-3032: MAVEN MIGRATION - zookeeper-server

Separating the java code is not feasible. Moving common and client back to 
server.

Author: Norbert Kalmar 

Reviewers: an...@apache.org

Closes #672 from nkalmar/ZOOKEEPER-3032r




---


[GitHub] zookeeper issue #674: ZOOKEEPER-3032: MAVEN MIGRATION - branch-3.5 - zookeep...

2018-10-19 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/674
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2476/



---


Success: ZOOKEEPER- PreCommit Build #2476

2018-10-19 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2476/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 82.05 MB...]
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] Session logged out. Session was 
JSESSIONID=54B35CBECE3BA718FA7AC0F616B082EA.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 are the same file

BUILD SUCCESSFUL
Total time: 21 minutes 25 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
[description-setter] Description set: ZOOKEEPER-3032
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Adding one-line test results to commit status...
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting status of 70e5c926ea524a2e8e486cee116398fe12a19be1 to SUCCESS with url 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2476/ and 
message: 'SUCCESS 
 1724 tests run, 3 skipped, 0 failed.'
Using context: Jenkins

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2476/

Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-3032) Step 1.6 - Create zk-server maven structure

2018-10-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657133#comment-16657133
 ] 

Hudson commented on ZOOKEEPER-3032:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #240 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/240/])
ZOOKEEPER-3032: MAVEN MIGRATION - zookeeper-server (andor: rev 
cb9f303bda9137d1aebe8eff3eab85c8a59f3cdd)
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/AsyncOpsTest.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/AuthTest.java
* (delete) 
zookeeper-common/src/main/java/org/apache/zookeeper/cli/LsCommand.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/ObserverLETest.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/ClientReconnectTest.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/version/util/VerGen.java
* (delete) 
zookeeper-client/zookeeper-client-java/src/main/java/org/apache/zookeeper/client/ZooKeeperSaslClient.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/common/AtomicFileWritingIdiomTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/common/NetUtils.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/Testable.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/SaslAuthMissingClientConfigTest.java
* (add) zookeeper-server/src/test/resources/data/kerberos/minikdc.ldiff
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/SSLTest.java.orig
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/common/ZKTrustManager.java
* (delete) zookeeper-common/src/test/resources/data/upgrade/log.11bf0
* (delete) zookeeper-common/src/main/java/org/apache/zookeeper/Quotas.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/cli/MalformedPathException.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/SessionUpgradeTest.java
* (delete) 
zookeeper-common/src/main/java/org/apache/zookeeper/cli/GetAclCommand.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/FourLetterWordsTest.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/common/X509Util.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/CreateTest.java
* (add) 
zookeeper-server/src/test/resources/data/invalidsnap/version-2/snapshot.272
* (delete) zookeeper-common/src/main/java/org/apache/zookeeper/OpResult.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ChrootAsyncTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/ReconfigExceptionTest.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/StandaloneTest.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/MultiTransactionRecord.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/cli/LsCommand.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/common/PathUtilsTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/WatchedEventTest.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/FourLetterWordsTest.java
* (delete) 
zookeeper-client/zookeeper-client-java/src/main/java/org/apache/zookeeper/client/StaticHostProvider.java
* (delete) zookeeper-common/src/test/resources/check_compatibility.py
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/JaasConfiguration.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/DuplicateLocalSessionUpgradeTest.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/NettyNettySuiteBase.java
* (add) zookeeper-server/src/main/java/org/apache/zookeeper/ZKUtil.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/NullDataTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/common/ZKTrustManagerTest.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/test/CnxManagerTest.java
* (delete) zookeeper-common/src/test/resources/data/invalidsnap/version-2/log.42
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/WatchDeregistration.java
* (add) zookeeper-server/src/test/java/org/apache/zookeeper/test/ClientTest.java
* (edit) zookeeper-contrib/build-contrib.xml
* (delete) zookeeper-common/src/main/java/org/apache/zookeeper/StatsTrack.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/JUnit4ZKTestRunner.java
* (delete) 
zookeeper-common/src/test/java/org/apache/zookeeper/test/ChrootClientTest.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/cli/CreateCommand.java
* (delete) 
zookeeper-common/src/main/java/org/apache/zookeeper/common/NetUtils.java
* (add) zookeeper-server/src/test/resources/data/upgrade/log.11bf0
* (delete) 
zookeeper-common/src/main/java/org/ap

[GitHub] zookeeper issue #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread enixon
Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/669
  
@eolivelli , good find with EPoll.  :)

When @ivmaykov first mentioned using EPoll to me as a potential 
optimization, I recommended leaving it for later so we would do the reviewers a 
favor and keep the complexity of this pull request relatively low. We do think 
it's the right implementation to use here, the only question is where/when to 
make that contribution.



---


[GitHub] zookeeper pull request #675: ZOOKEEPER-3032: MAVEN MIGRATION - branch-3.4 - ...

2018-10-19 Thread nkalmar
GitHub user nkalmar opened a pull request:

https://github.com/apache/zookeeper/pull/675

ZOOKEEPER-3032: MAVEN MIGRATION - branch-3.4 - zookeeper-server



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nkalmar/zookeeper ZOOKEEPER-3032r-3.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/675.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #675


commit 98a30ac3f284cc82d88759e59a46ad0d7b27b229
Author: Norbert Kalmar 
Date:   2018-10-19T19:04:22Z

ZOOKEEPER-3032: MAVEN MIGRATION - zookeeper-server




---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226755285
  
--- Diff: 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -68,18 +70,21 @@
 public class ClientCnxnSocketNetty extends ClientCnxnSocket {
 private static final Logger LOG = 
LoggerFactory.getLogger(ClientCnxnSocketNetty.class);
 
-ChannelFactory channelFactory = new NioClientSocketChannelFactory(
-Executors.newCachedThreadPool(), 
Executors.newCachedThreadPool());
-Channel channel;
-CountDownLatch firstConnect;
-ChannelFuture connectFuture;
-Lock connectLock = new ReentrantLock();
-AtomicBoolean disconnected = new AtomicBoolean();
-AtomicBoolean needSasl = new AtomicBoolean();
-Semaphore waitSasl = new Semaphore(0);
+private final EventLoopGroup eventLoopGroup;
+private Channel channel;
+private CountDownLatch firstConnect;
+private ChannelFuture connectFuture;
+private final Lock connectLock = new ReentrantLock();
+private final AtomicBoolean disconnected = new AtomicBoolean();
+private final AtomicBoolean needSasl = new AtomicBoolean();
+private final Semaphore waitSasl = new Semaphore(0);
+
+private static final AtomicReference TEST_ALLOCATOR =
+new AtomicReference<>(null);
 
 ClientCnxnSocketNetty(ZKClientConfig clientConfig) throws IOException {
 this.clientConfig = clientConfig;
+eventLoopGroup = new NioEventLoopGroup(0, 
Executors.newCachedThreadPool());
--- End diff --

I'd like to do it in a follow-up diff. I was thinking we default to NIO 
(since it works on all OSes), and have a config option to use Epoll instead.


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226755419
  
--- Diff: 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -103,71 +108,95 @@
 boolean isConnected() {
 // Assuming that isConnected() is only used to initiate connection,
 // not used by some other connection status judgement.
-return channel != null;
+connectLock.lock();
+try {
+return connectFuture != null || channel != null;
+} finally {
+connectLock.unlock();
+}
 }
 
 @Override
 void connect(InetSocketAddress addr) throws IOException {
 firstConnect = new CountDownLatch(1);
 
-ClientBootstrap bootstrap = new ClientBootstrap(channelFactory);
-
-bootstrap.setPipelineFactory(new 
ZKClientPipelineFactory(addr.getHostString(), addr.getPort()));
-bootstrap.setOption("soLinger", -1);
-bootstrap.setOption("tcpNoDelay", true);
-
-connectFuture = bootstrap.connect(addr);
-connectFuture.addListener(new ChannelFutureListener() {
-@Override
-public void operationComplete(ChannelFuture channelFuture) 
throws Exception {
-// this lock guarantees that channel won't be assgined 
after cleanup().
-connectLock.lock();
-try {
-if (!channelFuture.isSuccess() || connectFuture == 
null) {
-LOG.info("future isn't success, cause: {}", 
channelFuture.getCause());
-return;
-}
-// setup channel, variables, connection, etc.
-channel = channelFuture.getChannel();
-
-disconnected.set(false);
-initialized = false;
-lenBuffer.clear();
-incomingBuffer = lenBuffer;
-
-sendThread.primeConnection();
-updateNow();
-updateLastSendAndHeard();
-
-if (sendThread.tunnelAuthInProgress()) {
-waitSasl.drainPermits();
-needSasl.set(true);
-sendPrimePacket();
-} else {
-needSasl.set(false);
-}
+Bootstrap bootstrap = new Bootstrap();
+bootstrap.group(Objects.requireNonNull(eventLoopGroup))
+.channel(NioSocketChannel.class)
+.handler(new ZKClientPipelineFactory(addr.getHostString(), 
addr.getPort()))
+.option(ChannelOption.SO_LINGER, -1)
+.option(ChannelOption.TCP_NODELAY, true);
+ByteBufAllocator testAllocator = TEST_ALLOCATOR.get();
+if (testAllocator != null) {
+bootstrap.option(ChannelOption.ALLOCATOR, testAllocator);
+}
+bootstrap.validate();
+
+connectLock.lock();
+try {
+connectFuture = bootstrap.connect(addr);
+connectFuture.addListener(new ChannelFutureListener() {
+@Override
+public void operationComplete(ChannelFuture channelFuture) 
throws Exception {
+// this lock guarantees that channel won't be assigned 
after cleanup().
+connectLock.lock();
+try {
+if (!channelFuture.isSuccess()) {
+LOG.info("future isn't success, cause:", 
channelFuture.cause());
+return;
+} else if (connectFuture == null) {
+LOG.info("connect attempt cancelled");
+// If the connect attempt was cancelled but 
succeeded
+// anyway, make sure to close the channel, 
otherwise
+// we may leak a file descriptor.
+channelFuture.channel().close();
--- End diff --

I don't think so, since this code can only trigger if the connect future is 
successful. If the future is not successful, the previous if branch will be 
taken.


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226755668
  
--- Diff: 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -184,7 +213,9 @@ void cleanup() {
 
 @Override
 void close() {
-channelFactory.releaseExternalResources();
+if (!eventLoopGroup.isShuttingDown()) {
--- End diff --

I'm not sure if calling `shutdownGracefully` more than once is allowed, 
which is why I added the check. It might not be necessary.


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226756052
  
--- Diff: 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -267,7 +298,7 @@ private void sendPkt(Packet p) {
 p.createBB();
 updateLastSend();
 sentCount++;
-channel.write(ChannelBuffers.wrappedBuffer(p.bb));
+channel.writeAndFlush(Unpooled.wrappedBuffer(p.bb));
--- End diff --

Can you explain what the purpose of that would be? According to the 
documentation, voidPromise() returns a promise that will never be notified of 
success or failure.


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226756565
  
--- Diff: 
zookeeper-common/src/test/java/org/apache/zookeeper/common/TestByteBufAllocator.java
 ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.common;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.CompositeByteBuf;
+import io.netty.buffer.PooledByteBufAllocator;
+import io.netty.util.ResourceLeakDetector;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.atomic.AtomicReference;
+
+/**
+ * This is a custom ByteBufAllocator that tracks outstanding allocations 
and
+ * crashes the program if any of them are leaked.
+ *
+ * Never use this class in production, it will cause your server to run out
+ * of memory! This is because it holds strong references to all allocated
+ * buffers and doesn't release them until checkForLeaks() is called at the
+ * end of a unit test.
+ *
+ * Note: the original code was copied from 
https://github.com/airlift/drift,
+ * with the permission and encouragement of airlift's author (dain). 
Airlift
+ * uses the same apache 2.0 license as Zookeeper so this should be ok.
+ *
+ * However, the code was modified to take advantage of Netty's built-in
+ * leak tracking and make a best effort to print details about buffer 
leaks.
+ */
+public class TestByteBufAllocator extends PooledByteBufAllocator {
--- End diff --

Paranoid leak detection will just print details about leaks, but the test 
will still pass and it takes a human to examine the test's stderr output to see 
that there was a problem. Using this allocator will make the test fail if there 
are buffer leaks.


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226756353
  
--- Diff: 
zookeeper-common/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -439,13 +466,34 @@ public void messageReceived(ChannelHandlerContext ctx,
 }
 }
 wakeupCnxn();
+// Note: SimpleChannelInboundHandler releases the ByteBuf for 
us
+// so we don't need to do it.
 }
 
 @Override
-public void exceptionCaught(ChannelHandlerContext ctx,
-ExceptionEvent e) throws Exception {
-LOG.warn("Exception caught: {}", e, e.getCause());
+public void exceptionCaught(ChannelHandlerContext ctx, Throwable 
cause) {
+LOG.warn("Exception caught", cause);
 cleanup();
 }
 }
+
+/**
+ * Sets the test ByteBufAllocator. This allocator will be used by all
+ * future instances of this class.
+ * It is not recommended to use this method outside of testing.
+ * @param allocator the ByteBufAllocator to use for all netty buffer
+ *  allocations.
+ */
+public static void setTestAllocator(ByteBufAllocator allocator) {
+TEST_ALLOCATOR.set(allocator);
--- End diff --

Sure, but that would only affect that client and would have no effect on 
the server. All clients are untrusted by default since they run on a computer 
you don't own :)

I can look into using a mocking framework instead if you feel strongly 
about it.


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226756739
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java 
---
@@ -200,24 +186,13 @@ public void setSessionId(long sessionId) {
 this.sessionId = sessionId;
 }
 
-@Override
-public void enableRecv() {
-if (throttled) {
-throttled = false;
-if (LOG.isDebugEnabled()) {
-LOG.debug("Sending unthrottle event " + this);
-}
-channel.getPipeline().sendUpstream(new 
ResumeMessageEvent(channel));
-}
-}
-
 @Override
 public void sendBuffer(ByteBuffer sendBuffer) {
 if (sendBuffer == ServerCnxnFactory.closeConn) {
 close();
 return;
 }
-channel.write(wrappedBuffer(sendBuffer));
+channel.writeAndFlush(Unpooled.wrappedBuffer(sendBuffer));
--- End diff --

As above, I'm not sure what that provides. I am still learning about netty 
so please excuse my ignorance :)


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226757057
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
 ---
@@ -116,170 +115,94 @@ public void channelConnected(ChannelHandlerContext 
ctx,
 
 NettyServerCnxn cnxn = new NettyServerCnxn(channel,
 zkServer, NettyServerCnxnFactory.this);
-ctx.setAttachment(cnxn);
+ctx.channel().attr(CONNECTION_ATTRIBUTE).set(cnxn);
 
 if (secure) {
-SslHandler sslHandler = 
ctx.getPipeline().get(SslHandler.class);
-ChannelFuture handshakeFuture = sslHandler.handshake();
+SslHandler sslHandler = 
ctx.pipeline().get(SslHandler.class);
+Future handshakeFuture = 
sslHandler.handshakeFuture();
 handshakeFuture.addListener(new 
CertificateVerifier(sslHandler, cnxn));
 } else {
-allChannels.add(ctx.getChannel());
+allChannels.add(ctx.channel());
 addCnxn(cnxn);
 }
 }
 
 @Override
-public void channelDisconnected(ChannelHandlerContext ctx,
-ChannelStateEvent e) throws Exception
-{
-if (LOG.isTraceEnabled()) {
-LOG.trace("Channel disconnected " + e);
-}
-NettyServerCnxn cnxn = (NettyServerCnxn) ctx.getAttachment();
+public void channelInactive(ChannelHandlerContext ctx) throws 
Exception {
+LOG.trace("Channel inactive {}", ctx.channel());
--- End diff --

LOG.trace() does an isTraceEnabled check internally. If the additional 
parameters passed to the log method don't do any work (such as converting the 
contents of a buffer to a hex string), then the enclosing isTraceEnabled check 
is redundant.


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226757190
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
 ---
@@ -116,170 +115,94 @@ public void channelConnected(ChannelHandlerContext 
ctx,
 
 NettyServerCnxn cnxn = new NettyServerCnxn(channel,
 zkServer, NettyServerCnxnFactory.this);
-ctx.setAttachment(cnxn);
+ctx.channel().attr(CONNECTION_ATTRIBUTE).set(cnxn);
 
 if (secure) {
-SslHandler sslHandler = 
ctx.getPipeline().get(SslHandler.class);
-ChannelFuture handshakeFuture = sslHandler.handshake();
+SslHandler sslHandler = 
ctx.pipeline().get(SslHandler.class);
+Future handshakeFuture = 
sslHandler.handshakeFuture();
 handshakeFuture.addListener(new 
CertificateVerifier(sslHandler, cnxn));
 } else {
-allChannels.add(ctx.getChannel());
+allChannels.add(ctx.channel());
 addCnxn(cnxn);
 }
 }
 
 @Override
-public void channelDisconnected(ChannelHandlerContext ctx,
-ChannelStateEvent e) throws Exception
-{
-if (LOG.isTraceEnabled()) {
-LOG.trace("Channel disconnected " + e);
-}
-NettyServerCnxn cnxn = (NettyServerCnxn) ctx.getAttachment();
+public void channelInactive(ChannelHandlerContext ctx) throws 
Exception {
+LOG.trace("Channel inactive {}", ctx.channel());
+allChannels.remove(ctx.channel());
+NettyServerCnxn cnxn = 
ctx.channel().attr(CONNECTION_ATTRIBUTE).getAndSet(null);
 if (cnxn != null) {
-if (LOG.isTraceEnabled()) {
-LOG.trace("Channel disconnect caused close " + e);
-}
+LOG.trace("Channel inactive caused close {}", cnxn);
 cnxn.close();
 }
 }
 
 @Override
-public void exceptionCaught(ChannelHandlerContext ctx, 
ExceptionEvent e)
-throws Exception
-{
-LOG.warn("Exception caught " + e, e.getCause());
-NettyServerCnxn cnxn = (NettyServerCnxn) ctx.getAttachment();
+public void exceptionCaught(ChannelHandlerContext ctx, Throwable 
cause) throws Exception {
+LOG.warn("Exception caught", cause);
+NettyServerCnxn cnxn = 
ctx.channel().attr(CONNECTION_ATTRIBUTE).getAndSet(null);
 if (cnxn != null) {
-if (LOG.isDebugEnabled()) {
-LOG.debug("Closing " + cnxn);
-}
+LOG.debug("Closing {}", cnxn);
 cnxn.close();
 }
 }
 
 @Override
-public void messageReceived(ChannelHandlerContext ctx, 
MessageEvent e)
-throws Exception
-{
-if (LOG.isTraceEnabled()) {
-LOG.trace("message received called " + e.getMessage());
-}
+public void userEventTriggered(ChannelHandlerContext ctx, Object 
evt) throws Exception {
 try {
-if (LOG.isDebugEnabled()) {
-LOG.debug("New message " + e.toString()
-+ " from " + ctx.getChannel());
-}
-NettyServerCnxn cnxn = 
(NettyServerCnxn)ctx.getAttachment();
-synchronized(cnxn) {
-processMessage(e, cnxn);
+if (evt == NettyServerCnxn.AutoReadEvent.ENABLE) {
+LOG.debug("Received AutoReadEvent.ENABLE");
+NettyServerCnxn cnxn = 
ctx.channel().attr(CONNECTION_ATTRIBUTE).get();
+// TODO(ilyam): Not sure if cnxn can be null here. It 
becomes null if channelInactive()
+// or exceptionCaught() trigger, but it's unclear to 
me if userEventTriggered() can run
+// after either of those. Check for null just to be 
safe ...
+if (cnxn != null) {
+cnxn.processQueuedBuffer();
+}
+ctx.channel().config().setAutoRead(true);
+} else if (evt == NettyServerCnxn.AutoReadEvent.DISABLE) {
+LOG.debug("Received AutoReadEvent.DISABLE");
+ctx.channel().config().setAutoRead(false);
 }
-} catch(Exception ex) {
-LOG.error("Unexpected exception in 

[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r226757240
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
 ---
@@ -335,29 +260,34 @@ public void operationComplete(ChannelFuture future)
 CnxnChannelHandler channelHandler = new CnxnChannelHandler();
 
 NettyServerCnxnFactory() {
-bootstrap = new ServerBootstrap(
-new NioServerSocketChannelFactory(
-Executors.newCachedThreadPool(),
-Executors.newCachedThreadPool()));
-// parent channel
-bootstrap.setOption("reuseAddress", true);
-// child channels
-bootstrap.setOption("child.tcpNoDelay", true);
-/* set socket linger to off, so that socket close does not block */
-bootstrap.setOption("child.soLinger", -1);
-bootstrap.setPipelineFactory(new ChannelPipelineFactory() {
-@Override
-public ChannelPipeline getPipeline() throws Exception {
-ChannelPipeline p = Channels.pipeline();
-if (secure) {
-initSSL(p);
-}
-p.addLast("servercnxnfactory", channelHandler);
-
-return p;
-}
-});
 x509Util = new ClientX509Util();
+
+EventLoopGroup bossGroup = new NioEventLoopGroup(0, 
Executors.newCachedThreadPool());
--- End diff --

See comment above about making epoll optional based on a config option.


---


[GitHub] zookeeper issue #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-10-19 Thread ivmaykov
Github user ivmaykov commented on the issue:

https://github.com/apache/zookeeper/pull/669
  
@eolivelli thanks so much for the review! See my responses inline.


---


Success: ZOOKEEPER- PreCommit Build #2477

2018-10-19 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2477/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 41.06 MB...]
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] Session logged out. Session was 
JSESSIONID=005F1596A20A63FC101CDA77DDD2DF03.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 45 minutes 35 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
[description-setter] Description set: ZOOKEEPER-3032
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Adding one-line test results to commit status...
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting status of 98a30ac3f284cc82d88759e59a46ad0d7b27b229 to SUCCESS with url 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2477/ and 
message: 'SUCCESS 
 657 tests run, 3 skipped, 0 failed.'
Using context: Jenkins

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2477/

Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/



###
## FAILED TESTS (if any) 
##
All tests passed

[GitHub] zookeeper issue #675: ZOOKEEPER-3032: MAVEN MIGRATION - branch-3.4 - zookeep...

2018-10-19 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/675
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2477/



---


[jira] [Assigned] (ZOOKEEPER-3179) Add snapshot compression to reduce the disk IO

2018-10-19 Thread Fangmin Lv (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fangmin Lv reassigned ZOOKEEPER-3179:
-

Assignee: (was: Suyog Mapara)

> Add snapshot compression to reduce the disk IO
> --
>
> Key: ZOOKEEPER-3179
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3179
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Fangmin Lv
>Priority: Major
> Fix For: 3.6.0
>
>
> When the snapshot becomes larger, the periodically snapshot after certain 
> number of txns will be more expensive. Which will in turn affect the maximum 
> throughput we can support within SLA, because of the disk contention between 
> snapshot and txn when they're on the same drive.
>  
> With compression like zstd/snappy/gzip, the actual snapshot size could be 
> much smaller, the compress ratio depends on the actual data. It might make 
> the recovery time (loading from disk) faster in some cases, but will take 
> longer sometimes because of the extra time used to compress/decompress.
>  
> Based on the production traffic, the performance various with different 
> compress method as well, that's why we provided different implementations, we 
> can select different compress method for different use cases.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3179) Add snapshot compression to reduce the disk IO

2018-10-19 Thread Fangmin Lv (JIRA)
Fangmin Lv created ZOOKEEPER-3179:
-

 Summary: Add snapshot compression to reduce the disk IO
 Key: ZOOKEEPER-3179
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3179
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Fangmin Lv
Assignee: Suyog Mapara
 Fix For: 3.6.0


When the snapshot becomes larger, the periodically snapshot after certain 
number of txns will be more expensive. Which will in turn affect the maximum 
throughput we can support within SLA, because of the disk contention between 
snapshot and txn when they're on the same drive.
 
With compression like zstd/snappy/gzip, the actual snapshot size could be much 
smaller, the compress ratio depends on the actual data. It might make the 
recovery time (loading from disk) faster in some cases, but will take longer 
sometimes because of the extra time used to compress/decompress.
 
Based on the production traffic, the performance various with different 
compress method as well, that's why we provided different implementations, we 
can select different compress method for different use cases.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3180) Add response cache to improve the throughput of read heavy traffic

2018-10-19 Thread Fangmin Lv (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fangmin Lv updated ZOOKEEPER-3180:
--
Description: 
On read heavy use case with large response data size, the serialization of 
response takes time and added overhead to the GC.

Add response cache helps improving the throughput we can support, which also 
reduces the latency in general.

This Jira is going to implement a LRU cache for the response, which shows some 
performance gain on some of our production ensembles.

  was:
On read heavy use case with large response data size, the serialization of 
response takes time and added overhead to the GC.

Add response cache helps improving the throughput we can support, which also 
reduces the latency in general. This Jira is going to implement a LRU cache for 
the response, which shows some performance gain on some of our production 
ensembles.


> Add response cache to improve the throughput of read heavy traffic 
> ---
>
> Key: ZOOKEEPER-3180
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3180
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Priority: Minor
> Fix For: 3.6.0
>
>
> On read heavy use case with large response data size, the serialization of 
> response takes time and added overhead to the GC.
> Add response cache helps improving the throughput we can support, which also 
> reduces the latency in general.
> This Jira is going to implement a LRU cache for the response, which shows 
> some performance gain on some of our production ensembles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3180) Add response cache to improve the throughput of read heavy traffic

2018-10-19 Thread Fangmin Lv (JIRA)
Fangmin Lv created ZOOKEEPER-3180:
-

 Summary: Add response cache to improve the throughput of read 
heavy traffic 
 Key: ZOOKEEPER-3180
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3180
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Fangmin Lv
 Fix For: 3.6.0


On read heavy use case with large response data size, the serialization of 
response takes time and added overhead to the GC.

Add response cache helps improving the throughput we can support, which also 
reduces the latency in general. This Jira is going to implement a LRU cache for 
the response, which shows some performance gain on some of our production 
ensembles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #567: ZOOKEEPER-3071: Add a config parameter to control tran...

2018-10-19 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/zookeeper/pull/567
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2478/



---


Failed: ZOOKEEPER- PreCommit Build #2478

2018-10-19 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2478/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 77.84 MB...]
 [exec] 
==
 [exec] 
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] Session logged out. Session was 
JSESSIONID=BC48083E532E722D8D94FE251D797749.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1859:
 exec returned: 1

Total time: 21 minutes 36 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
[description-setter] Description set: ZOOKEEPER-3071
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Adding one-line test results to commit status...
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting status of 0a284cdd1b2456596cf2771e178520dc564c to FAILURE with url 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2478/ and 
message: 'FAILURE
 1793 tests run, 2 skipped, 1 failed.'
Using context: Jenkins

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2478/

Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting MAVEN_3_LATEST__HOME=/home/jenkins/tools/maven/latest3/



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalObserverRun

Error Message:
Timeout occurred. Please note the time in the report does not reflect the time 
until the timeout.

Stack Trace:
junit.framework.AssertionFailedError: Timeout occurred. Please note the time in 
the report does not reflect the time until the timeout.
at java.lang.Thread.run(Thread.java:748)

ZooKeeper-trunk - Build # 241 - Failure

2018-10-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/241/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 198.43 KB...]
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.194 sec, Thread: 1, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.316 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
47.358 sec, Thread: 2, Class: org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 2
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.422 sec, Thread: 3, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
3
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.099 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.136 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionTimeoutTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 1
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.719 sec, Thread: 1, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 1
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.492 sec, Thread: 1, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 1
[junit] Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.731 sec, Thread: 1, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.066 sec, Thread: 1, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.195 sec, Thread: 1, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 1
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
16.11 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
19.779 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.126 sec, Thread: 3, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 3
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
12.204 sec, Thread: 1, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 1
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.542 sec, Thread: 3, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 3
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.078 sec, Thread: 3, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.097 sec, Thread: 2, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.93 sec, Thread: 2, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Running org.apache.jute.BinaryInputArchiveTest in thread 2
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.03 sec, Thread: 3, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.081 sec, Thread: 2, Class: org.apache.jute.BinaryInputArchiveTest
[junit] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
31.341 sec

Re: Decrease number of threads in Jenkins builds to reduce flakyness

2018-10-19 Thread Bogdan Kanivets
I think the argument for keeping concurrency is that it may manifest some
unknown problems with the code.

Maybe a middle ground  - move largest offenders into separate junit tag and
run them after rest of the test with threads=1. Hopefully this will make
life better for PRs.

On the note of largest offenders, I've done 44 runs on aws r3.large with
various thread settings (1, 2, 4, 8).
Failure counts:
  1 testNextConfigAlreadyActive
  1 testNonExistingOpCode
  1 testRaceConditionBetweenLeaderAndAckRequestProcessor
  1 testWatcherDisconnectOnClose
  2 testDoubleElection
  5 testCurrentServersAreObserversInNextConfig
  5 testNormalFollowerRunWithDiff
  7 startSingleServerTest
 18 testNodeDataChanged

Haven't seen testPurgeWhenLogRollingInProgress
or testManyChildWatchersAutoReset failing yet.



On Thu, Oct 18, 2018 at 10:03 PM Michael Han  wrote:

> It's a good idea to reduce the concurrency of to eliminate flakyness. Looks
> like single threaded unit tests on trunk is pretty stable
> https://builds.apache.org/job/zookeeper-trunk-single-thread/ (some
> failures
> are due to C tests). The build time is longer, but not too bad (for
> pre-commit build, for nightly build, build time should not be a concern at
> all).
>
>
> On Mon, Oct 15, 2018 at 5:50 AM Andor Molnar 
> wrote:
>
> > +1
> >
> >
> >
> > On Mon, Oct 15, 2018 at 1:55 PM, Enrico Olivelli 
> > wrote:
> >
> > > Il giorno lun 15 ott 2018 alle ore 12:46 Andor Molnar
> > >  ha scritto:
> > > >
> > > > Thank you guys. This is great help.
> > > >
> > > > I remember your efforts Bogdan, as far as I remember you observer
> > thread
> > > starvation in multiple runs on Apache Jenkins. Correct my if I’m wrong.
> > > >
> > > > I’ve created an umbrella Jira to capture all flaky test fixing
> efforts
> > > here:
> > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3170 <
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-3170>
> > > >
> > > > All previous flaky-related tickets have been converted to sub-tasks.
> > > Some of them might not be up-to-date, please consider reviewing them
> and
> > > close if possible. Additionally feel free to create new sub-tasks to
> > > capture your actual work.
> > > >
> > > > I’ve already modified Trunk and branch-3.5 builds to run on 4 threads
> > > for testing initially. It resulted in slightly more stable tests:
> > >
> > > +1
> > >
> > > I have assigned the umbrella issue to you Andor as you are driving
> > > this important task. is is ok ?
> > >
> > > thank you
> > >
> > > Enrico
> > >
> > >
> > > >
> > > > Trunk (java 8) - failing 1/4 (since #229) - build time increased by
> > > 40-45%
> > > > Trunk (java 9) - failing 0/2 (since #993) - ~40%
> > > > Trunk (java 10) - failing 1/2 (since #280) -
> > > > branch-3.5 (java 8) - failing 0/4 (since #1153) - ~35-45%
> > > >
> > > > However the pattern is not big enough and results are inaccurate, so
> I
> > > need more builds. I also need to fix a bug in SSL to get java9/10
> builds
> > > working on 3.5.
> > > >
> > > > Please let me know if I should revert the changes. Precommit build is
> > > still running on 8 threads, but I’d like to change that one too.
> > > >
> > > > Regards,
> > > > Andor
> > > >
> > > >
> > > >
> > > > > On 2018. Oct 15., at 9:31, Bogdan Kanivets 
> > > wrote:
> > > > >
> > > > > Fangmin,
> > > > >
> > > > > Those are good ideas.
> > > > >
> > > > > FYI, I've stated running tests continuously in aws m1.xlarge.
> > > > > https://github.com/lavacat/zookeeper-tests-lab
> > > > >
> > > > > So far, I've done ~ 12 runs of trunk. Same common offenders as in
> > Flaky
> > > > > dash: testManyChildWatchersAutoReset,
> testPurgeWhenLogRollingInProgr
> > > ess
> > > > > I'll do some more runs, then try to come up with report.
> > > > >
> > > > > I'm using aws and not Apache Jenkins env because of better
> > > > > control/observability.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Oct 14, 2018 at 4:58 PM Fangmin Lv 
> > > wrote:
> > > > >
> > > > >> Internally, we also did some works to reduce the flaky, here are
> the
> > > main
> > > > >> things we've done:
> > > > >>
> > > > >> * using retry rule to retry in case the zk client lost it's
> > > connection,
> > > > >> this could happen if the quorum tests is running on unstable
> > > environment
> > > > >> and the leader election happened.
> > > > >> * using random port instead of sequentially to avoid the port
> racing
> > > when
> > > > >> running tests concurrently
> > > > >> * changing tests to avoid using the same test path when
> > > creating/deleting
> > > > >> nodes
> > > > >>
> > > > >> These greatly reduced the flaky internally, we should try those if
> > > we're
> > > > >> seeing similar issues in the Jenkins.
> > > > >>
> > > > >> Fangmin
> > > > >>
> > > > >> On Sat, Oct 13, 2018 at 10:48 AM Bogdan Kanivets <
> > bkaniv...@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >>> I've looked into flakiness couple months ago (special attention
> on
>