Li, I wonder of we have some new throttling/back pressure mechanisms that is enabled by default.
Does anyone has some pointer to relevant implementations? Enrico Il Ven 19 Feb 2021, 19:46 Li Wang <li4w...@gmail.com> ha scritto: > Hi, > > We switched to Netty on both client side and server side and the > performance issue is still there. Anyone has any insights on what could be > the cause of higher latency? > > Thanks, > > Li > > > > On Mon, Feb 15, 2021 at 2:17 PM Li Wang <li4w...@gmail.com> wrote: > > > Hi Enrico, > > > > > > Thanks for the reply. > > > > > > 1. We are using NIO based stack, not Netty based yet. > > > > 2. Yes, here are some metrics on the client side. > > > > > > 3.6: throughput: 7K, failure: 81215228, Avg Latency: 57ms, Max Latency > 31s > > > > 3.4: throughput: 15k, failure: 0, Avg Latency: 30ms, Max Latency: 1.6s > > > > > > 3. Yes, the JVM and zoo.cfg config are the exact same > > > > 10G of Heap > > > > 13G of Memory > > > > 5 Participante > > > > 5 Observere > > > > Client session timeout: 3000ms > > > > Server min session time: 4000ms > > > > > > > > 4. Yes, there are two types of WARN logs and many “Expiring session” > > INFO log > > > > > > 2021-02-15 22:04:36,506 [myid:4] - WARN > > [NIOWorkerThread-7:NIOServerCnxn@365] - Unexpected exception > > > > EndOfStreamException: Unable to read additional data from client, it > > probably closed the socket: address = /100.108.63.116:43366, session = > > 0x400189fee9a000b > > > > at > > > org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:164) > > > > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:327) > > > > at > > > org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) > > > > at > > > org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154) > > > > at > > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > > > > at > > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > > > > at java.base/java.lang.Thread.run(Thread.java:834) > > > > > > 2021-02-15 22:05:14,428 [myid:4] - WARN > > [SyncThread:4:SyncRequestProcessor@188] - Too busy to snap, skipping > > > > > > 2021-02-15 22:01:51,427 [myid:4] - INFO > > [SessionTracker:ZooKeeperServer@610] - Expiring session > > 0x400189fee9a001e, timeout of 4000ms exceeded > > > > > > > > 5. Yes we upgrade both the client and the server to 3.6. Actually, the > > issue happened with the combinations of > > > > > > 3.4 client and 3.6 server > > > > 3.6 client and 3.6 server > > > > Please let me know if you need any additional info. > > > > Thanks, > > > > Li > > > > > > > > > > On Mon, Feb 15, 2021 at 1:44 PM Li Wang <li4w...@gmail.com> wrote: > > > >> Hi Enrico, > >> > >> Thanks for the reply. > >> > >> 1. We are using direct NIO based stack, not Netty based yet. > >> 2. Yes, on the client side, here are the metrics > >> > >> 3.6: > >> > >> > >> > >> > >> On Mon, Feb 15, 2021 at 10:44 AM Enrico Olivelli <eolive...@gmail.com> > >> wrote: > >> > >>> IIRC The main difference is about the switch to Netty 4 and about using > >>> more DirectMemory. Are you using the Netty based stack? > >>> > >>> Apart from that macro difference there have been many many changes > since > >>> 3.4. > >>> > >>> Do you have some metrics to share? > >>> Are the JVM configurations and zoo.cfg configuration equals to each > >>> other? > >>> > >>> Do you see warnings on the server logs? > >>> > >>> Did you upgrade both the client and the server or only the server? > >>> > >>> Enrico > >>> > >>> > >>> Il Lun 15 Feb 2021, 18:30 Li Wang <li4w...@gmail.com> ha scritto: > >>> > >>> > Hi, > >>> > > >>> > We want to upgrade from 3.4.14 to 3.6.2. During the perform/load > >>> > comparison test, it was found that the performance of 3.6 has been > >>> > significantly degraded compared to 3.4 for the write operation. Under > >>> the > >>> > same load, there was a huge number of SessionExpired and > ConnectionLoss > >>> > errors in 3.6 while no such errors in 3.4. > >>> > > >>> > The load testing is 500 concurrent users with a cluster of 5 > >>> participants > >>> > and 5 observers. The min session timeout on the server side is > 4000ms. > >>> > > >>> > I wonder if anyone has seen the same issue and has any insights on > what > >>> > could be the cause of the performance degradation. > >>> > > >>> > Thanks > >>> > > >>> > Li > >>> > > >>> > >> >