Li,
I wonder of we have some new throttling/back pressure mechanisms that is
enabled by default.

Does anyone has some pointer to relevant implementations?


Enrico

Il Ven 19 Feb 2021, 19:46 Li Wang <li4w...@gmail.com> ha scritto:

> Hi,
>
> We switched to Netty on both client side and server side and the
> performance issue is still there.  Anyone has any insights on what could be
> the cause of higher latency?
>
> Thanks,
>
> Li
>
>
>
> On Mon, Feb 15, 2021 at 2:17 PM Li Wang <li4w...@gmail.com> wrote:
>
> > Hi Enrico,
> >
> >
> > Thanks for the reply.
> >
> >
> > 1. We are using NIO based stack, not Netty based yet.
> >
> > 2. Yes, here are some metrics on the client side.
> >
> >
> > 3.6: throughput: 7K, failure: 81215228, Avg Latency: 57ms,  Max Latency
> 31s
> >
> > 3.4: throughput: 15k, failure: 0,  Avg Latency: 30ms,  Max Latency: 1.6s
> >
> >
> > 3. Yes, the JVM and zoo.cfg config are the exact same
> >
> > 10G of Heap
> >
> > 13G of Memory
> >
> > 5 Participante
> >
> > 5 Observere
> >
> > Client session timeout: 3000ms
> >
> > Server min session time: 4000ms
> >
> >
> >
> > 4. Yes, there are two types of  WARN logs and many “Expiring session”
> > INFO log
> >
> >
> > 2021-02-15 22:04:36,506 [myid:4] - WARN
> > [NIOWorkerThread-7:NIOServerCnxn@365] - Unexpected exception
> >
> > EndOfStreamException: Unable to read additional data from client, it
> > probably closed the socket: address = /100.108.63.116:43366, session =
> > 0x400189fee9a000b
> >
> > at
> >
> org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:164)
> >
> > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:327)
> >
> > at
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> >
> > at
> >
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> >
> > at
> >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> >
> > at
> >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> >
> > at java.base/java.lang.Thread.run(Thread.java:834)
> >
> >
> > 2021-02-15 22:05:14,428 [myid:4] - WARN
> > [SyncThread:4:SyncRequestProcessor@188] - Too busy to snap, skipping
> >
> >
> > 2021-02-15 22:01:51,427 [myid:4] - INFO
> > [SessionTracker:ZooKeeperServer@610] - Expiring session
> > 0x400189fee9a001e, timeout of 4000ms exceeded
> >
> >
> >
> > 5. Yes we upgrade both the client and the server to 3.6. Actually, the
> > issue happened with the combinations of
> >
> >
> > 3.4 client and 3.6 server
> >
> > 3.6 client and 3.6 server
> >
> > Please let me know if you need any additional info.
> >
> > Thanks,
> >
> > Li
> >
> >
> >
> >
> > On Mon, Feb 15, 2021 at 1:44 PM Li Wang <li4w...@gmail.com> wrote:
> >
> >> Hi Enrico,
> >>
> >> Thanks for the reply.
> >>
> >> 1. We are using direct NIO based stack, not Netty based yet.
> >> 2. Yes, on the client side, here are the metrics
> >>
> >> 3.6:
> >>
> >>
> >>
> >>
> >> On Mon, Feb 15, 2021 at 10:44 AM Enrico Olivelli <eolive...@gmail.com>
> >> wrote:
> >>
> >>> IIRC The main difference is about the switch to Netty 4 and about using
> >>> more DirectMemory. Are you using the Netty based stack?
> >>>
> >>> Apart from that macro difference there have been many many changes
> since
> >>> 3.4.
> >>>
> >>> Do you have some metrics to share?
> >>> Are the  JVM configurations and zoo.cfg configuration equals to each
> >>> other?
> >>>
> >>> Do you see warnings on the server logs?
> >>>
> >>> Did you upgrade both the client and the server or only the server?
> >>>
> >>> Enrico
> >>>
> >>>
> >>> Il Lun 15 Feb 2021, 18:30 Li Wang <li4w...@gmail.com> ha scritto:
> >>>
> >>> > Hi,
> >>> >
> >>> > We want to upgrade from 3.4.14 to 3.6.2.  During the perform/load
> >>> > comparison test,  it was found that the performance of 3.6 has been
> >>> > significantly degraded compared to 3.4 for the write operation. Under
> >>> the
> >>> > same load, there was a huge number of SessionExpired and
> ConnectionLoss
> >>> > errors in 3.6 while no such errors in 3.4.
> >>> >
> >>> > The load testing is 500 concurrent users with a cluster of 5
> >>> participants
> >>> > and 5 observers. The min session timeout on the server side is
> 4000ms.
> >>> >
> >>> > I wonder if anyone has seen the same issue and has any insights on
> what
> >>> > could be the cause of the performance degradation.
> >>> >
> >>> > Thanks
> >>> >
> >>> > Li
> >>> >
> >>>
> >>
>

Reply via email to