On Sun, Feb 21, 2021 at 3:28 PM Li Wang <li4w...@gmail.com> wrote:

> Hi Enrico, Sushant,
>
> I re-run the perf test with the data consistency check feature disabled
> (i.e. -Dzookeeper.digest.enabled=false), the write performance issue of 3.6
> is still there.
>
> With everything exactly the same, the throughput of 3.6 was only 1/2 of 3.4
> and the max latency was more than 8 times.
>
> Any other points or thoughts?
>
>
In the past I've noticed a big impact of GC when doing certain performance
measurements. I assume you are using the same JVM version and GC when
running the two tests? Perhaps our memory footprint has expanded over time.
You should rule out GC by running with gc logging turned on with both
versions and compare the impact.

Regards,

Patrick


> Cheers,
>
> Li
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Feb 20, 2021 at 9:04 PM Li Wang <li4w...@gmail.com> wrote:
>
> > Thanks Sushant and Enrico!
> >
> > This is a really good point.  According to the 3.6 documentation, the
> > feature is disabled by default.
> >
> https://zookeeper.apache.org/doc/r3.6.2/zookeeperAdmin.html#ch_administration
> .
> > However, checking the code, the default is enabled.
> >
> > Let me set the zookeeper.digest.enabled to false and see how the write
> > operation performs.
> >
> > Best,
> >
> > Li
> >
> >
> >
> >
> > On Fri, Feb 19, 2021 at 1:32 PM Sushant Mane <sushantma...@gmail.com>
> > wrote:
> >
> >> Hi Li,
> >>
> >> On 3.6.2 consistency checker (adhash based) is enabled by default:
> >>
> >>
> https://github.com/apache/zookeeper/blob/803c7f1a12f85978cb049af5e4ef23bd8b688715/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L136
> >> .
> >> It is not present in ZK 3.4.14.
> >>
> >> This feature does have some impact on write performance.
> >>
> >> Thanks,
> >> Sushant
> >>
> >>
> >> On Fri, Feb 19, 2021 at 12:50 PM Enrico Olivelli <eolive...@gmail.com>
> >> wrote:
> >>
> >> > Li,
> >> > I wonder of we have some new throttling/back pressure mechanisms that
> is
> >> > enabled by default.
> >> >
> >> > Does anyone has some pointer to relevant implementations?
> >> >
> >> >
> >> > Enrico
> >> >
> >> > Il Ven 19 Feb 2021, 19:46 Li Wang <li4w...@gmail.com> ha scritto:
> >> >
> >> > > Hi,
> >> > >
> >> > > We switched to Netty on both client side and server side and the
> >> > > performance issue is still there.  Anyone has any insights on what
> >> could
> >> > be
> >> > > the cause of higher latency?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Li
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Feb 15, 2021 at 2:17 PM Li Wang <li4w...@gmail.com> wrote:
> >> > >
> >> > > > Hi Enrico,
> >> > > >
> >> > > >
> >> > > > Thanks for the reply.
> >> > > >
> >> > > >
> >> > > > 1. We are using NIO based stack, not Netty based yet.
> >> > > >
> >> > > > 2. Yes, here are some metrics on the client side.
> >> > > >
> >> > > >
> >> > > > 3.6: throughput: 7K, failure: 81215228, Avg Latency: 57ms,  Max
> >> Latency
> >> > > 31s
> >> > > >
> >> > > > 3.4: throughput: 15k, failure: 0,  Avg Latency: 30ms,  Max
> Latency:
> >> > 1.6s
> >> > > >
> >> > > >
> >> > > > 3. Yes, the JVM and zoo.cfg config are the exact same
> >> > > >
> >> > > > 10G of Heap
> >> > > >
> >> > > > 13G of Memory
> >> > > >
> >> > > > 5 Participante
> >> > > >
> >> > > > 5 Observere
> >> > > >
> >> > > > Client session timeout: 3000ms
> >> > > >
> >> > > > Server min session time: 4000ms
> >> > > >
> >> > > >
> >> > > >
> >> > > > 4. Yes, there are two types of  WARN logs and many “Expiring
> >> session”
> >> > > > INFO log
> >> > > >
> >> > > >
> >> > > > 2021-02-15 22:04:36,506 [myid:4] - WARN
> >> > > > [NIOWorkerThread-7:NIOServerCnxn@365] - Unexpected exception
> >> > > >
> >> > > > EndOfStreamException: Unable to read additional data from client,
> it
> >> > > > probably closed the socket: address = /100.108.63.116:43366,
> >> session =
> >> > > > 0x400189fee9a000b
> >> > > >
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:164)
> >> > > >
> >> > > > at
> >> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:327)
> >> > > >
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> >> > > >
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> >> > > >
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> >> > > >
> >> > > > at
> >> > > >
> >> > >
> >> >
> >>
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> >> > > >
> >> > > > at java.base/java.lang.Thread.run(Thread.java:834)
> >> > > >
> >> > > >
> >> > > > 2021-02-15 22:05:14,428 [myid:4] - WARN
> >> > > > [SyncThread:4:SyncRequestProcessor@188] - Too busy to snap,
> >> skipping
> >> > > >
> >> > > >
> >> > > > 2021-02-15 22:01:51,427 [myid:4] - INFO
> >> > > > [SessionTracker:ZooKeeperServer@610] - Expiring session
> >> > > > 0x400189fee9a001e, timeout of 4000ms exceeded
> >> > > >
> >> > > >
> >> > > >
> >> > > > 5. Yes we upgrade both the client and the server to 3.6. Actually,
> >> the
> >> > > > issue happened with the combinations of
> >> > > >
> >> > > >
> >> > > > 3.4 client and 3.6 server
> >> > > >
> >> > > > 3.6 client and 3.6 server
> >> > > >
> >> > > > Please let me know if you need any additional info.
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Li
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Mon, Feb 15, 2021 at 1:44 PM Li Wang <li4w...@gmail.com>
> wrote:
> >> > > >
> >> > > >> Hi Enrico,
> >> > > >>
> >> > > >> Thanks for the reply.
> >> > > >>
> >> > > >> 1. We are using direct NIO based stack, not Netty based yet.
> >> > > >> 2. Yes, on the client side, here are the metrics
> >> > > >>
> >> > > >> 3.6:
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> On Mon, Feb 15, 2021 at 10:44 AM Enrico Olivelli <
> >> eolive...@gmail.com
> >> > >
> >> > > >> wrote:
> >> > > >>
> >> > > >>> IIRC The main difference is about the switch to Netty 4 and
> about
> >> > using
> >> > > >>> more DirectMemory. Are you using the Netty based stack?
> >> > > >>>
> >> > > >>> Apart from that macro difference there have been many many
> changes
> >> > > since
> >> > > >>> 3.4.
> >> > > >>>
> >> > > >>> Do you have some metrics to share?
> >> > > >>> Are the  JVM configurations and zoo.cfg configuration equals to
> >> each
> >> > > >>> other?
> >> > > >>>
> >> > > >>> Do you see warnings on the server logs?
> >> > > >>>
> >> > > >>> Did you upgrade both the client and the server or only the
> server?
> >> > > >>>
> >> > > >>> Enrico
> >> > > >>>
> >> > > >>>
> >> > > >>> Il Lun 15 Feb 2021, 18:30 Li Wang <li4w...@gmail.com> ha
> scritto:
> >> > > >>>
> >> > > >>> > Hi,
> >> > > >>> >
> >> > > >>> > We want to upgrade from 3.4.14 to 3.6.2.  During the
> >> perform/load
> >> > > >>> > comparison test,  it was found that the performance of 3.6 has
> >> been
> >> > > >>> > significantly degraded compared to 3.4 for the write
> operation.
> >> > Under
> >> > > >>> the
> >> > > >>> > same load, there was a huge number of SessionExpired and
> >> > > ConnectionLoss
> >> > > >>> > errors in 3.6 while no such errors in 3.4.
> >> > > >>> >
> >> > > >>> > The load testing is 500 concurrent users with a cluster of 5
> >> > > >>> participants
> >> > > >>> > and 5 observers. The min session timeout on the server side is
> >> > > 4000ms.
> >> > > >>> >
> >> > > >>> > I wonder if anyone has seen the same issue and has any
> insights
> >> on
> >> > > what
> >> > > >>> > could be the cause of the performance degradation.
> >> > > >>> >
> >> > > >>> > Thanks
> >> > > >>> >
> >> > > >>> > Li
> >> > > >>> >
> >> > > >>>
> >> > > >>
> >> > >
> >> >
> >>
> >
>

Reply via email to