Re: ZK Transaction API multi()

Huizhi Lu Tue, 28 Jul 2020 22:44:18 -0700

Hi Ted,

Again, greatly appreciate the insightful GC tuning tips! Though we've had
some GC tuning, I believe we still have something to do further based on
profiling.


I am so glad I've got so many valuable suggestions from the ZK community!

-Huizhi

On Tue, Jul 28, 2020 at 6:27 PM Ted Dunning <ted.dunn...@gmail.com> wrote:

> Michael's suggestions are excellent, particularly the use of observers and
> the general warning to measure first.
>
> To expand his point about GC, consider moving to more recent JVM if GC is
> demonstrated to be the problem. To find out if it is a problem, turn on GC
> logging and see if GC delays correspond with long latency. If you are using
> an ancient JVM and you have GC delays, try moving to a more modern JVM and
> make sure that there is plenty of memory for ZK.
>
> Also, check to see if the machine that is running ZK might be
> oversubscribed. Noisy neighbors doing something intense can be nearly as
> bad as GC.
>
> To repeat and slightly rephrase Michael's (and old carpenters everywhere)
> advice, however, measure twice, cut once.
>
> On Tue, Jul 28, 2020 at 4:42 PM Michael Han <h...@apache.org> wrote:
>
> > I agree with Ted's comments on the philosophy of scaling and the need to
> > recheck your use case to justify if ZooKeeper is the long term solution
> or
> > not.
> >
> > That said, I was in a similar position and had gone through similar
> > scaling challenges for ZooKeeper so I could probably provide some
> > suggestions which might serve as a short term solution.
> >
> > * Obvious ones - more powerful hardware with better IOPS and bigger
> memory.
> > * Run a modern version of ZooKeeper (3.5.5+ or 3.6.0+).
> > * Don't use participants to serve traffic. Use observers only.
> > * Tune SyncRequestProcessor to allow more throughput at a cost of higher
> > latency - specifically max batch size and flush delay.
> > * Tune CommitProcessor to favor more writes instead of reads (depends on
> > your actual workload).
> > * Consider using response cache to reduce pressure on JVM Eden space.
> > * JVM tuning - hard to provide concrete advice but the session expiration
> > is likely caused by JVM GC. try different options based on profiling and
> > workload characteristics.
> > * Client auditing - making sure all traffic from your client is
> > legitimate. This is often overlooked, but surprisingly prevalent as root
> > causes of ZK meltdown in practice from time to time.
> >
> > These are some general guidelines that might help. As with any
> performance
> > tuning, the general approach should scope your workload, do some
> profiling,
> > identify bottleneck(s), and apply tunings accordingly. Good luck.
> >
> > On Mon, Jul 27, 2020 at 10:53 PM Enrico Olivelli <eolive...@gmail.com>
> > wrote:
> >
> >> Huizhi,
> >> If you want to achieve total atomic broadcast and have a greater
> >> throughput
> >> you can consider using Zookeeper brother Apache Bookkeeper, that is
> built
> >> over ZK, it is very lightweight and scalable (no central coordination
> >> servers).
> >>
> >> https://bookkeeper.apache.org
> >>
> >> Hope that helps
> >> Enrico
> >>
> >> Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ihuizh...@gmail.com> ha scritto:
> >>
> >> > Hi Ted,
> >> >
> >> > Thank you so much for the reply. Your suggestion is very valuable. I
> do
> >> > agree that we should migrate from ZK to a distributed DB for this high
> >> > number of writes. Due to legacy codebase and usage, it may not be that
> >> easy
> >> > for us to do that. So we are considering multi() as a short/mid term
> >> > solution. Finally we will move the excessive number of writes out of
> ZK
> >> to
> >> > achieve higher scalability.
> >> >
> >> > Lastly, I greatly appreciate your insightful explanation! FYI, I am
> very
> >> > happy to receive prompt replies from you, Ted!
> >> >
> >> > Best,
> >> > -Huizhi
> >> >
> >> > On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <ted.dunn...@gmail.com>
> >> wrote:
> >> >
> >> > >
> >> > > This sounds like you are using ZK outside of the intended design.
> The
> >> > idea
> >> > > is that ZK is a coordination engine. If you have such high write
> rates
> >> > that
> >> > > ZK is dropping connections, you probably want a distributed database
> >> of
> >> > > some kind, perhaps one that uses ZK to coordinate itself. ZK is a
> >> form of
> >> > > replicated database, not a distributed one and, as such, the write
> >> rate
> >> > > doesn't scale and that is intentional.
> >> > >
> >> > > Even if multi() solves your immediate problem, it leaves the same
> >> problem
> >> > > in place at just a slightly higher scale. My own philosophy of
> >> scaling is
> >> > > that when you hit a problem, you should increase your scale by a
> large
> >> > > enough factor to give you time to solve some other problems or build
> >> new
> >> > > stuff before you have to fix your scaling problem again. Increasing
> >> scale
> >> > > by a factor of 2 rarely does this. I prefer to increase my scaling
> >> bounds
> >> > > by a factor of 10 or more so that I have some breathing space. I
> >> > remember a
> >> > > time in one startup where our system was on the edge of breaking and
> >> our
> >> > > traffic was doubling roughly every week. We had to improve our
> >> situation
> >> > by
> >> > > at least a factor of 10 each time we upgraded our systems just to
> >> stay in
> >> > > the same place. I can only hope you will have similar problems.
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ihuizh...@gmail.com>
> >> wrote:
> >> > >
> >> > >> Hi Ted,
> >> > >>
> >> > >> Thank you very much for the reply! I didn't receive the reply in my
> >> > email
> >> > >> but I found it in ZK dev mail thread. So I could not reply directly
> >> to
> >> > the
> >> > >> thread.
> >> > >>
> >> > >> I really appreciate a reply from the original author of multi()!
> And
> >> > your
> >> > >> blog (A Tour of the Multi-update For Zookeeper) is very helpful
> with
> >> > >> understanding of multi(). Your reply helps convince my team that it
> >> is a
> >> > >> real transaction.
> >> > >>
> >> > >> Regarding my 2nd question, maybe I should have described a bit of
> our
> >> > >> challenge. When we have a large number of ZK write requests that
> >> cause
> >> > high
> >> > >> ZK write QPS, ZK sessions are expired by ZK. And this affects the
> >> > >> application's connection to ZK. We wonder if we could apply multi()
> >> to
> >> > >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't
> >> expire
> >> > >> sessions. So in this case, do you think we could still not apply
> >> > multi() to
> >> > >> achieve the purpose?
> >> > >>
> >> > >> Thank you, Ted!!
> >> > >>
> >> > >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ihuizh...@gmail.com>
> >> wrote:
> >> > >>
> >> > >> > Hi Zookeeper Devs,
> >> > >> >
> >> > >> > Hope this email finds you well!
> >> > >> >
> >> > >> > I am working on some stuff that needs ZK multi(). I would like to
> >> > confirm a
> >> > >> > few things about this API.
> >> > >> >
> >> > >> > 1. Is this a real transaction operation in ZK? My understanding
> is,
> >> > it is a
> >> > >> > real transaction. If I put 3 write operations in this transaction
> >> > request,
> >> > >> > these 3 write operations are committed in 1 transaction with the
> >> same
> >> > zxid
> >> > >> > and 1 proposal. Observers should either see all the updates or
> none
> >> > of the
> >> > >> > updates. Observers should not see partial updates, eg. only 1 of
> >> the 3
> >> > >> > updates.
> >> > >> >
> >> > >>
> >> > >> Yes. The multi() is atomic. It will happen or not and the program
> >> > invoking
> >> > >> the operation will be told why or why not.
> >> > >>
> >> > >> 2. We have a case to write multiple znodes. Currently it is sending
> >> the
> >> > >>
> >> > >>
> >> > >> > requests one by one. With transaction, I believe we could batch
> >> > writes in 1
> >> > >> > transaction request. This is intended to reduce ZK write
> pressure.
> >> > Eg. we
> >> > >> > are writing 100 znodes, putting it in one transaction request
> would
> >> > reduce
> >> > >> > write requests from 100 (single write request using create() or
> >> > set()) to 1
> >> > >> > write request (transaction), right? And does this reduce ZK
> server
> >> > write
> >> > >> > request pressure?
> >> > >> >
> >> > >>
> >> > >> Not really. The way that multi() works is that it does a group
> >> commit.
> >> > >> There may be some economies in terms of number of network
> exchanges,
> >> but
> >> > >> the internal work of testing whether the operations will succeed is
> >> the
> >> > >> same. The point of multi() is to make use of and provide nuanced
> >> control
> >> > >> over the normal group commit that Zookeeper is doing anyway. It
> >> should
> >> > not
> >> > >> generally be viewed as an efficiency improvement.
> >> > >>
> >> > >> Hopefully this helps.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ihuizh...@gmail.com>
> >> wrote:
> >> > >>
> >> > >>> Hi Zookeeper Devs,
> >> > >>>
> >> > >>> Hope this email finds you well!
> >> > >>>
> >> > >>> I am working on some stuff that needs ZK multi(). I would like to
> >> > >>> confirm a few things about this API.
> >> > >>>
> >> > >>> 1. Is this a real transaction operation in ZK? My understanding
> is,
> >> it
> >> > >>> is a real transaction. If I put 3 write operations in this
> >> transaction
> >> > >>> request, these 3 write operations are committed in 1 transaction
> >> with
> >> > the
> >> > >>> same zxid and 1 proposal. Observers should either see all the
> >> updates
> >> > or
> >> > >>> none of the updates. Observers should not see partial updates, eg.
> >> > only 1
> >> > >>> of the 3 updates.
> >> > >>>
> >> > >>> 2. We have a case to write multiple znodes. Currently it is
> sending
> >> the
> >> > >>> requests one by one. With transaction, I believe we could batch
> >> writes
> >> > in 1
> >> > >>> transaction request. This is intended to reduce ZK write pressure.
> >> Eg.
> >> > we
> >> > >>> are writing 100 znodes, putting it in one transaction request
> would
> >> > reduce
> >> > >>> write requests from 100 (single write request using create() or
> >> set())
> >> > to 1
> >> > >>> write request (transaction), right? And does this reduce ZK server
> >> > write
> >> > >>> request pressure?
> >> > >>>
> >> > >>> Could you help explain? I am looking forward to your reply. Thank
> >> you
> >> > >>> very much!
> >> > >>>
> >> > >>> Best,
> >> > >>> -Huizhi
> >> > >>>
> >> > >>
> >> >
> >>
> >
>

Re: ZK Transaction API multi()

Reply via email to