Hi Michael,

Thanks very much for the awesome suggestions! These are brilliant!
Personally I've learned the experience.
We've had some tunings, eg. GC tuning, adding observers, etc.. But
internally on Processors, we don't touch those.

And now we are in the process of upgrading ZK from 3.4+ to 3.6.0+, which
already proves significant performance and stability gain for us.
As Ted also mentioned, though we are now solving the pain as a short term,
we should target a long term solution that scales more.


-Huizhi

On Tue, Jul 28, 2020 at 4:42 PM Michael Han <h...@apache.org> wrote:

> I agree with Ted's comments on the philosophy of scaling and the need to
> recheck your use case to justify if ZooKeeper is the long term solution or
> not.
>
> That said, I was in a similar position and had gone through similar scaling
> challenges for ZooKeeper so I could probably provide some suggestions which
> might serve as a short term solution.
>
> * Obvious ones - more powerful hardware with better IOPS and bigger memory.
> * Run a modern version of ZooKeeper (3.5.5+ or 3.6.0+).
> * Don't use participants to serve traffic. Use observers only.
> * Tune SyncRequestProcessor to allow more throughput at a cost of higher
> latency - specifically max batch size and flush delay.
> * Tune CommitProcessor to favor more writes instead of reads (depends on
> your actual workload).
> * Consider using response cache to reduce pressure on JVM Eden space.
> * JVM tuning - hard to provide concrete advice but the session expiration
> is likely caused by JVM GC. try different options based on profiling and
> workload characteristics.
> * Client auditing - making sure all traffic from your client is legitimate.
> This is often overlooked, but surprisingly prevalent as root causes of ZK
> meltdown in practice from time to time.
>
> These are some general guidelines that might help. As with any performance
> tuning, the general approach should scope your workload, do some profiling,
> identify bottleneck(s), and apply tunings accordingly. Good luck.
>
> On Mon, Jul 27, 2020 at 10:53 PM Enrico Olivelli <eolive...@gmail.com>
> wrote:
>
> > Huizhi,
> > If you want to achieve total atomic broadcast and have a greater
> throughput
> > you can consider using Zookeeper brother Apache Bookkeeper, that is built
> > over ZK, it is very lightweight and scalable (no central coordination
> > servers).
> >
> > https://bookkeeper.apache.org
> >
> > Hope that helps
> > Enrico
> >
> > Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ihuizh...@gmail.com> ha scritto:
> >
> > > Hi Ted,
> > >
> > > Thank you so much for the reply. Your suggestion is very valuable. I do
> > > agree that we should migrate from ZK to a distributed DB for this high
> > > number of writes. Due to legacy codebase and usage, it may not be that
> > easy
> > > for us to do that. So we are considering multi() as a short/mid term
> > > solution. Finally we will move the excessive number of writes out of ZK
> > to
> > > achieve higher scalability.
> > >
> > > Lastly, I greatly appreciate your insightful explanation! FYI, I am
> very
> > > happy to receive prompt replies from you, Ted!
> > >
> > > Best,
> > > -Huizhi
> > >
> > > On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> > >
> > > >
> > > > This sounds like you are using ZK outside of the intended design. The
> > > idea
> > > > is that ZK is a coordination engine. If you have such high write
> rates
> > > that
> > > > ZK is dropping connections, you probably want a distributed database
> of
> > > > some kind, perhaps one that uses ZK to coordinate itself. ZK is a
> form
> > of
> > > > replicated database, not a distributed one and, as such, the write
> rate
> > > > doesn't scale and that is intentional.
> > > >
> > > > Even if multi() solves your immediate problem, it leaves the same
> > problem
> > > > in place at just a slightly higher scale. My own philosophy of
> scaling
> > is
> > > > that when you hit a problem, you should increase your scale by a
> large
> > > > enough factor to give you time to solve some other problems or build
> > new
> > > > stuff before you have to fix your scaling problem again. Increasing
> > scale
> > > > by a factor of 2 rarely does this. I prefer to increase my scaling
> > bounds
> > > > by a factor of 10 or more so that I have some breathing space. I
> > > remember a
> > > > time in one startup where our system was on the edge of breaking and
> > our
> > > > traffic was doubling roughly every week. We had to improve our
> > situation
> > > by
> > > > at least a factor of 10 each time we upgraded our systems just to
> stay
> > in
> > > > the same place. I can only hope you will have similar problems.
> > > >
> > > >
> > > >
> > > > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ihuizh...@gmail.com>
> wrote:
> > > >
> > > >> Hi Ted,
> > > >>
> > > >> Thank you very much for the reply! I didn't receive the reply in my
> > > email
> > > >> but I found it in ZK dev mail thread. So I could not reply directly
> to
> > > the
> > > >> thread.
> > > >>
> > > >> I really appreciate a reply from the original author of multi()! And
> > > your
> > > >> blog (A Tour of the Multi-update For Zookeeper) is very helpful with
> > > >> understanding of multi(). Your reply helps convince my team that it
> > is a
> > > >> real transaction.
> > > >>
> > > >> Regarding my 2nd question, maybe I should have described a bit of
> our
> > > >> challenge. When we have a large number of ZK write requests that
> cause
> > > high
> > > >> ZK write QPS, ZK sessions are expired by ZK. And this affects the
> > > >> application's connection to ZK. We wonder if we could apply multi()
> to
> > > >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't
> > expire
> > > >> sessions. So in this case, do you think we could still not apply
> > > multi() to
> > > >> achieve the purpose?
> > > >>
> > > >> Thank you, Ted!!
> > > >>
> > > >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ihuizh...@gmail.com>
> > wrote:
> > > >>
> > > >> > Hi Zookeeper Devs,
> > > >> >
> > > >> > Hope this email finds you well!
> > > >> >
> > > >> > I am working on some stuff that needs ZK multi(). I would like to
> > > confirm a
> > > >> > few things about this API.
> > > >> >
> > > >> > 1. Is this a real transaction operation in ZK? My understanding
> is,
> > > it is a
> > > >> > real transaction. If I put 3 write operations in this transaction
> > > request,
> > > >> > these 3 write operations are committed in 1 transaction with the
> > same
> > > zxid
> > > >> > and 1 proposal. Observers should either see all the updates or
> none
> > > of the
> > > >> > updates. Observers should not see partial updates, eg. only 1 of
> > the 3
> > > >> > updates.
> > > >> >
> > > >>
> > > >> Yes. The multi() is atomic. It will happen or not and the program
> > > invoking
> > > >> the operation will be told why or why not.
> > > >>
> > > >> 2. We have a case to write multiple znodes. Currently it is sending
> > the
> > > >>
> > > >>
> > > >> > requests one by one. With transaction, I believe we could batch
> > > writes in 1
> > > >> > transaction request. This is intended to reduce ZK write pressure.
> > > Eg. we
> > > >> > are writing 100 znodes, putting it in one transaction request
> would
> > > reduce
> > > >> > write requests from 100 (single write request using create() or
> > > set()) to 1
> > > >> > write request (transaction), right? And does this reduce ZK server
> > > write
> > > >> > request pressure?
> > > >> >
> > > >>
> > > >> Not really. The way that multi() works is that it does a group
> commit.
> > > >> There may be some economies in terms of number of network exchanges,
> > but
> > > >> the internal work of testing whether the operations will succeed is
> > the
> > > >> same. The point of multi() is to make use of and provide nuanced
> > control
> > > >> over the normal group commit that Zookeeper is doing anyway. It
> should
> > > not
> > > >> generally be viewed as an efficiency improvement.
> > > >>
> > > >> Hopefully this helps.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ihuizh...@gmail.com>
> > wrote:
> > > >>
> > > >>> Hi Zookeeper Devs,
> > > >>>
> > > >>> Hope this email finds you well!
> > > >>>
> > > >>> I am working on some stuff that needs ZK multi(). I would like to
> > > >>> confirm a few things about this API.
> > > >>>
> > > >>> 1. Is this a real transaction operation in ZK? My understanding is,
> > it
> > > >>> is a real transaction. If I put 3 write operations in this
> > transaction
> > > >>> request, these 3 write operations are committed in 1 transaction
> with
> > > the
> > > >>> same zxid and 1 proposal. Observers should either see all the
> updates
> > > or
> > > >>> none of the updates. Observers should not see partial updates, eg.
> > > only 1
> > > >>> of the 3 updates.
> > > >>>
> > > >>> 2. We have a case to write multiple znodes. Currently it is sending
> > the
> > > >>> requests one by one. With transaction, I believe we could batch
> > writes
> > > in 1
> > > >>> transaction request. This is intended to reduce ZK write pressure.
> > Eg.
> > > we
> > > >>> are writing 100 znodes, putting it in one transaction request would
> > > reduce
> > > >>> write requests from 100 (single write request using create() or
> > set())
> > > to 1
> > > >>> write request (transaction), right? And does this reduce ZK server
> > > write
> > > >>> request pressure?
> > > >>>
> > > >>> Could you help explain? I am looking forward to your reply. Thank
> you
> > > >>> very much!
> > > >>>
> > > >>> Best,
> > > >>> -Huizhi
> > > >>>
> > > >>
> > >
> >
>

Reply via email to