Hi Ted, Again, greatly appreciate the insightful GC tuning tips! Though we've had some GC tuning, I believe we still have something to do further based on profiling.
I am so glad I've got so many valuable suggestions from the ZK community! -Huizhi On Tue, Jul 28, 2020 at 6:27 PM Ted Dunning <ted.dunn...@gmail.com> wrote: > Michael's suggestions are excellent, particularly the use of observers and > the general warning to measure first. > > To expand his point about GC, consider moving to more recent JVM if GC is > demonstrated to be the problem. To find out if it is a problem, turn on GC > logging and see if GC delays correspond with long latency. If you are using > an ancient JVM and you have GC delays, try moving to a more modern JVM and > make sure that there is plenty of memory for ZK. > > Also, check to see if the machine that is running ZK might be > oversubscribed. Noisy neighbors doing something intense can be nearly as > bad as GC. > > To repeat and slightly rephrase Michael's (and old carpenters everywhere) > advice, however, measure twice, cut once. > > On Tue, Jul 28, 2020 at 4:42 PM Michael Han <h...@apache.org> wrote: > > > I agree with Ted's comments on the philosophy of scaling and the need to > > recheck your use case to justify if ZooKeeper is the long term solution > or > > not. > > > > That said, I was in a similar position and had gone through similar > > scaling challenges for ZooKeeper so I could probably provide some > > suggestions which might serve as a short term solution. > > > > * Obvious ones - more powerful hardware with better IOPS and bigger > memory. > > * Run a modern version of ZooKeeper (3.5.5+ or 3.6.0+). > > * Don't use participants to serve traffic. Use observers only. > > * Tune SyncRequestProcessor to allow more throughput at a cost of higher > > latency - specifically max batch size and flush delay. > > * Tune CommitProcessor to favor more writes instead of reads (depends on > > your actual workload). > > * Consider using response cache to reduce pressure on JVM Eden space. > > * JVM tuning - hard to provide concrete advice but the session expiration > > is likely caused by JVM GC. try different options based on profiling and > > workload characteristics. > > * Client auditing - making sure all traffic from your client is > > legitimate. This is often overlooked, but surprisingly prevalent as root > > causes of ZK meltdown in practice from time to time. > > > > These are some general guidelines that might help. As with any > performance > > tuning, the general approach should scope your workload, do some > profiling, > > identify bottleneck(s), and apply tunings accordingly. Good luck. > > > > On Mon, Jul 27, 2020 at 10:53 PM Enrico Olivelli <eolive...@gmail.com> > > wrote: > > > >> Huizhi, > >> If you want to achieve total atomic broadcast and have a greater > >> throughput > >> you can consider using Zookeeper brother Apache Bookkeeper, that is > built > >> over ZK, it is very lightweight and scalable (no central coordination > >> servers). > >> > >> https://bookkeeper.apache.org > >> > >> Hope that helps > >> Enrico > >> > >> Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ihuizh...@gmail.com> ha scritto: > >> > >> > Hi Ted, > >> > > >> > Thank you so much for the reply. Your suggestion is very valuable. I > do > >> > agree that we should migrate from ZK to a distributed DB for this high > >> > number of writes. Due to legacy codebase and usage, it may not be that > >> easy > >> > for us to do that. So we are considering multi() as a short/mid term > >> > solution. Finally we will move the excessive number of writes out of > ZK > >> to > >> > achieve higher scalability. > >> > > >> > Lastly, I greatly appreciate your insightful explanation! FYI, I am > very > >> > happy to receive prompt replies from you, Ted! > >> > > >> > Best, > >> > -Huizhi > >> > > >> > On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <ted.dunn...@gmail.com> > >> wrote: > >> > > >> > > > >> > > This sounds like you are using ZK outside of the intended design. > The > >> > idea > >> > > is that ZK is a coordination engine. If you have such high write > rates > >> > that > >> > > ZK is dropping connections, you probably want a distributed database > >> of > >> > > some kind, perhaps one that uses ZK to coordinate itself. ZK is a > >> form of > >> > > replicated database, not a distributed one and, as such, the write > >> rate > >> > > doesn't scale and that is intentional. > >> > > > >> > > Even if multi() solves your immediate problem, it leaves the same > >> problem > >> > > in place at just a slightly higher scale. My own philosophy of > >> scaling is > >> > > that when you hit a problem, you should increase your scale by a > large > >> > > enough factor to give you time to solve some other problems or build > >> new > >> > > stuff before you have to fix your scaling problem again. Increasing > >> scale > >> > > by a factor of 2 rarely does this. I prefer to increase my scaling > >> bounds > >> > > by a factor of 10 or more so that I have some breathing space. I > >> > remember a > >> > > time in one startup where our system was on the edge of breaking and > >> our > >> > > traffic was doubling roughly every week. We had to improve our > >> situation > >> > by > >> > > at least a factor of 10 each time we upgraded our systems just to > >> stay in > >> > > the same place. I can only hope you will have similar problems. > >> > > > >> > > > >> > > > >> > > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ihuizh...@gmail.com> > >> wrote: > >> > > > >> > >> Hi Ted, > >> > >> > >> > >> Thank you very much for the reply! I didn't receive the reply in my > >> > email > >> > >> but I found it in ZK dev mail thread. So I could not reply directly > >> to > >> > the > >> > >> thread. > >> > >> > >> > >> I really appreciate a reply from the original author of multi()! > And > >> > your > >> > >> blog (A Tour of the Multi-update For Zookeeper) is very helpful > with > >> > >> understanding of multi(). Your reply helps convince my team that it > >> is a > >> > >> real transaction. > >> > >> > >> > >> Regarding my 2nd question, maybe I should have described a bit of > our > >> > >> challenge. When we have a large number of ZK write requests that > >> cause > >> > high > >> > >> ZK write QPS, ZK sessions are expired by ZK. And this affects the > >> > >> application's connection to ZK. We wonder if we could apply multi() > >> to > >> > >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't > >> expire > >> > >> sessions. So in this case, do you think we could still not apply > >> > multi() to > >> > >> achieve the purpose? > >> > >> > >> > >> Thank you, Ted!! > >> > >> > >> > >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ihuizh...@gmail.com> > >> wrote: > >> > >> > >> > >> > Hi Zookeeper Devs, > >> > >> > > >> > >> > Hope this email finds you well! > >> > >> > > >> > >> > I am working on some stuff that needs ZK multi(). I would like to > >> > confirm a > >> > >> > few things about this API. > >> > >> > > >> > >> > 1. Is this a real transaction operation in ZK? My understanding > is, > >> > it is a > >> > >> > real transaction. If I put 3 write operations in this transaction > >> > request, > >> > >> > these 3 write operations are committed in 1 transaction with the > >> same > >> > zxid > >> > >> > and 1 proposal. Observers should either see all the updates or > none > >> > of the > >> > >> > updates. Observers should not see partial updates, eg. only 1 of > >> the 3 > >> > >> > updates. > >> > >> > > >> > >> > >> > >> Yes. The multi() is atomic. It will happen or not and the program > >> > invoking > >> > >> the operation will be told why or why not. > >> > >> > >> > >> 2. We have a case to write multiple znodes. Currently it is sending > >> the > >> > >> > >> > >> > >> > >> > requests one by one. With transaction, I believe we could batch > >> > writes in 1 > >> > >> > transaction request. This is intended to reduce ZK write > pressure. > >> > Eg. we > >> > >> > are writing 100 znodes, putting it in one transaction request > would > >> > reduce > >> > >> > write requests from 100 (single write request using create() or > >> > set()) to 1 > >> > >> > write request (transaction), right? And does this reduce ZK > server > >> > write > >> > >> > request pressure? > >> > >> > > >> > >> > >> > >> Not really. The way that multi() works is that it does a group > >> commit. > >> > >> There may be some economies in terms of number of network > exchanges, > >> but > >> > >> the internal work of testing whether the operations will succeed is > >> the > >> > >> same. The point of multi() is to make use of and provide nuanced > >> control > >> > >> over the normal group commit that Zookeeper is doing anyway. It > >> should > >> > not > >> > >> generally be viewed as an efficiency improvement. > >> > >> > >> > >> Hopefully this helps. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ihuizh...@gmail.com> > >> wrote: > >> > >> > >> > >>> Hi Zookeeper Devs, > >> > >>> > >> > >>> Hope this email finds you well! > >> > >>> > >> > >>> I am working on some stuff that needs ZK multi(). I would like to > >> > >>> confirm a few things about this API. > >> > >>> > >> > >>> 1. Is this a real transaction operation in ZK? My understanding > is, > >> it > >> > >>> is a real transaction. If I put 3 write operations in this > >> transaction > >> > >>> request, these 3 write operations are committed in 1 transaction > >> with > >> > the > >> > >>> same zxid and 1 proposal. Observers should either see all the > >> updates > >> > or > >> > >>> none of the updates. Observers should not see partial updates, eg. > >> > only 1 > >> > >>> of the 3 updates. > >> > >>> > >> > >>> 2. We have a case to write multiple znodes. Currently it is > sending > >> the > >> > >>> requests one by one. With transaction, I believe we could batch > >> writes > >> > in 1 > >> > >>> transaction request. This is intended to reduce ZK write pressure. > >> Eg. > >> > we > >> > >>> are writing 100 znodes, putting it in one transaction request > would > >> > reduce > >> > >>> write requests from 100 (single write request using create() or > >> set()) > >> > to 1 > >> > >>> write request (transaction), right? And does this reduce ZK server > >> > write > >> > >>> request pressure? > >> > >>> > >> > >>> Could you help explain? I am looking forward to your reply. Thank > >> you > >> > >>> very much! > >> > >>> > >> > >>> Best, > >> > >>> -Huizhi > >> > >>> > >> > >> > >> > > >> > > >