Huizhi, If you want to achieve total atomic broadcast and have a greater throughput you can consider using Zookeeper brother Apache Bookkeeper, that is built over ZK, it is very lightweight and scalable (no central coordination servers).
https://bookkeeper.apache.org Hope that helps Enrico Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ihuizh...@gmail.com> ha scritto: > Hi Ted, > > Thank you so much for the reply. Your suggestion is very valuable. I do > agree that we should migrate from ZK to a distributed DB for this high > number of writes. Due to legacy codebase and usage, it may not be that easy > for us to do that. So we are considering multi() as a short/mid term > solution. Finally we will move the excessive number of writes out of ZK to > achieve higher scalability. > > Lastly, I greatly appreciate your insightful explanation! FYI, I am very > happy to receive prompt replies from you, Ted! > > Best, > -Huizhi > > On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <ted.dunn...@gmail.com> wrote: > > > > > This sounds like you are using ZK outside of the intended design. The > idea > > is that ZK is a coordination engine. If you have such high write rates > that > > ZK is dropping connections, you probably want a distributed database of > > some kind, perhaps one that uses ZK to coordinate itself. ZK is a form of > > replicated database, not a distributed one and, as such, the write rate > > doesn't scale and that is intentional. > > > > Even if multi() solves your immediate problem, it leaves the same problem > > in place at just a slightly higher scale. My own philosophy of scaling is > > that when you hit a problem, you should increase your scale by a large > > enough factor to give you time to solve some other problems or build new > > stuff before you have to fix your scaling problem again. Increasing scale > > by a factor of 2 rarely does this. I prefer to increase my scaling bounds > > by a factor of 10 or more so that I have some breathing space. I > remember a > > time in one startup where our system was on the edge of breaking and our > > traffic was doubling roughly every week. We had to improve our situation > by > > at least a factor of 10 each time we upgraded our systems just to stay in > > the same place. I can only hope you will have similar problems. > > > > > > > > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ihuizh...@gmail.com> wrote: > > > >> Hi Ted, > >> > >> Thank you very much for the reply! I didn't receive the reply in my > email > >> but I found it in ZK dev mail thread. So I could not reply directly to > the > >> thread. > >> > >> I really appreciate a reply from the original author of multi()! And > your > >> blog (A Tour of the Multi-update For Zookeeper) is very helpful with > >> understanding of multi(). Your reply helps convince my team that it is a > >> real transaction. > >> > >> Regarding my 2nd question, maybe I should have described a bit of our > >> challenge. When we have a large number of ZK write requests that cause > high > >> ZK write QPS, ZK sessions are expired by ZK. And this affects the > >> application's connection to ZK. We wonder if we could apply multi() to > >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't expire > >> sessions. So in this case, do you think we could still not apply > multi() to > >> achieve the purpose? > >> > >> Thank you, Ted!! > >> > >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ihuizh...@gmail.com> wrote: > >> > >> > Hi Zookeeper Devs, > >> > > >> > Hope this email finds you well! > >> > > >> > I am working on some stuff that needs ZK multi(). I would like to > confirm a > >> > few things about this API. > >> > > >> > 1. Is this a real transaction operation in ZK? My understanding is, > it is a > >> > real transaction. If I put 3 write operations in this transaction > request, > >> > these 3 write operations are committed in 1 transaction with the same > zxid > >> > and 1 proposal. Observers should either see all the updates or none > of the > >> > updates. Observers should not see partial updates, eg. only 1 of the 3 > >> > updates. > >> > > >> > >> Yes. The multi() is atomic. It will happen or not and the program > invoking > >> the operation will be told why or why not. > >> > >> 2. We have a case to write multiple znodes. Currently it is sending the > >> > >> > >> > requests one by one. With transaction, I believe we could batch > writes in 1 > >> > transaction request. This is intended to reduce ZK write pressure. > Eg. we > >> > are writing 100 znodes, putting it in one transaction request would > reduce > >> > write requests from 100 (single write request using create() or > set()) to 1 > >> > write request (transaction), right? And does this reduce ZK server > write > >> > request pressure? > >> > > >> > >> Not really. The way that multi() works is that it does a group commit. > >> There may be some economies in terms of number of network exchanges, but > >> the internal work of testing whether the operations will succeed is the > >> same. The point of multi() is to make use of and provide nuanced control > >> over the normal group commit that Zookeeper is doing anyway. It should > not > >> generally be viewed as an efficiency improvement. > >> > >> Hopefully this helps. > >> > >> > >> > >> > >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ihuizh...@gmail.com> wrote: > >> > >>> Hi Zookeeper Devs, > >>> > >>> Hope this email finds you well! > >>> > >>> I am working on some stuff that needs ZK multi(). I would like to > >>> confirm a few things about this API. > >>> > >>> 1. Is this a real transaction operation in ZK? My understanding is, it > >>> is a real transaction. If I put 3 write operations in this transaction > >>> request, these 3 write operations are committed in 1 transaction with > the > >>> same zxid and 1 proposal. Observers should either see all the updates > or > >>> none of the updates. Observers should not see partial updates, eg. > only 1 > >>> of the 3 updates. > >>> > >>> 2. We have a case to write multiple znodes. Currently it is sending the > >>> requests one by one. With transaction, I believe we could batch writes > in 1 > >>> transaction request. This is intended to reduce ZK write pressure. Eg. > we > >>> are writing 100 znodes, putting it in one transaction request would > reduce > >>> write requests from 100 (single write request using create() or set()) > to 1 > >>> write request (transaction), right? And does this reduce ZK server > write > >>> request pressure? > >>> > >>> Could you help explain? I am looking forward to your reply. Thank you > >>> very much! > >>> > >>> Best, > >>> -Huizhi > >>> > >> >