Hi Enrico, Thanks for pointing me to BookKeeper. I've heard about it but haven't got a chance to really try it yet. It looks very promising. We will definitely evaluate this direction.
-Huizhi On Mon, Jul 27, 2020 at 10:53 PM Enrico Olivelli <eolive...@gmail.com> wrote: > Huizhi, > If you want to achieve total atomic broadcast and have a greater throughput > you can consider using Zookeeper brother Apache Bookkeeper, that is built > over ZK, it is very lightweight and scalable (no central coordination > servers). > > https://bookkeeper.apache.org > > Hope that helps > Enrico > > Il Mar 28 Lug 2020, 01:43 Huizhi Lu <ihuizh...@gmail.com> ha scritto: > > > Hi Ted, > > > > Thank you so much for the reply. Your suggestion is very valuable. I do > > agree that we should migrate from ZK to a distributed DB for this high > > number of writes. Due to legacy codebase and usage, it may not be that > easy > > for us to do that. So we are considering multi() as a short/mid term > > solution. Finally we will move the excessive number of writes out of ZK > to > > achieve higher scalability. > > > > Lastly, I greatly appreciate your insightful explanation! FYI, I am very > > happy to receive prompt replies from you, Ted! > > > > Best, > > -Huizhi > > > > On Mon, Jul 27, 2020 at 4:30 PM Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > > > > > > This sounds like you are using ZK outside of the intended design. The > > idea > > > is that ZK is a coordination engine. If you have such high write rates > > that > > > ZK is dropping connections, you probably want a distributed database of > > > some kind, perhaps one that uses ZK to coordinate itself. ZK is a form > of > > > replicated database, not a distributed one and, as such, the write rate > > > doesn't scale and that is intentional. > > > > > > Even if multi() solves your immediate problem, it leaves the same > problem > > > in place at just a slightly higher scale. My own philosophy of scaling > is > > > that when you hit a problem, you should increase your scale by a large > > > enough factor to give you time to solve some other problems or build > new > > > stuff before you have to fix your scaling problem again. Increasing > scale > > > by a factor of 2 rarely does this. I prefer to increase my scaling > bounds > > > by a factor of 10 or more so that I have some breathing space. I > > remember a > > > time in one startup where our system was on the edge of breaking and > our > > > traffic was doubling roughly every week. We had to improve our > situation > > by > > > at least a factor of 10 each time we upgraded our systems just to stay > in > > > the same place. I can only hope you will have similar problems. > > > > > > > > > > > > On Mon, Jul 27, 2020 at 3:47 PM Huizhi Lu <ihuizh...@gmail.com> wrote: > > > > > >> Hi Ted, > > >> > > >> Thank you very much for the reply! I didn't receive the reply in my > > email > > >> but I found it in ZK dev mail thread. So I could not reply directly to > > the > > >> thread. > > >> > > >> I really appreciate a reply from the original author of multi()! And > > your > > >> blog (A Tour of the Multi-update For Zookeeper) is very helpful with > > >> understanding of multi(). Your reply helps convince my team that it > is a > > >> real transaction. > > >> > > >> Regarding my 2nd question, maybe I should have described a bit of our > > >> challenge. When we have a large number of ZK write requests that cause > > high > > >> ZK write QPS, ZK sessions are expired by ZK. And this affects the > > >> application's connection to ZK. We wonder if we could apply multi() to > > >> batch the ZK write requests to reduce ZK write QPS so ZK wouldn't > expire > > >> sessions. So in this case, do you think we could still not apply > > multi() to > > >> achieve the purpose? > > >> > > >> Thank you, Ted!! > > >> > > >> On Mon, Jul 27, 2020 at 1:40 PM Huizhi Lu <ihuizh...@gmail.com> > wrote: > > >> > > >> > Hi Zookeeper Devs, > > >> > > > >> > Hope this email finds you well! > > >> > > > >> > I am working on some stuff that needs ZK multi(). I would like to > > confirm a > > >> > few things about this API. > > >> > > > >> > 1. Is this a real transaction operation in ZK? My understanding is, > > it is a > > >> > real transaction. If I put 3 write operations in this transaction > > request, > > >> > these 3 write operations are committed in 1 transaction with the > same > > zxid > > >> > and 1 proposal. Observers should either see all the updates or none > > of the > > >> > updates. Observers should not see partial updates, eg. only 1 of > the 3 > > >> > updates. > > >> > > > >> > > >> Yes. The multi() is atomic. It will happen or not and the program > > invoking > > >> the operation will be told why or why not. > > >> > > >> 2. We have a case to write multiple znodes. Currently it is sending > the > > >> > > >> > > >> > requests one by one. With transaction, I believe we could batch > > writes in 1 > > >> > transaction request. This is intended to reduce ZK write pressure. > > Eg. we > > >> > are writing 100 znodes, putting it in one transaction request would > > reduce > > >> > write requests from 100 (single write request using create() or > > set()) to 1 > > >> > write request (transaction), right? And does this reduce ZK server > > write > > >> > request pressure? > > >> > > > >> > > >> Not really. The way that multi() works is that it does a group commit. > > >> There may be some economies in terms of number of network exchanges, > but > > >> the internal work of testing whether the operations will succeed is > the > > >> same. The point of multi() is to make use of and provide nuanced > control > > >> over the normal group commit that Zookeeper is doing anyway. It should > > not > > >> generally be viewed as an efficiency improvement. > > >> > > >> Hopefully this helps. > > >> > > >> > > >> > > >> > > >> On Mon, Jul 27, 2020 at 1:23 PM Huizhi Lu <ihuizh...@gmail.com> > wrote: > > >> > > >>> Hi Zookeeper Devs, > > >>> > > >>> Hope this email finds you well! > > >>> > > >>> I am working on some stuff that needs ZK multi(). I would like to > > >>> confirm a few things about this API. > > >>> > > >>> 1. Is this a real transaction operation in ZK? My understanding is, > it > > >>> is a real transaction. If I put 3 write operations in this > transaction > > >>> request, these 3 write operations are committed in 1 transaction with > > the > > >>> same zxid and 1 proposal. Observers should either see all the updates > > or > > >>> none of the updates. Observers should not see partial updates, eg. > > only 1 > > >>> of the 3 updates. > > >>> > > >>> 2. We have a case to write multiple znodes. Currently it is sending > the > > >>> requests one by one. With transaction, I believe we could batch > writes > > in 1 > > >>> transaction request. This is intended to reduce ZK write pressure. > Eg. > > we > > >>> are writing 100 znodes, putting it in one transaction request would > > reduce > > >>> write requests from 100 (single write request using create() or > set()) > > to 1 > > >>> write request (transaction), right? And does this reduce ZK server > > write > > >>> request pressure? > > >>> > > >>> Could you help explain? I am looking forward to your reply. Thank you > > >>> very much! > > >>> > > >>> Best, > > >>> -Huizhi > > >>> > > >> > > >