Hello, I would like to follow-up and call for inputs for this. Damien, as the author of the PR, do you have any inputs/thoughts?
Please let me know if anything I can help with moving this forward. Cheers, Li On Wed, Feb 8, 2023 at 1:08 PM Li Wang <li4w...@gmail.com> wrote: > Thanks for the inputs, Enrico. > > On Wed, Feb 8, 2023 at 12:26 AM Enrico Olivelli <eolive...@gmail.com> > wrote: > >> Li, >> >> Il giorno mer 8 feb 2023 alle ore 03:49 Li Wang <li4w...@gmail.com> ha >> scritto: >> > >> > Hello, >> > >> > >> > We had a production outage due to the issue reported in >> > https://issues.apache.org/jira/browse/ZOOKEEPER-4306 and some other >> users >> > also ran into the same issue. I wonder if we can use this thread to >> discuss >> > and come to a consensus on how to fix it. :-) >> > >> > >> > >> > Thanks Damien Diederen >> > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ztzg> for >> the >> > contribution and patch. Limiting the number of ephemeral nodes that can >> be >> > created in a session looks like a simple and reasonable solution to me. >> > Having a way to enforce it will protect the system from potential OOM >> > issues. >> >> How does the client recover from having created too many ephemeral nodes ? >> This seems not trivial to do. Let me share some ideas: >> > > A new KeeperException/error code > (i.e.TooManyEphemeralsException/TOOMANYEPHEMERALS) is introduced in the > patch. Do you mean how > the old clients handle the new error code? > >> >> Solution one: fail the creation of the node >> If we fail the creation of the node then the application will probably >> enter a loop and continue to create it. >> There is no way to say that some znode is "more important" than other >> znodes, so the application will keep failing in the creation >> of random znodes. >> > > How about having a property to control whether throws > TooManyEphemeralsException in this case? Admin can enable the property > after all client applications upgrade to the new version and handle the new > error code. > > >> Solution two: force expires the session (and reset ephemeral nodes) >> In this case some applications would probably recover in a better way >> (ZK client applications are supposed to deal with session expiration >> somehow). >> and some applications will auto-restart (because session expired is a >> symptom of network partition and suicide is the best thing to do) >> In any case the application will try to create the znodes, work for >> some time, and then die again (or recreate the session) >> > > Great idea! Forcing session expiration seems promising, as it addresses > both following. > > 1. Protecting the server from txn size getting overflowed > 2. No need to worry about backward compatibility issue, as we use an > existing error code and client application are supposed to handle session > expiration error > > >> I agree that a short term solution is a server side protection, but it >> is better to think to a better plan. > > > Totally agree. We need to think through and have a plan on how the client > apps handle the changes. > The Solution two seems better, as it is less intrusive and doesn't require > any client side change. WDYT? > > Anyone else have any inputs? > >> >> > >> > >> > I've also looked into the possibility of splitting CloseSessionTxn into >> > smaller ones. Unfortunately, it didn't work, as currently in Zookeeper, >> one >> > request can only have one txn. Even though we can split the paths to be >> > deleted into multiple batches and define sub-txn for each batch, we >> have to >> > wrap all sub-txn(s) into a single wrapper txn and associate it to the >> > request. At the end, when loading zk database, we still have to >> deserialize >> > the large wrapper txn, which can fail the length check (jute.maxBuffer + >> > zookeeper.jute.maxbuffer.extrasize). >> >> Unfortunately there are few users that say that zookeeper doesn't >> scale and probably here we are hitting one of such cases, >> and most of these cases are due to the write protocol (JUTE), that >> puts unneeded constraints on Zookeeper >> > > Yes, in this case, we hit the constraint that JUTE doesn't serialize the > individual sub-txns separately. > > Best, > > Li > >> Enrico >> >> > >> > >> > Changing ZK to allow multiple txns for a single request looks quite >> > involved and it may have other implications. >> > >> > >> > I wonder if anyone has any input or any better ideas? >> > >> > >> > >> > Thanks, >> > >> > >> > Li >> >