Li, Il giorno mer 8 feb 2023 alle ore 03:49 Li Wang <li4w...@gmail.com> ha scritto: > > Hello, > > > We had a production outage due to the issue reported in > https://issues.apache.org/jira/browse/ZOOKEEPER-4306 and some other users > also ran into the same issue. I wonder if we can use this thread to discuss > and come to a consensus on how to fix it. :-) > > > > Thanks Damien Diederen > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ztzg> for the > contribution and patch. Limiting the number of ephemeral nodes that can be > created in a session looks like a simple and reasonable solution to me. > Having a way to enforce it will protect the system from potential OOM > issues.
How does the client recover from having created too many ephemeral nodes ? This seems not trivial to do. Let me share some ideas: Solution one: fail the creation of the node If we fail the creation of the node then the application will probably enter a loop and continue to create it. There is no way to say that some znode is "more important" than other znodes, so the application will keep failing in the creation of random znodes. Solution two: force expires the session (and reset ephemeral nodes) In this case some applications would probably recover in a better way (ZK client applications are supposed to deal with session expiration somehow). and some applications will auto-restart (because session expired is a symptom of network partition and suicide is the best thing to do) In any case the application will try to create the znodes, work for some time, and then die again (or recreate the session) I agree that a short term solution is a server side protection, but it is better to think to a better plan. > > > I've also looked into the possibility of splitting CloseSessionTxn into > smaller ones. Unfortunately, it didn't work, as currently in Zookeeper, one > request can only have one txn. Even though we can split the paths to be > deleted into multiple batches and define sub-txn for each batch, we have to > wrap all sub-txn(s) into a single wrapper txn and associate it to the > request. At the end, when loading zk database, we still have to deserialize > the large wrapper txn, which can fail the length check (jute.maxBuffer + > zookeeper.jute.maxbuffer.extrasize). Unfortunately there are few users that say that zookeeper doesn't scale and probably here we are hitting one of such cases, and most of these cases are due to the write protocol (JUTE), that puts unneeded constraints on Zookeeper Enrico > > > Changing ZK to allow multiple txns for a single request looks quite > involved and it may have other implications. > > > I wonder if anyone has any input or any better ideas? > > > > Thanks, > > > Li