Hello,

We had a production outage due to the issue reported in
https://issues.apache.org/jira/browse/ZOOKEEPER-4306 and some other users
also ran into the same issue. I wonder if we can use this thread to discuss
and come to a consensus on how to fix it. :-)



Thanks Damien Diederen
<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ztzg> for the
contribution and patch. Limiting the number of ephemeral nodes that can be
created in a session looks like a simple and reasonable solution to me.
Having a way to enforce it will protect the system from potential OOM
issues.


I've also looked into the possibility of splitting CloseSessionTxn into
smaller ones. Unfortunately, it didn't work, as currently in Zookeeper, one
request can only have one txn. Even though we can split the paths to be
deleted into multiple batches and define sub-txn for each batch, we have to
wrap all sub-txn(s) into a single wrapper txn and associate it to the
request. At the end, when loading zk database, we still have to deserialize
the large wrapper txn, which can fail the length check (jute.maxBuffer +
zookeeper.jute.maxbuffer.extrasize).


Changing ZK to allow multiple txns for a single request looks quite
involved and it may have other implications.


I wonder if anyone has any input or any better ideas?



Thanks,


Li

Reply via email to