chevaris opened a new issue, #1258:
URL: https://github.com/apache/curator/issues/1258
PersistentTTLNode (Curator 5.8.0 and probably previous versions) has a
corner case that prevents that ZNode is deleted when program running the recipe
is stop in certain situations.
This is the sequence:
- Start the recipe with TTL of 30secs -> Container Node is created
- Stop the program (or the program crashes in production) that runs the
recipe before Touch TTL node is created. This is NOT deterministic and
basically a background thread is scheduled to run TTL/2 (by default). In worse
case scenario de TTL node could take up to 15 secs in this example to be created
When this is happening the CONTAINER node is never deleted. One option is to
increase the touchScheduleFactor, BUT still this solution looks not correct for
me.
In my view the recipe should watch the Container Node itself and just when
the node is created, the recipe could trigger TOUCH node creation to minimize
the opportunity window in which the problem happens.
I attach a test case that shows the problem, and I include a fixed recipe
that solves it.
https://github.com/chevaris/curator/commit/6da77252f24841d8f8e85572cad9ac6d86cac5e7
Anyhow, no matter how fast the touch ZNode is added the race condition will
be always there, and in my view this is a limitation on the strategy used for
this recipe that should be documented.
Regards,
Cheva
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]