I don't feel too strongly about making this enabled by default if we have good reasons to keep it disabled, which it seems like we do. I do feel for the new operators coming to Cassandra who will need to discover that they need to set this before going to production. Maybe that's more of a documentation thing than a default configuration thing though.
On Mon, Apr 27, 2026 at 8:18 AM Štefan Miklošovič <[email protected]> wrote: > You know what ... what about just leaving it how it is now. > _latest.yaml might be used for people evaluating this tech, they might > just want to run "the latest stuff fully optimized" out of the box. I > get that. But when we set a threshold like that and we start to run > that check by default, it might actually bring more harm than good > because then people will be spending time on the realisation of what > went wrong, they just started the database after a week and it doesn't > boot? Heh? It is something different for prod usage when people are > intentionally buying into this feature but I do not think that having > it really on default is actually a good idea. > > It is great that this discussion resulted in CASSANDRA-21290 and > CASSANDRA-21246 delivered so it is both more robust and configurable > but having that enabled on defaults ... I don't know. > > On Tue, Apr 21, 2026 at 6:59 PM Isaac Reath <[email protected]> wrote: > > > > Thank you to everyone who contributed to CASSANDRA-21293 and > CASSANDRA-21290 incorporate Chris's feedback and improve the reliability of > this feature! > > > > Coming back to defaults, Chris mentions using a default minimum value of > 3 hours as the amount of time a node is allowed to be down if any tables in > the cluster have a very small or 0 gc_grace_seconds. I think this makes > sense as a default value for the cassandra_latest.yaml implementation. Are > there any concerns with using 3 hours or suggestions for an alternative > value? > > > > On Wed, Mar 25, 2026 at 12:42 PM Isaac Reath <[email protected]> > wrote: > >> > >> Happy to add in the docs into the PR for CASSANDRA-21247 if there's > nothing already available. > >> > >> On Wed, Mar 25, 2026 at 12:36 PM Štefan Miklošovič < > [email protected]> wrote: > >>> > >>> Hi Chris, > >>> > >>> If you have some time to put a patch together with these improvements > >>> that would be great. I can definitely review. > >>> > >>> Regards > >>> > >>> On Wed, Mar 25, 2026 at 5:24 PM Chris Lohfink <[email protected]> > wrote: > >>> > > >>> > We enabled this across our fleet. We did make a couple small tweaks > we might wanna consider > >>> > 1. (important one) if the process shuts down mid write you can end > up with a corrupt json hint file then the process refuses to start up. We > added fallback to the timestamp of the file and an atomic write. > >>> > 2 is we made it a minimum of 3 hours which was because we do have a > lot of things that are set to 0 (or very short) gc_grace in the fleet and > that we don't care about. There should probably be a setting for minimum > threshold otherwise they can't really do anything other than delete > heartbeat after every restart > >>> > 3. add some documentation to evaluate and delete heartbeat if its > blocking startup > >>> > > >>> > On Wed, Mar 25, 2026 at 10:17 AM Štefan Miklošovič < > [email protected]> wrote: > >>> >> > >>> >> Hi Isaac, > >>> >> > >>> >> I am fine with having that property set to true in > cassandra_latest.yaml only. > >>> >> > >>> >> Regards > >>> >> > >>> >> On Tue, Mar 24, 2026 at 10:05 PM Isaac Reath <[email protected]> > wrote: > >>> >> > > >>> >> > Hi all, > >>> >> > > >>> >> > There’s ongoing interest in preventing nodes from starting after > being offline longer than gc_grace_seconds, to avoid data resurrection > issues. > >>> >> > > >>> >> > This is already supported via `check_data_resurrection.enabled` > (added in 4.1 via CASSANDRA-17180), but it remains disabled by default. > Recent discussion in CASSANDRA-21221 suggests that operators may be unaware > of this setting and end up reimplementing similar safeguards themselves. > >>> >> > > >>> >> > Given that this feature has now been available in 4.1 and 5.0, > I'd like to propose enabling it by default in cassandra_latest.yaml for > 6.0. Are there any concerns with making this change? > >>> >> > > >>> >> > Isaac >
