You know what ... what about just leaving it how it is now.
_latest.yaml might be used for people evaluating this tech, they might
just want to run "the latest stuff fully optimized" out of the box. I
get that. But when we set a threshold like that and we start to run
that check by default, it might actually bring more harm than good
because then people will be spending time on the realisation of what
went wrong, they just started the database after a week and it doesn't
boot? Heh? It is something different for prod usage when people are
intentionally buying into this feature but I do not think that having
it really on default is actually a good idea.

It is great that this discussion resulted in CASSANDRA-21290 and
CASSANDRA-21246 delivered so it is both more robust and configurable
but having that enabled on defaults ... I don't know.

On Tue, Apr 21, 2026 at 6:59 PM Isaac Reath <[email protected]> wrote:
>
> Thank you to everyone who contributed to CASSANDRA-21293 and CASSANDRA-21290 
> incorporate Chris's feedback and improve the reliability of this feature!
>
> Coming back to defaults, Chris mentions using a default minimum value of 3 
> hours as the amount of time a node is allowed to be down if any tables in the 
> cluster have a very small or 0 gc_grace_seconds. I think this makes sense as 
> a default value for the cassandra_latest.yaml implementation. Are there any 
> concerns with using 3 hours or suggestions for an alternative value?
>
> On Wed, Mar 25, 2026 at 12:42 PM Isaac Reath <[email protected]> wrote:
>>
>> Happy to add in the docs into the PR for CASSANDRA-21247 if there's nothing 
>> already available.
>>
>> On Wed, Mar 25, 2026 at 12:36 PM Štefan Miklošovič <[email protected]> 
>> wrote:
>>>
>>> Hi Chris,
>>>
>>> If you have some time to put a patch together with these improvements
>>> that would be great. I can definitely review.
>>>
>>> Regards
>>>
>>> On Wed, Mar 25, 2026 at 5:24 PM Chris Lohfink <[email protected]> wrote:
>>> >
>>> > We enabled this across our fleet. We did make a couple small tweaks we 
>>> > might wanna consider
>>> > 1. (important one) if the process shuts down mid write you can end up 
>>> > with a corrupt json hint file then the process refuses to start up. We 
>>> > added fallback to the timestamp of the file and an atomic write.
>>> > 2 is we made it a minimum of 3 hours which was because we do have a lot 
>>> > of things that are set to 0 (or very short) gc_grace in the fleet and 
>>> > that we don't care about. There should probably be a setting for minimum 
>>> > threshold otherwise they can't really do anything other than delete 
>>> > heartbeat after every restart
>>> > 3. add some documentation to evaluate and delete heartbeat if its 
>>> > blocking startup
>>> >
>>> > On Wed, Mar 25, 2026 at 10:17 AM Štefan Miklošovič 
>>> > <[email protected]> wrote:
>>> >>
>>> >> Hi Isaac,
>>> >>
>>> >> I am fine with having that property set to true in cassandra_latest.yaml 
>>> >> only.
>>> >>
>>> >> Regards
>>> >>
>>> >> On Tue, Mar 24, 2026 at 10:05 PM Isaac Reath <[email protected]> 
>>> >> wrote:
>>> >> >
>>> >> > Hi all,
>>> >> >
>>> >> > There’s ongoing interest in preventing nodes from starting after being 
>>> >> > offline longer than gc_grace_seconds, to avoid data resurrection 
>>> >> > issues.
>>> >> >
>>> >> > This is already supported via `check_data_resurrection.enabled` (added 
>>> >> > in 4.1 via CASSANDRA-17180), but it remains disabled by default. 
>>> >> > Recent discussion in CASSANDRA-21221 suggests that operators may be 
>>> >> > unaware of this setting and end up reimplementing similar safeguards 
>>> >> > themselves.
>>> >> >
>>> >> > Given that this feature has now been available in 4.1 and 5.0, I'd 
>>> >> > like to propose enabling it by default in cassandra_latest.yaml for 
>>> >> > 6.0. Are there any concerns with making this change?
>>> >> >
>>> >> > Isaac

Reply via email to