The hypothetical concern described is around potential data resurrection -
would you still use resumable bootstrap if you knew that data deleted
during those STW pauses was improperly resurrected?

On Wed, Aug 3, 2022 at 2:40 PM Bowen Song via dev <dev@cassandra.apache.org>
wrote:

> I have benefited from the resumable bootstrap before, and I'm in favour of
> keeping the feature around.
>
> I've had streaming failures due to long STW GC pauses on some
> bootstrapping nodes, and I had to resume the bootstrap once or twice in
> order to get these nodes finish joinning the cluster. They had not
> experienced more long STW GC pauses since they joined the cluster. I would
> imagine I will spend a lots of time tuning the GC parameters in order get
> these nodes to join if the resumable bootstrapping feature is removed.
> Also, I'm not concerned about racing conditions involving repairs, because
> we don't run repairs while we are adding new nodes (to minimize the
> additional load on the cluster).
>
>
> On 03/08/2022 19:46, Josh McKenzie wrote:
>
> Context: https://issues.apache.org/jira/browse/CASSANDRA-17679
>
> From the .yaml comment on the param I was working on adding:
>
> In certain environments, operators may want to disable resumable bootstrap in 
> order to avoid potential correctness violations or data loss scenarios. 
> Largely this centers around nodes going down during bootstrap, tombstones 
> being written, and potential races with repair. By default we leave this on 
> as it's been enabled for quite some time, however the option to disable it is 
> more palatable now that we have zero copy streaming as that greatly 
> accelerates
>
>
> Given zero copy streaming in the system and the general unexplored
> correctness concerns of
> https://issues.apache.org/jira/browse/CASSANDRA-8838, specifically
> pointed out by Jeff here:
> https://issues.apache.org/jira/browse/CASSANDRA-8838?focusedCommentId=16900234&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16900234,
>  I've
> been chatting w/Paulo about this and we've both concluded we think the
> functionality should be made configurable, default off (?), deprecated in
> 4.2 and then completely removed next.
>
> - First: anyone have any concerns with the general arc of "remove
> resumable bootstrap and decommission"?
> - Second: Should we leave them enabled by default in 4.2 or disabled?
> - Third: Should we consider revisiting older branches with this
> functionality and making it toggle-able?
>
> ~Josh
>
>

Reply via email to