[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793789#comment-17793789 ] Brandon Williams commented on CASSANDRA-16418: -- bq. Feel free to create a new ticket to add it back or piggyback in some other ticket, I'd be glad to review. That would be CASSANDRA-18824 > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793739#comment-17793739 ] Paulo Motta commented on CASSANDRA-16418: - bq. However, from the API pov CompactionManager.performCleanup can be now called anytime - I think it was important precondition for that method - wouldn't be good to keep it there, just changing the condition to check pending ranges rather than joining status? Good point, this was overlooked during review - I suggested removing that just to cleanup but looking back I think there is value in keeping it for safety if this API is used elsewhere. Feel free to create a new ticket to add it back or piggyback in some other ticket, I'd be glad to review. To me it'd be nice that CompactionManager API is a dumb local API unaware of token ranges/membership status since it's just a local operation, but practically these concerns are mixed across the codebase so developers expect that any local API is safe from a distributed standpoint. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793600#comment-17793600 ] Jacek Lewandowski commented on CASSANDRA-16418: --- bq. I think that check was deemed unnecessary after a new check was added to StorageService.forceKeyspaceCleanup to prevent starting cleanup when there are pending ranges (ie. when a node is joining). [~paulo] - it looks ok from the user point of view. However, from the API pov {{CompactionManager.performCleanup}} can be now called anytime - I think it was important precondition for that method - wouldn't be good to keep it there, just changing the condition to check pending ranges rather than joining status? > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792967#comment-17792967 ] Jacek Lewandowski commented on CASSANDRA-16418: --- Thanks [~samt] > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792966#comment-17792966 ] Sam Tunnicliffe commented on CASSANDRA-16418: - The TCM patch removed this check from {{forceKeyspaceCleanup}} and replaced it with more granular and consistent checks as cleanup is run for each CFS. With TCM there isn't the same concept of pending ranges. Instead, replica sets for reads and writes are independent and modified separately during range movements. If a node will acquire a range as part of a range movement, it is added as a write replica at the start of the operation, before any streaming takes place. There's a [walkthrough of this|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MappingClusterOperationstoMetadataTransitions(Events)] in the CEP doc. Owned ranges for cleanup of a CFS are computed when the cleanup task is submitted, so if any range movement has started by that point the ranges involved would already be known to cleanup. What we don't have, and which the original check in {{forceKeyspaceCleanup}} also did not guard against, is a range movement which starts in the window between grabbing the owned ranges [here|https://github.com/apache/cassandra/blob/ae0842372ff6dd1437d026f82968a3749f555ff4/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L634-L642] and selecting the sstables for the cleanup task [here|https://github.com/apache/cassandra/blob/ae0842372ff6dd1437d026f82968a3749f555ff4/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L656]. The safest way to protect against this is probably to simply cancel any running cleanup tasks when the set of local ranges is modified, in {{CFS::invalidateLocalRanges}}. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792862#comment-17792862 ] Brandon Williams commented on CASSANDRA-16418: -- Trunk does have [this|https://github.com/apache/cassandra/blame/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L624-L628] check though, which was added by TCM. It's not clear to me either why this disparity exists. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792847#comment-17792847 ] Paulo Motta commented on CASSANDRA-16418: - {quote}Why that check in CompactionManager was removed? Was it needed for tests to make them run? I'm afraid that the check could have been legit for production use. {quote} I think that check was deemed unnecessary after a new check was added to [StorageService.forceKeyspaceCleanup|https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/service/StorageService.java#L3907] to prevent starting cleanup when there are pending ranges (ie. when a node is joining). It's not clear to me why this latter check is not present in [trunk|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L2524] (while it's present in 4.0/4.1). > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792799#comment-17792799 ] Jacek Lewandowski commented on CASSANDRA-16418: --- Why that check in {{CompactionManager}} was removed? Was it needed for tests to make them run? I'm afraid that the check could have been legit for production use. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762444#comment-17762444 ] Szymon Miezal commented on CASSANDRA-16418: --- [~brandon.williams] that's reasonable, thank you for confirmation. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762442#comment-17762442 ] Brandon Williams commented on CASSANDRA-16418: -- Let's use a new ticket so the lineage can be tracked more clearly. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762441#comment-17762441 ] Szymon Miezal commented on CASSANDRA-16418: --- Thank you [~smiklosovic], now knowing that decision wasn't deliberate I think it make sense to prepare those patches. Shall it be a separate ticket for the sake of doing the backporting work or should we still use this one? > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762438#comment-17762438 ] Stefan Miklosovic commented on CASSANDRA-16418: --- hey [~szymon.miezal] , if you prepare a patch for that (while you are on it, why not for 3.0 too if that problem is there as well?) then I can commit that for you. We probably just thought that branches lower from 4.0 are not worth the effort. I do not think there is any special reason behind that. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762436#comment-17762436 ] Szymon Miezal commented on CASSANDRA-16418: --- Given it appears to be a genuine problem on 3.11, is there any reason why it hasn't been merge/ported to that version? > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680021#comment-17680021 ] Stefan Miklosovic commented on CASSANDRA-16418: --- +1. thanks! > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.x > > Time Spent: 3h > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679268#comment-17679268 ] Paulo Motta commented on CASSANDRA-16418: - Test failures look unrelated - for instance [test_decommissioned_node_cant_rejoin|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2205/testReport/junit/dtest.topology_test/TestTopology/test_decommissioned_node_cant_rejoin/] failed on 4.0 and [test_simultaneous_bootstrap|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2207/testReport/junit/dtest-novnode.bootstrap_test/TestBootstrap/test_simultaneous_bootstrap/] on trunk, but they don't seem to be at all related to this ticket. I ran these tests locally on the respective branches and they're passing. ok to merge [~smiklosovic]? > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.x > > Time Spent: 3h > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678437#comment-17678437 ] Paulo Motta commented on CASSANDRA-16418: - Resubmitted CI after [test fix|https://github.com/linzuro/cassandra/commit/8de9c73d28291d2df67727ffcb7292f8c21b3442]: |branch||CI|| |[CASSANDRA-16418-4.0|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-16418-4.0]|[#2205|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2205/] (running)| |[CASSANDRA-16418-4.1|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-16418-4.1]|[#2206|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2206/] (running)| |[CASSANDRA-16418-trunk|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-16418-trunk]|[#2207|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2207/] (queued)| > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.x > > Time Spent: 3h > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678335#comment-17678335 ] Paulo Motta commented on CASSANDRA-16418: - There seems to be a legit failure on [CleanupTest|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2202/testReport/org.apache.cassandra.db/CleanupTest/testCleanup/] > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.x > > Time Spent: 1h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677987#comment-17677987 ] Paulo Motta commented on CASSANDRA-16418: - Prepared [~linzuro]'s patch for commit on 4.0/4.1/trunk and submitted CI: |branch||CI|| |[CASSANDRA-16418-4.0|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-16418-4.0]|[#2202|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2202/] (running)| |[CASSANDRA-16418-4.1|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-16418-4.1]|[#2203|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2202/] (running)| |[CASSANDRA-16418-trunk|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-16418-trunk]|[#2204|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2204/] (queued)| > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.x > > Time Spent: 1h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677879#comment-17677879 ] Stefan Miklosovic commented on CASSANDRA-16418: --- Thanks [~linzuro] for taking care of that. There is one unused import which fails the build. I am overall +1 when we build this successfully. [~paulo] would you mind to take the lead here and eventually merge it, please? > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.x > > Time Spent: 1h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17676706#comment-17676706 ] Stefan Miklosovic commented on CASSANDRA-16418: --- I ve commented on this https://github.com/apache/cassandra/pull/2061/files I am +1 on successful build. Comments in the PR are just nits, I leave this to the author's discretion. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.x > > Time Spent: 1h > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675941#comment-17675941 ] Stefan Miklosovic commented on CASSANDRA-16418: --- could you please create a PR from your branch so we may potentially comment on it? > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.x > > Time Spent: 50m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675740#comment-17675740 ] Paulo Motta commented on CASSANDRA-16418: - I rebased and squashed Lindsey's commit [on this branch|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-16418] + updated tests [from this commit|https://github.com/pauloricardomg/cassandra/commit/702f77d247893a51461823268ad6a20cd6c1a021] and submitted CI on https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-16418 (still queued). I think this is ready for a second round of review. [~JoshuaMcKenzie] [~stefan.miklosovic] would you have cycles to take a look? > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.x > > Time Spent: 50m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675737#comment-17675737 ] Paulo Motta commented on CASSANDRA-16418: - In order to check the tests were reliably reproducing the issue on Lindsey's [branch|https://github.com/apache/cassandra/pull/2061] I commented out the following excerpt: {noformat} InetAddressAndPort localAddress = FBUtilities.getBroadcastAddressAndPort(); Integer pendingRangesCount = tokenMetadata.getPendingRanges(keyspaceName, localAddress).size(); if (pendingRangesCount > 0) { throw new RuntimeException("Node is involved in cluster membership changes. Not safe to run cleanup."); } {noformat} And expected both [testCleanupFailsDuringOngoingDecommission|https://github.com/apache/cassandra/pull/2061/files#diff-68d2cd75caa0e4091c7206717116594bdcb0aab38f72f6d6afa44eac60466e13R41] and [testCleanupFailsDuringOngoingBootstrap|https://github.com/apache/cassandra/pull/2061/files#diff-68d2cd75caa0e4091c7206717116594bdcb0aab38f72f6d6afa44eac60466e13R85] to fail. Even though the tests failed most of the time, sometimes the tests passed so data was not being wrongly cleaned up as expected. The reason for this is that these tests require that the cleanup is executed between the sstables are transferred by streaming and the ring membership operation is finished. There is a small chance cleanup is not executed within this window so the issue will not reproduce, especially if we run this test on faster hardware. I took a slightly different testing approach on [this commit|https://github.com/pauloricardomg/cassandra/blob/702f77d247893a51461823268ad6a20cd6c1a021/test/distributed/org/apache/cassandra/distributed/test/ring/CleanupFailureTest.java#L40] that inserts data while a node is bootstrapping or decommissioning and checks the data is present after a cleanup is run. This was able to reliably reproduce the issue when the excerpt above is commented out. The updated test is more deterministic because we don't depend on streaming nor timing. Furthermore this makes the test faster since we don't need so many rows to reproduce the issue, which is needed with the streaming approach. A nice benefit of this approach is that since we only run cleanup a single time while the node is bootstrapping/decommissioning, we're able to [verify that the cleanup fails with the expected error message|https://github.com/pauloricardomg/cassandra/blob/702f77d247893a51461823268ad6a20cd6c1a021/test/distributed/org/apache/cassandra/distributed/test/ring/CleanupFailureTest.java#L105]. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.x > > Time Spent: 50m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652521#comment-17652521 ] Paulo Motta commented on CASSANDRA-16418: - Nice work [~linzuro]! The approach and test looks mostly good to me, added a few comments to the PR. Can you add a similar regression test for bootstrap? The test should fail when the bootstrap safeguard is removed. I think you can find some bootstrap dtest examples on \{{org.apache.cassandra.distributed.test.ring.BootstrapTest}}. I have submitted a preliminary CI run for your branch on: * https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2151/ > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Time Spent: 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651377#comment-17651377 ] Lindsey Zurovchak commented on CASSANDRA-16418: --- Created [PR|https://github.com/apache/cassandra/pull/2061] for this and made the following changes: * Added check during cleanup to ensure the node has no pending ranges before proceeding * Bug did not exist for bootstrap due to existing safety check but the check was one level below other safeguard checks so moved it to same location > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org