[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526105#comment-17526105 ]
Paulo Motta commented on CASSANDRA-17180: ----------------------------------------- Thanks for addressing initial comments. Finally found some time to look into this more deeply. Please find some follow-up comments below: * I think safety checks should be enabled by default, as long as people can disable it easily. Should we make this startup check enabled by default? We could improve the error message when the check fails to mention the properties to disable the check ({{startup_checks.check_data_resurrection.enabled=false}}) or ignore specific keyspace/tables ({{excluded_tables}}/{{excluded_keyspaces}})? * I didn't like check-specific logic on CassandraDaemon to schedule the heartbeat. I implemented this [suggestion|https://github.com/apache/cassandra/commit/0b3557dd43255538942a86f63dec4c36272f25e9] to move check post-action to StartupCheck class - what do you think? * Can we rename {{GcGraceSecondsOnStartupCheck}} class to {{CheckDataResurrection}} to be consistent with the check name ? * Can we make the default heartbeat file be stored on the storage directory (ie. {{DD.getLocalSystemKeyspacesDataFileLocations()}}) ? In some deployments the cassandra directory is non-writable. * I don't like adding [custom logic|https://github.com/apache/cassandra/pull/1351/files#diff-f375982492d2426d26da68e105a44d397568be76361e8156fe299e875b8041ffR214] to read/write the hearbeat file - since this is error-prone and we're just interested in the timestamp value and not the file format. Can we just use [File.setLastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#setLastModified(long)] and [File.lastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#lastModified()] to read/write the heartbeat instead? > Implement startup check to prevent Cassandra start to spread zombie data > ------------------------------------------------------------------------ > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability > Reporter: Stefan Miklosovic > Assignee: Stefan Miklosovic > Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org