[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data

Stefan Miklosovic (Jira) Fri, 22 Apr 2022 07:10:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526452#comment-17526452
 ]


Stefan Miklosovic commented on CASSANDRA-17180:
-----------------------------------------------

[~paulo] thanks for finally looking into it, I ll deal with it over the weekend 
to finally move this over the line.

I had implemented something similar to your postActions idea but Brandon's 
opinion was that we are inventing just something else here. But I see you moved 
that "execute post actions loop" after all checks are verified in 
CassandraDaemon instead of having it in StartupChecks.verify directly. I am 
fine with your take on that, is Brandon too?

Good  to know this is going to check system_distributed and system_auth too.

As for the default place of the heartbeat file, thats good point. Maybe we 
should go a little bit wild here and we might save it to /tmp/ ? I think that 
has the most guarantee of being writable. I do not like the fact that there is 
suddenly some file in area for sstables / tables. Other existing software might 
have a problem with this. For example when you are backuping, you would need to 
what ... exclude or include that file? It depends how people look at these 
backups etc. For that reason I would place it somewhere else. But .... if we 
place it to /tmp, and you have more than one node running on the same machine, 
there will be the clash as two nodes happen to write to the same file {_}by 
default{_}. In that case we would have to make that file name unique, e.g. by 
including node's id. What is your take on this?

Yes we can rename that class.

I do not mind to start to write JSON into that file, but ... how do you want to 
parse that file? I still need to read it / check it and so on. By what you 
would like to replace all that logic?

 

> Implement startup check to prevent Cassandra start to spread zombie data
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17180
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17180
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Legacy/Observability
>            Reporter: Stefan Miklosovic
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>          Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> As already discussed on ML, it would be nice to have a service which would 
> periodically write timestamp to a file signalling it is up / running.
> Then, on the startup, we would read this file and we would determine if there 
> is some table which gc grace is behind this time and we would fail the start 
> so we would prevent zombie data to be likely spread around a cluster.
> https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data

Reply via email to