[jira] [Commented] (CASSANDRA-18111) Centralize all snapshot operations to SnapshotManager and cache snapshots

Stefan Miklosovic (Jira) Thu, 01 Aug 2024 02:35:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870143#comment-17870143
 ]


Stefan Miklosovic commented on CASSANDRA-18111:
-----------------------------------------------

[~paulo] Thanks for the review!

I know there are some bits and pieces which are still there in 
ColumnFamilyStore but I left them there because of backward compatibility. 
These methods are exposed in Mbeans so I just left them there. I think that the 
idea around the introduction of SnapshotManagerMbean makes sense and I dont 
have any problem with it but it would mean that the methods in StorageService 
etc would need to say and it would just call SnapshshotManager where all the 
stuff would be put with additional logic left in StorageService etc ... We can 
then deprecate the methods in StorageService and similar, sure.

for B) yes, there is a dependency here but I don't find it problematic too 
much. I moved the core snapshotting method into SnapshotManager too. That 
method is thread safe (or, at least, I have not identified any place where it 
is not) so moving it to SnapshotManager should be safe and we unload some logic 
from CFS as well. 

Additional comment on Keyspace-level snapshot atomicity

Not sure about this, maybe example would be helpful here? We can indeed iterate 
on this work in the future.

Snapshots with multiple data directories

I think that what the patch does makes a lot of sense where. I consider not 
having a manifest in each data dir quite an issue, border-line equaling that 
with a bug. I do not know why we have just one manifest per whole logical 
snapshot and it is not done per data dir already. Then fixing the "files" field 
in this regard is just a logical conclusion. I would keep this in place.

To elaborate on this "watching" feature more ... You know what ... what if we 
just didn't do anything at all? I consider periodic scanning of snapshots by 
some thread, e.g. every 5 minutes or so, inferior to just not doing anything at 
all. The reason is that manual removal of a snapshot is not something an 
operator would do regularly. This stuff is most probably just done by mistake 
or similar. I also do not think that listing of snapshots is so frequently-done 
operation that it justifies do run a background thread. When we do that, then 
if snapshots are listed every 3 days (for example), then the amount of 
unnecessary scanning of the disk if snapshots still exist is actually worse 
than us doing nothing. We would cumulatively end up with more disk IO which 
beats the whole purpose of doing this in-memory caching. 

On the other hand, I consider it important that we are able to "reload" all 
snapshots / refresh them. That indeed goes to the disk but it is an operation 
an operator consciously chose to do instead of checking that regularly by us.

Your last sentence ... if we do that, then if a file is removed from one of the 
directories, how would we detect that it is missing if it can be in a different 
directory? We could not validate "snapshot per data dir", we would need to 
validate the whole snapshot with all its data dirs at once. If this is 
acceptable then we can include all files in one manifest PLUS the manifest 
would be in each data dir.  

> Centralize all snapshot operations to SnapshotManager and cache snapshots
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18111
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18111
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Snapshots
>            Reporter: Paulo Motta
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Everytime {{nodetool listsnapshots}} is called, all data directories are 
> scanned to find snapshots, what is inefficient.
> For example, fetching the 
> {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric 
> can take half a second (CASSANDRA-13338).
> This improvement will also allow snapshots to be efficiently queried via 
> virtual tables (CASSANDRA-18102).
> In order to do this, we should:
> a) load all snapshots from disk during initialization
> b) keep a collection of snapshots on {{SnapshotManager}}
> c) update the snapshots collection anytime a new snapshot is taken or cleared
> d) detect when a snapshot is manually removed from disk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18111) Centralize all snapshot operations to SnapshotManager and cache snapshots

Reply via email to