[jira] [Updated] (CASSANDRA-18111) Centralize all snapshot operations to SnapshotManager and cache snapshots

Stefan Miklosovic (Jira) Thu, 20 Jun 2024 01:41:06 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stefan Miklosovic updated CASSANDRA-18111:
------------------------------------------
    Status: Patch Available  (was: In Progress)

https://github.com/apache/cassandra/pull/3374 passes java 17 pre-commit CI, I 
think this is something we can consider to be reviewable.

I tried to take quite a holistic approach so it may seem as I restructured a 
lot of stuff. One not-so-obvious change is that we were clearing ephemeral 
snapshots as part of the startup sequence as part of startup checks and the 
logic was done is such a way that we were not using SnapshotManager because it 
was present as a singleton in StorageService and it was way early for that. 
That meant that we were trying to identify snapshots "manually" / by other 
means as by SnapshotManager. I rewrote that so SnapshotManager is a standalone 
"service" and the only source of truth for snapshot management globally. That 
also removed a lot of stuff from e.g. Directories which contained methods 
dealing with snapshots, they are not there anymore.

I will try to do some perf testing on this but I can't put any timestamp on 
that.

SnapshotManager is also integrated into everything where we deal with snapshots 
as such. 

> Centralize all snapshot operations to SnapshotManager and cache snapshots
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18111
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18111
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Snapshots
>            Reporter: Paulo Motta
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Everytime {{nodetool listsnapshots}} is called, all data directories are 
> scanned to find snapshots, what is inefficient.
> For example, fetching the 
> {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric 
> can take half a second (CASSANDRA-13338).
> This improvement will also allow snapshots to be efficiently queried via 
> virtual tables (CASSANDRA-18102).
> In order to do this, we should:
> a) load all snapshots from disk during initialization
> b) keep a collection of snapshots on {{SnapshotManager}}
> c) update the snapshots collection anytime a new snapshot is taken or cleared
> d) detect when a snapshot is manually removed from disk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-18111) Centralize all snapshot operations to SnapshotManager and cache snapshots

Reply via email to