[jira] [Commented] (CASSANDRA-18111) Cache snapshots in memory

Stefan Miklosovic (Jira) Tue, 18 Jun 2024 12:01:35 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856038#comment-17856038
 ]


Stefan Miklosovic commented on CASSANDRA-18111:
-----------------------------------------------

[~frankgh]

Well ... I think the name of this ticket is a little bit misleading. If you 
look at the goals [~paulo] outlined, all of them are already done and in, 
except d) I just addressed here. So it is not going to be worse than it is. I 
am just closing the gaps. The gap is that if somebody removes the snapshot by 
hand from disk (rm -rf data/ks/tb/snapshots) then, as of now, the current state 
is that they would be hanging in listsnapshots output. I have fixed this under 
d).

Then, I dug deeper and realized that even it is cached we still go to the disk 
anyway because of the sizes, which we might pre-compute too upon loading. 

If you do not want to hold this in memory, that would be something which is not 
implemented regardless of this patch being committed or not. As of now, we do 
hold it all in the memory anyway (we just happen to not reflect manually 
removed snapshots in the nodetool output and we still go to do disk everytime 
to get the sizes).

I am trying to figure out how to achieve what you want - you basically want to 
list the snapshots every time on demand, not having it in the memory but just 
listing it all over again, do I understand it correctly? We might probably have 
strategies how to list the snapshots - either to have it all in memory or to 
have it manual and you would just chose what you want. But the tradeoff would 
be that it would be more io-intensive ...

> Cache snapshots in memory
> -------------------------
>
>                 Key: CASSANDRA-18111
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18111
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Snapshots
>            Reporter: Paulo Motta
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Everytime {{nodetool listsnapshots}} is called, all data directories are 
> scanned to find snapshots, what is inefficient.
> For example, fetching the 
> {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric 
> can take half a second (CASSANDRA-13338).
> This improvement will also allow snapshots to be efficiently queried via 
> virtual tables (CASSANDRA-18102).
> In order to do this, we should:
> a) load all snapshots from disk during initialization
> b) keep a collection of snapshots on {{SnapshotManager}}
> c) update the snapshots collection anytime a new snapshot is taken or cleared
> d) detect when a snapshot is manually removed from disk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18111) Cache snapshots in memory

Reply via email to