[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856038#comment-17856038 ]
Stefan Miklosovic commented on CASSANDRA-18111: ----------------------------------------------- [~frankgh] Well ... I think the name of this ticket is a little bit misleading. If you look at the goals [~paulo] outlined, all of them are already done and in, except d) I just addressed here. So it is not going to be worse than it is. I am just closing the gaps. The gap is that if somebody removes the snapshot by hand from disk (rm -rf data/ks/tb/snapshots) then, as of now, the current state is that they would be hanging in listsnapshots output. I have fixed this under d). Then, I dug deeper and realized that even it is cached we still go to the disk anyway because of the sizes, which we might pre-compute too upon loading. If you do not want to hold this in memory, that would be something which is not implemented regardless of this patch being committed or not. As of now, we do hold it all in the memory anyway (we just happen to not reflect manually removed snapshots in the nodetool output and we still go to do disk everytime to get the sizes). I am trying to figure out how to achieve what you want - you basically want to list the snapshots every time on demand, not having it in the memory but just listing it all over again, do I understand it correctly? We might probably have strategies how to list the snapshots - either to have it all in memory or to have it manual and you would just chose what you want. But the tradeoff would be that it would be more io-intensive ... > Cache snapshots in memory > ------------------------- > > Key: CASSANDRA-18111 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18111 > Project: Cassandra > Issue Type: Improvement > Components: Local/Snapshots > Reporter: Paulo Motta > Assignee: Stefan Miklosovic > Priority: Normal > Fix For: 5.x > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Everytime {{nodetool listsnapshots}} is called, all data directories are > scanned to find snapshots, what is inefficient. > For example, fetching the > {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric > can take half a second (CASSANDRA-13338). > This improvement will also allow snapshots to be efficiently queried via > virtual tables (CASSANDRA-18102). > In order to do this, we should: > a) load all snapshots from disk during initialization > b) keep a collection of snapshots on {{SnapshotManager}} > c) update the snapshots collection anytime a new snapshot is taken or cleared > d) detect when a snapshot is manually removed from disk. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org