[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856206#comment-17856206 ]
Stefan Miklosovic edited comment on CASSANDRA-18111 at 6/19/24 8:03 AM: ------------------------------------------------------------------------ I don't think that is necessary. 100MiB for snapshot metadata? Do you have any idea how much snapshots that would be? I need to measure that but if we are going to cache TableSnapshot with three String-s, one UUID, one boolean and two Instant's (at most, as expiresAt is almost always null), then this can not occupy more than few KiB, if even that. If we say that this object has 2KiB in memory, (hard to believe, I need to really measure it), then 100MiB would be like 50k snapshots at the node. I think that if somebody holds 50k snapshots at a node, not expiring, and they list it all the time, then their problems are away bigger than us trying to just hold it in the memory. Not talking about the actual implementation of that. How would you even cap that? That means that if you were about to list _all snapshots_, you would not actually get them all? Which would one be evicted first if they are loaded all practically at the same time so we can not in practice differentiate for the insertion time into the cache (oldest out, newest in). I think that the course of action is that I will finish what I have "the normal way", then we will measure the impact. [~frankgh] is also invited to test this. I think that once he tries the patch, it will improve the performance too. I still do not know what "issues" they had with it. was (Author: smiklosovic): I don't think that is necessary. 100MiB for snapshot metadata? Do you have any idea how much snapshots that would be? I need to measure that but if we are going to cache TableSnapshot with three String-s, one UUID, one boolean and two Instant's (at most, as expiresAt is almost always null), then this can not occupy more than few KiB, if even that. If we say that this object has 2KiB in memory, (hard to believe, I need to really measure it), then 100MiB would be like 50k snapshots at the node. I think that if somebody holds 50k snapshots at a node, not expiring, and they list it all the time, then their problems are away bigger than us trying to just hold it in the memory. Not talking about the actual implementation of that. How would you even cap that? That means that if you were about to list _all snapshots_, you would not actually got them all? Which would would be evicted if they are loaded all practically at the same time so we can not in practice differentiate for the insertion time into the cache (oldest out, newest in). I think that the course of action is that I will finish what I have "the normal way", then we will measure the impact. [~frankgh] is also invited to test this. I think that once he tries the patch, it will improve the performance too. I still do not know what "issues" they had with it. > Cache snapshots in memory > ------------------------- > > Key: CASSANDRA-18111 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18111 > Project: Cassandra > Issue Type: Improvement > Components: Local/Snapshots > Reporter: Paulo Motta > Assignee: Stefan Miklosovic > Priority: Normal > Fix For: 5.x > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Everytime {{nodetool listsnapshots}} is called, all data directories are > scanned to find snapshots, what is inefficient. > For example, fetching the > {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric > can take half a second (CASSANDRA-13338). > This improvement will also allow snapshots to be efficiently queried via > virtual tables (CASSANDRA-18102). > In order to do this, we should: > a) load all snapshots from disk during initialization > b) keep a collection of snapshots on {{SnapshotManager}} > c) update the snapshots collection anytime a new snapshot is taken or cleared > d) detect when a snapshot is manually removed from disk. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org