[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856211#comment-17856211 ]
Stefan Miklosovic commented on CASSANDRA-18111: ----------------------------------------------- {code} @Test public void testTableMetadataSize() { TableSnapshot tableSnapshot = new TableSnapshot("my_keyspace-name", "my_table_name_a_litt_bit_longer", UUID.randomUUID(), UUID.randomUUID().toString(), // tag Instant.now(), null, Set.of(new File("/a/b/d/c/d/d/sdsd/sdsdsdsd/sdsd/sdsdsdsd/sds/ds/ds/dsd"), new File("/a/b/d/c/d/d/sdsd/sdsdsdsd/sdsd/lklklks /sds/ds/ds/dsd")), false); long size = meter.measure(tableSnapshot); long deepSize = meter.measureDeep(tableSnapshot); System.out.println(deepSize); // 768 TableSnapshot tableSnapshot2 = new TableSnapshot("my_keyspace-name", "my_table_name_a_litt_bit_longer", UUID.randomUUID(), UUID.randomUUID().toString(), // tag Instant.now(), null, Set.of(new File("/a/b/d/c/d/d/sdsd/sdsdsdsd/sdsd/sdsdsdsd/sds/ds/ds/dsd")), false); long deepSize2 = meter.measureDeep(tableSnapshot2); System.out.println(deepSize2); // 648 // 328 System.out.println(meter.measureDeep(Set.of(new File("/a/b/d/c/d/d/sdsd/sdsdsdsd/sdsd/sdsdsdsd/sds/ds/ds/dsd")))); // 120 System.out.println(meter.measureDeep(Set.of("/a/b/d/c/d/d/sdsd/sdsdsdsd/sdsd/sdsdsdsd/sds/ds/ds/dsd"))); } {code} So with two datadirs it takes around 768 bytes, with 1 data dir 648 bytes. When we manage to store Set of Strings instead of Set of Files, it will basically cut the size of that to half so we may say that we save 200 bytes per one data dir hence it would be around 500 bytes per snapshot entry if we average all things out. So if I count it correctly then you would put around 200 000 snapshots into 100 MiB. 20k snapshots 10MiB, 10k snapshots 5MiB. I do not think we need to deal with this ... > Cache snapshots in memory > ------------------------- > > Key: CASSANDRA-18111 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18111 > Project: Cassandra > Issue Type: Improvement > Components: Local/Snapshots > Reporter: Paulo Motta > Assignee: Stefan Miklosovic > Priority: Normal > Fix For: 5.x > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Everytime {{nodetool listsnapshots}} is called, all data directories are > scanned to find snapshots, what is inefficient. > For example, fetching the > {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric > can take half a second (CASSANDRA-13338). > This improvement will also allow snapshots to be efficiently queried via > virtual tables (CASSANDRA-18102). > In order to do this, we should: > a) load all snapshots from disk during initialization > b) keep a collection of snapshots on {{SnapshotManager}} > c) update the snapshots collection anytime a new snapshot is taken or cleared > d) detect when a snapshot is manually removed from disk. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org