[ 
https://issues.apache.org/jira/browse/CASSANDRA-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478310#comment-17478310
 ] 

Paulo Motta edited comment on CASSANDRA-17267 at 1/19/22, 2:11 AM:
-------------------------------------------------------------------

The snapshot true size is calculated by 
[Directories.getTrueAllocatedSizeIn|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Directories.java#L960].

This method creates a 
[SSTableSizeSummer|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Directories.java#L1054]
 using the snapshot folder as the list of files to be iterated/counted and the 
list of live sstables as the list of files to be skipped (toSkip set).

The 
[isAcceptable|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Directories.java#L1064]
 method decides whether the snapshot file size must be counted by checking if 
it's an sstable component and if it's not present on the "toSkip" set.

However the snapshot files will never be present in the "toSkip" set, causing 
the snapshot file sizes to always be accounted - whether or not a 
"corresponding" live sstable is found.

I believe the original implementer's intent was to verify that the 
"corresponding" sstable file is present in the "toSkip" set, but it doesn't 
reconstruct the original sstable file from the snapshot file before checking 
it's present on the set.

I created a [PR|https://github.com/apache/cassandra/pull/1408] with a 
reproduction and preliminary fix.

The reproduction can be found [on this 
test|https://github.com/apache/cassandra/pull/1408/files#diff-ef5be0b69d0440b76021282c4b24bad69770ef9419be260df2169f49921db377R346].

[The 
fix|https://github.com/apache/cassandra/pull/1408/files#diff-bb20d0c655884c2211213190ae4787ace619cdff4c0235f147db7dfbf1e7a869R1067]
 only counts the snapshot file size if the file is an sstable component *AND* 
if a corresponding live sstable component can *not* be found on 
"snapshot_dir/../../file_name" (since the snapshot file is found on 
<table_dir>/snapshots/<snapshot_name>/file).

The same snapshot of the ticket description is displayed as following after the 
fix:
{noformat}
$ nodetool listsnapshots
Snapshot Details:
Snapshot name Keyspace name Column family name True size Size on disk
test          ks1           tbl1               0 bytes   5.69 KiB

Total TrueDiskSpaceUsed: 0 bytes
{noformat}


was (Author: paulo):
The snapshot true size is calculated by 
[Directories.getTrueAllocatedSizeIn|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Directories.java#L960].

This method creates a 
[SSTableSizeSummer|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Directories.java#L1054]
 using the snapshot folder as the list of files to be iterated/counted and the 
list of live sstables as the list of files to be skipped (toSkip set).

The 
[isAcceptable|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Directories.java#L1064]
 method decides whether the snapshot file size must be counted by checking if 
it's an sstable component and if it's not present on the "toSkip" set.

However the snapshot files will never be present in the "toSkip" set, causing 
the snapshot file sizes to always be accounted - whether or not a 
"corresponding" live sstable is found.

I believe the original implementer's intent was to verify that the 
"corresponding" sstable file is present in the "toSkip" set, but it doesn't 
reconstruct the original sstable file from the snapshot file before checking 
it's present on the set.

I created a [PR|https://github.com/apache/cassandra/pull/1408] with a 
reproduction and preliminary fix.

The reproduction can be found [on this 
test|https://github.com/apache/cassandra/pull/1408/files#diff-ef5be0b69d0440b76021282c4b24bad69770ef9419be260df2169f49921db377R346].

[The 
fix|https://github.com/apache/cassandra/pull/1408/files#diff-bb20d0c655884c2211213190ae4787ace619cdff4c0235f147db7dfbf1e7a869R1067]
 only counts the snapshot file size if the file is an sstable component *AND* 
if a live sstable component can *not* be found on 
"snapshot_dir/../../file_name" (since the snapshot file is found on 
<table_dir>/snapshots/<snapshot_name>/file).

The same snapshot of the ticket description is displayed as following after the 
fix:
{noformat}
$ nodetool listsnapshots
Snapshot Details:
Snapshot name Keyspace name Column family name True size Size on disk
test          ks1           tbl1               0 bytes   5.69 KiB

Total TrueDiskSpaceUsed: 0 bytes
{noformat}

> Snapshot true size is miscalculated
> -----------------------------------
>
>                 Key: CASSANDRA-17267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17267
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Snapshots
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>            Priority: Normal
>
> As far as I understand, the snapshot "size on disk" is the total size of the 
> snapshot, while the "true size" is the (size_on_disk - size_of_live_sstables).
> I created a snapshot on a 3.11 node without traffic and I expected the "true 
> size" to be 0KB since the original sstables were still present, but this 
> didn't seem to be the case:
> {noformat}
> $ nodetool listsnapshots
> Snapshot Details:
> Snapshot name Keyspace name Column family name True size Size on disk
> test          ks1           tbl1               4.86 KiB  5.69 KiB
> Total TrueDiskSpaceUsed: 4.86 KiB
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to