[ 
https://issues.apache.org/jira/browse/CASSANDRA-15284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764383#comment-17764383
 ] 

Sebastian Marsching commented on CASSANDRA-15284:
-------------------------------------------------

I did a bit more digging and I think that this bug got originally introduced in 
Cassandra 2.1.0: Before the {{uncompressedSize}} (which was called 
{{originalSize}} at this point in time) was only used for calculating 
compression statistics, but since 2.1.0 it has also been used when calling 
methods of the {{{}MetadataWriter{}}}, where it matters whether this number is 
correct.

As this bug is only triggered when using {{resetAndTruncate}} and this method 
is only used by {{SSTableRewriter}} (which is only used by {{{}Scrubber{}}}), 
it only appears when scrubbing SSTables. In addition to that, the rewind causes 
by {{resetAndTruncate}} has to be large enough for the wrong data size to 
extend beyond the last chunk.

Interestingly, I do not see a good rason why {{uncompressedSize}} is even 
needed: When not hitting this bug, it should always be the same as 
{{{}bufferOffset{}}}, the only exception being inside {{flushData}} between 
updating {{uncompressedSize}} and returning, because bufferOffset is only 
updated after {{flushData}} returns.

Therefore, the simplest and least invasive fix is using {{bufferOffset}} 
instead of {{uncompressedSize}} in {{{}CompressedSequentialWriter.open{}}}.

However, {{uncompressedSize}} is also used when updating {{lastFlushOffset}} 
and for compression statistics. Luckily, it seems like {{lastFlushOffset}} is 
only used by {{{}BigTableWriter.IndexWriter{}}}, which never uses the 
{{{}CompressedSequentialWriter{}}}, so this shouldn’t cause any actual problems.

BTW, {{compressedSize}} has the same problem (not rolled back in 
{{{}resetAndTruncate{}}}), but as it is only used for compression statistics, 
it does not cause any issues.

The nicer, but more invasive fix would be removing {{compressedSize}} and 
{{uncompressedSize}} completely. As written earlier, uncompressedSize can 
easily be replaced with {{{}bufferOffset{}}}. {{compressedSize}} is very 
similar to {{{}chunkOffset{}}}, the only difference being that {{chunkOffset}} 
also counts the size occupied by the checksum for each chunk, while 
{{compressedSize}} does not. Therefore, I think that it would be okay to 
replace {{compressedSize}} with {{{}chunkOffset{}}}. It would mean that the 
compressed size used for statistics would be a bit larger, but it would also 
ensure that the statistics remain correct after {{resetAndTruncate}} has been 
called. If the old method of counting should be retained, we could 
{{chunkOffset - chunkCount * 4L}} for the compression statistics.

Reproducing the bug is quite easy: Simply go to 
{{CompressedSequentialWriterTest.resetAndTruncate}} and insert the following 
two lines anywhere after {{{}writer.resetAndTruncate(pos){}}}:

{{            CompressionMetadata metadata = ((CompressedSequentialWriter) 
writer).open(0);}}
{{            metadata.chunkFor(metadata.dataLength - 1);}}

This causes the {{AssertionError}} when running the test. When inserting the 
two lines before {{{}resetAndTruncate{}}}, there is no exception, which shows 
that this problem is caused by this method.

Unfortunately, I think that I found another bug while investigating this bug: 
The {{CompressedSequentialWriter.crcMetadata}} object gets updated with the 
uncompressed data as it is written. This has the consequence that the checksum 
written to the “-Digest.crc32” file will be wrong when {{resetAndTruncate}} has 
been called. In order for the checksum to be correct, the state of the 
{{ChecksumWriter}} would have to be reset in {{{}resetAndTruncate{}}}, but this 
is not easy because the underlying {{CRC32}} class does not provide any 
constructor or method for initializing the internal state, so there is no way 
to reset it to a previous state.

So, in order to fix this problem we would either have to use or own 
implementation of the CRC-32 algorithm or we would have to calculate the byte 
sequence that gets us back to a certain state (which is possible, but not a 
nice solution, see 
https://stackoverflow.com/questions/38981523/crc32-change-initial-value for a 
discussion).

Until this problem is fixed, scrubbing has to be considered broken.

 

> AssertionError while scrubbing sstable
> --------------------------------------
>
>                 Key: CASSANDRA-15284
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15284
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/Compression
>            Reporter: Gianluigi Tiesi
>            Priority: Normal
>             Fix For: 3.11.x, 4.0.x, 4.1.x
>
>         Attachments: assert-comp-meta.diff
>
>
> I've got a damaged data file but while trying to run scrub (online or 
> offline) I always get this
> error:
>  
> {code:java}
> -- StackTrace --
> java.lang.AssertionError
>         at 
> org.apache.cassandra.io.compress.CompressionMetadata$Chunk.<init>(CompressionMetadata.java:474)
>         at 
> org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:239)
>         at 
> org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:163)
>         at 
> org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:73)
>         at 
> org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:61)
>         at 
> org.apache.cassandra.io.util.MmappedRegions.map(MmappedRegions.java:104)
>         at 
> org.apache.cassandra.io.util.FileHandle$Builder.complete(FileHandle.java:362)
>         at 
> org.apache.cassandra.io.util.FileHandle$Builder.complete(FileHandle.java:331)
>         at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.openFinal(BigTableWriter.java:336)
>         at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.openFinalEarly(BigTableWriter.java:318)
>         at 
> org.apache.cassandra.io.sstable.SSTableRewriter.switchWriter(SSTableRewriter.java:322)
>         at 
> org.apache.cassandra.io.sstable.SSTableRewriter.doPrepare(SSTableRewriter.java:370)
>         at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
>         at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:184)
>         at 
> org.apache.cassandra.io.sstable.SSTableRewriter.finish(SSTableRewriter.java:357)
>         at 
> org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:291)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:1010)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager.access$200(CompactionManager.java:83)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:391)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:312)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> At the moment I've moved away the corrupted file, If you need more info fell 
> free to ask
>   
> According to the source 
> [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/compress/CompressionMetadata.java#L474]
> looks like the requested chung length is <= 0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to