[ https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869945#comment-15869945 ]
Branimir Lambov commented on CASSANDRA-10520: --------------------------------------------- Rebased and updated the patch and triggered another round of testing. bq. The micro benchmark looks different on my Linux machine That's very similar to what I get with page cache enabled. Is it possible you have run the benchmark without turning it off? bq. When writing compressed chunks, the compressed buffer is sized to the max compression length. WDYT about just passing a buffer that's bounded to maxCompressedLength and handle the buffer-overflow-exception to write it uncompressed? This is a possibility but as the use of exceptions on non-exceptional code paths is a bit of a frowned-upon practice I am worried that it can cause optimization headaches -- JIT refusing to optimize or doing the wrong thing, resulting in compression always taking longer than it should. At this point I don't really want to risk something like that, but it's an option to explore if we get some free cycles later on to verify that there are no performance issues in all relevant configurations. bq. Just for clarification - is the following correct? Yes, that is correct. {{<=}}/compressed is the typical path, hence placed first on the read side, and on the write path we have an {{if}} that is only triggered on the alternative. The latter could use a {{! <=}} pattern to make the subcondition identical, but that feels unnatural and more complex than necessary. bq. Even if CRC checks are disabled... Suggested patch included, thanks. > Compressed writer and reader should support non-compressed data. > ---------------------------------------------------------------- > > Key: CASSANDRA-10520 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10520 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths > Reporter: Branimir Lambov > Assignee: Branimir Lambov > Labels: messaging-service-bump-required > Fix For: 4.x > > Attachments: ReadWriteTestCompression.java > > > Compressing uncompressible data, as done, for instance, to write SSTables > during stress-tests, results in chunks larger than 64k which are a problem > for the buffer pooling mechanisms employed by the > {{CompressedRandomAccessReader}}. This results in non-negligible performance > issues due to excessive memory allocation. > To solve this problem and avoid decompression delays in the cases where it > does not provide benefits, I think we should allow compressed files to store > uncompressed chunks as alternative to compressed data. Such a chunk could be > written after compression returns a buffer larger than, for example, 90% of > the input, and would not result in additional delays in writing. On reads it > could be recognized by size (using a single global threshold constant in the > compression metadata) and data could be directly transferred into the > decompressed buffer, skipping the decompression step and ensuring a 64k > buffer for compressed data always suffices. -- This message was sent by Atlassian JIRA (v6.3.15#6346)