[ https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861316#comment-15861316 ]
Robert Stupp commented on CASSANDRA-10520: ------------------------------------------ Generally, the whole patch LGTM. Great work! Some notes: * Even if CRC checks are disables, we always call {{ThreadLocalRandom.current().nextDouble()}}, which seems unnecessary. I've played around with that in [this commit|https://github.com/snazy/cassandra/commit/e30dcdef64ccdcff75043d50a50ecb95c26c9667]. Feel free to include it in your branch, if you think it's ok and fine to sneak this into this ticket. * When writing compressed chunks, the {{compressed}} buffer is sized to the max compression length. WDYT about just passing a buffer that's bounded to {{maxCompressedLength}} and handle the buffer-overflow-exception to write it uncompressed? My (unproven) hope is that compression can stop earlier - i.e. would not need to compress to more than {{maxCompressedLength}}. * Just for clarification - is the following correct? ** on write, if {{compressedDataLength > maxCompressedLength}}, data is written uncompressed, compressed otherwise ** on read, if {{chunkLength <= maxCompressedLength}} data is read compressed, uncompressed otherwise. * Nice minor catches btw - like the power-of-2 check and removal of boxed types The micro benchmark looks different on my Linux machine (also [added {{disk_access_mode=standard}}|https://github.com/snazy/cassandra/commit/87c528fa57c18900f5ad075fb564559ba9368944] for completeness). There are probably a lot of reasons why our JMH runs differ. Anyway, although my run shows not that big difference, it is still worth to do this as probably no-one wants to let data unnecessarily become bigger because of compression. {code} [java] Benchmark (accessMode) (compression) Mode Cnt Score Error Units [java] ReadWriteTestCompression.readFixed mmap {'enabled':false} avgt 15 5.549 ± 0.314 us/op [java] ReadWriteTestCompression.readFixed mmap {'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15 5.732 ± 0.126 us/op [java] ReadWriteTestCompression.readFixed mmap {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15 5.241 ± 0.108 us/op [java] ReadWriteTestCompression.readFixed mmap {'class': 'LZ4Compressor'} avgt 15 5.840 ± 0.388 us/op [java] ReadWriteTestCompression.readFixed mmap {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15 5.594 ± 0.085 us/op [java] ReadWriteTestCompression.readFixed mmap_index_only {'enabled':false} avgt 15 5.542 ± 0.128 us/op [java] ReadWriteTestCompression.readFixed mmap_index_only {'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15 5.580 ± 0.027 us/op [java] ReadWriteTestCompression.readFixed mmap_index_only {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15 5.431 ± 0.144 us/op [java] ReadWriteTestCompression.readFixed mmap_index_only {'class': 'LZ4Compressor'} avgt 15 5.132 ± 0.089 us/op [java] ReadWriteTestCompression.readFixed mmap_index_only {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15 5.531 ± 0.104 us/op [java] ReadWriteTestCompression.readFixed standard {'enabled':false} avgt 15 5.490 ± 0.016 us/op [java] ReadWriteTestCompression.readFixed standard {'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15 5.479 ± 0.088 us/op [java] ReadWriteTestCompression.readFixed standard {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15 5.480 ± 0.152 us/op [java] ReadWriteTestCompression.readFixed standard {'class': 'LZ4Compressor'} avgt 15 5.328 ± 0.346 us/op [java] ReadWriteTestCompression.readFixed standard {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15 5.552 ± 0.123 us/op [java] ReadWriteTestCompression.readRandom mmap {'enabled':false} avgt 15 6.142 ± 0.072 us/op [java] ReadWriteTestCompression.readRandom mmap {'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15 6.121 ± 0.060 us/op [java] ReadWriteTestCompression.readRandom mmap {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15 5.721 ± 0.095 us/op [java] ReadWriteTestCompression.readRandom mmap {'class': 'LZ4Compressor'} avgt 15 5.872 ± 0.089 us/op [java] ReadWriteTestCompression.readRandom mmap {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15 6.383 ± 0.118 us/op [java] ReadWriteTestCompression.readRandom mmap_index_only {'enabled':false} avgt 15 5.886 ± 0.117 us/op [java] ReadWriteTestCompression.readRandom mmap_index_only {'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15 6.328 ± 0.111 us/op [java] ReadWriteTestCompression.readRandom mmap_index_only {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15 5.779 ± 0.178 us/op [java] ReadWriteTestCompression.readRandom mmap_index_only {'class': 'LZ4Compressor'} avgt 15 5.945 ± 0.119 us/op [java] ReadWriteTestCompression.readRandom mmap_index_only {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15 5.779 ± 0.183 us/op [java] ReadWriteTestCompression.readRandom standard {'enabled':false} avgt 15 6.257 ± 0.049 us/op [java] ReadWriteTestCompression.readRandom standard {'class': 'LZ4Compressor', 'crc_check_chance': 0.0} avgt 15 5.755 ± 0.064 us/op [java] ReadWriteTestCompression.readRandom standard {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0} avgt 15 5.864 ± 0.144 us/op [java] ReadWriteTestCompression.readRandom standard {'class': 'LZ4Compressor'} avgt 15 6.323 ± 0.087 us/op [java] ReadWriteTestCompression.readRandom standard {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0} avgt 15 5.858 ± 0.028 us/op {code} > Compressed writer and reader should support non-compressed data. > ---------------------------------------------------------------- > > Key: CASSANDRA-10520 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10520 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths > Reporter: Branimir Lambov > Assignee: Branimir Lambov > Labels: messaging-service-bump-required > Fix For: 4.x > > Attachments: ReadWriteTestCompression.java > > > Compressing uncompressible data, as done, for instance, to write SSTables > during stress-tests, results in chunks larger than 64k which are a problem > for the buffer pooling mechanisms employed by the > {{CompressedRandomAccessReader}}. This results in non-negligible performance > issues due to excessive memory allocation. > To solve this problem and avoid decompression delays in the cases where it > does not provide benefits, I think we should allow compressed files to store > uncompressed chunks as alternative to compressed data. Such a chunk could be > written after compression returns a buffer larger than, for example, 90% of > the input, and would not result in additional delays in writing. On reads it > could be recognized by size (using a single global threshold constant in the > compression metadata) and data could be directly transferred into the > decompressed buffer, skipping the decompression step and ensuring a 64k > buffer for compressed data always suffices. -- This message was sent by Atlassian JIRA (v6.3.15#6346)