[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

Branimir Lambov (JIRA) Thu, 16 Feb 2017 05:51:14 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869945#comment-15869945
 ]


Branimir Lambov commented on CASSANDRA-10520:
---------------------------------------------

Rebased and updated the patch and triggered another round of testing.

bq. The micro benchmark looks different on my Linux machine

That's very similar to what I get with page cache enabled. Is it possible you 
have run the benchmark without turning it off?

bq. When writing compressed chunks, the compressed buffer is sized to the max 
compression length. WDYT about just passing a buffer that's bounded to 
maxCompressedLength and handle the buffer-overflow-exception to write it 
uncompressed?

This is a possibility but as the use of exceptions on non-exceptional code 
paths is a bit of a frowned-upon practice I am worried that it can cause 
optimization headaches -- JIT refusing to optimize or doing the wrong thing, 
resulting in compression always taking longer than it should. At this point I 
don't really want to risk something like that, but it's an option to explore if 
we get some free cycles later on to verify that there are no performance issues 
in all relevant configurations.

bq. Just for clarification - is the following correct?

Yes, that is correct. {{<=}}/compressed is the typical path, hence placed first 
on the read side, and on the write path we have an {{if}} that is only 
triggered on the alternative. The latter could use a {{! <=}} pattern to make 
the subcondition identical, but that feels unnatural and more complex than 
necessary.

bq. Even if CRC checks are disabled...

Suggested patch included, thanks.


> Compressed writer and reader should support non-compressed data.
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-10520
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>              Labels: messaging-service-bump-required
>             Fix For: 4.x
>
>         Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

Reply via email to