[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-10-03 Thread Adam Holmberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16189996#comment-16189996
 ] 

Adam Holmberg commented on CASSANDRA-10520:
---

I confirmed that it displays as expected when set in compression options. 

{code:none}
cassandra@cqlsh:test> desc t;

CREATE TABLE test.t (
k int PRIMARY KEY,
value int
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 130
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

cassandra@cqlsh:test> alter table test.t with compression = 
{'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor', 'min_compress_ratio': '1.2'};
cassandra@cqlsh:test> desc test.t;

CREATE TABLE test.t (
k int PRIMARY KEY,
value int
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor', 'min_compress_ratio': '1.2'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 130
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

cassandra@cqlsh:test>

{code}

I did note that it's not in the metadata if it's not explicitly set.

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: client-impacting, messaging-service-bump-required
> Fix For: 4.0
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-09-28 Thread Adam Holmberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184391#comment-16184391
 ] 

Adam Holmberg commented on CASSANDRA-10520:
---

Thanks for the heads up, Paulo. From my cursory read of the patch, it looks 
like this is just a new compression parameter, not a new top-level table 
option. I believe the drivers treat compression parameters as an opaque map. 
Are you saying it doesn't work in current cqlsh?

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: client-impacting, messaging-service-bump-required
> Fix For: 4.0
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-09-27 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183606#comment-16183606
 ] 

Paulo Motta commented on CASSANDRA-10520:
-

Tagging this as {{client-impacting}} to support the new 
{{min_compression_ratio}} option. (cc [~aholmber] [~omichallat] [~adutra]). 
Please disconsider if this was already taken care of, but I haven't found any 
ticket to support this, and the new option doesn't seem to be working on trunk 
cqlsh/stress.

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: client-impacting, messaging-service-bump-required
> Fix For: 4.0
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-03-29 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946982#comment-15946982
 ] 

Branimir Lambov commented on CASSANDRA-10520:
-

Committed follow-up patch as 
[b953f9eb2a3a1ba992fbb561b1e35531607739f3|https://github.com/apache/cassandra/commit/b953f9eb2a3a1ba992fbb561b1e35531607739f3].

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-03-28 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944662#comment-15944662
 ] 

Branimir Lambov commented on CASSANDRA-10520:
-

If there are no strong feelings for or against, tomorrow I am going to commit 
the latest patch, making the compression off by default always, as the safest 
and least-confusing option.

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-03-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943380#comment-15943380
 ] 

Andrés de la Peña commented on CASSANDRA-10520:
---

Any news on this? This is blocking test failures such as 
[CASSANDRA-13250|https://issues.apache.org/jira/browse/CASSANDRA-13250]

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-02-27 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886000#comment-15886000
 ] 

Robert Stupp commented on CASSANDRA-10520:
--

(Opened CASSANDRA-13274 for the follow-up to fix the rolling-upgrade issue.)

I've got a slight preference for Sylvain's proposal to not touch existing 
tables but enable it on new tables.
This is a nice improvement for 4.0+ sstables and all values for 
{{min_compress_ratio}} go through the changed code anyway - want to say: I'm 
open to both proposals.

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-02-27 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885764#comment-15885764
 ] 

Branimir Lambov commented on CASSANDRA-10520:
-

I'm not very comfortable with using different defaults for existing vs new 
tables -- a choice like this looks fragile and can lead to confusion. 
Nevertheless, my personal preference is to not introduce unnecessary risk, and 
my vote would be to keep the option off by default (for both existing and new 
tables).

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-02-27 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885494#comment-15885494
 ] 

Sylvain Lebresne commented on CASSANDRA-10520:
--

bq. should reopen CASSANDRA-11128

We don't re-open issues that have made it in a release, but it's worth opening 
a followup, yes.

bq.  this means that for upgrades from 3.0/3.x to 4.0 users must ensure 
this/11128 is fixed.

Yes, and I wonder if avoiding this isn't a good enough reason to avoid enabling 
this by default, _at least on existing tables_. I mean, in general, I wonder if 
we shouldn't default on being more conservative for existing tables on upgrade. 
That is, what I'd suggest is that we'd default existing table to this being 
disabled (no change from now), but enable it on new table (basically, default 
to false if the table doesn't have the option, but force it to true on new 
tables if not provided).

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-02-26 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885063#comment-15885063
 ] 

Stefania commented on CASSANDRA-10520:
--

If the default value of {{min_compress_ratio}} is changed in the follow-up 
commit, the cqlshlib tests should also be updated, see 
[8bfa82769|https://github.com/apache/cassandra/commit/8bfa8276968db3d25d3e305a0fad156eeb904c34].
 

[{{test_describe}}|http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_tests/TestCqlsh/test_describe/]
 and 
[{{test_describe_mv}}|http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_tests/TestCqlsh/test_describe_mv/]
 need fixing if not already done, thanks.

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-02-24 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882754#comment-15882754
 ] 

Robert Stupp commented on CASSANDRA-10520:
--

The upgrade failures are caused by the old nodes trying to issue a 
{{MigrationTask}} against upgraded node(s), which should be prevented by {{if 
(!MigrationManager.shouldPullSchemaFrom(endpoint))}} in 
{{MigrationTask.runMayThrow}}. However, due to CASSANDRA-11128, which 
introduced {{version = Math.min(version, current_version);}} in 
{{MessagingService.setVersion()}}, the {{shouldPullSchemaFrom}} returns 
{{true}} even for nodes with a newer messaging-version.

CASSANDRA-11128 was introduced in 3.0 and we haven't updated the 
messaging-version since.

I think we can safely resolve this issue as fixed but should reopen 
CASSANDRA-11128. /cc [~slebresne]

However, this means that for upgrades from 3.0/3.x to 4.0 users must ensure 
this/11128 is fixed.

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-02-24 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882551#comment-15882551
 ] 

Branimir Lambov commented on CASSANDRA-10520:
-

Patch:
|[code|https://github.com/blambov/cassandra/tree/10520-addendum]|[utest|http://cassci.datastax.com/job/blambov-10520-addendum-testall/]|[dtest|http://cassci.datastax.com/job/blambov-10520-addendum-dtest/]|

Addresses the problem by making the default to not use uncompressed chunks, and 
make sure the flag is not included in the compression-params-as-map when it 
matches the default. Unfortunately this means anyone who wants to use this 
optimization will have to set the flag manually (and assert by doing so that 
there won't be any need for communication with pre-V4 nodes), but currently 
this appears to be the only option.

The dtest patch above is not necessary with this change (it wasn't yet 
committed).

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-02-21 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876204#comment-15876204
 ] 

Branimir Lambov commented on CASSANDRA-10520:
-

Thank you, committed as 
[f97db26f8e9989d2294cccbea8a06589253313f2|https://github.com/apache/cassandra/commit/f97db26f8e9989d2294cccbea8a06589253313f2].

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-02-21 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875966#comment-15875966
 ] 

Robert Stupp commented on CASSANDRA-10520:
--

re: microbench: yea - page cache, that makes sense. don't have the possibility 
to run it on an appropriate linux box until the weekend.

re: exceptions: exceptions vs. JIT - that was one of the question marks behind 
my question.

The 2 new dtest failures look to be fixed by the dtest patch, which LGTM.

Anyway, +1 on the patch. Nice work!


> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-02-16 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869945#comment-15869945
 ] 

Branimir Lambov commented on CASSANDRA-10520:
-

Rebased and updated the patch and triggered another round of testing.

bq. The micro benchmark looks different on my Linux machine

That's very similar to what I get with page cache enabled. Is it possible you 
have run the benchmark without turning it off?

bq. When writing compressed chunks, the compressed buffer is sized to the max 
compression length. WDYT about just passing a buffer that's bounded to 
maxCompressedLength and handle the buffer-overflow-exception to write it 
uncompressed?

This is a possibility but as the use of exceptions on non-exceptional code 
paths is a bit of a frowned-upon practice I am worried that it can cause 
optimization headaches -- JIT refusing to optimize or doing the wrong thing, 
resulting in compression always taking longer than it should. At this point I 
don't really want to risk something like that, but it's an option to explore if 
we get some free cycles later on to verify that there are no performance issues 
in all relevant configurations.

bq. Just for clarification - is the following correct?

Yes, that is correct. {{<=}}/compressed is the typical path, hence placed first 
on the read side, and on the write path we have an {{if}} that is only 
triggered on the alternative. The latter could use a {{! <=}} pattern to make 
the subcondition identical, but that feels unnatural and more complex than 
necessary.

bq. Even if CRC checks are disabled...

Suggested patch included, thanks.


> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
> Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-02-10 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861316#comment-15861316
 ] 

Robert Stupp commented on CASSANDRA-10520:
--

Generally, the whole patch LGTM. Great work!

Some notes:
* Even if CRC checks are disables, we always call 
{{ThreadLocalRandom.current().nextDouble()}}, which seems unnecessary. I've 
played around with that in [this 
commit|https://github.com/snazy/cassandra/commit/e30dcdef64ccdcff75043d50a50ecb95c26c9667].
 Feel free to include it in your branch, if you think it's ok and fine to sneak 
this into this ticket.
* When writing compressed chunks, the {{compressed}} buffer is sized to the max 
compression length. WDYT about just passing a buffer that's bounded to 
{{maxCompressedLength}} and handle the buffer-overflow-exception to write it 
uncompressed? My (unproven) hope is that compression can stop earlier - i.e. 
would not need to compress to more than {{maxCompressedLength}}.
* Just for clarification - is the following correct?
** on write, if {{compressedDataLength > maxCompressedLength}}, data is written 
uncompressed, compressed otherwise
** on read, if {{chunkLength <= maxCompressedLength}} data is read compressed, 
uncompressed otherwise.
* Nice minor catches btw - like the power-of-2 check and removal of boxed types

The micro benchmark looks different on my Linux machine (also [added 
{{disk_access_mode=standard}}|https://github.com/snazy/cassandra/commit/87c528fa57c18900f5ad075fb564559ba9368944]
 for completeness). There are probably a lot of reasons why our JMH runs 
differ. Anyway, although my run shows not that big difference, it is still 
worth to do this as probably no-one wants to let data unnecessarily become 
bigger because of compression.

{code}
 [java] Benchmark   (accessMode)
   (compression)  Mode  Cnt  
Score   Error  Units
 [java] ReadWriteTestCompression.readFixed  mmap
   {'enabled':false}  avgt   15  
5.549 ± 0.314  us/op
 [java] ReadWriteTestCompression.readFixed  mmap
 {'class': 'LZ4Compressor', 'crc_check_chance': 0.0}  avgt   15  
5.732 ± 0.126  us/op
 [java] ReadWriteTestCompression.readFixed  mmap  {'class': 
'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0}  avgt   15 
 5.241 ± 0.108  us/op
 [java] ReadWriteTestCompression.readFixed  mmap
  {'class': 'LZ4Compressor'}  avgt   15  
5.840 ± 0.388  us/op
 [java] ReadWriteTestCompression.readFixed  mmap
   {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0}  avgt   15  
5.594 ± 0.085  us/op
 [java] ReadWriteTestCompression.readFixed   mmap_index_only
   {'enabled':false}  avgt   15  
5.542 ± 0.128  us/op
 [java] ReadWriteTestCompression.readFixed   mmap_index_only
 {'class': 'LZ4Compressor', 'crc_check_chance': 0.0}  avgt   15  
5.580 ± 0.027  us/op
 [java] ReadWriteTestCompression.readFixed   mmap_index_only  {'class': 
'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0}  avgt   15 
 5.431 ± 0.144  us/op
 [java] ReadWriteTestCompression.readFixed   mmap_index_only
  {'class': 'LZ4Compressor'}  avgt   15  
5.132 ± 0.089  us/op
 [java] ReadWriteTestCompression.readFixed   mmap_index_only
   {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0}  avgt   15  
5.531 ± 0.104  us/op
 [java] ReadWriteTestCompression.readFixed  standard
   {'enabled':false}  avgt   15  
5.490 ± 0.016  us/op
 [java] ReadWriteTestCompression.readFixed  standard
 {'class': 'LZ4Compressor', 'crc_check_chance': 0.0}  avgt   15  
5.479 ± 0.088  us/op
 [java] ReadWriteTestCompression.readFixed  standard  {'class': 
'LZ4Compressor', 'min_compress_ratio': 0.0, 'crc_check_chance': 0.0}  avgt   15 
 5.480 ± 0.152  us/op
 [java] ReadWriteTestCompression.readFixed  standard
  {'class': 'LZ4Compressor'}  avgt   15  
5.328 ± 0.346  us/op
 [java] ReadWriteTestCompression.readFixed  standard
   {'class': 'LZ4Compressor', 'min_compress_ratio': 0.0}  avgt   15  
5.552 ± 0.123  us/op
 [java] ReadWriteTestCompression.readRandom mmap
   {'enabled':false}  avgt   15  
6.142 ± 0.072  us/op
 [java] ReadWriteTestCompression.readRandom mmap
 

[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2017-01-18 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828095#comment-15828095
 ] 

Branimir Lambov commented on CASSANDRA-10520:
-

Updated and rebased branch for 4.0:
|[trunk 
patch|https://github.com/blambov/cassandra/tree/10520-uncompressed-chunks]|[utest|http://cassci.datastax.com/job/blambov-10520-uncompressed-chunks-testall/]|[dtest|http://cassci.datastax.com/job/blambov-10520-uncompressed-chunks-dtest/]|[dtest
 patch|https://github.com/blambov/cassandra-dtest/tree/10520]|

Performance according to the attached JMH microbench, with disabled chunk cache 
({{file_cache_size: 32}} in {{cassandra.yaml}}) to force data accesses to 
perform a read:
{code}
ReadWriteTestCompression.readFixed mmap {compression 
off}  avgt   15   4.281 ± 0.048  us/op
ReadWriteTestCompression.readFixed mmap{LZ4, crc_check_chance: 
0}  avgt   15   7.286 ± 0.107  us/op
ReadWriteTestCompression.readFixed mmap 
{LZ4}  avgt   15   9.744 ± 0.085  us/op
ReadWriteTestCompression.readFixed mmap  {LZ4, min_compress_ratio: 
0}  avgt   15  14.353 ± 0.189  us/op
ReadWriteTestCompression.readFixed  mmap_index_only {compression 
off}  avgt   15   5.264 ± 0.037  us/op
ReadWriteTestCompression.readFixed  mmap_index_only{LZ4, crc_check_chance: 
0}  avgt   15   8.284 ± 0.082  us/op
ReadWriteTestCompression.readFixed  mmap_index_only 
{LZ4}  avgt   15  11.662 ± 0.147  us/op
ReadWriteTestCompression.readFixed  mmap_index_only  {LZ4, min_compress_ratio: 
0}  avgt   15  17.910 ± 0.110  us/op
{code}
The last option is the baseline, always store compressed, the third is with 
default threshold (1.1) which in this case stores uncompressed chunks, the 
second with CRC checks off, and the first is with compression turned off (which 
doesn't do CRC checks). The new code saves 4.5-5 microseconds of the query time.

It is possible to construct a special compressed rebufferer that is zero-copy 
and cache-bypassing for non-compressed memmapped or cached chunks, but that 
adds some complexity and IMHO should be done in a separate patch.

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
>  Labels: messaging-service-bump-required
> Fix For: 4.x
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2015-12-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033376#comment-15033376
 ] 

Sylvain Lebresne commented on CASSANDRA-10520:
--

bq. the patch does affect streaming as the sstable format differs. Is this a 
complete blocker?

We can't do a change that would prevent a new node to stream (or send any type 
of message) to an old node in a minor release, so if we can't get around that 
(and it's not immediately clear to me how we could), then it is a blocker. 
That's also reflected in the sstable versioning: we can use {{mb}} only if the 
change is basically backward compatible, which include the addition of a field 
to a metadata component that is not essential to the decoding of the sstable 
(so that old nodes that don't know about the field are fine ignoring it), but 
not a whole lot more.

Long story short, I doubt we can make that change before 4.0, though I haven't 
looked closely at the patch so it would be great if you could summarize the 
change to the sstable format this actually does.

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
> Fix For: 3.0.x
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2015-11-23 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021934#comment-15021934
 ] 

Branimir Lambov commented on CASSANDRA-10520:
-

The patch was made for 3.0, hoping that the format change will make it in. Too 
late for that now, rebased the patch and changed the versions accordingly 
(picked {{mb}}). Also added the tests you requested.

Even without the params serialization, the patch does affect streaming as the 
sstable format differs. Is this a complete blocker?

> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
> Fix For: 3.0.x
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.

2015-11-19 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013198#comment-15013198
 ] 

Robert Stupp commented on CASSANDRA-10520:
--

Generally your patch looks really good. Nice catch with the power-of-2-check in 
{{CompressionParams#validate}}!

Can you add tests for the changes using {{disk_access_mode = mmap}} ? I think 
tests using mmap generally need some love (not just for this ticket).
Also some tests for the changes in {{CompressedInputStream}} (the new code path 
is not covered in {{CompressedInputStreamTest}}) and {{CompressionParams}} (I 
think the latter is covered by the CREATE/ALTER TABLE tests - but can you have 
a 2nd look at this?).

The change in {{BigVersion}} indicates the file format change from 3.0.0 
({{ma}}). We will need a new file format (either {{mb}} if this patch goes into 
3.0.1/3.1 or {{na}} if it goes into 3.2?) and we must not touch what’s known as 
{{ma}} since it’s used in released 3.0.0.
So - instead of changing the legacy sstables for {{ma}} we need new ones for 
{{mb}}/{{na}}. Not sure how we handle that file format changes in 3.x - means 
whether {{mb}} or {{na}} - I guess {{mb}} could work for 3.x series. /cc 
[~iamaleksey] regarding version string

I *think* this ticket also affects messaging, which is probably bad because it 
means we cannot apply it to 3.x since we guarantee 3.x interoperability 
(referring to the changes in {{CompressionParams.Serializer}} and 
{{CompressedInputStream}}).
We would also need a new {{MessagingVersion}} number.


> Compressed writer and reader should support non-compressed data.
> 
>
> Key: CASSANDRA-10520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Branimir Lambov
>Assignee: Branimir Lambov
> Fix For: 3.0.x
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables 
> during stress-tests, results in chunks larger than 64k which are a problem 
> for the buffer pooling mechanisms employed by the 
> {{CompressedRandomAccessReader}}. This results in non-negligible performance 
> issues due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it 
> does not provide benefits, I think we should allow compressed files to store 
> uncompressed chunks as alternative to compressed data. Such a chunk could be 
> written after compression returns a buffer larger than, for example, 90% of 
> the input, and would not result in additional delays in writing. On reads it 
> could be recognized by size (using a single global threshold constant in the 
> compression metadata) and data could be directly transferred into the 
> decompressed buffer, skipping the decompression step and ensuring a 64k 
> buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)