[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744151#comment-17744151
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 7/18/23 10:16 AM:
-------------------------------------------------------------------------

I want to highlight one not-so-obvious consequence of introducing this patch.

There is flush_compression property in cassandra.yaml doing this:

{code}
# Compression to apply to SSTables as they flush for compressed tables.
# Note that tables without compression enabled do not respect this flag.
#
# As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially
# block flushes for too long, the default is to flush with a known fast
# compressor in those cases. Options are:
#
# none : Flush without compressing blocks but while still doing checksums.
# fast : Flush with a fast compressor. If the table is already using a
#        fast compressor that compressor is used.
# table: Always flush with the same compressor that the table uses. This
#        was the pre 4.0 behavior.
#
# flush_compression: fast
{code}

By default, "fast" compressor is LZ4, it will use that one.

So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one 
does "nodetool snapshot", it will do snapshots of all tables, user defined as 
well as system tables and it is compressed with LZ4.

If one specifies a custom compressor to be used in newly introduced 
"sstable_compression" field, two situations might happen:

1) the default compressor is not "fast" (check ICompressor.Uses and 
ICompressor.recommendedUses()), that means that we need to fallback to a fast 
compressor - it will default to LZ4
2) the default compressor is fast - so we are going to flush *system* tables 
with a *custom, user-specified, ICompressor*.

I want to be sure that we are on the same page. While it makes sense to do it 
like that - a user specified a custom compressor to use - that also means that 
system tables will be compressed with that compressor as well. I want to be 
sure everybody is OK with this.


was (Author: smiklosovic):
I want to highlight one not-so-obvious consequence of introducing this patch.

There is flush_compression property in cassandra.yaml doing this:

{code}
# Compression to apply to SSTables as they flush for compressed tables.
# Note that tables without compression enabled do not respect this flag.
#
# As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially
# block flushes for too long, the default is to flush with a known fast
# compressor in those cases. Options are:
#
# none : Flush without compressing blocks but while still doing checksums.
# fast : Flush with a fast compressor. If the table is already using a
#        fast compressor that compressor is used.
# table: Always flush with the same compressor that the table uses. This
#        was the pre 4.0 behavior.
#
# flush_compression: fast
{code}

By default, "fast" compressor is LZ4, it will use that one.

So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one 
does "nodetool snapshot", it will do snapshots of all tables, user defined as 
well as system tables and it is compressed with LZ4.

If one specifies a default compressor to be used in newly introduced 
"sstable_compression" field, two situations might happen:

1) the default compressor is not "fast" (check ICompressor.Uses and 
ICompressor.recommendedUses()), that means that we need to fallback to a fast 
compressor - it will default to LZ4
2) the default compressor is fast - so we are going to flush *system* tables 
with a *custom, user-specified, ICompressor*.

I want to be sure that we are on the same page. While it makes sense to do it 
like that - a user specified a custom compressor to use - that also means that 
system tables will be compressed with that compressor as well. I want to be 
sure everybody is OK with this.

> Default setting (yaml) for SSTable compression
> ----------------------------------------------
>
>                 Key: CASSANDRA-12937
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: Michael Semb Wever
>            Priority: Low
>              Labels: AdventCalendar2021
>             Fix For: 5.x
>
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to