[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

Stefan Miklosovic (Jira) Tue, 18 Apr 2023 03:43:07 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713529#comment-17713529
 ]


Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 10:42 AM:
-------------------------------------------------------------------------

All I prefer to see is to have a simple map of parameters into 
ParametrizedClass which would have exactly same names as for their CQL 
counterparts. They would be literally just used there. There does not seem to 
be any collisions with that. I do not get the "obsession" with having 
parameters for these compressors to follow the same names of CompressionParams. 
(or following same units).

_The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass._ 

why do they have to be extracted in the first place? 

for hints_compression in yaml we have:

{code}
# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
#     parameters:
#         -
{code}

For commitlog_compression we have:

{code}
# Compression to apply to the commit log. If omitted, the commit log
# will be written uncompressed.  LZ4, Snappy, and Deflate compressors
# are supported.
# commitlog_compression:
#   - class_name: LZ4Compressor
#     parameters:
#         -
{code}

for sstable_compression, I would prefer to see the exact same way of the 
configuration. Why are we trying to introduce completely custom way of the 
configuration which exists nowhere else with extracting some parameters 
outside? Why we can not use same stuff?

I do not think that we should blindly follow "the parameters names and their 
units". I think we already discussed this. I already explained all advantages 
of following what we have there already. If we make it explicitly clear that 
these parameters are exactly same as if they would be put into compression 
params upon table creation, they would save us a lot of headache to have 
something completely custom and people would need to put there parameters and 
their names as they are used to. Why do we want to change all of this to 
further confuse the user?

EDIT: to further support my case with having same parameters and their units in 
cassandra.yaml as they are specified in CQL upon table creation, what happens 
in practice is that people who want to take advantage of this configuration 
would just copy-paste CQL snippet for compression params and they would make it 
like entries in the map by hitting "enter" on the keyboard and they are done. I 
highly doubt that they would like to specify "other units" just for the sake of 
consistency with the rest of cassandra.yaml. I do not think they care at all. 
They just want to copy it over from CQL and call it the day.


was (Author: smiklosovic):
All I prefer to see is to have a simple map of parameters into 
ParametrizedClass which would have exactly same names as for their CQL 
counterparts. They would be literally just used there. There does not seem to 
be any collisions with that. I do not get the "obsession" with having 
parameters for these compressors to follow the same names of CompressionParams. 
(or following same units).

_The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass._ 

why do they have to be extracted in the first place? 

for hints_compression in yaml we have:

{code}
# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
#     parameters:
#         -
{code}

For commitlog_compression we have:

{code}
# Compression to apply to the commit log. If omitted, the commit log
# will be written uncompressed.  LZ4, Snappy, and Deflate compressors
# are supported.
# commitlog_compression:
#   - class_name: LZ4Compressor
#     parameters:
#         -
{code}

for sstable_compression, I would prefer to see the exact same way of the 
configuration. Why are we trying to introduce completely custom way of the 
configuration which exists nowhere else with extracting some parameters 
outside? Why we can not use same stuff?

I do not think that we should blindly follow "the parameters names and their 
units". I think we already discussed this. I already explained all advantages 
of following what we have there already. If we make it explicitly clear that 
these parameters are exactly same as if they would be put into compression 
params upon table creation, they would save us a lot of headache to have 
something completely custom and people would need to put there parameters and 
their names as they are used to. Why do we want to change all of this to 
further confuse the user?

> Default setting (yaml) for SSTable compression
> ----------------------------------------------
>
>                 Key: CASSANDRA-12937
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: Michael Semb Wever
>            Assignee: Claude Warren
>            Priority: Low
>              Labels: AdventCalendar2021, lhf
>             Fix For: 5.x
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

Reply via email to