[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

Alex Petrov (Jira) Mon, 15 Apr 2024 06:09:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837237#comment-17837237
 ]


Alex Petrov edited comment on CASSANDRA-12937 at 4/15/24 1:08 PM:
------------------------------------------------------------------

bq. Yes, I think this is the most ideal solution. If somebody wants to 
experiment with a new compressor and similar, there would need to be some knob 
to override it, like some JMX method or similar, and all risks attached to that 
(divergence of the configuration caused by operator's negligence) would be on 
him.

Some things are actually quite useful for gradual rollout. For example, 
compression. You probably do not want to rewrite your sstables across the 
entire cluster. Similar arguments may be made for canary deployments of 
memtable settings and other things. 

I agree that it is fine if these parameters are completely transient (i.e. if 
you have set it to something that diverges from the clusterwide value, it will 
get reverted back after the node bounce). In such case, probably they will not 
go through TCM and will be purely node-local.

Examples of things that are now configuable via yaml but will be configurable 
via TCM if we go ahead with this proposal: partitioner, memtable configuration, 
default compaction strategy, compression. As Sam has mentioned, "which specific 
value makes it into schema just depends on which instance acts as the 
coordinator for a given DCL statement".

bq. but I remain unconvinced that just picking the defaults from whatever node 
happens to be coordinating is the right way to go.

I have talked with Sam shortly just to make sure I understand it correctly 
before trying to describe it. Since this was first worded in a way that 
suggested a problem but not directly proposed a solution (possibly described 
elsewhere), I will attempt to do this. Sam has already described a part of the 
solution as:

bq. That should probably be in a parallel local datastructure though, not in 
the node's local log table as we don't want to ship those local defaults to 
peers when providing log catchup (because they should use their own defaults).

The part that was missing for me was where would the values be coming from, and 
what would be the precedence. When executing a {{CREATE}} statement on some 
node _without_ specifying, say, compression, the statement will be created and 
executed without the value for compression set at all. Every node will pick the 
value from its ephemeral parallel structure Sam described (which is also 
settable via JMX and alike like Stefan mentioned). If no value is present in 
this table, it will be picked from yaml (alternatively, we could just populate 
this structure from yaml, too, but I consider these things roughly equivalent).


was (Author: ifesdjeen):
bq. Yes, I think this is the most ideal solution. If somebody wants to 
experiment with a new compressor and similar, there would need to be some knob 
to override it, like some JMX method or similar, and all risks attached to that 
(divergence of the configuration caused by operator's negligence) would be on 
him.

Some things are actually quite useful for gradual rollout. For example, 
compression. You probably do not want to rewrite your sstables across the 
entire cluster. Similar arguments may be made for canary deployments of 
memtable settings and other things. 

I agree that it is fine if these parameters are completely transient (i.e. if 
you have set it to something that diverges from the clusterwide value, it will 
get reverted back after the node bounce). In such case, probably they will not 
go through TCM and will be purely node-local.

Examples of things that are now configuable via yaml but will be configurable 
via TCM if we go ahead with this proposal: partitioner, memtable configuration, 
default compaction strategy, compression. As Sam has mentioned, "which specific 
value makes it into schema just depends on which instance acts as the 
coordinator for a given DCL statement".

bq. but I remain unconvinced that just picking the defaults from whatever node 
happens to be coordinating is the right way to go.

I have talked with Sam shortly just to make sure I understand it correctly 
before trying to describe it. Since this was first worded in a way that 
suggested a problem but not directly proposed a solution (possibly described 
elsewhere), I will attempt to do this. Sam has already described a part of the 
solution as:

bq. That should probably be in a parallel local datastructure though, not in 
the node's local log table as we don't want to ship those local defaults to 
peers when providing log catchup (because they should use their own defaults).

The part that was missing for me was where would the values be coming from, and 
what would be the precedence. When executing a {CREATE} statement on some node 
_without_ specifying, say, compression, the statement will be created and 
executed without the value for compression set at all. Every node will pick the 
value from its ephemeral parallel structure Sam described (which is also 
settable via JMX and alike like Stefan mentioned). If no value is present in 
this table, it will be picked from yaml (alternatively, we could just populate 
this structure from yaml, too, but I consider these things roughly equivalent).

> Default setting (yaml) for SSTable compression
> ----------------------------------------------
>
>                 Key: CASSANDRA-12937
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: Michael Semb Wever
>            Assignee: Stefan Miklosovic
>            Priority: Low
>              Labels: AdventCalendar2021
>             Fix For: 5.x
>
>          Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

Reply via email to