[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837237#comment-17837237 ]
Alex Petrov edited comment on CASSANDRA-12937 at 4/15/24 1:08 PM: ------------------------------------------------------------------ bq. Yes, I think this is the most ideal solution. If somebody wants to experiment with a new compressor and similar, there would need to be some knob to override it, like some JMX method or similar, and all risks attached to that (divergence of the configuration caused by operator's negligence) would be on him. Some things are actually quite useful for gradual rollout. For example, compression. You probably do not want to rewrite your sstables across the entire cluster. Similar arguments may be made for canary deployments of memtable settings and other things. I agree that it is fine if these parameters are completely transient (i.e. if you have set it to something that diverges from the clusterwide value, it will get reverted back after the node bounce). In such case, probably they will not go through TCM and will be purely node-local. Examples of things that are now configuable via yaml but will be configurable via TCM if we go ahead with this proposal: partitioner, memtable configuration, default compaction strategy, compression. As Sam has mentioned, "which specific value makes it into schema just depends on which instance acts as the coordinator for a given DCL statement". bq. but I remain unconvinced that just picking the defaults from whatever node happens to be coordinating is the right way to go. I have talked with Sam shortly just to make sure I understand it correctly before trying to describe it. Since this was first worded in a way that suggested a problem but not directly proposed a solution (possibly described elsewhere), I will attempt to do this. Sam has already described a part of the solution as: bq. That should probably be in a parallel local datastructure though, not in the node's local log table as we don't want to ship those local defaults to peers when providing log catchup (because they should use their own defaults). The part that was missing for me was where would the values be coming from, and what would be the precedence. When executing a {{CREATE}} statement on some node _without_ specifying, say, compression, the statement will be created and executed without the value for compression set at all. Every node will pick the value from its ephemeral parallel structure Sam described (which is also settable via JMX and alike like Stefan mentioned). If no value is present in this table, it will be picked from yaml (alternatively, we could just populate this structure from yaml, too, but I consider these things roughly equivalent). was (Author: ifesdjeen): bq. Yes, I think this is the most ideal solution. If somebody wants to experiment with a new compressor and similar, there would need to be some knob to override it, like some JMX method or similar, and all risks attached to that (divergence of the configuration caused by operator's negligence) would be on him. Some things are actually quite useful for gradual rollout. For example, compression. You probably do not want to rewrite your sstables across the entire cluster. Similar arguments may be made for canary deployments of memtable settings and other things. I agree that it is fine if these parameters are completely transient (i.e. if you have set it to something that diverges from the clusterwide value, it will get reverted back after the node bounce). In such case, probably they will not go through TCM and will be purely node-local. Examples of things that are now configuable via yaml but will be configurable via TCM if we go ahead with this proposal: partitioner, memtable configuration, default compaction strategy, compression. As Sam has mentioned, "which specific value makes it into schema just depends on which instance acts as the coordinator for a given DCL statement". bq. but I remain unconvinced that just picking the defaults from whatever node happens to be coordinating is the right way to go. I have talked with Sam shortly just to make sure I understand it correctly before trying to describe it. Since this was first worded in a way that suggested a problem but not directly proposed a solution (possibly described elsewhere), I will attempt to do this. Sam has already described a part of the solution as: bq. That should probably be in a parallel local datastructure though, not in the node's local log table as we don't want to ship those local defaults to peers when providing log catchup (because they should use their own defaults). The part that was missing for me was where would the values be coming from, and what would be the precedence. When executing a {CREATE} statement on some node _without_ specifying, say, compression, the statement will be created and executed without the value for compression set at all. Every node will pick the value from its ephemeral parallel structure Sam described (which is also settable via JMX and alike like Stefan mentioned). If no value is present in this table, it will be picked from yaml (alternatively, we could just populate this structure from yaml, too, but I consider these things roughly equivalent). > Default setting (yaml) for SSTable compression > ---------------------------------------------- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config > Reporter: Michael Semb Wever > Assignee: Stefan Miklosovic > Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org