[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2023-05-03 Thread Maxim Muzafarov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718936#comment-17718936
 ] 

Maxim Muzafarov commented on CASSANDRA-17292:
-

Hello [~maedhroz],



I'd like to offer a few thoughts on that as well. I have been delving into the 
configuration usages while making the SettingsTable virtual table updatable and 
I think we can support nested structured configuration (with some limitations) 
as well as the flat one including storing the configuration in multiple files. 
Why can we do this? Well, we should move away from using the configuration as a 
POJO nested classes and store the configuration properties internally as a 
tree-based runtime data structure (the same concept is provided by Apache 
Commons Configuration, Lightbend Config etc.). This will give users a good deal 
of flexibility, so they can use/split their configuration as they wish.

I also mentioned some thoughts here in the {{= Alternatives =}} section, with 
all the drawbacks we might face:
[https://lists.apache.org/thread/gdtr3vp375d3nyj6h8xo7owth1s556lz]
h3. Why we can do so?

*The first thing* to note is that there is no need to map the yaml file 
structure directly to the POJO configuration classes, as these classes are not 
directly available to users and are only used in internal components. The only 
requirement is that we must clearly define the configuration properties on 
which the naming conversion is to be based: sub-components must be properly 
prefixed (we can align properties using @Replaces annotation or mapping).

So a user can use any kind of configuration below, we just need to load the 
configuration into our internal structure (or a POJO class) with an appropriate 
YamlLoader.

This is valid:
{code:java}
commitlog_directory: String
commitlog_max_compression_buffers_in_pool: int
commitlog_periodic_queue_size: int
{code}
This is also valid:
{code:java}
commitlog:
  directory: String
  max_compression_buffers_in_pool: int
  periodic_queue_size: int
{code}
This is a valid case if we split the configuration into multiple files and put 
them in the classpath to load:
{code:java}
// Let's assume Cassandra configurations yaml has 'cassandra.(.*).yaml' pattern.
cassandra.accord.yaml
cassandra.yaml
{code}
*The second thing* to note is how the whole configuration can be validated. I 
guess the answer here is relatively simple - we can reuse all the apply methods 
we have now (applySSTableFormats(), applySimpleConfig(), applyPartitioner()) 
keeping them almost 'as is'.

*The third thing* is that if we use a runtime tree-based structure to configure 
the Cassandra cluster, we are able to inject a configuration subtree right 
where it is needed. For example, @Configuration(prefix="commiglog"), so there 
will be no need to keep a layer with thousands of lines e.g. DatabaseDescriptor 
class in the source code to access the configuration. Of course, we will keep 
it to minimise the initial changes, but eventually, we can get rid of it.

{*}Last but not least{*}, we should think carefully about the performance of 
accessing configuration fields, as this could affect the performance of the 
cluster as a whole. Direct class field access is the fastest way we read a 
property value, but I think in the Cassandra project it might be OK to have 
O(1) guarantees. Some of the frameworks have configuration variables caching 
under the hood. For example, the Netflix/archaius has this 
[https://github.com/Netflix/archaius/blob/2.x/archaius2-core/src/main/java/com/netflix/archaius/DefaultPropertyFactory.java#L213],
 but the commons configuration doesn't seem to. If we go this way we will have 
to do benchmarks, but I think it will be faster enough within measurement error.
h3. Tree-based configuration frameworks

There are a lot of frameworks that store configuration in a runtime tree-based 
structure that might be considered for Cassandra: [Apache Commons 
Configuration|https://github.com/apache/commons-configuration], [Lightbend 
Config|https://github.com/lightbend/config], [Netflix 
Archaius|https://github.com/Netflix/archaius], and as I mentioned in the 
{{=Alternatives=}} section, we can consider adding the Apache Commons 
configuration. Adding something from 'apache commons' looks safer as we already 
have some libraries from 'commons', rather than adding a completely different 
configuration framework.

But whatever framework we consider, the following things need to be taken into 
account:
 - We have custom configuration datatypes such as DataStorageSpec, 
DataStorageSpec;
 - We have custom DurationSpec, so we either move them to Duration, preserving 
backwards compatibility for all supported APIs (yaml, JMX), or extend a 
considered framework with new types, we have to provide data type converters in 
the latter case;
 - An additional dependency, so the key component (configuration) of the 
project 

[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-23 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496915#comment-17496915
 ] 

Caleb Rackliffe commented on CASSANDRA-17292:
-

[~smiklosovic] The nested format you're using there looks entirely consistent 
w/ at least my current proposal, so no objections here :)

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but on a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-23 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496752#comment-17496752
 ] 

Stefan Miklosovic commented on CASSANDRA-17292:
---

I am just letting people know that I am about to merge this (1) (17220)

There will be grouped / nested startup_checks section. The (not so obvious) 
advantage of what we did there is that if you want to introduce a new startup 
check, you do not need to change anything configuration-related. We are parsing 
the config into the map where key type of that map is an enum so in order to 
include a new check, one has to just add a new entry into that enum type and 
you are done. No change on configuration side about that in cassandra.yml.

(1) https://github.com/apache/cassandra/pull/1448

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-22 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496352#comment-17496352
 ] 

Paulo Motta commented on CASSANDRA-17292:
-

{quote}I think the main point of contention then is incremental vs. 
non-incremental migration of existing configuration.
{quote}
I think we can support the new layout for new configurations added before 5.X. 
For existing (legacy) configurations I see the following options:
a) Non-incrementally migrate all legacy properties to the new layout on 5.X
b) Incrementally migrate on 4.x while allowing users to opt-in to the new 
configuration, and switch that to opt-out on 5.x.

I'm slightly in favor of b) due to splitting the work into bite-sized chunks 
and making the new layout incrementally available earlier, but I'm also OK with 
a).
{quote}I think the thought that's hard for me to escape around this is that we 
really want a coherent design for the whole configuration up-front, given the 
lack of one is at least partially to blame for the current mess.
{quote}
This is my main motivation for chiming in here with this feature-centric 
proposal, since it allows anyone to pretty easily decide where a particular 
configuration belongs using the following heuristic when adding a new 
configuration option:
 * Does this configuration belong to an existing {{{}FeatureConfiguration{}}}?
 ** If yes, add the new property to the existing {{{}FeatureConfiguration{}}}.
 ** If not, create a new {{FeatureConfiguration}} subclass for the particular 
feature that you're adding.

No prior knowledge on the "domain model" is needed to use the heuristics above 
when deciding where a configuration should go.
{quote}Then, if we have that, and we can work out whatever small 
inconsistencies exist, we can present operators with a clean v2 config file 
format in 5.0 (that requires us to do very little thinking about compatibility, 
outside checking the version element).
{quote}
The migration of "legacy configuration" to the new feature-centric layout is 
also straightforward using the same heuristics above, for whenever we decide to 
perform a "big bang" switch to the new configuration layout.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-22 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496332#comment-17496332
 ] 

Caleb Rackliffe commented on CASSANDRA-17292:
-

bq. The basic construct to create new feature configurations is the following 
class:
bq. For example this is how "HintsConfiguration" would look like:
bq. And would be represented as following on cassandra.yaml:

Gotcha. I don't think we'd be too far apart on any of that once we get into 
implementation space.

I think the main point of contention then is incremental vs. non-incremental 
migration of _existing_ configuration. (I emphasize "existing", because we have 
the opportunity to do new things like CASSANDRA-17148 without having to change 
it later if we have a coherent design for it to ultimately fit into. Having a 
small section of the config in the same format between v1 and v2 isn't really a 
problem.) There was actually a Slack thread about this very recently 
[here|https://the-asf.slack.com/archives/CK23JSY2K/p1645049135928759]. I think 
the thought that's hard for me to escape around this is that we _really_ want a 
coherent design for the whole configuration up-front, given the lack of one is 
at least partially to blame for the current mess. Then, if we have that, and we 
can work out whatever small inconsistencies exist, we can present operators 
with a clean v2 config file format in 5.0 (that requires us to do very little 
thinking about compatibility, outside checking the {{version}} element).

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-22 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496330#comment-17496330
 ] 

Paulo Motta commented on CASSANDRA-17292:
-

Added an example of new feature-centric layout mixed with legacy configuration 
on a single "cassandra.yaml" for illustration: 
https://gist.github.com/pauloricardomg/4369f4b0dd8b84421a11ae61bf2d2c7e

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-22 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496307#comment-17496307
 ] 

Paulo Motta commented on CASSANDRA-17292:
-

One additional thing I would like to note is that my proposal conciously 
abstains from attempting to pre-define a full domain model upfront, in favor of 
an incremental feature-centric approach, where we migrate the properties from 
the legacy flat format to the new feature-centric format gradually - while new 
features can already start using the new format based on the 
{{FeatureConfiguration}} abstraction - as exemplified above in the migration of 
the "hints" configuration from the old to the new model.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-22 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496302#comment-17496302
 ] 

Paulo Motta commented on CASSANDRA-17292:
-

Thanks for the additional context [~maedhroz], that is very helpful to 
understand the reasoning behind the proposed nesting.
{quote}For a moment, let's ignore the fact that there's any kind of textual 
configuration file at all for the project, but we still have all the 
knobs/systems/etc. The very first thing I would do is create a "domain model" 
for C* configuration on the Java side, a hierarchy rooted in a Configuration 
container class, which would contain members w/ types like 
ClusterConfiguration, NetworkConfiguration, StorageConfiguration, etc. These 
would be easy to navigate, would provide reasonable points for inline 
documentation, could encapsulate validation logic for relationships between 
parameters within subsystems and features, and could be passed as little 
"kernels" of configuration around the codebase, allowing for better mocking, 
etc.
{quote}
I think we're not very far from what we want the end result to look like from 
the developer's perspective, my proposal is just a simplification of yours 
where instead of a multi-level hierarchy rooted on physical resources 
(cluster/network/storage), I'm proposing a feature-centric domain model 
hierachy with a single level - each feature define its own configuration 
subtree.

The basic construct to create new feature configurations is the following class:
{code:java}
public abstract class FeatureConfiguration
{
// is the feature enabled by default?
boolean enabled = false;

// the feature name to be used in the YAML/JSON
public abstract String getFeatureName();

// whether this feature can be disabled
public boolean isOptional()
{
return true;
}
}
{code}
This would allow to easily create typed configuration for each feature:
 * CommitlogConfiguration
 * HintsConfiguration
 * MaterializedViewsConfiguration

For example this is how "HintsConfiguration" would look like:
{code:java}
public class HintsConfiguration extends FeatureConfiguration
{
   public HintsConfiguration()
   {
 this.enabled = true;
   } 

   public String getFeatureName()
   {
 return "hinted_handoff";
   }

   boolean auto_hints_cleanup = false
   Duration max_hint_window = "3h"
   Throttle hinted_handoff_throttle = "1024KiB"
   int max_hints_delivery_threads = 2
   Duration hints_flush_period = "1ms"
   Size max_hints_file_size = "128MiB"
}
{code}
And would be represented as following on {{{}cassandra.yaml{}}}:
{code:yaml}
# Commit log (cannot be disabled because isOptional()=false)
commit_log:   commitlog_sync: periodic
  commitlog_sync_period: 1ms
  commitlog_segment_size: 32MiB

# Hinted Handoff
hinted_handoff:   enabled: true
  auto_hints_cleanup: false
  max_hint_window: 3h
  hinted_handoff_throttle: 1024KiB
  max_hints_delivery_threads: 2
  hints_flush_period: 1ms
  max_hints_file_size: 128MiB

# MVs are experimental and not recommended for production-use
materialized_views:   enabled: false 
{code}
The approach above provides a very simple user experience while allowing typed 
configuration in the developer's side.

I think that we can easily fit most database configurations in this 
feature-centric view, but if there are some that we cannot fit into an existing 
feature we could create a new type {{ResourceConfiguration}} which would allow 
to configure a resource not tied to a particular feature.
{quote}I'm still pretty strongly in support of a versioned but intact single 
configuration file.
{quote}
Perhaps I should've made it clear but the split of configuration in multiple 
files is a mere optional convenience of my proposal, which also support 
configurations in a single file for backward-compatibility.

For instance, moving the configuration from the {{features.yaml}} to 
{{core.yaml}} would still render the same global configuration.

I think that the optional splitting of configuration in different files provide 
an organizational benefit of grouping together properties belonging to a 
similar category (ie. core-features which cannot be disabled, optional features 
and guardrails).

My original proposal of starting with 3 initial categories 
(core.yaml/features.yaml/guardrails.yaml) is mostly to facilitate the 
transition to the new configuration model:
 - cassandra.yaml (previously core.yaml): all legacy configurations would 
initially go here separated by section headers
 - features.yaml: all configurations compatible with the new 
{{{}FeatureConfiguration{ model would go here (including new features and 
"migrated" legacy features)
 - guardrails.yaml: all guardrails are collocated in the same file for 
operational simplicity

For instance, the hints configuration is currently flat so it would initially 
go in {{cassandra.yam

[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-22 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496259#comment-17496259
 ] 

Caleb Rackliffe commented on CASSANDRA-17292:
-

[~paulo] Thanks for the 
[proposal|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05].
 I've been able to give it and your comments a couple reads through, but before 
I offer some feedback, a little diversion...

For a moment, let's ignore the fact that there's any kind of textual 
configuration file at all for the project, but we still have all the 
knobs/systems/etc. The very first thing I would do is create a "domain model" 
for C* configuration on the Java side, a hierarchy rooted in a 
{{Configuration}} container class, which would contain members w/ types like 
{{ClusterConfiguration}}, {{NetworkConfiguration}}, {{StorageConfiguration}}, 
etc. These would be easy to navigate, would provide reasonable points for 
inline documentation, could encapsulate validation logic for relationships 
between parameters within subsystems and features, and could be passed as 
little "kernels" of configuration around the codebase, allowing for better 
mocking, etc.

With that configuration model in hand, we could then deal w/ the problem of its 
mapping to and from some kind of human-readable format. In this case, something 
like a nested YAML file (or it could be JSON, etc.) seems to be the best 
option, in terms of its ease of use w/ tooling, its conceptual mapping, and 
with even minimal care around naming, its human navigability/readability.

Predictably then, I'm still pretty strongly in support of a versioned but 
intact single configuration file. I could imagine a synthesis of the two 
proposals that would minimize the amount of potential bouncing between files 
for operators trying to make sense of related configuration items, but simply 
having multiple files worries me. Within the structure of the individual files, 
I would also push for named hierarchies rather than relying on comments to 
denote sections of related parameters. (This has been one of the primary 
motivations behind moving toward a nested structure.)

bq. I think that the intermingling of feature/subsystem/resource in the yaml 
structure can get a little counterintuitive and does not provide a consistent 
framework for extending the properties.

This, however, is something I really want to dig into, because it echoes some 
of the concerns [~benedict] has had about the current single-file approach 
(although the most current iteration of it 
[here|https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a]
 was specifically built to address some of those concerns and integrates even 
future parameters like those we'll introduce in CASSANDRA-17148). Are there any 
major inconsistencies you could expand on?

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apach

[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-21 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495648#comment-17495648
 ] 

Paulo Motta commented on CASSANDRA-17292:
-

Migrating from the previous to the new configuration layout in the approach 
proposed above would be:
 * Decide what macro-categories to start with (ie. core.yaml, guardrails.yaml, 
features.yaml)
 * Assign existing properties to the corresponding macro-category "bucket" and 
group them in feature groups separated by a "section header".

The above would already provide a good starting point for new features moving 
forward:
 * Any new feature must be added to {{features.yaml}} guarded by a feature-flag 
unless it's a core feature (must go on {{{}core.yaml{}}}) or a guardrail 
{{{}(must go on guardrails.yaml{}}}).

After the new initial grouping is delivered, we can make incremental changes to 
the legacy categories via extraction and re-grouping while keeping most of 
other new configurations unchanged.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-21 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495629#comment-17495629
 ] 

Paulo Motta commented on CASSANDRA-17292:
-

I took a look at the proposed layout and while I think this is a great 
improvement from status quo I think that the intermingling of 
feature/subsystem/resource in the yaml structure can get a little 
counterintuitive and does not provide a consistent framework for extending the 
properties. Furthermore the too-many-levels nesting can get tricky pretty fast.

Why do we have to encode the subsystem/resource information in the YAML 
hierarchy? I think we can achieve a similar effect of improving discoverability 
by grouping co-related properties in different files and subsections within the 
same file.

I created an alternative proposal [on this 
gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] 
that groups properties in two dimensions: category/feature group.

The category axis is represented by the name of the property filename 
("core.yaml", "guardrails.yaml", "features.yaml") and the feature group is 
represented by a comment header separating distinct feature groups within the 
same category.

One initial example of categories [from the 
gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] 
would be:
 * {{{}core.yaml{}}}: core DB parameters
 * {{{}guardrails.yaml{}}}: any fail/warn thresholds
 * {{{}features.yaml{}}}: any (experimental/prod-ready) feature that can be 
enabled/disabled.

For instance adding new features is basically adding a new section to 
{{{}features.yaml{}}}.

This layout facilitates extracting subsections to a new file if the number of 
properties of that particular section grows too big. For instance, we could 
extract the {{encryption}} section of {{core.yaml}} into a new file 
{{encryption.yaml}} if the need for more specialization arises. Other 
macro-categories that we can have if necessary:
 * {{{}repair.yaml{}}}: all things repair
 * {{{}network.yaml{}}}: all things network

What do you guys think of this alternative? The proposed gist is by far a 
complete example, it's just an initial draft to get a feel of how it would look 
like.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-16 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493407#comment-17493407
 ] 

Benedict Elliott Smith commented on CASSANDRA-17292:


This iteration looks much better, FWIW. There remain some inconsistencies to 
address, that I'll comment on in due course, but at first glance they appear 
manageable.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-15 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492922#comment-17492922
 ] 

Caleb Rackliffe commented on CASSANDRA-17292:
-

[~benedict] [~dcapwell] [~adelapena] [~e.dimitrova] Alright, took me a while, 
but I've pushed up a proposal 
[here|https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a],
 with some inline comments to explain some bits I'm not 100% happy about.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-08 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489133#comment-17489133
 ] 

Caleb Rackliffe commented on CASSANDRA-17292:
-

bq. I think the kind of examples I gave of logically inconsistent groupings, 
and ambiguous or arbitrary terminology and groupings are bad, and perhaps worse 
than what we have today (or at least not strictly better). I think if we want 
to use these groupings, we need to give a lot more thought to the groupings to 
make them more consistent and obvious.

I'm as much a fan or not obliterating the meaning of words as the next pedantic 
native English speaker, but again, I think we can vastly improve things without 
perfect logical consistency. Just to reiterate, I'm for whatever grouping and 
nesting...

1.) ...gives us documentation points for important concepts in the database 
inline in the YAML.
2.) ...makes it easier to build a set of domain objects in the configuration 
handling code that enforce relationships between options, etc.
3.) ...colocates parameters operators will need to touch (or at least make not 
of) to perform common operational tasks. (ex. As long as client encryption 
parameters are colocated, I don't care if they ultimately fall under 
{{encryption.client}}, if we feel encryption is a good top-level security 
concern, or {{network.client.encryption}} if we want to think of it as a 
sub-concern of the networking sub-system.)

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-08 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489093#comment-17489093
 ] 

Benedict Elliott Smith commented on CASSANDRA-17292:


I suppose there is one possible alternative approach here, and that is to 
consider each level of nesting an independent dimension of lookup, or label, so 
that any ordering of path is equivalent, i.e. consider 
{{network.internode.encryption}} and {{encryption.network.internode}} to be the 
same. Then we only have to consider what the total set of suitable labels would 
be, and annotate each config file parameter with a set of labels that must be 
declared to be read.

I _suspect_ this would lead to more confusion though.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-08 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489087#comment-17489087
 ] 

Benedict Elliott Smith commented on CASSANDRA-17292:


Though I think resource limits are helpful to colocate for independent reasons, 
and fundamentally are the main use of the config file for most users, I am by 
no means wed to this concept or structure - like I said, I originally pursued a 
feature style structure.

I am, however, fairly wed to the idea of API consistency - I think the kind of 
examples I gave of logically inconsistent groupings, and ambiguous or arbitrary 
terminology and groupings are bad, and perhaps worse than what we have today 
(or at least not strictly better). I think if we want to use these groupings, 
we need to give a lot more thought to the groupings to make them more 
consistent and obvious.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-08 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489080#comment-17489080
 ] 

Caleb Rackliffe commented on CASSANDRA-17292:
-

With CASSANDRA-15234 finally merged, I'm still planning on revamping my 
previous attempts at a new config structure.

Our goals are the same here, i.e. to make the config more readable and 
discoverable. There are good arguments for different axes in our nesting at the 
global, feature, and sub-system level. Everything the database does touches 
some resource(s), but it doesn't mean we have to frame every option in that 
context. (Even if we did most things touch multiple resources.) There are 
things like encryption, that we probably want to continue to group in features 
space, although perhaps change slightly...

{noformat}
encryption:
  internode:
...
  client:
...
{noformat}

...and things like network that end up being much lower/protocol level, and 
might include things like protocol level back-pressure configuration...

{noformat}
network:
  internode:
...
  client:
...
{noformat}

...but not feature level limits, like the compaction backlog size at which we 
abort streaming/repair.

We can have a more readable config than we have today without complete logical 
consistency, especially if it affords us the opportunity to explain how the 
options for individual features and subsystems work together in our inline 
documentation. I'd like to start with an approach that favors feature grouping, 
given that I think the majority of our config is amenable to that, but then 
factor out pieces of that when and if it becomes the clearer option. (ex. It 
could end up being the case that having all our threading/SEDA options under 
one umbrella makes the most sense, and allows operators to think about CPU 
usage more naturally.)

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-05 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487494#comment-17487494
 ] 

Benedict Elliott Smith commented on CASSANDRA-17292:


bq. if you are looking at that and new to Cassandra, will you think validation 
is related to compaction? What about repair?

None of these are in my proposed layout file, in fact there is no separate 
validation compaction throughput limiter that I can see? In my proposal I see

{code}
throughput:
streaming:
  local: 25MiB/s
  remote: 25MiB/s
batchlog: 1MiB/s# total for node; peers receive proportional 
share
compaction: 16MiB/s
hint_delivery: 1MiB/s
{code}

If you wanted to list a separate validation compaction limiter, I would 
probably call it e.g. {{compaction_for_repair}}. Today the 
{{concurrent_validations}} is a much better example of something that makes no 
sense already to a user without pre-existing knowledge, despite its partial 
context.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-05 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487476#comment-17487476
 ] 

Benedict Elliott Smith commented on CASSANDRA-17292:


bq.  At the moment streaming and compaction are configured separately

We have a largely flat and messy config file today, so I don't think what we do 
today is relevant. Streaming and compaction are intrinsically linked by repair 
(except in the case of bootstrap). Streaming is gated by compaction throughput. 
Where does repair configuration sit in this world? Where should streaming 
network configurations sit?

You also haven't addressed the clear inconsistency of 
{{materialized_views.concurrent_writes}} and {{query.concurrent_writes}}, or 
{{materialized_views.enabled}} and {{query.enable_user_defined_functions}}. In 
each case we have semantically equivalent things dotted in entirely unrelated 
config.

Honestly, if we cannot come up with a _coherent_ strategy that avoids the above 
inconsistencies I prefer the grab bag of flat config we have today, just tidied 
up a bit. Nesting inconsistently is strictly worse for usability IMO.

bq.  I have never worked on a project where I didn't ask how to configure a 
feature or a subsystem and instead wanted to look at all rate limiters together

You have never had to address database behaviour concerns that cut across 
features?

bq. but if we do actually implement pluggable storage, where will this be?

This same argument can likely be applied to concurrent_reads and 
concurrent_writes - it also applies to commit log (and implicitly CDC), repair, 
streaming, hints, memtables and compaction. Are we going to group these all 
under storage?






> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487396#comment-17487396
 ] 

David Capwell commented on CASSANDRA-17292:
---

Using CASSANDRA-17166 I took a dump of all our configs (doesn't include 
CASSANDRA-15234, which would have grouped things by feature more cleanly), 
below is every config reachable from Config and their type

{code}
allocate_tokens_for_keyspace: String
allocate_tokens_for_local_replication_factor: int
audit_logging_options: # AuditLogOptions
  archive_command: String
  audit_logs_dir: String
  block: boolean
  enabled: boolean
  excluded_categories: String
  excluded_keyspaces: String
  excluded_users: String
  included_categories: String
  included_keyspaces: String
  included_users: String
  logger: # ParameterizedClass
class_name: String
parameters: Map
  max_archive_retries: int
  max_log_size: long
  max_queue_weight: int
  roll_cycle: String
auth_cache_warming_enabled: boolean
auth_read_consistency_level: String
auth_write_consistency_level: String
authenticator: String
authorizer: String
auto_bootstrap: boolean
auto_hints_cleanup_enabled: boolean
auto_optimise_full_repair_streams: boolean
auto_optimise_inc_repair_streams: boolean
auto_optimise_preview_repair_streams: boolean
auto_snapshot: boolean
autocompaction_on_startup_enabled: boolean
automatic_sstable_upgrade: boolean
available_processors: int
back_pressure_enabled: boolean
back_pressure_strategy: # ParameterizedClass
  class_name: String
  parameters: Map
batch_size_fail_threshold_in_kb: int
batch_size_warn_threshold_in_kb: int
batchlog_replay_throttle_in_kb: int
block_for_peers_in_remote_dcs: boolean
block_for_peers_timeout_in_secs: int
broadcast_address: String
broadcast_rpc_address: String
buffer_pool_use_heap_if_exhausted: boolean
cache_load_timeout_seconds: int
cas_contention_timeout_in_ms: long
cdc_block_writes: boolean
cdc_enabled: boolean
cdc_free_space_check_interval_ms: int
cdc_raw_directory: String
cdc_total_space_in_mb: int
check_for_duplicate_rows_during_compaction: boolean
check_for_duplicate_rows_during_reads: boolean
client_encryption_options: # EncryptionOptions
  accepted_protocols: List
  algorithm: String
  cipher_suites: List
  enabled: Boolean
  keystore: String
  keystore_password: String
  optional: boolean
  protocol: String
  require_client_auth: boolean
  require_endpoint_verification: boolean
  ssl_context_factory: # ParameterizedClass
class_name: String
parameters: Map
  store_type: String
  truststore: String
  truststore_password: String
client_error_reporting_exclusions: # SubnetGroups
  empty: boolean
  subnets: Set
cluster_name: String
column_index_cache_size_in_kb: int
column_index_size_in_kb: int
commit_failure_policy: Enum
commitlog_compression: # ParameterizedClass
  class_name: String
  parameters: Map
commitlog_directory: String
commitlog_max_compression_buffers_in_pool: int
commitlog_periodic_queue_size: int
commitlog_segment_size_in_mb: int
commitlog_sync: Enum
commitlog_sync_batch_window_in_ms: double
commitlog_sync_group_window_in_ms: double
commitlog_sync_period_in_ms: int
commitlog_total_space_in_mb: int
compaction_large_partition_warning_threshold_mb: int
compaction_throughput_mb_per_sec: int
compaction_tombstone_warning_threshold: int
concurrent_compactors: int
concurrent_counter_writes: int
concurrent_materialized_view_builders: int
concurrent_materialized_view_writes: int
concurrent_reads: int
concurrent_replicates: int
concurrent_validations: int
concurrent_writes: int
consecutive_message_errors_threshold: int
corrupted_tombstone_strategy: Enum
counter_cache_keys_to_save: int
counter_cache_save_period: int
counter_cache_size_in_mb: long
counter_write_request_timeout_in_ms: long
credentials_cache_active_update: boolean
credentials_cache_max_entries: int
credentials_update_interval_in_ms: int
credentials_validity_in_ms: int
cross_node_timeout: boolean
data_file_directories: String[]
default_keyspace_rf: int
denylist_consistency_level: Enum
denylist_initial_load_retry_seconds: int
denylist_max_keys_per_table: int
denylist_max_keys_total: int
denylist_refresh_seconds: int
diagnostic_events_enabled: boolean
disk_access_mode: Enum
disk_failure_policy: Enum
disk_optimization_estimate_percentile: double
disk_optimization_page_cross_chance: double
disk_optimization_strategy: Enum
dynamic_snitch: boolean
dynamic_snitch_badness_threshold: double
dynamic_snitch_reset_interval_in_ms: int
dynamic_snitch_update_interval_in_ms: int
enable_denylist_range_reads: boolean
enable_denylist_reads: boolean
enable_denylist_writes: boolean
enable_drop_compact_storage: boolean
enable_materialized_views: boolean
enable_partition_denylist: boolean
enable_sasi_indexes: boolean
enable_scripted_user_defined_functions: boolean
enable_transient_replication: boolean
enable_user_defined_functions: boolean
enable_user_defined_functions_threads: boo

[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487391#comment-17487391
 ] 

David Capwell commented on CASSANDRA-17292:
---

bq. streaming is equally as much compaction as it is network, as it also 
controls the disk

Most things we do involves the disk... At the moment streaming and compaction 
are configured separately, so the fact they touch the disk doesn't mean they 
should be together, I don't follow your argument.

bq. If we control this under query why not also row_cache and key_cache

I can buy arguments for "query" or "storage", does this mean that this type of 
grouping is broken?  I don't see why, most configs clearly belong to a group, 
and the minority of cases are blurry (can be argued for 2 groups) or there are 
no clear groups (such as cluster_name); these are outliers were we can debate 
on a per-basis, I just don't follow the argument that they invalidate this 
style of grouping as a whole.

To me, I would expect storage.row_cache as I normally see caches implemented at 
the storage layer, but in Cassandra we do this CQL 
(SinglePartitionReadCommand); but if we do actually implement pluggable 
storage, where will this be?  Do we even want these caches if RocksDB is the 
storage backend?  If the answer is no (I would think not as RocksDB provides 
its own caches) then its clearly tied to storage, so storage.row_cache is the 
most ideal place.

bq. back_pressure

{code}
$ grep -r back_pressure src/
src//java/org/apache/cassandra/config/Config.java:public volatile boolean 
back_pressure_enabled = false;
src//java/org/apache/cassandra/config/Config.java:public volatile 
ParameterizedClass back_pressure_strategy;
{code}

heh... dead code... 

We do have a network based back pressure, and different features may be able to 
inform/work with it to maintain stability, so I always saw our current one as a 
network feature, but I could see different arguments.  If we want to have a 
discussion on where that makes the most sense or if it should be its own top 
level thing, I feel thats productive.

bq. or other query execution topics? 

I believe thats my point, group the query related topics together...

bq. Much IMO better to have e.g. [enable: {user_defined_functions: true, 
materialized_views: true}

I find discoverability is much harder in this model. If you are asking how to 
configure something do you say "I want to walk through all limits in isolation 
and provide values, then move to enable flags, then rate limiters" or do you 
say "I want to configure compaction"?  I have never worked on a project where I 
didn't ask how to configure a feature or a subsystem and instead wanted to look 
at all rate limiters together... If I want to configure the rate limiters in 
compaction I would look at the compaction configs, looking at the rate limiter 
configs can be confusing as you don't know if the property you see is actually 
related to compaction

{code}

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-

[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487383#comment-17487383
 ] 

Benedict Elliott Smith commented on CASSANDRA-17292:


Also having UDFs enabled/disabled inside {{query}} but a separate 
{{materialized_view}} heading - despite this being an equivalent language level 
feature. This is super inconsistent.

Much IMO better to have e.g.

{{code}}
enable:
  user_defined_functions: true
  materialized_views: true
  ...
{{code}}

Also helps the user find feature options and names. Like {{limits}} it is much 
more discoverable.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487382#comment-17487382
 ] 

Benedict Elliott Smith commented on CASSANDRA-17292:


I think streaming is equally as much compaction as it is network, as it also 
controls the disk. {{concurrent_reads}} and {{concurrent_writes}} also controls 
disk throughput, and is framed as such in the config file. If we control this 
under {{query}} why not also {{row_cache}} and {{key_cache}}, or 
{{back_pressure}} or other query execution topics? This is the problem with 
this kind of grouping IMO, coming up with something consistent and intuitive is 
hard.



> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487380#comment-17487380
 ] 

David Capwell commented on CASSANDRA-17292:
---

bq. I have no idea what should go under the query heading

Configurations which directly impact query execution; your examples on how many 
threads to give the read/write stages logically fall under "queries" as they 
tune how many threads to give the query system...

CQL currently is our query system, so logically CQL configs should be grouped 
there as well (could do cql rather than query.cql, main advantage of query.cql 
is its clear where a different query execution configs would go (query), but 
without anyone actively working on one don't 100% think we need to deal with 
that right now, and can migrate if the time comes).

{code}
cql:
  concurrent_read: 42
  concurrent_write: 42
  prepared_statements:
cache_size: 1mb
  user_defined_functions:
enabled: false
threads_enabled: true
timeout:
  warn: 500ms
  fail: 1500ms
policy: die
{code}

bq. Streaming also has compaction limits (particularly concurrent_validators, 
but arguably also the bandwidth)

{code}
(trunk) $ grep -ir concurrent_validators src/
(trunk) $
{code}

I don't see a "concurrent_validators", do you mean "concurrent_validations"?  
The code is physically in compaction, but this is part of repair, so should 
logically should be under a "repair" top level.

{code}
repair:
  validation:
threads: 1
{code}

"bandwidth" is currently scoped to streaming 
(stream_throughput_outbound_megabits_per_sec, 
entire_sstable_stream_throughput_outbound_megabits_per_sec, 
inter_dc_stream_throughput_outbound_megabits_per_sec, 
entire_sstable_inter_dc_stream_throughput_outbound_megabits_per_sec, etc.)...  
if we want to ask "is streaming scoped to networking, or top level" I think 
there are arguments both ways, but the fact that streaming can be rate limited 
doesn't mean it should be grouped with other things which can be rate limited...

bq. Should we use internode and native_transport terminology? They're very core 
developer focused, and not very user friendly. Perhaps cluster_network and 
client_network?

"cluster_network" and "client_network" make sense, a more common pattern I have 
seen are client/server

{code}
network:
  client: # this is CQL protocol, this may also include thrift in 3.0 (we 
should not backport), and/or any alternatives that may come in the future
...
  server: # this is the internode protocol
...
{code}

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487371#comment-17487371
 ] 

Benedict Elliott Smith commented on CASSANDRA-17292:


bq. I am ok with guardrails choosing to go flat for the time being to unblock 
it...

+1

bq. query.local_read_size.abort_threshold

I'm strongly -1 (in sentiment, not in a veto sense) any grouping that isn't 
fairly obvious or consistent about what should be contained within, and I have 
no idea what should go under the {{query}} heading. How do we decide what is 
considered a {{query}} option and what is not?

Some general notes:
- Should concurrent_reads, writes etc be grouped?
- There is confusion about where to put materialised view settings, and their 
concurrent_read/write settings. Some go under MVs, compaction, and global, and 
currently it is seemingly inconsistent.
- Streaming also has compaction limits (particularly concurrent_validators, but 
arguably also the bandwidth)
- Should we use {{internode}} and {{native_transport}} terminology? They're 
very core developer focused, and not very user friendly. Perhaps 
{{cluster_network}} and {{client_network}}?


> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487318#comment-17487318
 ] 

David Capwell commented on CASSANDRA-17292:
---

for track warnings, I don't mind marking the field transient disabling it from 
the config layer, only exposing via JMX.  I rather flesh this out and defer 
exposing track_warnings via configs than to release with a config we plan to 
rename the next release...

This ticket is to get agreement on what the structure should look like, and NOT 
move all configs to this structure... once we agree on the end goal we can 
refactor track_warnings (I +1 Caleb's proposal, 
query.local_read_size.abort_threshold is what I prefer strongly)

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487229#comment-17487229
 ] 

Andres de la Peña commented on CASSANDRA-17292:
---

I guess that getting the proposal ready, discussing it and getting it approved 
can take a while.

I'll be happy preparing a patch to return guardrails config to the original 
flat structure that what was proposed in the initial patch if that helps to 
unblock progress on guardrails. That way we can keep adding guardrails while 
the restructuring of the yaml is under discussion. It should be quite easy to 
go back to the current nested format if we end up going that way. [~benedict] 
would this work for you?

As for track warnings, I understand that we should also flatten them in a 
similar way. Maybe we can do that as part of CASSANDRA-17341, once the config 
for guardrails is flattened. [~dcapwell] wdyt?

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-03 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486713#comment-17486713
 ] 

Caleb Rackliffe commented on CASSANDRA-17292:
-

My plan here is to immediately (within a week or two) rework my proposal above 
once CASSANDRA-15234 merges. If there's a general consensus around that being 
roughly the direction we want to go with the config, it can inform the open 
guardrails tickets I've linked above. If not, I'd be perfectly fine w/ moving 
them forward w/ a flat config layout (which at least means avoiding potential 
clashes w/ a future nesting scheme and creating classes to manage it that would 
later be replaced) and making this an epic from which we can incrementally move 
things to the new structure.

CC [~dcapwell] [~blerer]

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org