subject:"\[jira\] \[Comment Edited\] $CASSANDRA\-17292$ Move cassandra.yaml toward a nested structure around major database concepts"

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-23 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496752#comment-17496752
 ] 

Stefan Miklosovic edited comment on CASSANDRA-17292 at 2/23/22, 1:51 PM:
-

I am just letting people know that I am about to merge this (1) (17220) where 
config will look like this (2)

There will be grouped / nested startup_checks section. The (not so obvious) 
advantage of what we did there is that if you want to introduce a new startup 
check, you do not need to change anything configuration-related. We are parsing 
the config into the map where key type of that map is an enum so in order to 
include a new check, one has to just add a new entry into that enum type and 
you are done. No change on configuration side about that in cassandra.yml.

I want to be sure we are on the same page here, are we?

(1) https://github.com/apache/cassandra/pull/1448
(2) 
https://github.com/apache/cassandra/blob/8072fc0c842ba2821305fc27988a0eacb3e24b99/conf/cassandra.yaml#L1624-L1646


was (Author: smiklosovic):
I am just letting people know that I am about to merge this (1) (17220) where 
config will look like this (2)

There will be grouped / nested startup_checks section. The (not so obvious) 
advantage of what we did there is that if you want to introduce a new startup 
check, you do not need to change anything configuration-related. We are parsing 
the config into the map where key type of that map is an enum so in order to 
include a new check, one has to just add a new entry into that enum type and 
you are done. No change on configuration side about that in cassandra.yml.

I want to be sure we are on the same page here, are we?

(1) https://github.com/apache/cassandra/pull/1448
(2) 
https://github.com/apache/cassandra/blob/a7baa061c29047f5cd8de8fe7ba899ea6fa12404/conf/cassandra.yaml#L1624-L1643

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-23 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496752#comment-17496752
 ] 

Stefan Miklosovic edited comment on CASSANDRA-17292 at 2/23/22, 1:27 PM:
-

I am just letting people know that I am about to merge this (1) (17220) where 
config will look like this (2)

There will be grouped / nested startup_checks section. The (not so obvious) 
advantage of what we did there is that if you want to introduce a new startup 
check, you do not need to change anything configuration-related. We are parsing 
the config into the map where key type of that map is an enum so in order to 
include a new check, one has to just add a new entry into that enum type and 
you are done. No change on configuration side about that in cassandra.yml.

I want to be sure we are on the same page here, are we?

(1) https://github.com/apache/cassandra/pull/1448
(2) 
https://github.com/apache/cassandra/blob/a7baa061c29047f5cd8de8fe7ba899ea6fa12404/conf/cassandra.yaml#L1624-L1643


was (Author: smiklosovic):
I am just letting people know that I am about to merge this (1) (17220) where 
config will look like this (2)

There will be grouped / nested startup_checks section. The (not so obvious) 
advantage of what we did there is that if you want to introduce a new startup 
check, you do not need to change anything configuration-related. We are parsing 
the config into the map where key type of that map is an enum so in order to 
include a new check, one has to just add a new entry into that enum type and 
you are done. No change on configuration side about that in cassandra.yml.

(1) https://github.com/apache/cassandra/pull/1448
(2) 
https://github.com/apache/cassandra/blob/a7baa061c29047f5cd8de8fe7ba899ea6fa12404/conf/cassandra.yaml#L1624-L1643

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-23 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496752#comment-17496752
 ] 

Stefan Miklosovic edited comment on CASSANDRA-17292 at 2/23/22, 1:25 PM:
-

I am just letting people know that I am about to merge this (1) (17220) where 
config will look like this (2)

There will be grouped / nested startup_checks section. The (not so obvious) 
advantage of what we did there is that if you want to introduce a new startup 
check, you do not need to change anything configuration-related. We are parsing 
the config into the map where key type of that map is an enum so in order to 
include a new check, one has to just add a new entry into that enum type and 
you are done. No change on configuration side about that in cassandra.yml.

(1) https://github.com/apache/cassandra/pull/1448
(2) 
https://github.com/apache/cassandra/blob/a7baa061c29047f5cd8de8fe7ba899ea6fa12404/conf/cassandra.yaml#L1624-L1643


was (Author: smiklosovic):
I am just letting people know that I am about to merge this (1) (17220) where 
config will look like this (2)

There will be grouped / nested startup_checks section. The (not so obvious) 
advantage of what we did there is that if you want to introduce a new startup 
check, you do not need to change anything configuration-related. We are parsing 
the config into the map where key type of that map is an enum so in order to 
include a new check, one has to just add a new entry into that enum type and 
you are done. No change on configuration side about that in cassandra.yml.

(1) https://github.com/apache/cassandra/pull/1448
(2) 
https://github.com/apache/cassandra/blob/165be5e596bee21bdc747ae740b72c14f0a8979a/conf/cassandra.yaml#L1624-L1643

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-23 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496752#comment-17496752
 ] 

Stefan Miklosovic edited comment on CASSANDRA-17292 at 2/23/22, 1:23 PM:
-

I am just letting people know that I am about to merge this (1) (17220) where 
config will look like this (2)

There will be grouped / nested startup_checks section. The (not so obvious) 
advantage of what we did there is that if you want to introduce a new startup 
check, you do not need to change anything configuration-related. We are parsing 
the config into the map where key type of that map is an enum so in order to 
include a new check, one has to just add a new entry into that enum type and 
you are done. No change on configuration side about that in cassandra.yml.

(1) https://github.com/apache/cassandra/pull/1448
(2) 
https://github.com/apache/cassandra/blob/165be5e596bee21bdc747ae740b72c14f0a8979a/conf/cassandra.yaml#L1624-L1643


was (Author: smiklosovic):
I am just letting people know that I am about to merge this (1) (17220)

There will be grouped / nested startup_checks section. The (not so obvious) 
advantage of what we did there is that if you want to introduce a new startup 
check, you do not need to change anything configuration-related. We are parsing 
the config into the map where key type of that map is an enum so in order to 
include a new check, one has to just add a new entry into that enum type and 
you are done. No change on configuration side about that in cassandra.yml.

(1) https://github.com/apache/cassandra/pull/1448

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> Current proposals:
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~paulo] - 
> https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-22 Thread Paulo Motta (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496302#comment-17496302
 ] 

Paulo Motta edited comment on CASSANDRA-17292 at 2/22/22, 8:29 PM:
---

Thanks for the additional context [~maedhroz], that is very helpful to 
understand the reasoning behind the proposed nesting.

{quote}For a moment, let's ignore the fact that there's any kind of textual 
configuration file at all for the project, but we still have all the 
knobs/systems/etc. The very first thing I would do is create a "domain model" 
for C* configuration on the Java side, a hierarchy rooted in a Configuration 
container class, which would contain members w/ types like 
ClusterConfiguration, NetworkConfiguration, StorageConfiguration, etc. These 
would be easy to navigate, would provide reasonable points for inline 
documentation, could encapsulate validation logic for relationships between 
parameters within subsystems and features, and could be passed as little 
"kernels" of configuration around the codebase, allowing for better mocking, 
etc.
{quote}

I think we're not very far from what we want the end result to look like from 
the developer's perspective, my proposal is just a simplification of yours 
where instead of a multi-level hierarchy rooted on physical resources 
(cluster/network/storage), I'm proposing a feature-centric domain model 
hierachy with a single level - each feature define its own configuration 
subtree.

The basic construct to create new feature configurations is the following class:
{code:java}
public abstract class FeatureConfiguration
{
// is the feature enabled by default?
boolean enabled = false;

// the feature name to be used in the YAML/JSON
public abstract String getFeatureName();

// whether this feature can be disabled
public boolean isOptional()
{
return true;
}
}
{code}
This would allow to easily create typed configuration for each feature:
 * CommitlogConfiguration
 * HintsConfiguration
 * MaterializedViewsConfiguration

For example this is how "HintsConfiguration" would look like:
{code:java}
public class HintsConfiguration extends FeatureConfiguration
{
   public HintsConfiguration()
   {
 this.enabled = true;
   } 

   public String getFeatureName()
   {
 return "hinted_handoff";
   }

   boolean auto_hints_cleanup = false
   Duration max_hint_window = "3h"
   Throttle hinted_handoff_throttle = "1024KiB"
   int max_hints_delivery_threads = 2
   Duration hints_flush_period = "1ms"
   Size max_hints_file_size = "128MiB"
}
{code}

And would be represented as following on {{cassandra.yaml}}:

{code:yaml}
# Commit log (cannot be disabled because isOptional()=false)
commit_log:
  commitlog_sync: periodic
  commitlog_sync_period: 1ms
  commitlog_segment_size: 32MiB

# Hinted Handoff
hinted_handoff:
  enabled: true
  auto_hints_cleanup: false
  max_hint_window: 3h
  hinted_handoff_throttle: 1024KiB
  max_hints_delivery_threads: 2
  hints_flush_period: 1ms
  max_hints_file_size: 128MiB

# MVs are experimental and not recommended for production-use
materialized_views:
  enabled: false 
{code}

The approach above provides a very simple user experience while allowing typed 
configuration in the developer's side.

I think that we can easily fit most database configurations in this 
feature-centric view, but if there are some that we cannot fit into an existing 
feature we could create a new type {{ResourceConfiguration}} which would allow 
to configure a resource not tied to a particular feature.

{quote}I'm still pretty strongly in support of a versioned but intact single 
configuration file.
{quote}
Perhaps I should've made it clear but the split of configuration in multiple 
files is a mere optional convenience of my proposal, which also support 
configurations in a single file for backward-compatibility.

For instance, moving the configuration from the {{features.yaml}} to 
{{core.yaml}} would still render the same global configuration.

I think that the optional splitting of configuration in different files provide 
an organizational benefit of grouping together properties belonging to a 
similar category (ie. core-features which cannot be disabled, optional features 
and guardrails).

My original proposal of starting with 3 initial categories 
(core.yaml/features.yaml/guardrails.yaml) is mostly to facilitate the 
transition to the new configuration model:
 - cassandra.yaml (previously core.yaml): all legacy configurations would 
initially go here separated by section headers
 - features.yaml: all configurations compatible with the new 
{{{}FeatureConfiguration{ model would go here (including new features and 
"migrated" legacy features)
 - guardrails.yaml: all guardrails are collocated in the same file for 
operational simplicity

For instance, the hints configuration is curren

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-22 Thread Paulo Motta (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496302#comment-17496302
 ] 

Paulo Motta edited comment on CASSANDRA-17292 at 2/22/22, 8:23 PM:
---

Thanks for the additional context [~maedhroz], that is very helpful to 
understand the reasoning behind the proposed nesting.
{quote}For a moment, let's ignore the fact that there's any kind of textual 
configuration file at all for the project, but we still have all the 
knobs/systems/etc. The very first thing I would do is create a "domain model" 
for C* configuration on the Java side, a hierarchy rooted in a Configuration 
container class, which would contain members w/ types like 
ClusterConfiguration, NetworkConfiguration, StorageConfiguration, etc. These 
would be easy to navigate, would provide reasonable points for inline 
documentation, could encapsulate validation logic for relationships between 
parameters within subsystems and features, and could be passed as little 
"kernels" of configuration around the codebase, allowing for better mocking, 
etc.
{quote}
I think we're not very far from what we want the end result to look like from 
the developer's perspective, my proposal is just a simplification of yours 
where instead of a multi-level hierarchy rooted on physical resources 
(cluster/network/storage), I'm proposing a feature-centric domain model 
hierachy with a single level - each feature define its own configuration 
subtree.

The basic construct to create new feature configurations is the following class:
{code:java}
public abstract class FeatureConfiguration
{
// is the feature enabled by default?
boolean enabled = false;

// the feature name to be used in the YAML/JSON
public abstract String getFeatureName();

// whether this feature can be disabled
public boolean isOptional()
{
return true;
}
}
{code}
This would allow to easily create typed configuration for each feature:
 * CommitlogConfiguration
 * HintsConfiguration
 * MaterializedViewsConfiguration

For example this is how "HintsConfiguration" would look like:
{code:java}
public class HintsConfiguration extends FeatureConfiguration
{
   public HintsConfiguration()
   {
 this.enabled = true;
   } 

   public String getFeatureName()
   {
 return "hinted_handoff";
   }

   boolean auto_hints_cleanup = false
   Duration max_hint_window = "3h"
   Throttle hinted_handoff_throttle = "1024KiB"
   int max_hints_delivery_threads = 2
   Duration hints_flush_period = "1ms"
   Size max_hints_file_size = "128MiB"
}
{code}
And would be represented as following on {{{}cassandra.yaml{}}}:
{code:yaml}
# Commit log (cannot be disabled because isOptional()=false)
commit_log:
  commitlog_sync: periodic
  commitlog_sync_period: 1ms
  commitlog_segment_size: 32MiB

# Hinted Handoff
hinted_handoff:
  enabled: true
  auto_hints_cleanup: false
  max_hint_window: 3h
  hinted_handoff_throttle: 1024KiB
  max_hints_delivery_threads: 2
  hints_flush_period: 1ms
  max_hints_file_size: 128MiB

# MVs are experimental and not recommended for production-use
materialized_views:   enabled: false 
{code}
The approach above provides a very simple user experience while allowing typed 
configuration in the developer's side.

I think that we can easily fit most database configurations in this 
feature-centric view, but if there are some that we cannot fit into an existing 
feature we could create a new type {{ResourceConfiguration}} which would allow 
to configure a resource not tied to a particular feature.
{quote}I'm still pretty strongly in support of a versioned but intact single 
configuration file.
{quote}
Perhaps I should've made it clear but the split of configuration in multiple 
files is a mere optional convenience of my proposal, which also support 
configurations in a single file for backward-compatibility.

For instance, moving the configuration from the {{features.yaml}} to 
{{core.yaml}} would still render the same global configuration.

I think that the optional splitting of configuration in different files provide 
an organizational benefit of grouping together properties belonging to a 
similar category (ie. core-features which cannot be disabled, optional features 
and guardrails).

My original proposal of starting with 3 initial categories 
(core.yaml/features.yaml/guardrails.yaml) is mostly to facilitate the 
transition to the new configuration model:
 - cassandra.yaml (previously core.yaml): all legacy configurations would 
initially go here separated by section headers
 - features.yaml: all configurations compatible with the new 
{{{}FeatureConfiguration{ model would go here (including new features and 
"migrated" legacy features)
 - guardrails.yaml: all guardrails are collocated in the same file for 
operational simplicity

For instance, the hints configuration is currentl

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-21 Thread Paulo Motta (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495648#comment-17495648
 ] 

Paulo Motta edited comment on CASSANDRA-17292 at 2/21/22, 5:52 PM:
---

Migrating from the previous to the new configuration layout in the approach 
proposed above would be:
 * Decide what macro-categories to start with (ie. core.yaml, guardrails.yaml, 
features.yaml)
 * Assign existing properties to the corresponding macro-category "bucket" and 
group them in feature groups separated by a "section header".

The above would already provide a good starting point for new features moving 
forward:
 * Any new feature must be added to {{features.yaml}} guarded by a feature-flag 
unless it's a core feature (must go on {{{}core.yaml{}}}) or a guardrail 
{{{}(must go on guardrails.yaml{}}}).

After the new initial grouping is delivered, we can make incremental changes to 
the legacy properties via extraction and re-grouping while keeping most of 
other new configurations unchanged.


was (Author: paulo):
Migrating from the previous to the new configuration layout in the approach 
proposed above would be:
 * Decide what macro-categories to start with (ie. core.yaml, guardrails.yaml, 
features.yaml)
 * Assign existing properties to the corresponding macro-category "bucket" and 
group them in feature groups separated by a "section header".

The above would already provide a good starting point for new features moving 
forward:
 * Any new feature must be added to {{features.yaml}} guarded by a feature-flag 
unless it's a core feature (must go on {{{}core.yaml{}}}) or a guardrail 
{{{}(must go on guardrails.yaml{}}}).

After the new initial grouping is delivered, we can make incremental changes to 
the legacy categories via extraction and re-grouping while keeping most of 
other new configurations unchanged.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-21 Thread Paulo Motta (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495629#comment-17495629
 ] 

Paulo Motta edited comment on CASSANDRA-17292 at 2/21/22, 4:51 PM:
---

I took a look at the proposed layout and while I think this is a great 
improvement from status quo I think that the intermingling of 
feature/subsystem/resource in the yaml structure can get a little 
counterintuitive and does not provide a consistent framework for extending the 
properties. Furthermore the too-many-levels nesting can get tricky pretty fast.

Why do we have to encode the subsystem/resource information in the YAML 
hierarchy? I think we can achieve a similar effect of improving discoverability 
by grouping co-related properties in different files and subsections within the 
same file.

I created an alternative proposal [on this 
gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] 
that groups properties in two dimensions: category/feature group.

The category axis is represented by the name of the property filename 
("core.yaml", "guardrails.yaml", "features.yaml") and the feature group is 
represented by a comment header separating distinct feature groups within the 
same category.

One initial example of categories [from the 
gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] 
would be:
 * 
[core.yaml|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05#file-core-yaml]:
 core DB parameters
 * 
[guardrails.yaml|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05#file-guardrails-yaml]:
 any fail/warn thresholds
 * 
[features.yaml|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05#file-features-yaml]:
 any (experimental/prod-ready) feature that can be enabled/disabled.

For instance adding new features is basically adding a new section to 
[features.yaml|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05#file-features-yaml].

This layout facilitates extracting subsections to a new file if the number of 
properties of that particular section grows too big. For instance, we could 
extract the {{encryption}} section of 
[core.yaml|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05#file-core-yaml]
 into a new file {{encryption.yaml}} if the need for more specialization 
arises. Other macro-categories that we can have if necessary:
 * {{{}repair.yaml{}}}: all things repair
 * {{{}network.yaml{}}}: all things network

What do you guys think of this alternative? The proposed gist is by far a 
complete example, it's just an initial draft to get a feel of how it would look 
like.


was (Author: paulo):
I took a look at the proposed layout and while I think this is a great 
improvement from status quo I think that the intermingling of 
feature/subsystem/resource in the yaml structure can get a little 
counterintuitive and does not provide a consistent framework for extending the 
properties. Furthermore the too-many-levels nesting can get tricky pretty fast.

Why do we have to encode the subsystem/resource information in the YAML 
hierarchy? I think we can achieve a similar effect of improving discoverability 
by grouping co-related properties in different files and subsections within the 
same file.

I created an alternative proposal [on this 
gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] 
that groups properties in two dimensions: category/feature group.

The category axis is represented by the name of the property filename 
("core.yaml", "guardrails.yaml", "features.yaml") and the feature group is 
represented by a comment header separating distinct feature groups within the 
same category.

One initial example of categories [from the 
gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] 
would be:
 * {{{}core.yaml{}}}: core DB parameters
 * {{{}guardrails.yaml{}}}: any fail/warn thresholds
 * {{{}features.yaml{}}}: any (experimental/prod-ready) feature that can be 
enabled/disabled.

For instance adding new features is basically adding a new section to 
{{{}features.yaml{}}}.

This layout facilitates extracting subsections to a new file if the number of 
properties of that particular section grows too big. For instance, we could 
extract the {{encryption}} section of {{core.yaml}} into a new file 
{{encryption.yaml}} if the need for more specialization arises. Other 
macro-categories that we can have if necessary:
 * {{{}repair.yaml{}}}: all things repair
 * {{{}network.yaml{}}}: all things network

What do you guys think of this alternative? The proposed gist is by far a 
complete example, it's just an initial draft to get a feel of how it would look 
like.

> Move cassandra.yaml toward a nested structure around major database concepts
>

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-15 Thread Caleb Rackliffe (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492922#comment-17492922
 ] 

Caleb Rackliffe edited comment on CASSANDRA-17292 at 2/16/22, 12:01 AM:


[~benedict] [~dcapwell] [~adelapena] [~e.dimitrova] Alright, took me a while, 
but I've pushed up a proposal 
[here|https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a],
 with some inline comments to explain some bits I'm not 100% happy about. In 
some areas, I've actually borrowed pretty heavily from Benedict's work. In the 
context of the ongoing guardrails work, I'd make particular note of the 
migration of those elements to {{schema}} and {{requests}}.


was (Author: maedhroz):
[~benedict] [~dcapwell] [~adelapena] [~e.dimitrova] Alright, took me a while, 
but I've pushed up a proposal 
[here|https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a],
 with some inline comments to explain some bits I'm not 100% happy about. In 
some areas, I've actually borrowed pretty heavily from Benedict's work.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-15 Thread Caleb Rackliffe (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492922#comment-17492922
 ] 

Caleb Rackliffe edited comment on CASSANDRA-17292 at 2/16/22, 12:00 AM:


[~benedict] [~dcapwell] [~adelapena] [~e.dimitrova] Alright, took me a while, 
but I've pushed up a proposal 
[here|https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a],
 with some inline comments to explain some bits I'm not 100% happy about. In 
some areas, I've actually borrowed pretty heavily from Benedict's work.


was (Author: maedhroz):
[~benedict] [~dcapwell] [~adelapena] [~e.dimitrova] Alright, took me a while, 
but I've pushed up a proposal 
[here|https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a],
 with some inline comments to explain some bits I'm not 100% happy about.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-08 Thread Caleb Rackliffe (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489080#comment-17489080
 ] 

Caleb Rackliffe edited comment on CASSANDRA-17292 at 2/8/22, 7:39 PM:
--

With CASSANDRA-15234 finally merged, I'm still planning on revamping my 
previous attempts at a new config structure.

Our goals are the same here, i.e. to make the config more readable and 
discoverable. There are good arguments for different axes in our nesting at the 
global, feature, and sub-system level. Everything the database does touches 
some resource(s), but it doesn't mean we have to frame every option in that 
context. (Even if we did most things touch multiple resources.) There are 
things like encryption, that we probably want to continue to group in features 
space, although perhaps change slightly...

{noformat}
encryption:
  # document general concerns for internode encryption, including how 
parameters interact
  internode:
...
  # document general concerns for client encryption, including how parameters 
interact
  client:
...
{noformat}

...and things like network that end up being much lower/protocol level, and 
might include things like protocol level back-pressure configuration...

{noformat}
network:
  internode:
...
  client:
...
{noformat}

...but not feature level limits, like the compaction backlog size at which we 
abort streaming/repair.

We can have a more readable config than we have today without complete logical 
consistency, especially if it affords us the opportunity to explain how the 
options for individual features and subsystems work together in our inline 
documentation. I'd like to start with an approach that favors feature grouping, 
given that I think the majority of our config is amenable to that, but then 
factor out pieces of that when and if it becomes the clearer option. (ex. It 
could end up being the case that having all our threading/SEDA options under 
one umbrella makes the most sense, and allows operators to think about CPU 
usage more naturally.)


was (Author: maedhroz):
With CASSANDRA-15234 finally merged, I'm still planning on revamping my 
previous attempts at a new config structure.

Our goals are the same here, i.e. to make the config more readable and 
discoverable. There are good arguments for different axes in our nesting at the 
global, feature, and sub-system level. Everything the database does touches 
some resource(s), but it doesn't mean we have to frame every option in that 
context. (Even if we did most things touch multiple resources.) There are 
things like encryption, that we probably want to continue to group in features 
space, although perhaps change slightly...

{noformat}
encryption:
  internode:
...
  client:
...
{noformat}

...and things like network that end up being much lower/protocol level, and 
might include things like protocol level back-pressure configuration...

{noformat}
network:
  internode:
...
  client:
...
{noformat}

...but not feature level limits, like the compaction backlog size at which we 
abort streaming/repair.

We can have a more readable config than we have today without complete logical 
consistency, especially if it affords us the opportunity to explain how the 
options for individual features and subsystems work together in our inline 
documentation. I'd like to start with an approach that favors feature grouping, 
given that I think the majority of our config is amenable to that, but then 
factor out pieces of that when and if it becomes the clearer option. (ex. It 
could end up being the case that having all our threading/SEDA options under 
one umbrella makes the most sense, and allows operators to think about CPU 
usage more naturally.)

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects,

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-05 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487494#comment-17487494
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-17292 at 2/5/22, 1:57 PM:
-

bq. if you are looking at that and new to Cassandra, will you think validation 
is related to compaction? What about repair?

None of these are in my proposed layout file, in fact there is no separate 
validation compaction throughput limiter that I can see in either proposal, or 
your dump? In my proposal I see

{code}
throughput:
streaming:
  local: 25MiB/s
  remote: 25MiB/s
batchlog: 1MiB/s# total for node; peers receive proportional 
share
compaction: 16MiB/s
hint_delivery: 1MiB/s
{code}

If you wanted to list a separate validation compaction limiter, I would 
probably call it e.g. {{compaction_for_repair}}. Today the 
{{concurrent_validations}} is a much better example of something that makes no 
sense already to a user without pre-existing knowledge, despite its partial 
context.


was (Author: benedict):
bq. if you are looking at that and new to Cassandra, will you think validation 
is related to compaction? What about repair?

None of these are in my proposed layout file, in fact there is no separate 
validation compaction throughput limiter that I can see? In my proposal I see

{code}
throughput:
streaming:
  local: 25MiB/s
  remote: 25MiB/s
batchlog: 1MiB/s# total for node; peers receive proportional 
share
compaction: 16MiB/s
hint_delivery: 1MiB/s
{code}

If you wanted to list a separate validation compaction limiter, I would 
probably call it e.g. {{compaction_for_repair}}. Today the 
{{concurrent_validations}} is a much better example of something that makes no 
sense already to a user without pre-existing knowledge, despite its partial 
context.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-05 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487476#comment-17487476
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-17292 at 2/5/22, 1:33 PM:
-

bq.  At the moment streaming and compaction are configured separately

We have a largely flat and messy config file today, so I don't think what we do 
today is relevant. Streaming and compaction are intrinsically linked by repair 
(except in the case of bootstrap). Streaming is gated by (validation) 
compaction throughput. Where does repair configuration sit in this world? Where 
should streaming network settings sit?

You also really need to address the logical inconsistency of 
{{materialized_views.concurrent_writes}} and {{query.concurrent_writes}}, or 
{{materialized_views.enabled}} and {{query.enable_user_defined_functions}}. In 
each case we have semantically equivalent things dotted in inconsistent config 
(if {{concurrent_writes}} is a query option, so is 
{{concurrent_materialized_view_writes}}; if {{enable_user_defined_functions}} 
is a query/cql option so is {{enable_materialized_views}}).

Honestly, if we cannot come up with a _coherent_ strategy that avoids the above 
inconsistencies I prefer the grab bag of flat config we have today, just tidied 
up a bit. Nesting inconsistently is strictly worse for usability IMO.

bq.  I have never worked on a project where I didn't ask how to configure a 
feature or a subsystem and instead wanted to look at all rate limiters together

You have never had to address database behaviour concerns that cut across 
features? 

In my stint as a database operator, most configuration was of no interest. I 
did not typically delve into feature-level configuration. What I was interested 
in is what settings I needed to set for it to operate correctly, what settings 
might affect the database performance, and what settings might affect security 
or other stability concerns. I would absolutely have preferred to see them 
presented together rather than spread across the many features I did not know 
of or understand.

bq. but if we do actually implement pluggable storage, where will this be?

This same argument can likely be applied to concurrent_reads and 
concurrent_writes - it also applies to commit log (and implicitly CDC), repair, 
streaming, hints, memtables and compaction.  Even many of the guardrails, 
particularly e.g. involving tombstones (which are a storage layer concept not 
all implementations will share). Even MVs perhaps (due to special tombstones). 
Are we proposing to group these all under {{storage}}?

IMO {{storage}} and {{query}} are such broad terms that almost everything can 
be justified as encompassed by them. To me this is poor API design, as the user 
has to guess what the authors were thinking, whether in this case it went under 
this heading, or that one, or if this one was important enough it got its own 
heading. Particularly if the user doesn't know a priori what the possible 
configuration options are.





was (Author: benedict):
bq.  At the moment streaming and compaction are configured separately

We have a largely flat and messy config file today, so I don't think what we do 
today is relevant. Streaming and compaction are intrinsically linked by repair 
(except in the case of bootstrap). Streaming is gated by (validation) 
compaction throughput. Where does repair configuration sit in this world? Where 
should streaming network settings sit?

You also really need to address the logical inconsistency of 
{{materialized_views.concurrent_writes}} and {{query.concurrent_writes}}, or 
{{materialized_views.enabled}} and {{query.enable_user_defined_functions}}. In 
each case we have semantically equivalent things dotted in inconsistent config 
(if {{concurrent_writes}} is a query option, so is 
{{concurrent_materialized_view_writes}}; if {{enable_user_defined_functions}} 
is a query/cql option so is {{enable_materialized_views}}).

Honestly, if we cannot come up with a _coherent_ strategy that avoids the above 
inconsistencies I prefer the grab bag of flat config we have today, just tidied 
up a bit. Nesting inconsistently is strictly worse for usability IMO.

bq.  I have never worked on a project where I didn't ask how to configure a 
feature or a subsystem and instead wanted to look at all rate limiters together

You have never had to address database behaviour concerns that cut across 
features? In my stint as a database operator, most configuration was of no 
interest. I did not typically delve into feature-level configuration. System 
settings, tuning and security are the only things I would be interested in, and 
I would absolutely have preferred to see them presented together rather than 
spread across the many features I did not know of or understand.

bq. but if we do actually implement pluggable stor

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-05 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487476#comment-17487476
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-17292 at 2/5/22, 1:31 PM:
-

bq.  At the moment streaming and compaction are configured separately

We have a largely flat and messy config file today, so I don't think what we do 
today is relevant. Streaming and compaction are intrinsically linked by repair 
(except in the case of bootstrap). Streaming is gated by (validation) 
compaction throughput. Where does repair configuration sit in this world? Where 
should streaming network settings sit?

You also really need to address the logical inconsistency of 
{{materialized_views.concurrent_writes}} and {{query.concurrent_writes}}, or 
{{materialized_views.enabled}} and {{query.enable_user_defined_functions}}. In 
each case we have semantically equivalent things dotted in inconsistent config 
(if {{concurrent_writes}} is a query option, so is 
{{concurrent_materialized_view_writes}}; if {{enable_user_defined_functions}} 
is a query/cql option so is {{enable_materialized_views}}).

Honestly, if we cannot come up with a _coherent_ strategy that avoids the above 
inconsistencies I prefer the grab bag of flat config we have today, just tidied 
up a bit. Nesting inconsistently is strictly worse for usability IMO.

bq.  I have never worked on a project where I didn't ask how to configure a 
feature or a subsystem and instead wanted to look at all rate limiters together

You have never had to address database behaviour concerns that cut across 
features? In my stint as a database operator, most configuration was of no 
interest. I did not typically delve into feature-level configuration. System 
settings, tuning and security are the only things I would be interested in, and 
I would absolutely have preferred to see them presented together rather than 
spread across the many features I did not know of or understand.

bq. but if we do actually implement pluggable storage, where will this be?

This same argument can likely be applied to concurrent_reads and 
concurrent_writes - it also applies to commit log (and implicitly CDC), repair, 
streaming, hints, memtables and compaction.  Even many of the guardrails, 
particularly e.g. involving tombstones (which are a storage layer concept not 
all implementations will share). Even MVs perhaps (due to special tombstones). 
Are we proposing to group these all under {{storage}}?

IMO {{storage}} and {{query}} are such broad terms that almost everything can 
be justified as encompassed by them. To me this is poor API design, as the user 
has to guess what the authors were thinking, whether in this case it went under 
this heading, or that one, or if this one was important enough it got its own 
heading. Particularly if the user doesn't know a priori what the possible 
configuration options are.





was (Author: benedict):
bq.  At the moment streaming and compaction are configured separately

We have a largely flat and messy config file today, so I don't think what we do 
today is relevant. Streaming and compaction are intrinsically linked by repair 
(except in the case of bootstrap). Streaming is gated by compaction throughput. 
Where does repair configuration sit in this world? Where should streaming 
network configurations sit?

You also haven't addressed the clear inconsistency of 
{{materialized_views.concurrent_writes}} and {{query.concurrent_writes}}, or 
{{materialized_views.enabled}} and {{query.enable_user_defined_functions}}. In 
each case we have semantically equivalent things dotted in inconsistent config 
(if {{concurrent_writes}} is a query option, so is 
{{concurrent_materialized_view_writes}}; if {{enable_user_defined_functions}} 
is a query/cql option so is {{enable_materialized_views}}).

Honestly, if we cannot come up with a _coherent_ strategy that avoids the above 
inconsistencies I prefer the grab bag of flat config we have today, just tidied 
up a bit. Nesting inconsistently is strictly worse for usability IMO.

bq.  I have never worked on a project where I didn't ask how to configure a 
feature or a subsystem and instead wanted to look at all rate limiters together

You have never had to address database behaviour concerns that cut across 
features?

bq. but if we do actually implement pluggable storage, where will this be?

This same argument can likely be applied to concurrent_reads and 
concurrent_writes - it also applies to commit log (and implicitly CDC), repair, 
streaming, hints, memtables and compaction. Are we going to group these all 
under storage?






> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-05 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487476#comment-17487476
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-17292 at 2/5/22, 11:49 AM:
--

bq.  At the moment streaming and compaction are configured separately

We have a largely flat and messy config file today, so I don't think what we do 
today is relevant. Streaming and compaction are intrinsically linked by repair 
(except in the case of bootstrap). Streaming is gated by compaction throughput. 
Where does repair configuration sit in this world? Where should streaming 
network configurations sit?

You also haven't addressed the clear inconsistency of 
{{materialized_views.concurrent_writes}} and {{query.concurrent_writes}}, or 
{{materialized_views.enabled}} and {{query.enable_user_defined_functions}}. In 
each case we have semantically equivalent things dotted in inconsistent config 
(if {{concurrent_writes}} is a query option, so is 
{{concurrent_materialized_view_writes}}; if {{enable_user_defined_functions}} 
is a query/cql option so is {{enable_materialized_views}}).

Honestly, if we cannot come up with a _coherent_ strategy that avoids the above 
inconsistencies I prefer the grab bag of flat config we have today, just tidied 
up a bit. Nesting inconsistently is strictly worse for usability IMO.

bq.  I have never worked on a project where I didn't ask how to configure a 
feature or a subsystem and instead wanted to look at all rate limiters together

You have never had to address database behaviour concerns that cut across 
features?

bq. but if we do actually implement pluggable storage, where will this be?

This same argument can likely be applied to concurrent_reads and 
concurrent_writes - it also applies to commit log (and implicitly CDC), repair, 
streaming, hints, memtables and compaction. Are we going to group these all 
under storage?







was (Author: benedict):
bq.  At the moment streaming and compaction are configured separately

We have a largely flat and messy config file today, so I don't think what we do 
today is relevant. Streaming and compaction are intrinsically linked by repair 
(except in the case of bootstrap). Streaming is gated by compaction throughput. 
Where does repair configuration sit in this world? Where should streaming 
network configurations sit?

You also haven't addressed the clear inconsistency of 
{{materialized_views.concurrent_writes}} and {{query.concurrent_writes}}, or 
{{materialized_views.enabled}} and {{query.enable_user_defined_functions}}. In 
each case we have semantically equivalent things dotted in entirely unrelated 
config.

Honestly, if we cannot come up with a _coherent_ strategy that avoids the above 
inconsistencies I prefer the grab bag of flat config we have today, just tidied 
up a bit. Nesting inconsistently is strictly worse for usability IMO.

bq.  I have never worked on a project where I didn't ask how to configure a 
feature or a subsystem and instead wanted to look at all rate limiters together

You have never had to address database behaviour concerns that cut across 
features?

bq. but if we do actually implement pluggable storage, where will this be?

This same argument can likely be applied to concurrent_reads and 
concurrent_writes - it also applies to commit log (and implicitly CDC), repair, 
streaming, hints, memtables and compaction. Are we going to group these all 
under storage?






> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coheren

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread David Capwell (Jira)

[
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487391#comment-17487391
]

David Capwell edited comment on CASSANDRA-17292 at 2/5/22, 1:30 AM:

bq. streaming is equally as much compaction as it is network, as it also
controls the disk

Most things we do involves the disk... At the moment streaming and compaction
are configured separately, so the fact they touch the disk doesn't mean they
should be together, I don't follow your argument.

bq. If we control this under query why not also row_cache and key_cache

I can buy arguments for "query" or "storage", does this mean that this type of
grouping is broken? I don't see why, most configs clearly belong to a group,
and the minority of cases are blurry (can be argued for 2 groups) or there are
no clear groups (such as cluster_name); these are outliers were we can debate
on a per-basis, I just don't follow the argument that they invalidate this
style of grouping as a whole.

To me, I would expect storage.row_cache as I normally see caches implemented at
the storage layer, but in Cassandra we do this CQL
(SinglePartitionReadCommand); but if we do actually implement pluggable
storage, where will this be? Do we even want these caches if RocksDB is the
storage backend? If the answer is no (I would think not as RocksDB provides
its own caches) then its clearly tied to storage, so storage.row_cache is the
most ideal place.

bq. back_pressure

{code}
$ grep -r back_pressure src/
src//java/org/apache/cassandra/config/Config.java:public volatile boolean
back_pressure_enabled = false;
src//java/org/apache/cassandra/config/Config.java:public volatile
ParameterizedClass back_pressure_strategy;
{code}

heh... dead code...

We do have a network based back pressure, and different features may be able to
inform/work with it to maintain stability, so I always saw our current one as a
network feature, but I could see different arguments. If we want to have a
discussion on where that makes the most sense or if it should be its own top
level thing, I feel thats productive.

bq. or other query execution topics?

I believe thats my point, group the query related topics together...

bq. Much IMO better to have e.g. [enable: {user_defined_functions: true,
materialized_views: true}

I find discoverability is much harder in this model. If you are asking how to
configure something do you say "I want to walk through all limits in isolation
and provide values, then move to enable flags, then rate limiters" or do you
say "I want to configure compaction"? I have never worked on a project where I
didn't ask how to configure a feature or a subsystem and instead wanted to look
at all rate limiters together... If I want to configure the rate limiters in
compaction I would look at the compaction configs, looking at the rate limiter
configs can be confusing as you don't know if the property you see is actually
related to compaction

{code}
rate_limit:
compaction_throughput: 10mb/s
validation_throughput: 10mb/s
{code}

if you are looking at that and new to Cassandra, will you think validation is
related to compaction? What about repair? What is a "validation" and why
would I put a rate limiter on it? Grouping based off limits/flags/etc. looses
context of what a property relates to, so I personally find this more confusing
than things are today.

was (Author: dcapwell):
bq. streaming is equally as much compaction as it is network, as it also
controls the disk

bq. If we control this under query why not also row_cache and key_cache

bq. back_pressure

{code}
$ grep -r back_pressure src/
src//java/org/apache/cassandra/config/Config.java:public volatile boolean
back_pressure_enabled = false;
src//j

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487383#comment-17487383
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-17292 at 2/5/22, 12:44 AM:
--

Also having UDFs enabled/disabled inside {{query}} but a separate 
{{materialized_view}} heading - despite this being an equivalent language level 
feature. This is super inconsistent.

Much IMO better to have e.g.

{code}
enable:
  user_defined_functions: true
  materialized_views: true
  ...
{code}

Also helps the user find feature options and names. Like {{limits}} it is much 
more discoverable.


was (Author: benedict):
Also having UDFs enabled/disabled inside {{query}} but a separate 
{{materialized_view}} heading - despite this being an equivalent language level 
feature. This is super inconsistent.

Much IMO better to have e.g.

{{code}}
enable:
  user_defined_functions: true
  materialized_views: true
  ...
{{code}}

Also helps the user find feature options and names. Like {{limits}} it is much 
more discoverable.

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

2022-02-04 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487318#comment-17487318
 ] 

David Capwell edited comment on CASSANDRA-17292 at 2/4/22, 10:11 PM:
-

for track warnings, I don't mind marking the field {{transient}} disabling it 
from the config layer, only exposing via JMX.  I rather flesh this out and 
defer exposing track_warnings via configs than to release with a config we plan 
to rename the next release...

This ticket is to get agreement on what the structure should look like, and NOT 
move all configs to this structure... once we agree on the end goal we can 
refactor track_warnings (I +1 Caleb's proposal, 
query.local_read_size.abort_threshold is what I prefer strongly; grouping by 
feature makes the most sense to me, and how we mostly name our configs 
already... commitlog_directory vs commitlog.directory, commitlog_total_space vs 
commitlog.total_space... we already prefix configs with the feature...)

I am ok with guardrails choosing to go flat for the time being to unblock it...


was (Author: dcapwell):
for track warnings, I don't mind marking the field transient disabling it from 
the config layer, only exposing via JMX.  I rather flesh this out and defer 
exposing track_warnings via configs than to release with a config we plan to 
rename the next release...

This ticket is to get agreement on what the structure should look like, and NOT 
move all configs to this structure... once we agree on the end goal we can 
refactor track_warnings (I +1 Caleb's proposal, 
query.local_read_size.abort_threshold is what I prefer strongly)

> Move cassandra.yaml toward a nested structure around major database concepts
> 
>
> Key: CASSANDRA-17292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

18 matches

Site Navigation

Mail list logo

Footer information