[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

David Capwell (Jira) Fri, 04 Feb 2022 17:31:39 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487391#comment-17487391
 ]


David Capwell edited comment on CASSANDRA-17292 at 2/5/22, 1:30 AM:
--------------------------------------------------------------------

bq. streaming is equally as much compaction as it is network, as it also 
controls the disk

Most things we do involves the disk... At the moment streaming and compaction 
are configured separately, so the fact they touch the disk doesn't mean they 
should be together, I don't follow your argument.

bq. If we control this under query why not also row_cache and key_cache

I can buy arguments for "query" or "storage", does this mean that this type of 
grouping is broken?  I don't see why, most configs clearly belong to a group, 
and the minority of cases are blurry (can be argued for 2 groups) or there are 
no clear groups (such as cluster_name); these are outliers were we can debate 
on a per-basis, I just don't follow the argument that they invalidate this 
style of grouping as a whole.

To me, I would expect storage.row_cache as I normally see caches implemented at 
the storage layer, but in Cassandra we do this CQL 
(SinglePartitionReadCommand); but if we do actually implement pluggable 
storage, where will this be?  Do we even want these caches if RocksDB is the 
storage backend?  If the answer is no (I would think not as RocksDB provides 
its own caches) then its clearly tied to storage, so storage.row_cache is the 
most ideal place.

bq. back_pressure

{code}
$ grep -r back_pressure src/
src//java/org/apache/cassandra/config/Config.java:    public volatile boolean 
back_pressure_enabled = false;
src//java/org/apache/cassandra/config/Config.java:    public volatile 
ParameterizedClass back_pressure_strategy;
{code}

heh... dead code... 

We do have a network based back pressure, and different features may be able to 
inform/work with it to maintain stability, so I always saw our current one as a 
network feature, but I could see different arguments.  If we want to have a 
discussion on where that makes the most sense or if it should be its own top 
level thing, I feel thats productive.

bq. or other query execution topics? 

I believe thats my point, group the query related topics together...

bq. Much IMO better to have e.g. [enable: {user_defined_functions: true, 
materialized_views: true}

I find discoverability is much harder in this model. If you are asking how to 
configure something do you say "I want to walk through all limits in isolation 
and provide values, then move to enable flags, then rate limiters" or do you 
say "I want to configure compaction"?  I have never worked on a project where I 
didn't ask how to configure a feature or a subsystem and instead wanted to look 
at all rate limiters together... If I want to configure the rate limiters in 
compaction I would look at the compaction configs, looking at the rate limiter 
configs can be confusing as you don't know if the property you see is actually 
related to compaction

{code}
rate_limit:
  compaction_throughput: 10mb/s
  validation_throughput: 10mb/s
{code}

if you are looking at that and new to Cassandra, will you think validation is 
related to compaction?  What about repair?  What is a "validation" and why 
would I put a rate limiter on it?  Grouping based off limits/flags/etc. looses 
context of what a property relates to, so I personally find this more confusing 
than things are today.


was (Author: dcapwell):
bq. streaming is equally as much compaction as it is network, as it also 
controls the disk

Most things we do involves the disk... At the moment streaming and compaction 
are configured separately, so the fact they touch the disk doesn't mean they 
should be together, I don't follow your argument.

bq. If we control this under query why not also row_cache and key_cache

I can buy arguments for "query" or "storage", does this mean that this type of 
grouping is broken?  I don't see why, most configs clearly belong to a group, 
and the minority of cases are blurry (can be argued for 2 groups) or there are 
no clear groups (such as cluster_name); these are outliers were we can debate 
on a per-basis, I just don't follow the argument that they invalidate this 
style of grouping as a whole.

To me, I would expect storage.row_cache as I normally see caches implemented at 
the storage layer, but in Cassandra we do this CQL 
(SinglePartitionReadCommand); but if we do actually implement pluggable 
storage, where will this be?  Do we even want these caches if RocksDB is the 
storage backend?  If the answer is no (I would think not as RocksDB provides 
its own caches) then its clearly tied to storage, so storage.row_cache is the 
most ideal place.

bq. back_pressure

{code}
$ grep -r back_pressure src/
src//java/org/apache/cassandra/config/Config.java:    public volatile boolean 
back_pressure_enabled = false;
src//java/org/apache/cassandra/config/Config.java:    public volatile 
ParameterizedClass back_pressure_strategy;
{code}

heh... dead code... 

We do have a network based back pressure, and different features may be able to 
inform/work with it to maintain stability, so I always saw our current one as a 
network feature, but I could see different arguments.  If we want to have a 
discussion on where that makes the most sense or if it should be its own top 
level thing, I feel thats productive.

bq. or other query execution topics? 

I believe thats my point, group the query related topics together...

bq. Much IMO better to have e.g. [enable: {user_defined_functions: true, 
materialized_views: true}

I find discoverability is much harder in this model. If you are asking how to 
configure something do you say "I want to walk through all limits in isolation 
and provide values, then move to enable flags, then rate limiters" or do you 
say "I want to configure compaction"?  I have never worked on a project where I 
didn't ask how to configure a feature or a subsystem and instead wanted to look 
at all rate limiters together... If I want to configure the rate limiters in 
compaction I would look at the compaction configs, looking at the rate limiter 
configs can be confusing as you don't know if the property you see is actually 
related to compaction

{code}

> Move cassandra.yaml toward a nested structure around major database concepts
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17292
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts

Reply via email to