[ 
https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487476#comment-17487476
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-17292 at 2/5/22, 1:33 PM:
-----------------------------------------------------------------------------

bq.  At the moment streaming and compaction are configured separately

We have a largely flat and messy config file today, so I don't think what we do 
today is relevant. Streaming and compaction are intrinsically linked by repair 
(except in the case of bootstrap). Streaming is gated by (validation) 
compaction throughput. Where does repair configuration sit in this world? Where 
should streaming network settings sit?

You also really need to address the logical inconsistency of 
{{materialized_views.concurrent_writes}} and {{query.concurrent_writes}}, or 
{{materialized_views.enabled}} and {{query.enable_user_defined_functions}}. In 
each case we have semantically equivalent things dotted in inconsistent config 
(if {{concurrent_writes}} is a query option, so is 
{{concurrent_materialized_view_writes}}; if {{enable_user_defined_functions}} 
is a query/cql option so is {{enable_materialized_views}}).

Honestly, if we cannot come up with a _coherent_ strategy that avoids the above 
inconsistencies I prefer the grab bag of flat config we have today, just tidied 
up a bit. Nesting inconsistently is strictly worse for usability IMO.

bq.  I have never worked on a project where I didn't ask how to configure a 
feature or a subsystem and instead wanted to look at all rate limiters together

You have never had to address database behaviour concerns that cut across 
features? 

In my stint as a database operator, most configuration was of no interest. I 
did not typically delve into feature-level configuration. What I was interested 
in is what settings I needed to set for it to operate correctly, what settings 
might affect the database performance, and what settings might affect security 
or other stability concerns. I would absolutely have preferred to see them 
presented together rather than spread across the many features I did not know 
of or understand.

bq. but if we do actually implement pluggable storage, where will this be?

This same argument can likely be applied to concurrent_reads and 
concurrent_writes - it also applies to commit log (and implicitly CDC), repair, 
streaming, hints, memtables and compaction.  Even many of the guardrails, 
particularly e.g. involving tombstones (which are a storage layer concept not 
all implementations will share). Even MVs perhaps (due to special tombstones). 
Are we proposing to group these all under {{storage}}?

IMO {{storage}} and {{query}} are such broad terms that almost everything can 
be justified as encompassed by them. To me this is poor API design, as the user 
has to guess what the authors were thinking, whether in this case it went under 
this heading, or that one, or if this one was important enough it got its own 
heading. Particularly if the user doesn't know a priori what the possible 
configuration options are.





was (Author: benedict):
bq.  At the moment streaming and compaction are configured separately

We have a largely flat and messy config file today, so I don't think what we do 
today is relevant. Streaming and compaction are intrinsically linked by repair 
(except in the case of bootstrap). Streaming is gated by (validation) 
compaction throughput. Where does repair configuration sit in this world? Where 
should streaming network settings sit?

You also really need to address the logical inconsistency of 
{{materialized_views.concurrent_writes}} and {{query.concurrent_writes}}, or 
{{materialized_views.enabled}} and {{query.enable_user_defined_functions}}. In 
each case we have semantically equivalent things dotted in inconsistent config 
(if {{concurrent_writes}} is a query option, so is 
{{concurrent_materialized_view_writes}}; if {{enable_user_defined_functions}} 
is a query/cql option so is {{enable_materialized_views}}).

Honestly, if we cannot come up with a _coherent_ strategy that avoids the above 
inconsistencies I prefer the grab bag of flat config we have today, just tidied 
up a bit. Nesting inconsistently is strictly worse for usability IMO.

bq.  I have never worked on a project where I didn't ask how to configure a 
feature or a subsystem and instead wanted to look at all rate limiters together

You have never had to address database behaviour concerns that cut across 
features? In my stint as a database operator, most configuration was of no 
interest. I did not typically delve into feature-level configuration. System 
settings, tuning and security are the only things I would be interested in, and 
I would absolutely have preferred to see them presented together rather than 
spread across the many features I did not know of or understand.

bq. but if we do actually implement pluggable storage, where will this be?

This same argument can likely be applied to concurrent_reads and 
concurrent_writes - it also applies to commit log (and implicitly CDC), repair, 
streaming, hints, memtables and compaction.  Even many of the guardrails, 
particularly e.g. involving tombstones (which are a storage layer concept not 
all implementations will share). Even MVs perhaps (due to special tombstones). 
Are we proposing to group these all under {{storage}}?

IMO {{storage}} and {{query}} are such broad terms that almost everything can 
be justified as encompassed by them. To me this is poor API design, as the user 
has to guess what the authors were thinking, whether in this case it went under 
this heading, or that one, or if this one was important enough it got its own 
heading. Particularly if the user doesn't know a priori what the possible 
configuration options are.




> Move cassandra.yaml toward a nested structure around major database concepts
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17292
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17292
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Config
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.x
>
>
> Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new 
> features") has made it clear we will gravitate toward appropriately nested 
> structures for new parameters in {{cassandra.yaml}}, but from the scattered 
> conversation across a few Guardrails tickets (see CASSANDRA-17212 and 
> CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to 
> eventually extend this to the rest of {{cassandra.yaml}}. The benefits of 
> this change include those we gain by doing it for new features (single point 
> of interest for feature documentation, typed configuration objects, logical 
> grouping for additional parameters added over time, discoverability, etc.), 
> but one a larger scale.
> This may overlap with ongoing work, including the Guardrails epic. Ideally, 
> even a rough cut of a design here would allow that to move forward in a 
> timely and coherent manner (with less long-term refactoring pain).
> While these would have to be adjusted to CASSANDRA-15234 (probably after it 
> merges), there have been two proposals floated already for what this might 
> look like:
> From [~maedhroz] - 
> https://github.com/maedhroz/cassandra/commit/49e83c70eba3357978d1081ecf500bbbdee960d8
> From [~benedict] - 
> https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to