Re: [DISCUSS] Nested YAML configs for new features

David Capwell Mon, 29 Nov 2021 11:59:12 -0800

Thanks everyone for the comments, I hope below is a good summary of all the 
talking points?

We already use nested configs (networking, seed provider, commit log/hint 
compression, back pressure, etc.)
Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
It would be possible to support flat versions of our configs in cassandra.yaml 
(in addition to the nested versions)
"Settings" vtable currently uses the "_" separator (example of encryption/audit 
log).  Switching to "." Would be a change in behavior which may impact some 
users
"." Separator for nested configs are common in other systems (yq, elastic 
search, etc.)
"Structured / nested config is easier for human eyes to read"... "Flat config 
is harder for human eyes but easy for simple scripts"
For learning what configs are enabled, cassandra.yaml isn't the best interface 
as it may not reflect the actual configs; we can better expose this in CQL 
and/or Sidecar
What should our default example cassandra.yaml file use (flat or nested)?  
Currently default shows nested
When projecting the Config into CQL, we may want to consider UDTs to represent 
the complex types
Current limitations in CQL make nested structures hard to work with, it may be 
worth wild to expand CQL support for nested structures.

I also took a quick stab at enhancing our cassandra.yaml logic to: 1) be 
reusable outside of yaml parsing, 2) support setters (we currently do, but 
setters must be snake case… I fixed that)…, 3) support both nested and 
structured, 4) support ignoring fields in a consistent way (Settings vtable 
will include things SnakeYAML won’t and visa-versa).

https://github.com/apache/cassandra/pull/1335 
<https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final ready 
to merge thing, but instead a POC to show how we can solve a lot of the core 
problems in a consistent and reusable manner.

The following cassandra.yaml was used to show both worlds would work fine in 
the config (and compliment each other)

track_warnings:
  enabled: true
  # nested relative to the local level (TrackWarnings)
  coordinator_read_size.warn_threshold_kb: 1024
  local_read_size.abort_threshold_kb: 1024
  row_index_size:
    warn_threshold_kb: 1024
    abort_threshold_kb: 1024
# nested relative to the top level
track_warnings.coordinator_read_size.abort_threshold_kb: 42

For the “Settings” vtable, a new Loader interface was added to get all the 
properties, and Properties.flatten would turn every property into a “flatten” 
version (isScalar (isPrimitive or not hasSubProperties) or isCollection).  This 
doesn’t solve 100% of the issues that vtable has (types such as Duration would 
need additional translation as they are Scalar but need a translation from 
String -> Duration), and doesn’t solve the fact the table currently uses “_”.

> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote:
> 
> I meant to imply we should improve our UDT usability to support this kind of 
> querying, essentially – but that if we support a simple text->property setup 
> we might want to offer LIKE support so we can search them (via simple 
> filtering, not any index) – which is actually pretty easy to provide.
> 
> I think we should aim to provide users all the facilities they need to 
> interact with config via vtables. If the user requires external tooling, it 
> suggests a weakness in CQL that we should address, and maybe help the user in 
> other scenario too…
> 
> From: Joseph Lynch <joe.e.ly...@gmail.com>
> Date: Monday, 29 November 2021 at 17:32
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org
> <bened...@apache.org> wrote:
>> 
>> Maybe we can make our query language more expressive 😊
>> 
>> We might anyway want to introduce e.g. a LIKE filtering option to 
>> find/discover flattened config parameters?
> 
> This sounds more complicated than just having the settings virtual
> table return text (dot encoded) -> text (json) and probably not even
> that much more useful. A full table scan on the settings table could
> return all top level keys (strings before the first dot) and if we
> just return a valid json string then users can bring their own
> querying capabilities via jq [1], or one line of code in almost any
> programming language (especially python, perl, etc ...).
> 
> Alternatively if we want to modify the grammar it seems supporting
> structured data querying on text fields would maybe be more preferable
> to LIKE since you could get what you want without a grammar change and
> if we could generalize to any text column it would be amazingly useful
> elsewhere to users. For example, we could emulate jq's query syntax in
> the select which is, imo, best-in-class for quickly querying into
> nearest structures. Assuming a key (text) -> value (json) schema:
> 
> 'a' -> "{'b': [{'c': {'d': 4}}]}",
> 
> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
> 
> To have exactly jq syntax (but harder to parse) it would be:
> 
> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
> 
> Since we're not indexing the structured data in any way, filtering
> before selection probably doesn't give us much performance improvement
> as we'd still have to parse the whole text field in most cases.
> 
> -Joey
> 
> [1] https://stedolan.github.io/jq/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Reply via email to