Re: [DISCUSS] Nested YAML configs for new features

Benjamin Lerer Mon, 29 Nov 2021 08:59:32 -0800

>
> We might anyway want to introduce e.g. a LIKE filtering option to
> find/discover flattened config parameters?



+100

Le lun. 29 nov. 2021 à 17:51, [email protected] <[email protected]> a
écrit :

> Maybe we can make our query language more expressive 😊
>
> We might anyway want to introduce e.g. a LIKE filtering option to
> find/discover flattened config parameters?
>
> From: Benjamin Lerer <[email protected]>
> Date: Monday, 29 November 2021 at 16:41
> To: [email protected] <[email protected]>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >
> > I don’t think it’s necessarily a requirement that we use the flattened
> > version in vtables. At the very least we can make use of sets, lists,
> etc.
> > But we can probably also use UDTs if this improves clarity.
>
>
> In my opinion part of the issue is on the query side. How do we select a
> nested set or a specific set easily? UDTs are not great for this type of
> queries. For collection we can use CONTAINS and element or range selection
> but insertion might be the problem.
>
> Le lun. 29 nov. 2021 à 17:23, Bowen Song <[email protected]> a écrit :
>
> > In ElasticSearch, the default is a flattened format with almost all
> > lines commented out. See
> >
> >
> https://github.com/elastic/elasticsearch/blob/master/distribution/src/config/elasticsearch.yml
> >
> > I guess they chose to do that because user can uncomment individual
> > lines to make changes. In a structured config file, the user will have
> > to uncomment all lines containing the parent keys to get it work. For
> > example, if someone wants to set the config keyABB to a non-default
> > value, they will have to correctly uncomment 3 lines: keyA, keyAB and
> > keyABB, which can be annoying and could easily maker a mistake. If any
> > of the first two keys is not uncommented, the YAML file will still be
> > valid but the config like keyX.keyAB.keyABB might just get silently
> > ignored by the database.
> >
> >     keyX:
> >        keyY:
> >          keyZ: value
> >     # keyA:
> >     #   keyAA:
> >     #     key AAA: value
> >     #   keyAB:
> >     #     keyABA: value
> >     #     keyABB: value
> >
> > On 29/11/2021 15:54, Benjamin Lerer wrote:
> > > I do not think that supporting both options is an issue. The settings
> > > virtual table would have to use the flattened version.
> > > If we support both formats, the question would be: what should be the
> one
> > > used by default in the configuration file?
> > >
> > > Le ven. 26 nov. 2021 à 15:40,[email protected]  <[email protected]
> >
> > a
> > > écrit :
> > >
> > >> This is the approach I favour for config files also. We had a much
> less
> > >> engaged discussion on this topic only a few months ago, so glad to see
> > more
> > >> people getting involved now.
> > >>
> > >> I would however personally prefer to see the configuration file slowly
> > >> deprecated (if perhaps never retired), in favour of virtual tables, so
> > that
> > >> operators may easily set configurations for the entire cluster.
> Ideally
> > it
> > >> would be possible to specify configuration per cluster, per DC and per
> > >> node, with the most specific configuration applying I would like to
> see
> > a
> > >> similar hierarchy for Keyspace, Table and Per-Query options. Ideally
> > only
> > >> the barest minimum number of options would be necessary to supply in a
> > >> config file, and only on first launch – seed nodes, for instance.
> > >>
> > >> So whatever design we employ here, we should IMO be aiming for it to
> be
> > >> compatible with a CQL representation also.
> > >>
> > >>
> > >> From: Bowen Song<[email protected]>
> > >> Date: Wednesday, 24 November 2021 at 18:15
> > >> To:[email protected]  <[email protected]>
> > >> Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >> Since you mentioned ElasticSearch, I'm actually pretty happy with
> their
> > >> config file syntax. It allows the user to completely flatten out the
> > >> entire config file. To give people who isn't familiar with
> ElasticSearch
> > >> an idea, here is a config file we use:
> > >>
> > >>      cluster.name: foobar
> > >>
> > >>      node.remote_cluster_client: false
> > >>      node.name: "foo.example.com"
> > >>      node.master: true
> > >>      node.data: true
> > >>      node.ingest: true
> > >>      node.ml: false
> > >>
> > >>      xpack.ml.enabled: false
> > >>      xpack.security.enabled: false
> > >>      xpack.security.audit.enabled: false
> > >>      xpack.watcher.enabled: false
> > >>
> > >>      action.auto_create_index: "+.,-*"
> > >>
> > >>      network.host: _global_
> > >>
> > >>      discovery.zen.hosts_provider: file
> > >>      discovery.zen.minimum_master_nodes: 2
> > >>
> > >>      http.publish_host: "foo.example.com"
> > >>      http.publish_port: 443
> > >>      http.bind_host: 127.0.0.1
> > >>
> > >>      transport.publish_host: "bar.example.com"
> > >>      transport.bind_host: 0.0.0.0
> > >>
> > >>      indices.fielddata.cache.size: 1GB
> > >>      indices.breaker.total.use_real_memory: false
> > >>
> > >>      path.logs: /var/log/elasticsearch
> > >>      path.data: /var/lib/elasticsearch/data
> > >>
> > >> As you can see we can use the flat (grep-able) syntax for everything.
> > >> This is also human readable because we can group options together by
> > >> inserting empty lines between them.
> > >>
> > >> The equivalent of the above in a structured syntax will be:
> > >>
> > >>      cluster:
> > >>           name: foobar
> > >>
> > >>      node:
> > >>           remote_cluster_client: false
> > >>           name: "foo.example.com"
> > >>           master: true
> > >>           data: true
> > >>           ingest: true
> > >>           ml: false
> > >>
> > >>      xpack:
> > >>           ml:
> > >>               enabled: false
> > >>           security:
> > >>               enabled: false
> > >>               audit:
> > >>                   enabled: false
> > >>           watcher:
> > >>               enabled: false
> > >>
> > >>      action:
> > >>           auto_create_index: "+.,-*"
> > >>
> > >>      network:
> > >>           host: _global_
> > >>
> > >>      discovery:
> > >>           zen:
> > >>               hosts_provider: file
> > >>               minimum_master_nodes: 2
> > >>
> > >>      http:
> > >>           publish_host: "foo.example.com"
> > >>           publish_port: 443
> > >>           bind_host: 127.0.0.1
> > >>
> > >>      transport:
> > >>           publish_host: "bar.example.com"
> > >>           bind_host: 0.0.0.0
> > >>
> > >>      indices:
> > >>           fielddata:
> > >>               cache:
> > >>                   size: 1GB
> > >>      indices:
> > >>           breaker:
> > >>               total:
> > >>                   use_real_memory: false
> > >>
> > >>      path:
> > >>           logs: /var/log/elasticsearch
> > >>           data: /var/lib/elasticsearch/data
> > >>
> > >> This may be easier to read for some people, but it is a total
> nightmare
> > >> for "grep" - so many keys have identical names, such as "enabled".
> > >>
> > >> Also, for the virtual tables, it would be a lot easier to represent
> > >> individual values in a virtual table when the config is flat and keys
> > >> are unique. The virtual tables would need to either support the
> encoding
> > >> and decoding of the structured config into a flat structure, or use
> JSON
> > >> encoded string value. The use of JSON would make querying individual
> > >> value much harder.
> > >>
> > >> On 22/11/2021 16:16, Joseph Lynch wrote:
> > >>> Isn't one of the primary reasons to have a YAML configuration instead
> > >>> of a properties file is to allow typed and structured (implies
> nested)
> > >>> configuration? I think it makes a lot of sense to group related
> > >>> configuration options (e.g. a feature) into a typed class when we're
> > >>> talking about more than one or two related options.
> > >>>
> > >>> It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs
> to
> > >>> period encoded key->value pairs when required (usually when providing
> > >>> a property or override layer), Spring and Elasticsearch yamls both
> > >>> come to mind. It seems pretty reasonable to support dot encoding and
> > >>> decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
> > >>>
> > >>> Regarding quickly telling what configuration a node is running I
> think
> > >>> we should lean on virtual tables for "what is the current
> > >>> configuration" now that we have them, as others have said the written
> > >>> cassandra.yaml is not necessarily the current configuration ... and
> > >>> also grep -C or -A exist for this reason.
> > >>>
> > >>> -Joey
> > >>>
> > >>> On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<[email protected]>
> > >> wrote:
> > >>>> I do not have a strong opinion for one or the other but wanted to
> > raise
> > >> the
> > >>>> issue I see with the "Settings" virtual table.
> > >>>>
> > >>>> Currently the "Settings" virtual table converts nested options into
> > flat
> > >>>> options using a "_" separator. For those options it allows a user to
> > >> query
> > >>>> the all set of options through some hack.
> > >>>> If we decide to move to more nesting (more than one level), it seems
> > to
> > >> me
> > >>>> that we need to change the way this table is behaving and how we can
> > >> query
> > >>>> its data.
> > >>>>
> > >>>> We would need to start using "." as a nesting separator to ensure
> that
> > >>>> things are consistent between the configuration and the table and
> add
> > >>>> support for LIKE restrictions for filtering queries to allow
> operators
> > >> to
> > >>>> be able to select the precise set of settings that the operator is
> > >> looking
> > >>>> for.
> > >>>>
> > >>>> Doing so is not really complicated in itself but might impact some
> > >> users.
> > >>>> Le ven. 19 nov. 2021 à 22:39, David Capwell<[email protected]
> > .invalid>
> > >> a
> > >>>> écrit :
> > >>>>
> > >>>>>> it is really handy to grep
> > >>>>>> cassandra.yaml on some config key and you know the value
> instantly.
> > >>>>> You can still do that
> > >>>>>
> > >>>>> $ grep -A2 coordinator_read_size conf/cassandra.yaml
> > >>>>> #     coordinator_read_size:
> > >>>>> #         warn_threshold_kb: 0
> > >>>>> #         abort_threshold_kb: 0
> > >>>>>
> > >>>>> I was also arguing we should support nested and flat, so if your
> > infra
> > >>>>> works better with flat then you could use
> > >>>>>
> > >>>>> track_warnings.coordinator_read_size.warn_threshold_kb: 0
> > >>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 0
> > >>>>>
> > >>>>>> On Nov 19, 2021, at 1:34 PM, David Capwell<[email protected]>
> > >> wrote:
> > >>>>>>> With the flat structure it turns into properties file - would it
> be
> > >>>>>>> possible to support both formats - nested yaml and flat
> properties?
> > >>>>>> For majority of our configs yes, but there are a subset where flat
> > >>>>> properties is annoying
> > >>>>>> hinted_handoff_disabled_datacenters - set type, so you could do
> > >>>>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to
> > deal
> > >>>>> with separators as the format doesn’t support
> > >>>>>> seed_provider.parameters - this is a map type… so would need to do
> > >>>>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we
> > >> special
> > >>>>> case maps as dynamic fields?  Then seed_provider.parameters.a=a?
> We
> > >> have
> > >>>>> ParameterizedClass all over the code
> > >>>>>> So, as long as we define how to deal with java collections; we
> could
> > >> in
> > >>>>> theory support properties files (not arguing for that in this
> thread)
> > >> as
> > >>>>> well as system properties.
> > >>>>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> > >>>>> [email protected]> wrote:
> > >>>>>>> With the flat structure it turns into properties file - would it
> be
> > >>>>>>> possible to support both formats - nested yaml and flat
> properties?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> - - -- --- ----- -------- -------------
> > >>>>>>> Jacek Lewandowski
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> > >>>>> [email protected]>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> If it's nested, "track_warnings" would still work if you're
> > grepping
> > >>>>> around
> > >>>>>>>> vim or less.
> > >>>>>>>>
> > >>>>>>>> I'd have to concede the point about grep output, although there
> > are
> > >>>>> tools
> > >>>>>>>> likehttps://github.com/kislyuk/yq  that could probably be bent
> to
> > >> do
> > >>>>> what
> > >>>>>>>> you want.
> > >>>>>>>>
> > >>>>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> > >>>>>>>> [email protected]> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi David,
> > >>>>>>>>>
> > >>>>>>>>> while I do not oppose nested structure, it is really handy to
> > grep
> > >>>>>>>>> cassandra.yaml on some config key and you know the value
> > instantly.
> > >>>>>>>>> This is not possible when it is nested (easily & fastly) as it
> is
> > >> on
> > >>>>>>>>> two lines. Or maybe my grepping is just not advanced enough to
> > >> cover
> > >>>>>>>>> this case? If it is flat, I can just grep "track_warnings" and
> I
> > >> have
> > >>>>>>>>> them all.
> > >>>>>>>>>
> > >>>>>>>>> Can you elaborate on your last bullet point? Parsing layer ...
> > >> What do
> > >>>>>>>>> you mean specifically?
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<[email protected]
> >
> > >>>>> wrote:
> > >>>>>>>>>> This has been brought up in a few tickets, so pushing to the
> dev
> > >>>>> list.
> > >>>>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
> > >>>>>>>>>> CASSANDRA-16896 - hard/soft limits for queries
> > >>>>>>>>>> CASSANDRA-17147 - Guardrails prototype
> > >>>>>>>>>>
> > >>>>>>>>>> In short, do we as a project wish to move "new features" into
> > >> nested
> > >>>>>>>>>> YAML when the feature has "enough" to justify the nesting?  I
> > >> would
> > >>>>>>>>>> really like to focus this discussion on new features rather
> than
> > >>>>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as
> > >> there is
> > >>>>>>>>>> already a place to talk about that.
> > >>>>>>>>>>
> > >>>>>>>>>> To get things started, let's start with the track-warning
> > feature
> > >>>>>>>>>> (hard/soft limits for queries), currently the configs look as
> > >> follows
> > >>>>>>>>>> (assuming 15234)
> > >>>>>>>>>>
> > >>>>>>>>>> track_warnings:
> > >>>>>>>>>>     enabled: true
> > >>>>>>>>>>     coordinator_read_size:
> > >>>>>>>>>>         warn_threshold: 10kb
> > >>>>>>>>>>         abort_threshold: 1mb
> > >>>>>>>>>>     local_read_size:
> > >>>>>>>>>>         warn_threshold: 10kb
> > >>>>>>>>>>         abort_threshold: 1mb
> > >>>>>>>>>>     row_index_size:
> > >>>>>>>>>>         warn_threshold: 100mb
> > >>>>>>>>>>         abort_threshold: 1gb
> > >>>>>>>>>>
> > >>>>>>>>>> or should this be "flat"
> > >>>>>>>>>>
> > >>>>>>>>>> track_warnings_enabled: true
> > >>>>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
> > >>>>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
> > >>>>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb
> > >>>>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb
> > >>>>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb
> > >>>>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb
> > >>>>>>>>>>
> > >>>>>>>>>> For me I prefer nested for a few reasons
> > >>>>>>>>>> * easier to enforce consistency as the configs can use shared
> > >> types;
> > >>>>>>>>>> in the track warnings patch I had mismatches cross configs
> (warn
> > >> vs
> > >>>>>>>>>> warns, fail vs abort, etc.) before going nested, now
> everything
> > >>>>> reuses
> > >>>>>>>>>> the same types
> > >>>>>>>>>> * even though it is longer, things can be more clear how they
> > are
> > >>>>>>>> related
> > >>>>>>>>>> * parsing layer can add support for mixed or purely flat
> > >> depending on
> > >>>>>>>>>> user preference (example:
> > >>>>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.'
> > >> notation
> > >>>>>>>>>> to represent nested structures)
> > >>>>>>>>>>
> > >>>>>>>>>> Thoughts?
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >> ---------------------------------------------------------------------
> > >>>>>>>>>> To unsubscribe,e-mail:[email protected]
> > >>>>>>>>>> For additional commands,e-mail:[email protected]
> > >>>>>>>>>>
> > >> ---------------------------------------------------------------------
> > >>>>>>>>> To unsubscribe,e-mail:[email protected]
> > >>>>>>>>> For additional commands,e-mail:[email protected]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>
> ---------------------------------------------------------------------
> > >>>>> To unsubscribe,e-mail:[email protected]
> > >>>>> For additional commands,e-mail:[email protected]
> > >>>>>
> > >>>>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe,e-mail:[email protected]
> > >>> For additional commands,e-mail:[email protected]
> > >>>
>

Re: [DISCUSS] Nested YAML configs for new features

Reply via email to