Hi everyone,

I like Timo's proposal to organize our configuration more hierarchical
since this is what the coding guide specifies. The benefit I see is that
config options belonging to the same concept will be found in the same
nested object. Moreover, it will be possible to split the configuration
into unrelated parts which are fed to the respective components. That way
one has a much better separation of concern since component A cannot read
the configuration of component B.

Concerning Timo's last two proposals:

If fail-on-missing-field is a common configuration shared by all formats,
then I would go with the first option:

format.kind: json
format.fail-on-missing-field: true

If fail-on-missing-field is specific for json, then one could go with

format: json
json.fail-on-missing-field: true

or

format.kind: json
format.json.fail-on-missing-field: true

Cheers,
Till


On Fri, May 1, 2020 at 11:55 AM Timo Walther <twal...@apache.org> wrote:

> Hi Jark,
>
> yes, in theory every connector can design options as they like. But for
> user experience and good coding style we should be consistent in Flink
> connectors and configuration. Because implementers of new connectors
> will copy the design of existing ones.
>
> Furthermore, I could image that people in the DataStream API would also
> like to configure their connector based on options in the near future.
> It might be the case that Flink DataStream API connectors will reuse the
> ConfigOptions from Table API for consistency.
>
> I'm favoring either:
>
> format.kind = json
> format.fail-on-missing-field: true
>
> Or:
>
> format = json
> json.fail-on-missing-field: true
>
> Both are valid hierarchies.
>
> Regards,
> Timo
>
>
> On 30.04.20 17:57, Jark Wu wrote:
> > Hi Dawid,
> >
> > I just want to mention one of your response,
> >
> >> What you described with
> >> 'format' = 'csv',
> >> 'csv.allow-comments' = 'true',
> >> 'csv.ignore-parse-errors' = 'true'
> >> would not work though as the `format` prefix is mandatory in the sources
> > as only the properties with format
> >>   will be passed to the format factory in majority of cases. We already
> > have some implicit contracts.
> >
> > IIUC, in FLIP-95 and FLIP-122, the property key style are totally decided
> > by connectors, not the framework.
> > So I custom connector can define above properties, and extract the value
> of
> > 'format', i.e. 'csv', to find the format factory.
> > And extract the properties with `csv.` prefix and remove the prefix, and
> > pass the properties (e.g. 'allow-comments' = 'true')
> > into the format factory to create format.
> >
> > So there is no a strict guarantee to have a "nested JSON style"
> properties.
> > Users can still develop a custom connector with this
> > un-hierarchy properties and works well.
> >
> > 'format' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > Best,
> > Jark
> >
> >
> > On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <dwysakow...@apache.org>
> > wrote:
> >
> >> Hi all,
> >>
> >> I'd like to start with a comment that I am ok with the current state of
> >> the FLIP-122 if there is a strong preference for it. Nevertheless I
> still
> >> like the idea of adding `type` to the `format` to have it as
> `format.type`
> >> = `json`.
> >>
> >> I wanted to clarify a few things though:
> >>
> >> @Jingsong As far as I see it most of the users copy/paste the properties
> >> from the documentation to the SQL, so I don't think additional four
> >> characters are too cumbersome. Plus if you force the additional suffix
> onto
> >> all the options of a format you introduce way more boilerplate than if
> we
> >> added the `type/kind/name`
> >>
> >> @Kurt I agree that we cannot force it, but I think it is more of a
> >> question to set standards/implicit contracts on the properties. What you
> >> described with
> >> 'format' = 'csv',
> >> 'csv.allow-comments' = 'true',
> >> 'csv.ignore-parse-errors' = 'true'
> >>
> >> would not work though as the `format` prefix is mandatory in the sources
> >> as only the properties with format will be passed to the format factory
> in
> >> majority of cases. We already have some implicit contracts.
> >>
> >> @Forward I did not necessarily get the example. Aren't json and bson two
> >> separate formats? Do you mean you can have those two at the same time?
> Why
> >> do you need to differentiate the options for each? The way I see it is:
> >>
> >> ‘format(.name)' = 'json',
> >> ‘format.fail-on-missing-field' = 'false'
> >>
> >> or
> >>
> >> ‘format(.name)' = 'bson',
> >> ‘format.fail-on-missing-field' = 'false'
> >>
> >> @Benchao I'd be fine with any of name, kind, type(this we already had in
> >> the past)
> >>
> >> Best,
> >> Dawid
> >>
> >> On 30/04/2020 04:17, Forward Xu wrote:
> >>
> >> Here I have a little doubt. At present, our json only supports the
> >> conventional json format. If we need to implement json with bson, json
> with
> >> avro, etc., how should we express it?
> >> Do you need like the following:
> >>
> >> ‘format.name' = 'json',
> >>
> >> ‘format.json.fail-on-missing-field' = 'false'
> >>
> >>
> >> ‘format.name' = 'bson',
> >>
> >> ‘format.bson.fail-on-missing-field' = ‘false'
> >>
> >>
> >> Best,
> >>
> >> Forward
> >>
> >> Benchao Li <libenc...@gmail.com> <libenc...@gmail.com> 于2020年4月30日周四
> 上午9:58写道:
> >>
> >>
> >> Thanks Timo for staring the discussion.
> >>
> >> Generally I like the idea to keep the config align with a standard like
> >> json/yaml.
> >>
> >>  From the user's perspective, I don't use table configs from a config
> file
> >> like yaml or json for now,
> >> And it's ok to change it to yaml like style. Actually we didn't know
> that
> >> this could be a yaml like
> >> configuration hierarchy. If it has a hierarchy, we maybe consider that
> in
> >> the future to load the
> >> config from a yaml/json file.
> >>
> >> Regarding the name,
> >> 'format.kind' looks fine to me. However there is another name from the
> top
> >> of my head:
> >> 'format.name', WDYT?
> >>
> >> Dawid Wysakowicz <dwysakow...@apache.org> <dwysakow...@apache.org>
> 于2020年4月29日周三 下午11:56写道:
> >>
> >>
> >> Hi all,
> >>
> >> I also wanted to share my opinion.
> >>
> >> When talking about a ConfigOption hierarchy we use for configuring Flink
> >> cluster I would be a strong advocate for keeping a yaml/hocon/json/...
> >> compatible style. Those options are primarily read from a file and thus
> >> should at least try to follow common practices for nested formats if we
> >> ever decide to switch to one.
> >>
> >> Here the question is about the properties we use in SQL statements. The
> >> origin/destination of these usually will be external catalog, usually in
> >>
> >> a
> >>
> >> flattened(key/value) representation so I agree it is not as important as
> >>
> >> in
> >>
> >> the aforementioned case. Nevertheless having a yaml based catalog or
> >>
> >> being
> >>
> >> able to have e.g. yaml based snapshots of a catalog in my opinion is
> >> appealing. At the same time cost of being able to have a nice
> >> yaml/hocon/json representation is just adding a single suffix to a
> >> single(at most 2 key + value) property. The question is between `format`
> >>
> >> =
> >>
> >> `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
> >> doing it.
> >>
> >> Just to have a full picture. Both cases can be represented in yaml, but
> >> the difference is significant:
> >> format: 'json'
> >> format.option: 'value'
> >>
> >> vs
> >> format:
> >>      kind: 'json'
> >>
> >>      option: 'value'
> >>
> >> Best,
> >> Dawid
> >>
> >> On 29/04/2020 17:13, Flavio Pompermaier wrote:
> >>
> >> Personally I don't have any preference here.  Compliance wih standard
> >>
> >> YAML
> >>
> >> parser is probably more important
> >>
> >> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <imj...@gmail.com> <
> imj...@gmail.com> wrote:
> >>
> >>
> >>  From a user's perspective, I prefer the shorter one "format=json",
> >>
> >> because
> >>
> >> it's more concise and straightforward. The "kind" is redundant for
> >>
> >> users.
> >>
> >> Is there a real case requires to represent the configuration in JSON
> >> style?
> >> As far as I can see, I don't see such requirement, and everything works
> >> fine by now.
> >>
> >> So I'm in favor of "format=json". But if the community insist to follow
> >> code style on this, I'm also fine with the longer one.
> >>
> >> Btw, I also CC user mailing list to listen more user's feedback.
> >>
> >> Because I
> >>
> >> think this is relative to usability.
> >>
> >> Best,
> >> Jark
> >>
> >> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <ches...@apache.org> <
> ches...@apache.org>
> >> wrote:
> >>
> >>
> >>   > Therefore, should we advocate instead:
> >>   >
> >>   > 'format.kind' = 'json',
> >>   > 'format.fail-on-missing-field' = 'false'
> >>
> >> Yes. That's pretty much it.
> >>
> >> This is reasonable important to nail down as with such violations I
> >> believe we could not actually switch to a standard YAML parser.
> >>
> >> On 29/04/2020 16:05, Timo Walther wrote:
> >>
> >> Hi everyone,
> >>
> >> discussions around ConfigOption seem to be very popular recently.
> >>
> >> So I
> >>
> >> would also like to get some opinions on a different topic.
> >>
> >> How do we represent hierarchies in ConfigOption? In FLIP-122, we
> >> agreed on the following DDL syntax:
> >>
> >> CREATE TABLE fs_table (
> >>   ...
> >> ) WITH (
> >>   'connector' = 'filesystem',
> >>   'path' = 'file:///path/to/whatever',
> >>   'format' = 'csv',
> >>   'format.allow-comments' = 'true',
> >>   'format.ignore-parse-errors' = 'true'
> >> );
> >>
> >> Of course this is slightly different from regular Flink core
> >> configuration but a connector still needs to be configured based on
> >> these options.
> >>
> >> However, I think this FLIP violates our code style guidelines
> >>
> >> because
> >>
> >> 'format' = 'json',
> >> 'format.fail-on-missing-field' = 'false'
> >>
> >> is an invalid hierarchy. `format` cannot be a string and a top-level
> >> object at the same time.
> >>
> >> We have similar problems in our runtime configuration:
> >>
> >> state.backend=
> >> state.backend.incremental=
> >> restart-strategy=
> >> restart-strategy.fixed-delay.delay=
> >> high-availability=
> >> high-availability.cluster-id=
> >>
> >> The code style guide states "Think of the configuration as nested
> >> objects (JSON style)". So such hierarchies cannot be represented in
> >>
> >> a
> >>
> >> nested JSON style.
> >>
> >> Therefore, should we advocate instead:
> >>
> >> 'format.kind' = 'json',
> >> 'format.fail-on-missing-field' = 'false'
> >>
> >> What do you think?
> >>
> >> Thanks,
> >> Timo
> >>
> >> [1]
> >>
> >>
> >>
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >>
> >> --
> >>
> >> Benchao Li
> >> School of Electronics Engineering and Computer Science, Peking
> UniversityTel:+86-15650713730
> >> Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >>
> >>
> >
>
>

Reply via email to