Thanks all for the discussion, I have updated FLIP-105 and FLIP-122 to use
the new format option keys.

FLIP-105:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427289
FLIP-122:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory

Best,
Jark

On Wed, 6 May 2020 at 20:37, Timo Walther <twal...@apache.org> wrote:

> Cool, so let's go with:
>
> format = json
> json.fail-on-missing-field = true
> json.ignore-parse-error = true
>
> value.format = json
> value.json.fail-on-missing-field = true
> value.json.ignore-parse-error = true
>
> Regards,
> Timo
>
>
> On 06.05.20 12:39, Jingsong Li wrote:
> > Hi,
> >
> > +1 to:
> > format = parquet
> > parquet.compression = ...
> > parquet.block.size = ...
> > parquet.page.size = ...
> >
> > For the formats like parquet and orc,
> > Not just Flink itself, but this way also let Flink keys compatible with
> the
> > property keys of Hadoop / Hive / Spark.
> >
> > And like Jark said, this way works for Kafka key value too.
> >
> > Best,
> > Jingsong Lee
> >
> > On Wed, May 6, 2020 at 6:19 PM Jark Wu <imj...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I think Timo proposed a good idea to make both side happy. That is:
> >>
> >> format = json
> >> json.fail-on-missing-field = true
> >> json.ignore-parse-error = true
> >>
> >> value.format = json
> >> value.json.fail-on-missing-field = true
> >> value.json.ignore-parse-error = true
> >>
> >> This is a valid hierarchies. Besides, it's more clear that the option
> >> belongs to a specific component (i.e. json).
> >> This will be more readable when we introducing more formats, e.g.
> parquet.
> >>
> >> format = parquet
> >> parquet.compression = ...
> >> parquet.block.size = ...
> >> parquet.page.size = ...
> >>
> >> is more readable than current style:
> >>
> >> format = parquet
> >> format.compression = ...
> >> format.block.size = ...
> >> format.page.size = ...
> >>
> >> To sum up, I'm +1 to use "format = json",  "json.fail-on-missing-field =
> >> true".
> >>
> >> Best,
> >> Jark
> >>
> >> On Wed, 6 May 2020 at 17:12, Danny Chan <yuzhao....@gmail.com> wrote:
> >>
> >>> Hi, everyone ~
> >>>
> >>> Allows me to share some thoughts here.
> >>>
> >>> Personally i think for SQL, "format" is obviously better than "
> >> format.name",
> >>> it is more concise and straight-forward, similar with Presto FORMAT[2]
> >> and
> >>> KSQL VALUE_FORMAT[1]; i think we move from "connector.type" to
> >> "connector"
> >>> for the same reason, the "type" or "name" suffix is implicit, SQL
> syntax
> >>> like the DDL is a top-level user API, so from my side keeping good
> >>> user-friendly syntax is more important.
> >>>
> >>> @Timo I'm big +1 for the a good code style guide, but that does not
> mean
> >>> we should go for a json-style table options in the DDL, the DDL could
> >> have
> >>> its own contract. Can we move "represent these config options in YAML"
> to
> >>> another topic ? Otherwise, how should we handle the "connector" key,
> >> should
> >>> we prefix all the table options with "connector" ? The original
> inention
> >> of
> >>> FLIP-122 is to remove some redundant prefix/suffix of the table options
> >>> because they are obviously implicit there, and the "connector." prefix
> >> and
> >>> the ".type" or ".name" suffix are the ones we most want to delete.
> >>>
> >>> @Dawid Although ".type" is just another 4 characters, but we force the
> >> SQL
> >>> users to do the thing that is obvious reduadant, i know serialize
> catalog
> >>> table to YAML or use the options in DataStream has similar keys
> request,
> >>> but they are different use cases that i believe many SQL user would not
> >>> encounter, that means we force many users to obey rules for cases they
> >>> would never have.
> >>>
> >>>
> >>> [1] https://docs.ksqldb.io/en/latest/developer-guide/create-a-table/
> >>> [2] https://prestodb.io/docs/current/sql/create-table.html
> >>>
> >>> Best,
> >>> Danny Chan
> >>> 在 2020年5月4日 +0800 PM11:34,Till Rohrmann <trohrm...@apache.org>,写道:
> >>>> Hi everyone,
> >>>>
> >>>> I like Timo's proposal to organize our configuration more hierarchical
> >>>> since this is what the coding guide specifies. The benefit I see is
> >> that
> >>>> config options belonging to the same concept will be found in the same
> >>>> nested object. Moreover, it will be possible to split the
> configuration
> >>>> into unrelated parts which are fed to the respective components. That
> >> way
> >>>> one has a much better separation of concern since component A cannot
> >> read
> >>>> the configuration of component B.
> >>>>
> >>>> Concerning Timo's last two proposals:
> >>>>
> >>>> If fail-on-missing-field is a common configuration shared by all
> >> formats,
> >>>> then I would go with the first option:
> >>>>
> >>>> format.kind: json
> >>>> format.fail-on-missing-field: true
> >>>>
> >>>> If fail-on-missing-field is specific for json, then one could go with
> >>>>
> >>>> format: json
> >>>> json.fail-on-missing-field: true
> >>>>
> >>>> or
> >>>>
> >>>> format.kind: json
> >>>> format.json.fail-on-missing-field: true
> >>>>
> >>>> Cheers,
> >>>> Till
> >>>>
> >>>>
> >>>> On Fri, May 1, 2020 at 11:55 AM Timo Walther <twal...@apache.org>
> >> wrote:
> >>>>
> >>>>> Hi Jark,
> >>>>>
> >>>>> yes, in theory every connector can design options as they like. But
> >> for
> >>>>> user experience and good coding style we should be consistent in
> >> Flink
> >>>>> connectors and configuration. Because implementers of new connectors
> >>>>> will copy the design of existing ones.
> >>>>>
> >>>>> Furthermore, I could image that people in the DataStream API would
> >> also
> >>>>> like to configure their connector based on options in the near
> >> future.
> >>>>> It might be the case that Flink DataStream API connectors will reuse
> >>> the
> >>>>> ConfigOptions from Table API for consistency.
> >>>>>
> >>>>> I'm favoring either:
> >>>>>
> >>>>> format.kind = json
> >>>>> format.fail-on-missing-field: true
> >>>>>
> >>>>> Or:
> >>>>>
> >>>>> format = json
> >>>>> json.fail-on-missing-field: true
> >>>>>
> >>>>> Both are valid hierarchies.
> >>>>>
> >>>>> Regards,
> >>>>> Timo
> >>>>>
> >>>>>
> >>>>> On 30.04.20 17:57, Jark Wu wrote:
> >>>>>> Hi Dawid,
> >>>>>>
> >>>>>> I just want to mention one of your response,
> >>>>>>
> >>>>>>> What you described with
> >>>>>>> 'format' = 'csv',
> >>>>>>> 'csv.allow-comments' = 'true',
> >>>>>>> 'csv.ignore-parse-errors' = 'true'
> >>>>>>> would not work though as the `format` prefix is mandatory in the
> >>> sources
> >>>>>> as only the properties with format
> >>>>>>> will be passed to the format factory in majority of cases. We
> >>> already
> >>>>>> have some implicit contracts.
> >>>>>>
> >>>>>> IIUC, in FLIP-95 and FLIP-122, the property key style are totally
> >>> decided
> >>>>>> by connectors, not the framework.
> >>>>>> So I custom connector can define above properties, and extract the
> >>> value
> >>>>> of
> >>>>>> 'format', i.e. 'csv', to find the format factory.
> >>>>>> And extract the properties with `csv.` prefix and remove the
> >> prefix,
> >>> and
> >>>>>> pass the properties (e.g. 'allow-comments' = 'true')
> >>>>>> into the format factory to create format.
> >>>>>>
> >>>>>> So there is no a strict guarantee to have a "nested JSON style"
> >>>>> properties.
> >>>>>> Users can still develop a custom connector with this
> >>>>>> un-hierarchy properties and works well.
> >>>>>>
> >>>>>> 'format' = 'json',
> >>>>>> 'format.fail-on-missing-field' = 'false'
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>>
> >>>>>> On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <
> >>> dwysakow...@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I'd like to start with a comment that I am ok with the current
> >>> state of
> >>>>>>> the FLIP-122 if there is a strong preference for it.
> >> Nevertheless I
> >>>>> still
> >>>>>>> like the idea of adding `type` to the `format` to have it as
> >>>>> `format.type`
> >>>>>>> = `json`.
> >>>>>>>
> >>>>>>> I wanted to clarify a few things though:
> >>>>>>>
> >>>>>>> @Jingsong As far as I see it most of the users copy/paste the
> >>> properties
> >>>>>>> from the documentation to the SQL, so I don't think additional
> >> four
> >>>>>>> characters are too cumbersome. Plus if you force the additional
> >>> suffix
> >>>>> onto
> >>>>>>> all the options of a format you introduce way more boilerplate
> >>> than if
> >>>>> we
> >>>>>>> added the `type/kind/name`
> >>>>>>>
> >>>>>>> @Kurt I agree that we cannot force it, but I think it is more of
> >> a
> >>>>>>> question to set standards/implicit contracts on the properties.
> >>> What you
> >>>>>>> described with
> >>>>>>> 'format' = 'csv',
> >>>>>>> 'csv.allow-comments' = 'true',
> >>>>>>> 'csv.ignore-parse-errors' = 'true'
> >>>>>>>
> >>>>>>> would not work though as the `format` prefix is mandatory in the
> >>> sources
> >>>>>>> as only the properties with format will be passed to the format
> >>> factory
> >>>>> in
> >>>>>>> majority of cases. We already have some implicit contracts.
> >>>>>>>
> >>>>>>> @Forward I did not necessarily get the example. Aren't json and
> >>> bson two
> >>>>>>> separate formats? Do you mean you can have those two at the same
> >>> time?
> >>>>> Why
> >>>>>>> do you need to differentiate the options for each? The way I see
> >>> it is:
> >>>>>>>
> >>>>>>> ‘format(.name)' = 'json',
> >>>>>>> ‘format.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>> or
> >>>>>>>
> >>>>>>> ‘format(.name)' = 'bson',
> >>>>>>> ‘format.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>> @Benchao I'd be fine with any of name, kind, type(this we already
> >>> had in
> >>>>>>> the past)
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Dawid
> >>>>>>>
> >>>>>>> On 30/04/2020 04:17, Forward Xu wrote:
> >>>>>>>
> >>>>>>> Here I have a little doubt. At present, our json only supports
> >> the
> >>>>>>> conventional json format. If we need to implement json with bson,
> >>> json
> >>>>> with
> >>>>>>> avro, etc., how should we express it?
> >>>>>>> Do you need like the following:
> >>>>>>>
> >>>>>>> ‘format.name' = 'json',
> >>>>>>>
> >>>>>>> ‘format.json.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>>
> >>>>>>> ‘format.name' = 'bson',
> >>>>>>>
> >>>>>>> ‘format.bson.fail-on-missing-field' = ‘false'
> >>>>>>>
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Forward
> >>>>>>>
> >>>>>>> Benchao Li <libenc...@gmail.com> <libenc...@gmail.com>
> >>> 于2020年4月30日周四
> >>>>> 上午9:58写道:
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks Timo for staring the discussion.
> >>>>>>>
> >>>>>>> Generally I like the idea to keep the config align with a
> >> standard
> >>> like
> >>>>>>> json/yaml.
> >>>>>>>
> >>>>>>>  From the user's perspective, I don't use table configs from a
> >>> config
> >>>>> file
> >>>>>>> like yaml or json for now,
> >>>>>>> And it's ok to change it to yaml like style. Actually we didn't
> >>> know
> >>>>> that
> >>>>>>> this could be a yaml like
> >>>>>>> configuration hierarchy. If it has a hierarchy, we maybe consider
> >>> that
> >>>>> in
> >>>>>>> the future to load the
> >>>>>>> config from a yaml/json file.
> >>>>>>>
> >>>>>>> Regarding the name,
> >>>>>>> 'format.kind' looks fine to me. However there is another name
> >> from
> >>> the
> >>>>> top
> >>>>>>> of my head:
> >>>>>>> 'format.name', WDYT?
> >>>>>>>
> >>>>>>> Dawid Wysakowicz <dwysakow...@apache.org> <
> >> dwysakow...@apache.org>
> >>>>> 于2020年4月29日周三 下午11:56写道:
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I also wanted to share my opinion.
> >>>>>>>
> >>>>>>> When talking about a ConfigOption hierarchy we use for
> >> configuring
> >>> Flink
> >>>>>>> cluster I would be a strong advocate for keeping a
> >>> yaml/hocon/json/...
> >>>>>>> compatible style. Those options are primarily read from a file
> >> and
> >>> thus
> >>>>>>> should at least try to follow common practices for nested formats
> >>> if we
> >>>>>>> ever decide to switch to one.
> >>>>>>>
> >>>>>>> Here the question is about the properties we use in SQL
> >>> statements. The
> >>>>>>> origin/destination of these usually will be external catalog,
> >>> usually in
> >>>>>>>
> >>>>>>> a
> >>>>>>>
> >>>>>>> flattened(key/value) representation so I agree it is not as
> >>> important as
> >>>>>>>
> >>>>>>> in
> >>>>>>>
> >>>>>>> the aforementioned case. Nevertheless having a yaml based catalog
> >>> or
> >>>>>>>
> >>>>>>> being
> >>>>>>>
> >>>>>>> able to have e.g. yaml based snapshots of a catalog in my opinion
> >>> is
> >>>>>>> appealing. At the same time cost of being able to have a nice
> >>>>>>> yaml/hocon/json representation is just adding a single suffix to
> >> a
> >>>>>>> single(at most 2 key + value) property. The question is between
> >>> `format`
> >>>>>>>
> >>>>>>> =
> >>>>>>>
> >>>>>>> `json` vs `format.kind` = `json`. That said I'd be slighty in
> >>> favor of
> >>>>>>> doing it.
> >>>>>>>
> >>>>>>> Just to have a full picture. Both cases can be represented in
> >>> yaml, but
> >>>>>>> the difference is significant:
> >>>>>>> format: 'json'
> >>>>>>> format.option: 'value'
> >>>>>>>
> >>>>>>> vs
> >>>>>>> format:
> >>>>>>> kind: 'json'
> >>>>>>>
> >>>>>>> option: 'value'
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Dawid
> >>>>>>>
> >>>>>>> On 29/04/2020 17:13, Flavio Pompermaier wrote:
> >>>>>>>
> >>>>>>> Personally I don't have any preference here. Compliance wih
> >>> standard
> >>>>>>>
> >>>>>>> YAML
> >>>>>>>
> >>>>>>> parser is probably more important
> >>>>>>>
> >>>>>>> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <imj...@gmail.com> <
> >>>>> imj...@gmail.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>  From a user's perspective, I prefer the shorter one
> >> "format=json",
> >>>>>>>
> >>>>>>> because
> >>>>>>>
> >>>>>>> it's more concise and straightforward. The "kind" is redundant
> >> for
> >>>>>>>
> >>>>>>> users.
> >>>>>>>
> >>>>>>> Is there a real case requires to represent the configuration in
> >>> JSON
> >>>>>>> style?
> >>>>>>> As far as I can see, I don't see such requirement, and everything
> >>> works
> >>>>>>> fine by now.
> >>>>>>>
> >>>>>>> So I'm in favor of "format=json". But if the community insist to
> >>> follow
> >>>>>>> code style on this, I'm also fine with the longer one.
> >>>>>>>
> >>>>>>> Btw, I also CC user mailing list to listen more user's feedback.
> >>>>>>>
> >>>>>>> Because I
> >>>>>>>
> >>>>>>> think this is relative to usability.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jark
> >>>>>>>
> >>>>>>> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <
> >> ches...@apache.org>
> >>> <
> >>>>> ches...@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>> Therefore, should we advocate instead:
> >>>>>>>>
> >>>>>>>> 'format.kind' = 'json',
> >>>>>>>> 'format.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>> Yes. That's pretty much it.
> >>>>>>>
> >>>>>>> This is reasonable important to nail down as with such
> >> violations I
> >>>>>>> believe we could not actually switch to a standard YAML parser.
> >>>>>>>
> >>>>>>> On 29/04/2020 16:05, Timo Walther wrote:
> >>>>>>>
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> discussions around ConfigOption seem to be very popular recently.
> >>>>>>>
> >>>>>>> So I
> >>>>>>>
> >>>>>>> would also like to get some opinions on a different topic.
> >>>>>>>
> >>>>>>> How do we represent hierarchies in ConfigOption? In FLIP-122, we
> >>>>>>> agreed on the following DDL syntax:
> >>>>>>>
> >>>>>>> CREATE TABLE fs_table (
> >>>>>>> ...
> >>>>>>> ) WITH (
> >>>>>>> 'connector' = 'filesystem',
> >>>>>>> 'path' = 'file:///path/to/whatever',
> >>>>>>> 'format' = 'csv',
> >>>>>>> 'format.allow-comments' = 'true',
> >>>>>>> 'format.ignore-parse-errors' = 'true'
> >>>>>>> );
> >>>>>>>
> >>>>>>> Of course this is slightly different from regular Flink core
> >>>>>>> configuration but a connector still needs to be configured based
> >> on
> >>>>>>> these options.
> >>>>>>>
> >>>>>>> However, I think this FLIP violates our code style guidelines
> >>>>>>>
> >>>>>>> because
> >>>>>>>
> >>>>>>> 'format' = 'json',
> >>>>>>> 'format.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>> is an invalid hierarchy. `format` cannot be a string and a
> >>> top-level
> >>>>>>> object at the same time.
> >>>>>>>
> >>>>>>> We have similar problems in our runtime configuration:
> >>>>>>>
> >>>>>>> state.backend=
> >>>>>>> state.backend.incremental=
> >>>>>>> restart-strategy=
> >>>>>>> restart-strategy.fixed-delay.delay=
> >>>>>>> high-availability=
> >>>>>>> high-availability.cluster-id=
> >>>>>>>
> >>>>>>> The code style guide states "Think of the configuration as nested
> >>>>>>> objects (JSON style)". So such hierarchies cannot be represented
> >> in
> >>>>>>>
> >>>>>>> a
> >>>>>>>
> >>>>>>> nested JSON style.
> >>>>>>>
> >>>>>>> Therefore, should we advocate instead:
> >>>>>>>
> >>>>>>> 'format.kind' = 'json',
> >>>>>>> 'format.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Benchao Li
> >>>>>>> School of Electronics Engineering and Computer Science, Peking
> >>>>> UniversityTel:+86-15650713730
> >>>>>>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >
> >
>
>

Reply via email to