Thanks all for the discussion, I have updated FLIP-105 and FLIP-122 to use the new format option keys.
FLIP-105: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427289 FLIP-122: https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory Best, Jark On Wed, 6 May 2020 at 20:37, Timo Walther <twal...@apache.org> wrote: > Cool, so let's go with: > > format = json > json.fail-on-missing-field = true > json.ignore-parse-error = true > > value.format = json > value.json.fail-on-missing-field = true > value.json.ignore-parse-error = true > > Regards, > Timo > > > On 06.05.20 12:39, Jingsong Li wrote: > > Hi, > > > > +1 to: > > format = parquet > > parquet.compression = ... > > parquet.block.size = ... > > parquet.page.size = ... > > > > For the formats like parquet and orc, > > Not just Flink itself, but this way also let Flink keys compatible with > the > > property keys of Hadoop / Hive / Spark. > > > > And like Jark said, this way works for Kafka key value too. > > > > Best, > > Jingsong Lee > > > > On Wed, May 6, 2020 at 6:19 PM Jark Wu <imj...@gmail.com> wrote: > > > >> Hi, > >> > >> I think Timo proposed a good idea to make both side happy. That is: > >> > >> format = json > >> json.fail-on-missing-field = true > >> json.ignore-parse-error = true > >> > >> value.format = json > >> value.json.fail-on-missing-field = true > >> value.json.ignore-parse-error = true > >> > >> This is a valid hierarchies. Besides, it's more clear that the option > >> belongs to a specific component (i.e. json). > >> This will be more readable when we introducing more formats, e.g. > parquet. > >> > >> format = parquet > >> parquet.compression = ... > >> parquet.block.size = ... > >> parquet.page.size = ... > >> > >> is more readable than current style: > >> > >> format = parquet > >> format.compression = ... > >> format.block.size = ... > >> format.page.size = ... > >> > >> To sum up, I'm +1 to use "format = json", "json.fail-on-missing-field = > >> true". > >> > >> Best, > >> Jark > >> > >> On Wed, 6 May 2020 at 17:12, Danny Chan <yuzhao....@gmail.com> wrote: > >> > >>> Hi, everyone ~ > >>> > >>> Allows me to share some thoughts here. > >>> > >>> Personally i think for SQL, "format" is obviously better than " > >> format.name", > >>> it is more concise and straight-forward, similar with Presto FORMAT[2] > >> and > >>> KSQL VALUE_FORMAT[1]; i think we move from "connector.type" to > >> "connector" > >>> for the same reason, the "type" or "name" suffix is implicit, SQL > syntax > >>> like the DDL is a top-level user API, so from my side keeping good > >>> user-friendly syntax is more important. > >>> > >>> @Timo I'm big +1 for the a good code style guide, but that does not > mean > >>> we should go for a json-style table options in the DDL, the DDL could > >> have > >>> its own contract. Can we move "represent these config options in YAML" > to > >>> another topic ? Otherwise, how should we handle the "connector" key, > >> should > >>> we prefix all the table options with "connector" ? The original > inention > >> of > >>> FLIP-122 is to remove some redundant prefix/suffix of the table options > >>> because they are obviously implicit there, and the "connector." prefix > >> and > >>> the ".type" or ".name" suffix are the ones we most want to delete. > >>> > >>> @Dawid Although ".type" is just another 4 characters, but we force the > >> SQL > >>> users to do the thing that is obvious reduadant, i know serialize > catalog > >>> table to YAML or use the options in DataStream has similar keys > request, > >>> but they are different use cases that i believe many SQL user would not > >>> encounter, that means we force many users to obey rules for cases they > >>> would never have. > >>> > >>> > >>> [1] https://docs.ksqldb.io/en/latest/developer-guide/create-a-table/ > >>> [2] https://prestodb.io/docs/current/sql/create-table.html > >>> > >>> Best, > >>> Danny Chan > >>> 在 2020年5月4日 +0800 PM11:34,Till Rohrmann <trohrm...@apache.org>,写道: > >>>> Hi everyone, > >>>> > >>>> I like Timo's proposal to organize our configuration more hierarchical > >>>> since this is what the coding guide specifies. The benefit I see is > >> that > >>>> config options belonging to the same concept will be found in the same > >>>> nested object. Moreover, it will be possible to split the > configuration > >>>> into unrelated parts which are fed to the respective components. That > >> way > >>>> one has a much better separation of concern since component A cannot > >> read > >>>> the configuration of component B. > >>>> > >>>> Concerning Timo's last two proposals: > >>>> > >>>> If fail-on-missing-field is a common configuration shared by all > >> formats, > >>>> then I would go with the first option: > >>>> > >>>> format.kind: json > >>>> format.fail-on-missing-field: true > >>>> > >>>> If fail-on-missing-field is specific for json, then one could go with > >>>> > >>>> format: json > >>>> json.fail-on-missing-field: true > >>>> > >>>> or > >>>> > >>>> format.kind: json > >>>> format.json.fail-on-missing-field: true > >>>> > >>>> Cheers, > >>>> Till > >>>> > >>>> > >>>> On Fri, May 1, 2020 at 11:55 AM Timo Walther <twal...@apache.org> > >> wrote: > >>>> > >>>>> Hi Jark, > >>>>> > >>>>> yes, in theory every connector can design options as they like. But > >> for > >>>>> user experience and good coding style we should be consistent in > >> Flink > >>>>> connectors and configuration. Because implementers of new connectors > >>>>> will copy the design of existing ones. > >>>>> > >>>>> Furthermore, I could image that people in the DataStream API would > >> also > >>>>> like to configure their connector based on options in the near > >> future. > >>>>> It might be the case that Flink DataStream API connectors will reuse > >>> the > >>>>> ConfigOptions from Table API for consistency. > >>>>> > >>>>> I'm favoring either: > >>>>> > >>>>> format.kind = json > >>>>> format.fail-on-missing-field: true > >>>>> > >>>>> Or: > >>>>> > >>>>> format = json > >>>>> json.fail-on-missing-field: true > >>>>> > >>>>> Both are valid hierarchies. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> > >>>>> On 30.04.20 17:57, Jark Wu wrote: > >>>>>> Hi Dawid, > >>>>>> > >>>>>> I just want to mention one of your response, > >>>>>> > >>>>>>> What you described with > >>>>>>> 'format' = 'csv', > >>>>>>> 'csv.allow-comments' = 'true', > >>>>>>> 'csv.ignore-parse-errors' = 'true' > >>>>>>> would not work though as the `format` prefix is mandatory in the > >>> sources > >>>>>> as only the properties with format > >>>>>>> will be passed to the format factory in majority of cases. We > >>> already > >>>>>> have some implicit contracts. > >>>>>> > >>>>>> IIUC, in FLIP-95 and FLIP-122, the property key style are totally > >>> decided > >>>>>> by connectors, not the framework. > >>>>>> So I custom connector can define above properties, and extract the > >>> value > >>>>> of > >>>>>> 'format', i.e. 'csv', to find the format factory. > >>>>>> And extract the properties with `csv.` prefix and remove the > >> prefix, > >>> and > >>>>>> pass the properties (e.g. 'allow-comments' = 'true') > >>>>>> into the format factory to create format. > >>>>>> > >>>>>> So there is no a strict guarantee to have a "nested JSON style" > >>>>> properties. > >>>>>> Users can still develop a custom connector with this > >>>>>> un-hierarchy properties and works well. > >>>>>> > >>>>>> 'format' = 'json', > >>>>>> 'format.fail-on-missing-field' = 'false' > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> > >>>>>> On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz < > >>> dwysakow...@apache.org> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I'd like to start with a comment that I am ok with the current > >>> state of > >>>>>>> the FLIP-122 if there is a strong preference for it. > >> Nevertheless I > >>>>> still > >>>>>>> like the idea of adding `type` to the `format` to have it as > >>>>> `format.type` > >>>>>>> = `json`. > >>>>>>> > >>>>>>> I wanted to clarify a few things though: > >>>>>>> > >>>>>>> @Jingsong As far as I see it most of the users copy/paste the > >>> properties > >>>>>>> from the documentation to the SQL, so I don't think additional > >> four > >>>>>>> characters are too cumbersome. Plus if you force the additional > >>> suffix > >>>>> onto > >>>>>>> all the options of a format you introduce way more boilerplate > >>> than if > >>>>> we > >>>>>>> added the `type/kind/name` > >>>>>>> > >>>>>>> @Kurt I agree that we cannot force it, but I think it is more of > >> a > >>>>>>> question to set standards/implicit contracts on the properties. > >>> What you > >>>>>>> described with > >>>>>>> 'format' = 'csv', > >>>>>>> 'csv.allow-comments' = 'true', > >>>>>>> 'csv.ignore-parse-errors' = 'true' > >>>>>>> > >>>>>>> would not work though as the `format` prefix is mandatory in the > >>> sources > >>>>>>> as only the properties with format will be passed to the format > >>> factory > >>>>> in > >>>>>>> majority of cases. We already have some implicit contracts. > >>>>>>> > >>>>>>> @Forward I did not necessarily get the example. Aren't json and > >>> bson two > >>>>>>> separate formats? Do you mean you can have those two at the same > >>> time? > >>>>> Why > >>>>>>> do you need to differentiate the options for each? The way I see > >>> it is: > >>>>>>> > >>>>>>> ‘format(.name)' = 'json', > >>>>>>> ‘format.fail-on-missing-field' = 'false' > >>>>>>> > >>>>>>> or > >>>>>>> > >>>>>>> ‘format(.name)' = 'bson', > >>>>>>> ‘format.fail-on-missing-field' = 'false' > >>>>>>> > >>>>>>> @Benchao I'd be fine with any of name, kind, type(this we already > >>> had in > >>>>>>> the past) > >>>>>>> > >>>>>>> Best, > >>>>>>> Dawid > >>>>>>> > >>>>>>> On 30/04/2020 04:17, Forward Xu wrote: > >>>>>>> > >>>>>>> Here I have a little doubt. At present, our json only supports > >> the > >>>>>>> conventional json format. If we need to implement json with bson, > >>> json > >>>>> with > >>>>>>> avro, etc., how should we express it? > >>>>>>> Do you need like the following: > >>>>>>> > >>>>>>> ‘format.name' = 'json', > >>>>>>> > >>>>>>> ‘format.json.fail-on-missing-field' = 'false' > >>>>>>> > >>>>>>> > >>>>>>> ‘format.name' = 'bson', > >>>>>>> > >>>>>>> ‘format.bson.fail-on-missing-field' = ‘false' > >>>>>>> > >>>>>>> > >>>>>>> Best, > >>>>>>> > >>>>>>> Forward > >>>>>>> > >>>>>>> Benchao Li <libenc...@gmail.com> <libenc...@gmail.com> > >>> 于2020年4月30日周四 > >>>>> 上午9:58写道: > >>>>>>> > >>>>>>> > >>>>>>> Thanks Timo for staring the discussion. > >>>>>>> > >>>>>>> Generally I like the idea to keep the config align with a > >> standard > >>> like > >>>>>>> json/yaml. > >>>>>>> > >>>>>>> From the user's perspective, I don't use table configs from a > >>> config > >>>>> file > >>>>>>> like yaml or json for now, > >>>>>>> And it's ok to change it to yaml like style. Actually we didn't > >>> know > >>>>> that > >>>>>>> this could be a yaml like > >>>>>>> configuration hierarchy. If it has a hierarchy, we maybe consider > >>> that > >>>>> in > >>>>>>> the future to load the > >>>>>>> config from a yaml/json file. > >>>>>>> > >>>>>>> Regarding the name, > >>>>>>> 'format.kind' looks fine to me. However there is another name > >> from > >>> the > >>>>> top > >>>>>>> of my head: > >>>>>>> 'format.name', WDYT? > >>>>>>> > >>>>>>> Dawid Wysakowicz <dwysakow...@apache.org> < > >> dwysakow...@apache.org> > >>>>> 于2020年4月29日周三 下午11:56写道: > >>>>>>> > >>>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I also wanted to share my opinion. > >>>>>>> > >>>>>>> When talking about a ConfigOption hierarchy we use for > >> configuring > >>> Flink > >>>>>>> cluster I would be a strong advocate for keeping a > >>> yaml/hocon/json/... > >>>>>>> compatible style. Those options are primarily read from a file > >> and > >>> thus > >>>>>>> should at least try to follow common practices for nested formats > >>> if we > >>>>>>> ever decide to switch to one. > >>>>>>> > >>>>>>> Here the question is about the properties we use in SQL > >>> statements. The > >>>>>>> origin/destination of these usually will be external catalog, > >>> usually in > >>>>>>> > >>>>>>> a > >>>>>>> > >>>>>>> flattened(key/value) representation so I agree it is not as > >>> important as > >>>>>>> > >>>>>>> in > >>>>>>> > >>>>>>> the aforementioned case. Nevertheless having a yaml based catalog > >>> or > >>>>>>> > >>>>>>> being > >>>>>>> > >>>>>>> able to have e.g. yaml based snapshots of a catalog in my opinion > >>> is > >>>>>>> appealing. At the same time cost of being able to have a nice > >>>>>>> yaml/hocon/json representation is just adding a single suffix to > >> a > >>>>>>> single(at most 2 key + value) property. The question is between > >>> `format` > >>>>>>> > >>>>>>> = > >>>>>>> > >>>>>>> `json` vs `format.kind` = `json`. That said I'd be slighty in > >>> favor of > >>>>>>> doing it. > >>>>>>> > >>>>>>> Just to have a full picture. Both cases can be represented in > >>> yaml, but > >>>>>>> the difference is significant: > >>>>>>> format: 'json' > >>>>>>> format.option: 'value' > >>>>>>> > >>>>>>> vs > >>>>>>> format: > >>>>>>> kind: 'json' > >>>>>>> > >>>>>>> option: 'value' > >>>>>>> > >>>>>>> Best, > >>>>>>> Dawid > >>>>>>> > >>>>>>> On 29/04/2020 17:13, Flavio Pompermaier wrote: > >>>>>>> > >>>>>>> Personally I don't have any preference here. Compliance wih > >>> standard > >>>>>>> > >>>>>>> YAML > >>>>>>> > >>>>>>> parser is probably more important > >>>>>>> > >>>>>>> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <imj...@gmail.com> < > >>>>> imj...@gmail.com> wrote: > >>>>>>> > >>>>>>> > >>>>>>> From a user's perspective, I prefer the shorter one > >> "format=json", > >>>>>>> > >>>>>>> because > >>>>>>> > >>>>>>> it's more concise and straightforward. The "kind" is redundant > >> for > >>>>>>> > >>>>>>> users. > >>>>>>> > >>>>>>> Is there a real case requires to represent the configuration in > >>> JSON > >>>>>>> style? > >>>>>>> As far as I can see, I don't see such requirement, and everything > >>> works > >>>>>>> fine by now. > >>>>>>> > >>>>>>> So I'm in favor of "format=json". But if the community insist to > >>> follow > >>>>>>> code style on this, I'm also fine with the longer one. > >>>>>>> > >>>>>>> Btw, I also CC user mailing list to listen more user's feedback. > >>>>>>> > >>>>>>> Because I > >>>>>>> > >>>>>>> think this is relative to usability. > >>>>>>> > >>>>>>> Best, > >>>>>>> Jark > >>>>>>> > >>>>>>> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler < > >> ches...@apache.org> > >>> < > >>>>> ches...@apache.org> > >>>>>>> wrote: > >>>>>>> > >>>>>>> > >>>>>>>> Therefore, should we advocate instead: > >>>>>>>> > >>>>>>>> 'format.kind' = 'json', > >>>>>>>> 'format.fail-on-missing-field' = 'false' > >>>>>>> > >>>>>>> Yes. That's pretty much it. > >>>>>>> > >>>>>>> This is reasonable important to nail down as with such > >> violations I > >>>>>>> believe we could not actually switch to a standard YAML parser. > >>>>>>> > >>>>>>> On 29/04/2020 16:05, Timo Walther wrote: > >>>>>>> > >>>>>>> Hi everyone, > >>>>>>> > >>>>>>> discussions around ConfigOption seem to be very popular recently. > >>>>>>> > >>>>>>> So I > >>>>>>> > >>>>>>> would also like to get some opinions on a different topic. > >>>>>>> > >>>>>>> How do we represent hierarchies in ConfigOption? In FLIP-122, we > >>>>>>> agreed on the following DDL syntax: > >>>>>>> > >>>>>>> CREATE TABLE fs_table ( > >>>>>>> ... > >>>>>>> ) WITH ( > >>>>>>> 'connector' = 'filesystem', > >>>>>>> 'path' = 'file:///path/to/whatever', > >>>>>>> 'format' = 'csv', > >>>>>>> 'format.allow-comments' = 'true', > >>>>>>> 'format.ignore-parse-errors' = 'true' > >>>>>>> ); > >>>>>>> > >>>>>>> Of course this is slightly different from regular Flink core > >>>>>>> configuration but a connector still needs to be configured based > >> on > >>>>>>> these options. > >>>>>>> > >>>>>>> However, I think this FLIP violates our code style guidelines > >>>>>>> > >>>>>>> because > >>>>>>> > >>>>>>> 'format' = 'json', > >>>>>>> 'format.fail-on-missing-field' = 'false' > >>>>>>> > >>>>>>> is an invalid hierarchy. `format` cannot be a string and a > >>> top-level > >>>>>>> object at the same time. > >>>>>>> > >>>>>>> We have similar problems in our runtime configuration: > >>>>>>> > >>>>>>> state.backend= > >>>>>>> state.backend.incremental= > >>>>>>> restart-strategy= > >>>>>>> restart-strategy.fixed-delay.delay= > >>>>>>> high-availability= > >>>>>>> high-availability.cluster-id= > >>>>>>> > >>>>>>> The code style guide states "Think of the configuration as nested > >>>>>>> objects (JSON style)". So such hierarchies cannot be represented > >> in > >>>>>>> > >>>>>>> a > >>>>>>> > >>>>>>> nested JSON style. > >>>>>>> > >>>>>>> Therefore, should we advocate instead: > >>>>>>> > >>>>>>> 'format.kind' = 'json', > >>>>>>> 'format.fail-on-missing-field' = 'false' > >>>>>>> > >>>>>>> What do you think? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Timo > >>>>>>> > >>>>>>> [1] > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>> > >> > https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes > >>>>>>> > >>>>>>> -- > >>>>>>> > >>>>>>> Benchao Li > >>>>>>> School of Electronics Engineering and Computer Science, Peking > >>>>> UniversityTel:+86-15650713730 > >>>>>>> Email: libenc...@gmail.com; libenc...@pku.edu.cn > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>> > >> > > > > > >