Hi all, I like the overall design of the FLIP.
As for the withstanding concerns. I kind of like the approach to put the version into the factory identifier. I think it's the cleanest way to say that this version actually applies to the connector itself and not to the system it connects to. BTW, I think the outcome of this discussion will affect interfaces described in FLIP-95. If we put the version into the functionIdentifier, do we need Factory#factoryVersion? I also think it does make sense to have a versioning for the properties as well. Are we able to read all the current properties with the new factories? I think we could use the "connector.property-version" to alternate between different Factory interfaces to support the old set of properties. Otherwise the new factories need to understand both set of properties, don't they? Best, Dawid On 30/03/2020 17:07, Timo Walther wrote: > Hi Jark, > > thanks for the FLIP. I don't have a strong opinion on > DynamicTableFactory#factoryVersion() for distiguishing connector > versions. We can also include it in the factory identifier itself. For > Kafka, I would then just use `kafka` because the naming "universal" > was just a helper solution at that time. > > What we need is a good guide for how to design the options. We should > include that in the JavaDoc of factory interface itself, once it is > in-place. I like the difference between source and sink in properties. > > Regarding "connector.property-version": > > It was meant for backwards compatibility along the dimension of > "property design" compared to "connector version". However, since the > connector is now responsible for parsing all properties, it is not as > necessary as it was before. > > However, a change of the property design as it is done in FLIP-107 > becomes more difficult without a property version and versioning is > always a good idea in API world. > > Regards, > Timo > > > On 30.03.20 16:38, Benchao Li wrote: >> Hi Jark, >> >> Thanks for starting the discussion. The FLIP LGTM generally. >> >> Regarding connector version VS put version into connector's name, >> I favor the latter personally, using one property to locate a >> connector can make the error log more precise. >> From the users' side, using one property to match a connector will >> be easier. Especially we have many connectors, >> and some of the need version property required, and some of them not. >> >> Regarding Jingsong's suggestion, >> IMO, it's a very good complement to the FLIP. Distinguishing >> properties for source and sink can be very useful, and >> also this will make the connector property more precise. >> We are also sick of this for now, we cannot know whether a DDL is a >> source or sink unless we look through all queries where >> the table is used. >> Even more, some of the required properties are only required for >> source, bug we cannot leave it blank for sink, and vice versa. >> I think we can also add a type for dimension tables except source and >> sink. >> >> Kurt Young <[email protected] <mailto:[email protected]>> 于2020年3月30日 >> 周一 下午8:16写道: >> >> > It's not possible for the framework to throw such exception. >> Because the >> framework doesn't know what versions do the connector support. >> >> Not really, we can list all valid connectors framework could >> found. E.g. >> user mistyped 'kafka-0.x', the error message will looks like: >> >> we don't have any connector named "kafka-0.x", but we have: >> FileSystem >> Kafka-0.10 >> Kafka-0.11 >> ElasticSearch6 >> ElasticSearch7 >> >> Best, >> Kurt >> >> >> On Mon, Mar 30, 2020 at 5:11 PM Jark Wu <[email protected] >> <mailto:[email protected]>> wrote: >> >> > Hi Kurt, >> > >> > > 2) Lists all available connectors seems also quite >> straightforward, e.g >> > user provided a wrong "kafka-0.8", we tell user we have >> candidates of >> > "kakfa-0.11", "kafka-universal" >> > It's not possible for the framework to throw such exception. >> Because the >> > framework doesn't know what versions do the connector support. >> All the >> > version information is a blackbox in the identifier. But with >> > `Factory#factoryVersion()` interface, we can know all the >> supported >> > versions. >> > >> > > 3) I don't think so. We can still treat it as the same >> connector but with >> > different versions. >> > That's true but that's weird. Because from the plain DDL >> definition, they >> > look like different connectors with different "connector" >> value, e.g. >> > 'connector=kafka-0.8', 'connector=kafka-0.10'. >> > >> > > If users don't set any version, we will use "kafka-universal" >> instead. >> > The behavior is inconsistent IMO. >> > That is a long term vision when there is no kafka clusters >> with <0.11 >> > version. >> > At that point, "universal" is the only supported version in Flink >> and the >> > "version" key can be optional. >> > >> > --------------------------------------------- >> > >> > Hi Jingsong, >> > >> > > "version" vs "kafka.version" >> > I though about it. But if we prefix "kafka" to version, we should >> prefix >> > "kafka" for all other property keys, because they are all kafka >> specific >> > options. >> > However, that will make the property set verbose, see rejected >> option#2 in >> > the FLIP. >> > >> > > explicitly separate options for source and sink >> > That's a good topic. It's good to have a guideline for the new >> property >> > keys. >> > I'm fine to prefix with a "source"/"sink" for some connector >> keys. >> > Actually, we already do this in some connectors, e.g. jdbc and >> hbase. >> > >> > Best, >> > Jark >> > >> > On Mon, 30 Mar 2020 at 16:36, Jingsong Li <[email protected] >> <mailto:[email protected]>> wrote: >> > >> > > Thanks Jark for the proposal. >> > > >> > > +1 to the general idea. >> > > >> > > For "version", what about "kafka.version"? It is obvious to >> know its >> > > meaning. >> > > >> > > And I'd like to start a new topic: >> > > Should we need to explicitly separate source from sink? >> > > With the development of batch and streaming, more and more >> connectors >> > have >> > > both source and sink. >> > > >> > > So should we set a rule for table properties: >> > > - properties for both source and sink: without prefix, like >> "topic" >> > > - properties for source only: with "source." prefix, like >> > > "source.startup-mode" >> > > - properties for sink only: with "sink." prefix, like >> "sink.partitioner" >> > > >> > > What do you think? >> > > >> > > Best, >> > > Jingsong Lee >> > > >> > > On Mon, Mar 30, 2020 at 3:56 PM Jark Wu <[email protected] >> <mailto:[email protected]>> wrote: >> > > >> > > > Hi Kurt, >> > > > >> > > > That's good questions. >> > > > >> > > > > the meaning of "version" >> > > > There are two versions in the old design. One is property >> version >> > > > "connector.property-version" which can be used for backward >> > > compatibility. >> > > > The other one is "connector.version" which defines the >> version of >> > > external >> > > > system, e.g. 0.11" for kafka, "6" or "7" for ES. >> > > > In this proposal, the "version" is the previous >> "connector.version". >> > The >> > > > ""connector.property-version" is not introduced in new >> design. >> > > > >> > > > > how to keep the old capability which can evolve connector >> properties >> > > > The "connector.property-version" is an optional key in the >> old design >> > and >> > > > is never bumped up. >> > > > I'm not sure how "connector.property-version" should work >> in the >> > initial >> > > > design. Maybe @Timo Walther <[email protected] >> <mailto:[email protected]>> has more knowledge on >> > > > this. >> > > > But for the new properties, every options should be >> expressed as >> > > > `ConfigOption` which provides `withDeprecatedKeys(...)` >> method to >> > easily >> > > > support evolving keys. >> > > > >> > > > > a single keys instead of two, e.g. "kafka-0.11", >> "kafka-universal"? >> > > > There are several benefit to use separate "version" key I can >> see: >> > > > 1) it's more readable to separete them into different keys, >> because >> > they >> > > > are orthogonal concepts. >> > > > 2) the planner can give all the availble versions in the >> exception >> > > message, >> > > > if user uses a wrong version (this is often reported in >> user ML). >> > > > 3) If we use "kafka-0.11" as connector identifier, we may >> have to >> > write a >> > > > full documentation for each version, because they are >> different >> > > > "connector"? >> > > > IMO, for 0.11, 0.11, etc... kafka, they are actually >> the same >> > > connector >> > > > but with different "client jar" version, >> > > > they share all the same supported property keys and >> should reside >> > > > together. >> > > > 4) IMO, the future vision is version-free. At some point in >> the future, >> > > we >> > > > may don't need users to specify kafka version anymore, and >> make >> > > > "version=universal" as optional or removed in the future. >> This is can >> > be >> > > > done easily if they are separate keys. >> > > > >> > > > Best, >> > > > Jark >> > > > >> > > > >> > > > On Mon, 30 Mar 2020 at 14:45, Kurt Young <[email protected] >> <mailto:[email protected]>> wrote: >> > > > >> > > > > Hi Jark, >> > > > > >> > > > > Thanks for the proposal, I'm +1 to the general idea. >> However I have a >> > > > > question about "version", >> > > > > in the old design, the version seems to be aimed for >> tracking >> > property >> > > > > version, with different >> > > > > version, we could evolve these step by step without >> breaking backward >> > > > > compatibility. But in this >> > > > > design, version is representing external system's >> version, like >> > "0.11" >> > > > for >> > > > > kafka, "6" or "7" for ES. >> > > > > I'm not sure if this is necessary, what's the benefit of >> using two >> > keys >> > > > > instead of one, like "kafka-0.11" >> > > > > or "ES6" directly? And how about the old capability which >> could let >> > us >> > > > > evolving connector properties? >> > > > > >> > > > > Best, >> > > > > Kurt >> > > > > >> > > > > >> > > > > On Mon, Mar 30, 2020 at 2:36 PM LakeShen >> <[email protected] <mailto:[email protected]>> >> > > > > wrote: >> > > > > >> > > > > > Hi Jark, >> > > > > > >> > > > > > I am really looking forward to this feature. I think this >> feature >> > > > > > could simplify flink sql code,and at the same time , >> > > > > > it could make the developer more easlier to config the >> flink sql >> > WITH >> > > > > > options. >> > > > > > >> > > > > > Now when I am using flink sql to write flink task , >> sometimes I >> > think >> > > > the >> > > > > > WITH options is too long for user. >> > > > > > For example,I config the kafka source connector >> parameter,for >> > > consumer >> > > > > > group and brokers parameter: >> > > > > > >> > > > > > 'connector.properties.0.key' = 'group.id >> <http://group.id>' >> > > > > > > , 'connector.properties.0.value' = 'xxx' >> > > > > > > , 'connector.properties.1.key' = >> 'bootstrap.servers' >> > > > > > > , 'connector.properties.1.value' = 'xxxxx' >> > > > > > > >> > > > > > >> > > > > > I can understand this config , but for the flink fresh >> man,maybe it >> > > > > > is confused for him. >> > > > > > In my thought, I am really looking forward to this >> feature,thank >> > you >> > > to >> > > > > > propose this feature. >> > > > > > >> > > > > > Best wishes, >> > > > > > LakeShen >> > > > > > >> > > > > > >> > > > > > Jark Wu <[email protected] <mailto:[email protected]>> 于 >> 2020年3月30日周一 下午2:02写道: >> > > > > > >> > > > > > > Hi everyone, >> > > > > > > >> > > > > > > I want to start a discussion about further improve and >> simplify >> > our >> > > > > > current >> > > > > > > connector porperty keys, aka WITH options. Currently, >> we have a >> > > > > > > 'connector.' prefix for many properties, but they are >> verbose, >> > and >> > > we >> > > > > > see a >> > > > > > > big inconsistency between the properties when designing >> FLIP-107. >> > > > > > > >> > > > > > > So we propose to remove all the 'connector.' prefix and >> rename >> > > > > > > 'connector.type' to 'connector', 'format.type' to >> 'format'. So a >> > > new >> > > > > > Kafka >> > > > > > > DDL may look like this: >> > > > > > > >> > > > > > > CREATE TABLE kafka_table ( >> > > > > > > ... >> > > > > > > ) WITH ( >> > > > > > > 'connector' = 'kafka', >> > > > > > > 'version' = '0.10', >> > > > > > > 'topic' = 'test-topic', >> > > > > > > 'startup-mode' = 'earliest-offset', >> > > > > > > 'properties.bootstrap.servers' = 'localhost:9092', >> > > > > > > 'properties.group.id <http://properties.group.id>' = >> 'testGroup', >> > > > > > > 'format' = 'json', >> > > > > > > 'format.fail-on-missing-field' = 'false' >> > > > > > > ); >> > > > > > > >> > > > > > > The new connector property key set will come together >> with new >> > > > Factory >> > > > > > > inferface which is proposed in FLIP-95. Old properties >> are still >> > > > > > compatible >> > > > > > > with their existing implementation. New properties >> are only >> > > available >> > > > > in >> > > > > > > new DynamicTableFactory implementations. >> > > > > > > >> > > > > > > You can access the detailed FLIP here: >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory >> > > > > > > >> > > > > > > Best, >> > > > > > > Jark >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > -- >> > > Best, Jingsong Lee >> > > >> > >> >> >> >> -- >> >> Benchao Li >> School of Electronics Engineering and Computer Science, Peking >> University >> Tel:+86-15650713730 >> Email:[email protected] >> <mailto:[email protected]>;[email protected] >> <mailto:[email protected]> >> >
signature.asc
Description: OpenPGP digital signature
