Sorry, i meant white-list ~ Danny Chan <danny0...@apache.org> 于2020年3月27日周五 下午12:40写道:
> Thanks everyone for the feedback ~ > > - For the global config option belongs to `ExecutionConfigOptions` or > `OptimizerConfigOptions`, i have to strong objections, switch > to `OptimizerConfigOptions` is okey to me and i have updated the WIKI > - For use while-list or black-list, i have opinion with Timo, so black-list > > I would fire a Vote if there are no other objections soon, thanks ~ > > Timo Walther <twal...@apache.org> 于2020年3月26日周四 下午6:31写道: > >> Hi everyone, >> >> it is not only about security concerns. Hint options should be >> well-defined. We had a couple of people that were concerned about >> changing the semantics with a concept that is called "hint". These >> options are more like "debugging options" while someone is developing a >> connector or using a notebook to quickly produce some rows. >> >> The final pipeline should use a temporary table instead. I suggest to >> use a whitelist and force people to think about what should be exposed >> as a hint. By default, no option should be exposed. It is better to be >> conservative here. >> >> Regards, >> Timo >> >> >> On 26.03.20 10:31, Danny Chan wrote: >> > Thanks Kurt for the suggestion ~ >> > >> > In my opinion: >> > - There is no need for TableFormatFactory#supportedHintOptions because >> all >> > the format options can be configured dynamically, they have no security >> > issues >> > - Dynamic table options is not an optimization, it is more like an >> > execution behavior from my side >> > >> > Kurt Young <ykt...@gmail.com> 于2020年3月26日周四 下午4:47写道: >> > >> >> Hi Danny, >> >> >> >> Thanks for the updates. I have 2 comments regarding to latest document: >> >> >> >> 1) I think we also need `*supportedHintOptions*` for >> >> `*TableFormatFactory*` >> >> 2) IMO "dynamic-table-options.enabled" should belong to ` >> >> *OptimizerConfigOptions*` >> >> >> >> Best, >> >> Kurt >> >> >> >> >> >> On Thu, Mar 26, 2020 at 4:40 PM Timo Walther <twal...@apache.org> >> wrote: >> >> >> >>> Thanks for the update Danny. +1 for this proposal. >> >>> >> >>> Regards, >> >>> Timo >> >>> >> >>> On 26.03.20 04:51, Danny Chan wrote: >> >>>> Thanks everyone who engaged in this discussion ~ >> >>>> >> >>>> Our goal is "Supports Dynamic Table Options for Flink SQL". After an >> >>>> offline discussion with Kurt, Timo and Dawid, we have made the final >> >>>> conclusion, here is the summary: >> >>>> >> >>>> >> >>>> - Use comment style syntax to specify the dynamic table options: >> >> "/*+ >> >>>> *OPTIONS*(k1='v1', k2='v2') */" >> >>>> - Have constraint on the options keys: the options that may >> bring >> >> in >> >>>> security problems should not be allowed, i.e. Kafka connector >> >>> zookeeper >> >>>> endpoint URL and topic name >> >>>> - Use white-list to control the allowed options for each >> connector, >> >>>> which is more safe for future extention >> >>>> - We allow to enable/disable this feature globally >> >>>> - Implement based on the current code base first, and when >> FLIP-95 >> >> is >> >>>> checked in, implement this feature based on new interface >> >>>> >> >>>> Any suggestions are appreciated ~ >> >>>> >> >>>> [1] >> >>>> >> >>> >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL >> >>>> >> >>>> Best, >> >>>> Danny Chan >> >>>> >> >>>> Jark Wu <imj...@gmail.com> 于2020年3月18日周三 下午10:38写道: >> >>>> >> >>>>> Hi everyone, >> >>>>> >> >>>>> Sorry, but I'm not sure about the `supportedHintOptions`. I'm afraid >> >> it >> >>>>> doesn't solve the problems but increases some development and >> learning >> >>>>> burdens. >> >>>>> >> >>>>> # increase development and learning burden >> >>>>> >> >>>>> According to the discussion so far, we want to support overriding a >> >>> subset >> >>>>> of options in hints which doesn't affect semantics. >> >>>>> With the `supportedHintOptions`, it's up to the connector developers >> >> to >> >>>>> decide which options will not affect semantics, and to be hint >> >> options. >> >>>>> However, the question is how to distinguish whether an option will >> >>> *affect >> >>>>> semantics*? What happens if an option will affect semantics but >> >>> provided as >> >>>>> hint options? >> >>>>> From my point of view, it's not easy to distinguish. For example, >> the >> >>>>> "format.ignore-parse-error" can be a very useful dynamic option but >> >> that >> >>>>> will affect semantic, because the result is different (null vs >> >>> exception). >> >>>>> Another example, the "connector.lookup.cache.*" options are also >> very >> >>>>> useful to tune jobs, however, it will also affect the job results. I >> >> can >> >>>>> come up many more useful options but may affect semantics. >> >>>>> >> >>>>> I can see that the community will under endless discussion around >> "can >> >>> this >> >>>>> option to be a hint option?", "wether this option will affect >> >>> semantics?". >> >>>>> You can also find that we already have different opinions on >> >>>>> "ignore-parse-error". Those discussion is a waste of time! That's >> not >> >>> what >> >>>>> users want! >> >>>>> The problem is user need this, this, this options and HOW to expose >> >>> them? >> >>>>> We should focus on that. >> >>>>> >> >>>>> Then there could be two endings in the future: >> >>>>> 1) compromise on the usability, we drop the rule that hints don't >> >> affect >> >>>>> semantics, allow all the useful options in the hints list. >> >>>>> 2) stick on the rule, users will find this is a stumbling feature >> >> which >> >>>>> doesn't solve their problems. >> >>>>> And they will be surprised why this option can't be set, but >> the >> >>> other >> >>>>> could. *semantic* is hard to be understood by users. >> >>>>> >> >>>>> # doesn't solve the problems >> >>>>> >> >>>>> I think the purpose of this FLIP is to allow users to quickly >> override >> >>> some >> >>>>> connectors' properties to tune their jobs. >> >>>>> However, `supportedHintOptions` is off track. It only allows a >> subset >> >>>>> options and for the users it's not *clear* which subset is allowed. >> >>>>> >> >>>>> Besides, I'm not sure `supportedHintOptions` can work well for all >> >>> cases. >> >>>>> How could you support kafka properties (`connector.properties.*`) as >> >>> hint >> >>>>> options? Some kafka properties may affect semantics >> >> (bootstrap.servers), >> >>>>> some may not (max.poll.records). Besides, I think it's not possible >> to >> >>> list >> >>>>> all the possible kafka properties [1]. >> >>>>> >> >>>>> In summary, IMO, `supportedHintOptions` >> >>>>> (1) it increase the complexity to develop a connector >> >>>>> (2) it confuses users which options can be used in hint, which are >> >> not, >> >>>>> they have to check the docs again and again. >> >>>>> (3) it doesn't solve the problems which we want to solve by this >> FLIP. >> >>>>> >> >>>>> I think we should avoid introducing some partial solutions. >> Otherwise, >> >>> we >> >>>>> will be stuck in a loop that introduce new API -> deprecate API -> >> >>>>> introduce new API.... >> >>>>> >> >>>>> I personally in favor of an explicit WITH syntax after the table as >> a >> >>> part >> >>>>> of the query which is mentioned by Kurt before, e.g. SELECT * from T >> >>>>> WITH('key' = 'value') . >> >>>>> It allows users to dynamically set options which can affect >> semantics. >> >>> It >> >>>>> will be very flexible to solve users' problems so far. >> >>>>> >> >>>>> Best, >> >>>>> Jark >> >>>>> >> >>>>> [1]: https://kafka.apache.org/documentation/#consumerconfigs >> >>>>> >> >>>>> On Wed, 18 Mar 2020 at 21:44, Danny Chan <yuzhao....@gmail.com> >> >> wrote: >> >>>>> >> >>>>>> My POC is here for the hints options merge [1]. >> >>>>>> >> >>>>>> Personally, I have no strong objections for splitting hints with >> the >> >>>>>> CatalogTable, the only cons is a more complex implementation but >> the >> >>>>>> concept is more clear, and I have updated the WIKI. >> >>>>>> >> >>>>>> I think it would be nice if we can support the format “ignore-parse >> >>>>> error” >> >>>>>> option key, the CSV source already has a key [2] and we can use >> that >> >> in >> >>>>> the >> >>>>>> supportedHIntOptions, for the common CSV and JSON formats, we cal >> >> also >> >>>>> give >> >>>>>> a support. This is the only kind of key in formats that “do not >> >> change >> >>>>> the >> >>>>>> semantics” (somehow), what do you think about this ~ >> >>>>>> >> >>>>>> [1] >> >>>>>> >> >>>>> >> >>> >> >> >> https://github.com/danny0405/flink/commit/5d925fa16c3c553423c4b7d93001521b8e6e6bee#diff-6e569a6dd124fd2091c18e2790fb49c5 >> >>>>>> [2] >> >>>>>> >> >>>>> >> >>> >> >> >> https://github.com/apache/flink/blob/b83060dff6d403b6994b6646b3f29a374f599530/flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/sources/CsvTableSourceFactoryBase.java#L92 >> >>>>>> >> >>>>>> Best, >> >>>>>> Danny Chan >> >>>>>> 在 2020年3月18日 +0800 PM9:10,Timo Walther <twal...@apache.org>,写道: >> >>>>>>> Hi everyone, >> >>>>>>> >> >>>>>>> +1 to Kurt's suggestion. Let's just have it in source and sink >> >>>>> factories >> >>>>>>> for now. We can still move this method up in the future. >> Currently, >> >> I >> >>>>>>> don't see a need for catalogs or formats. Because how would you >> >> target >> >>>>> a >> >>>>>>> format in the query? >> >>>>>>> >> >>>>>>> @Danny: Can you send a link to your PoC? I'm very skeptical about >> >>>>>>> creating a new CatalogTable in planner. Actually CatalogTable >> should >> >>> be >> >>>>>>> immutable between Catalog and Factory. Because a catalog can >> return >> >>> its >> >>>>>>> own factory and fully control the instantiation. Depending on the >> >>>>>>> implementation, that means it can be possible that the catalog has >> >>>>>>> encoded more information in a concrete subclass implementing the >> >>>>>>> interface. I vote for separating the concerns of catalog >> information >> >>>>> and >> >>>>>>> hints in the factory explicitly. >> >>>>>>> >> >>>>>>> Regards, >> >>>>>>> Timo >> >>>>>>> >> >>>>>>> >> >>>>>>> On 18.03.20 05:41, Jingsong Li wrote: >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> I am thinking we can provide hints to *table* related instances. >> >>>>>>>> - TableFormatFactory: of cause we need hints support, there are >> >> many >> >>>>>> format >> >>>>>>>> options in DDL too. >> >>>>>>>> - catalog and module: I don't know, maybe in future we can >> provide >> >>>>> some >> >>>>>>>> hints for them. >> >>>>>>>> >> >>>>>>>> Best, >> >>>>>>>> Jingsong Lee >> >>>>>>>> >> >>>>>>>> On Wed, Mar 18, 2020 at 12:28 PM Danny Chan < >> yuzhao....@gmail.com> >> >>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Yes, I think we should move the `supportedHintOptions` from >> >>>>>> TableFactory >> >>>>>>>>> to TableSourceFactory, and we also need to add the interface to >> >>>>>>>>> TableSinkFactory though because sink target table may also have >> >>>>> hints >> >>>>>>>>> attached. >> >>>>>>>>> >> >>>>>>>>> Best, >> >>>>>>>>> Danny Chan >> >>>>>>>>> 在 2020年3月18日 +0800 AM11:08,Kurt Young <ykt...@gmail.com>,写道: >> >>>>>>>>>> Have one question for adding `supportedHintOptions` method to >> >>>>>>>>>> `TableFactory`. It seems >> >>>>>>>>>> `TableFactory` is a base factory interface for all *table >> module* >> >>>>>> related >> >>>>>>>>>> instances, such as >> >>>>>>>>>> catalog, module, format and so on. It's not created only for >> >>>>>> *table*. Is >> >>>>>>>>> it >> >>>>>>>>>> possible to move it >> >>>>>>>>>> to `TableSourceFactory`? >> >>>>>>>>>> >> >>>>>>>>>> Best, >> >>>>>>>>>> Kurt >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Wed, Mar 18, 2020 at 10:59 AM Danny Chan < >> >>>>> yuzhao....@gmail.com> >> >>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> Thanks Timo ~ >> >>>>>>>>>>> >> >>>>>>>>>>> For the naming itself, I also think the PROPERTIES is not that >> >>>>>>>>> concise, so >> >>>>>>>>>>> +1 for OPTIONS (I had thought about that, but there are many >> >>>>>> codes in >> >>>>>>>>>>> current Flink called it properties, i.e. the >> >>>>>> DescriptorProperties, >> >>>>>>>>>>> #getSupportedProperties), let’s use OPTIONS if this is our new >> >>>>>>>>> preference. >> >>>>>>>>>>> >> >>>>>>>>>>> +1 to `Set<ConfigOption> supportedHintOptions()` because the >> >>>>>>>>> ConfigOption >> >>>>>>>>>>> can take more info. AFAIK, Spark also call their table options >> >>>>>> instead >> >>>>>>>>> of >> >>>>>>>>>>> properties. [1] >> >>>>>>>>>>> >> >>>>>>>>>>> In my local POC, I did create a new CatalogTable, and it works >> >>>>>> for >> >>>>>>>>> current >> >>>>>>>>>>> connectors well, all the DDL tables would finally yield a >> >>>>>> CatalogTable >> >>>>>>>>>>> instance and we can apply the options to that(in the >> >>>>>> CatalogSourceTable >> >>>>>>>>>>> when we generating the TableSource), the pros is that we do >> not >> >>>>>> need to >> >>>>>>>>>>> modify the codes of connectors itself. If we split the options >> >>>>>> from >> >>>>>>>>>>> CatalogTable, we may need to add some additional logic in each >> >>>>>>>>> connector >> >>>>>>>>>>> factories in order to merge these properties (and the logic >> are >> >>>>>> almost >> >>>>>>>>> the >> >>>>>>>>>>> same), what do you think about this? >> >>>>>>>>>>> >> >>>>>>>>>>> [1] >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>> >> >>> >> >> >> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html >> >>>>>>>>>>> >> >>>>>>>>>>> Best, >> >>>>>>>>>>> Danny Chan >> >>>>>>>>>>> 在 2020年3月17日 +0800 PM10:10,Timo Walther <twal...@apache.org >> >>>>>> ,写道: >> >>>>>>>>>>>> Hi Danny, >> >>>>>>>>>>>> >> >>>>>>>>>>>> thanks for updating the FLIP. I think your current design is >> >>>>>>>>> sufficient >> >>>>>>>>>>>> to separate hints from result-related properties. >> >>>>>>>>>>>> >> >>>>>>>>>>>> One remark to the naming itself: I would vote for calling the >> >>>>>> hints >> >>>>>>>>>>>> around table scan `OPTIONS('k'='v')`. We used the term >> >>>>>> "properties" >> >>>>>>>>> in >> >>>>>>>>>>>> the past but since we want to unify the Flink configuration >> >>>>>>>>> experience, >> >>>>>>>>>>>> we should use consistent naming and classes around >> >>>>>> `ConfigOptions`. >> >>>>>>>>>>>> >> >>>>>>>>>>>> It would be nice to use `Set<ConfigOption> >> >>>>>> supportedHintOptions();` >> >>>>>>>>> to >> >>>>>>>>>>>> start using config options instead of pure string properties. >> >>>>>> This >> >>>>>>>>> will >> >>>>>>>>>>>> also allow us to generate documentation in the future around >> >>>>>>>>> supported >> >>>>>>>>>>>> data types, ranges, etc. for options. At some point we would >> >>>>>> also >> >>>>>>>>> like >> >>>>>>>>>>>> to drop `DescriptorProperties` class. "Options" is also used >> >>>>>> in the >> >>>>>>>>>>>> documentation [1] and in the SQL/MED standard [2]. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Furthermore, I would still vote for separating CatalogTable >> >>>>>> and hint >> >>>>>>>>>>>> options. Otherwise the planner would need to create a new >> >>>>>>>>> CatalogTable >> >>>>>>>>>>>> instance which might not always be easy. We should offer them >> >>>>>> via: >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>> >> org.apache.flink.table.factories.TableSourceFactory.Context#getHints: >> >>>>>>>>>>>> ReadableConfig >> >>>>>>>>>>>> >> >>>>>>>>>>>> What do you think? >> >>>>>>>>>>>> >> >>>>>>>>>>>> Regards, >> >>>>>>>>>>>> Timo >> >>>>>>>>>>>> >> >>>>>>>>>>>> [1] >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>> >> >>> >> >> >> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table >> >>>>>>>>>>>> [2] https://wiki.postgresql.org/wiki/SQL/MED >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On 12.03.20 15:06, Stephan Ewen wrote: >> >>>>>>>>>>>>> @Danny sounds good. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Maybe it is worth listing all the classes of problems that >> >>>>>> you >> >>>>>>>>> want to >> >>>>>>>>>>>>> address and then look at each class and see if hints are a >> >>>>>> good >> >>>>>>>>> default >> >>>>>>>>>>>>> solution or a good optional way of simplifying things? >> >>>>>>>>>>>>> The discussion has grown a lot and it is starting to be >> >>>>> hard >> >>>>>> to >> >>>>>>>>>>> distinguish >> >>>>>>>>>>>>> the parts where everyone agrees from the parts were there >> >>>>> are >> >>>>>>>>> concerns. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Thu, Mar 12, 2020 at 2:31 PM Danny Chan < >> >>>>>> danny0...@apache.org> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> Thanks Stephan ~ >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> We can remove the support for properties that may change >> >>>>>> the >> >>>>>>>>>>> semantics of >> >>>>>>>>>>>>>> query if you think that is a trouble. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> How about we support the /*+ properties() */ hint only >> >>>>> for >> >>>>>> those >> >>>>>>>>>>> optimize >> >>>>>>>>>>>>>> parameters, such as the fetch size of source or something >> >>>>>> like >> >>>>>>>>> that, >> >>>>>>>>>>> does >> >>>>>>>>>>>>>> that make sense? >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>于2020年3月12日 周四下午7:45写道: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I think Bowen has actually put it very well. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> (1) Hints that change semantics looks like trouble >> >>>>>> waiting to >> >>>>>>>>>>> happen. For >> >>>>>>>>>>>>>>> example Kafka offset handling should be in filters. The >> >>>>>> Kafka >> >>>>>>>>>>> source >> >>>>>>>>>>>>>> should >> >>>>>>>>>>>>>>> support predicate pushdown. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> (2) Hints should not be a workaround for current >> >>>>>> shortcomings. >> >>>>>>>>> A >> >>>>>>>>>>> lot of >> >>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>> suggested above sounds exactly like that. Working >> >>>>> around >> >>>>>>>>>>> catalog/DDL >> >>>>>>>>>>>>>>> shortcomings, missing exposure of metadata (offsets), >> >>>>>> missing >> >>>>>>>>>>> predicate >> >>>>>>>>>>>>>>> pushdown in Kafka. Abusing a feature like hints now as >> >>>>> a >> >>>>>> quick >> >>>>>>>>> fix >> >>>>>>>>>>> for >> >>>>>>>>>>>>>>> these issues, rather than fixing the root causes, will >> >>>>>> much >> >>>>>>>>> likely >> >>>>>>>>>>> bite >> >>>>>>>>>>>>>> us >> >>>>>>>>>>>>>>> back badly in the future. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>> Stephan >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Thu, Mar 12, 2020 at 10:43 AM Kurt Young < >> >>>>>> ykt...@gmail.com> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> It seems this FLIP's name is somewhat misleading. >> >>>>> From >> >>>>>> my >> >>>>>>>>>>>>>> understanding, >> >>>>>>>>>>>>>>>> this FLIP is trying to >> >>>>>>>>>>>>>>>> address the dynamic parameter issue, and table hints >> >>>>>> is the >> >>>>>>>>> way >> >>>>>>>>>>> we wan >> >>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>> choose. I think we should >> >>>>>>>>>>>>>>>> be focus on "what's the right way to solve dynamic >> >>>>>> property" >> >>>>>>>>>>> instead of >> >>>>>>>>>>>>>>>> discussing "whether table >> >>>>>>>>>>>>>>>> hints can affect query semantics". >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> For now, there are two proposed ways to achieve >> >>>>> dynamic >> >>>>>>>>> property: >> >>>>>>>>>>>>>>>> 1. FLIP-110: create temporary table xx like xx with >> >>>>>> (xxx) >> >>>>>>>>>>>>>>>> 2. use custom "from t with (xxx)" syntax >> >>>>>>>>>>>>>>>> 3. "Borrow" the table hints to have a special >> >>>>>> PROPERTIES >> >>>>>>>>> hint. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> The first one didn't break anything, but the only >> >>>>>> problem i >> >>>>>>>>> see >> >>>>>>>>>>> is a >> >>>>>>>>>>>>>>> little >> >>>>>>>>>>>>>>>> more verbose than the table hint >> >>>>>>>>>>>>>>>> approach. I can imagine when someone using SQL CLI to >> >>>>>> have a >> >>>>>>>>> sql >> >>>>>>>>>>>>>>>> experience, it's quite often that >> >>>>>>>>>>>>>>>> he will modify the table property, some use cases i >> >>>>> can >> >>>>>>>>> think of: >> >>>>>>>>>>>>>>>> 1. the source contains some corrupted data, i want to >> >>>>>> turn >> >>>>>>>>> on the >> >>>>>>>>>>>>>>>> "ignore-error" flag for certain formats. >> >>>>>>>>>>>>>>>> 2. I have a kafka table and want to see some sample >> >>>>>> data >> >>>>>>>>> from the >> >>>>>>>>>>>>>>>> beginning, so i change the offset >> >>>>>>>>>>>>>>>> to "earliest", and then I want to observe the latest >> >>>>>> data >> >>>>>>>>> which >> >>>>>>>>>>> keeps >> >>>>>>>>>>>>>>>> coming in. I would write another query >> >>>>>>>>>>>>>>>> to select from the latest table. >> >>>>>>>>>>>>>>>> 3. I want to my jdbc sink flush data more eagerly >> >>>>> then >> >>>>>> i can >> >>>>>>>>>>> observe >> >>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>> data from database side. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Most of such use cases are quite ad-hoc. If every >> >>>>> time >> >>>>>> I >> >>>>>>>>> want to >> >>>>>>>>>>> have a >> >>>>>>>>>>>>>>>> different experience, i need to create >> >>>>>>>>>>>>>>>> a temporary table and then also modify my query, it >> >>>>>> doesn't >> >>>>>>>>> feel >> >>>>>>>>>>>>>> smooth. >> >>>>>>>>>>>>>>>> Embed such dynamic property into >> >>>>>>>>>>>>>>>> query would have better user experience. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Both 2 & 3 can make this happen. The cons of #2 is >> >>>>>> breaking >> >>>>>>>>> SQL >> >>>>>>>>>>>>>>> compliant, >> >>>>>>>>>>>>>>>> and for #3, it only breaks some >> >>>>>>>>>>>>>>>> unwritten rules, but we can have an explanation on >> >>>>>> that. And >> >>>>>>>>> I >> >>>>>>>>>>> really >> >>>>>>>>>>>>>>> doubt >> >>>>>>>>>>>>>>>> whether user would complain about >> >>>>>>>>>>>>>>>> this when they actually have flexible and good >> >>>>>> experience >> >>>>>>>>> using >> >>>>>>>>>>> this. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> My tendency would be #3 > #1 > #2, what do you think? >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>> Kurt >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> On Thu, Mar 12, 2020 at 1:11 PM Danny Chan < >> >>>>>>>>> yuzhao....@gmail.com >> >>>>>>>>>>>> >> >>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Thanks Aljoscha ~ >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> I agree for most of the query hints, they are >> >>>>>> optional as >> >>>>>>>>> an >> >>>>>>>>>>>>>> optimizer >> >>>>>>>>>>>>>>>>> instruction, especially for the traditional RDBMS. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> But, just like BenChao said, Flink as a computation >> >>>>>> engine >> >>>>>>>>> has >> >>>>>>>>>>> many >> >>>>>>>>>>>>>>>>> different kind of data sources, thus, dynamic >> >>>>>> parameters >> >>>>>>>>> like >> >>>>>>>>>>>>>>>> start_offest >> >>>>>>>>>>>>>>>>> can only bind to each table scope, we can not set a >> >>>>>> session >> >>>>>>>>>>> config >> >>>>>>>>>>>>>> like >> >>>>>>>>>>>>>>>>> KSQL because they are all about Kafka: >> >>>>>>>>>>>>>>>>>> SET ‘auto.offset.reset’=‘earliest’; >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Thus the most flexible way to set up these dynamic >> >>>>>> params >> >>>>>>>>> is >> >>>>>>>>>>> to bind >> >>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>> the table scope in the query when we want to >> >>>>> override >> >>>>>>>>>>> something, so >> >>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>> have >> >>>>>>>>>>>>>>>>> these solutions above (with pros and cons from my >> >>>>>> side): >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> • 1. Select * from t(offset=123) (from Timo) >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Pros: >> >>>>>>>>>>>>>>>>> - Easy to add >> >>>>>>>>>>>>>>>>> - Parameters are part of the main query >> >>>>>>>>>>>>>>>>> Cons: >> >>>>>>>>>>>>>>>>> - Not SQL compliant >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> • 2. Select * from t /*+ PROPERTIES(offset=123) */ >> >>>>>> (from >> >>>>>>>>> me) >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Pros: >> >>>>>>>>>>>>>>>>> - Easy to add >> >>>>>>>>>>>>>>>>> - SQL compliant because it is nested in the >> >>>>> comments >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Cons: >> >>>>>>>>>>>>>>>>> - Parameters are not part of the main query >> >>>>>>>>>>>>>>>>> - Cryptic syntax for new users >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> The biggest problem for hints way may be the “if >> >>>>>> hints >> >>>>>>>>> must be >> >>>>>>>>>>>>>>> optional”, >> >>>>>>>>>>>>>>>>> actually we have though about 1 for a while but >> >>>>>> aborted >> >>>>>>>>>>> because it >> >>>>>>>>>>>>>>> breaks >> >>>>>>>>>>>>>>>>> the SQL standard too much. And we replace it with >> >>>>> 2, >> >>>>>>>>> because >> >>>>>>>>>>> the >> >>>>>>>>>>>>>> hints >> >>>>>>>>>>>>>>>>> syntax do not break SQL standard(nested in >> >>>>> comments). >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> What if we have the special /*+ PROPERTIES */ hint >> >>>>>> that >> >>>>>>>>> allows >> >>>>>>>>>>>>>> override >> >>>>>>>>>>>>>>>>> some properties of table dynamically, it does not >> >>>>>> break >> >>>>>>>>>>> anything, at >> >>>>>>>>>>>>>>>> lease >> >>>>>>>>>>>>>>>>> for current Flink use cases. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Planner hints are optional just because they are >> >>>>>> naturally >> >>>>>>>>>>> enforcers >> >>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>> the planner, most of them aim to instruct the >> >>>>>> optimizer, >> >>>>>>>>> but, >> >>>>>>>>>>> the >> >>>>>>>>>>>>>> table >> >>>>>>>>>>>>>>>>> hints is a little different, table hints can >> >>>>> specify >> >>>>>> the >> >>>>>>>>> table >> >>>>>>>>>>> meta >> >>>>>>>>>>>>>>> like >> >>>>>>>>>>>>>>>>> index column, and it is very convenient to specify >> >>>>>> table >> >>>>>>>>>>> properties. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Or shall we not call /*+ PROPERTIES(offset=123) */ >> >>>>>> table >> >>>>>>>>> hint, >> >>>>>>>>>>> we >> >>>>>>>>>>>>>> can >> >>>>>>>>>>>>>>>>> call it table dynamic parameters. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>> Danny Chan >> >>>>>>>>>>>>>>>>> 在 2020年3月11日 +0800 PM9:20,Aljoscha Krettek < >> >>>>>>>>>>> aljos...@apache.org>,写道: >> >>>>>>>>>>>>>>>>>> Hi, >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> I don't understand this discussion. Hints, as I >> >>>>>>>>> understand >> >>>>>>>>>>> them, >> >>>>>>>>>>>>>>> should >> >>>>>>>>>>>>>>>>>> work like this: >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> - hints are *optional* advice for the optimizer >> >>>>> to >> >>>>>> try >> >>>>>>>>> and >> >>>>>>>>>>> help it >> >>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>> find a good execution strategy >> >>>>>>>>>>>>>>>>>> - hints should not change query semantics, i.e. >> >>>>>> they >> >>>>>>>>> should >> >>>>>>>>>>> not >> >>>>>>>>>>>>>>> change >> >>>>>>>>>>>>>>>>>> connector properties executing a query with >> >>>>> taking >> >>>>>> into >> >>>>>>>>>>> account the >> >>>>>>>>>>>>>>>>>> hints *must* produce the same result as executing >> >>>>>> the >> >>>>>>>>> query >> >>>>>>>>>>> without >> >>>>>>>>>>>>>>>>>> taking into account the hints >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> From these simple requirements you can derive a >> >>>>>> solution >> >>>>>>>>>>> that makes >> >>>>>>>>>>>>>>>>>> sense. I don't have a strong preference for the >> >>>>>> syntax >> >>>>>>>>> but we >> >>>>>>>>>>>>>> should >> >>>>>>>>>>>>>>>>>> strive to be in line with prior work. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>> Aljoscha >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> On 11.03.20 11:53, Danny Chan wrote: >> >>>>>>>>>>>>>>>>>>> Thanks Timo for summarize the 3 options ~ >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> I agree with Kurt that option2 is too >> >>>>>> complicated to >> >>>>>>>>> use >> >>>>>>>>>>> because: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> • As a Kafka topic consumer, the user must >> >>>>>> define both >> >>>>>>>>> the >> >>>>>>>>>>>>>> virtual >> >>>>>>>>>>>>>>>>> column for start offset and he must apply a special >> >>>>>> filter >> >>>>>>>>>>> predicate >> >>>>>>>>>>>>>>>> after >> >>>>>>>>>>>>>>>>> each query >> >>>>>>>>>>>>>>>>>>> • And for the internal implementation, the >> >>>>>> metadata >> >>>>>>>>> column >> >>>>>>>>>>> push >> >>>>>>>>>>>>>>> down >> >>>>>>>>>>>>>>>>> is another hard topic, each kind of message queue >> >>>>>> may have >> >>>>>>>>> its >> >>>>>>>>>>> offset >> >>>>>>>>>>>>>>>>> attribute, we need to consider the expression type >> >>>>>> for >> >>>>>>>>>>> different >> >>>>>>>>>>>>>> kind; >> >>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> source also need to recognize the constant column >> >>>>> as >> >>>>>> a >> >>>>>>>>> config >> >>>>>>>>>>>>>>>> option(which >> >>>>>>>>>>>>>>>>> is weird because usually what we pushed down is a >> >>>>>> table >> >>>>>>>>> column) >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> For option 1 and option3, I think there is no >> >>>>>>>>> difference, >> >>>>>>>>>>> option1 >> >>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>> also a hint syntax which is introduced in Sybase >> >>>>> and >> >>>>>>>>>>> referenced then >> >>>>>>>>>>>>>>>>> deprecated by MS-SQL in 199X years because of the >> >>>>>>>>>>> ambitiousness. >> >>>>>>>>>>>>>>>> Personally >> >>>>>>>>>>>>>>>>> I prefer /*+ */ style table hint than WITH keyword >> >>>>>> for >> >>>>>>>>> these >> >>>>>>>>>>> reasons: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> • We do not break the standard SQL, the hints >> >>>>> are >> >>>>>>>>> nested >> >>>>>>>>>>> in SQL >> >>>>>>>>>>>>>>>>> comments >> >>>>>>>>>>>>>>>>>>> • We do not need to introduce additional WITH >> >>>>>> keyword >> >>>>>>>>>>> which may >> >>>>>>>>>>>>>>>> appear >> >>>>>>>>>>>>>>>>> in a query if we use that because a table can be >> >>>>>>>>> referenced in >> >>>>>>>>>>> all >> >>>>>>>>>>>>>>> kinds >> >>>>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>> SQL contexts: INSERT/DELETE/FROM/JOIN …. That would >> >>>>>> make >> >>>>>>>>> our >> >>>>>>>>>>> sql >> >>>>>>>>>>>>>> query >> >>>>>>>>>>>>>>>>> break too much of the SQL from standard >> >>>>>>>>>>>>>>>>>>> • We would have uniform syntax for hints as >> >>>>> query >> >>>>>>>>> hint, one >> >>>>>>>>>>>>>> syntax >> >>>>>>>>>>>>>>>>> fits all and more easy to use >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> And here is the reason why we choose a uniform >> >>>>>> Oracle >> >>>>>>>>>>> style query >> >>>>>>>>>>>>>>>>> hint syntax which is addressed by Julian Hyde when >> >>>>> we >> >>>>>>>>> design >> >>>>>>>>>>> the >> >>>>>>>>>>>>>> syntax >> >>>>>>>>>>>>>>>>> from the Calcite community: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> I don’t much like the MSSQL-style syntax for >> >>>>>> table >> >>>>>>>>> hints. >> >>>>>>>>>>> It >> >>>>>>>>>>>>>> adds a >> >>>>>>>>>>>>>>>>> new use of the WITH keyword that is unrelated to >> >>>>> the >> >>>>>> use of >> >>>>>>>>>>> WITH for >> >>>>>>>>>>>>>>>>> common-table expressions. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> A historical note. Microsoft SQL Server >> >>>>>> inherited its >> >>>>>>>>> hint >> >>>>>>>>>>> syntax >> >>>>>>>>>>>>>>>> from >> >>>>>>>>>>>>>>>>> Sybase a very long time ago. (See “Transact SQL >> >>>>>>>>>>> Programming”[1], page >> >>>>>>>>>>>>>>>> 632, >> >>>>>>>>>>>>>>>>> “Optimizer hints”. The book was written in 1999, >> >>>>> and >> >>>>>> covers >> >>>>>>>>>>> Microsoft >> >>>>>>>>>>>>>>> SQL >> >>>>>>>>>>>>>>>>> Server 6.5 / 7.0 and Sybase Adaptive Server 11.5, >> >>>>>> but the >> >>>>>>>>>>> syntax very >> >>>>>>>>>>>>>>>>> likely predates Sybase 4.3, from which Microsoft >> >>>>> SQL >> >>>>>>>>> Server was >> >>>>>>>>>>>>>> forked >> >>>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>> 1993.) >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Microsoft later added the WITH keyword to make >> >>>>>> it less >> >>>>>>>>>>> ambiguous, >> >>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>> has now deprecated the syntax that does not use >> >>>>> WITH. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> They are forced to keep the syntax for >> >>>>> backwards >> >>>>>>>>>>> compatibility >> >>>>>>>>>>>>>> but >> >>>>>>>>>>>>>>>>> that doesn’t mean that we should shoulder their >> >>>>>> burden. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> I think formatted comments are the right >> >>>>>> container for >> >>>>>>>>>>> hints >> >>>>>>>>>>>>>>> because >> >>>>>>>>>>>>>>>>> it allows us to change the hint syntax without >> >>>>>> changing >> >>>>>>>>> the SQL >> >>>>>>>>>>>>>> parser, >> >>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>> makes clear that we are at liberty to ignore hints >> >>>>>>>>> entirely. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Julian >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> [1] https://www.amazon.com/s?k=9781565924017 < >> >>>>>>>>>>>>>>>>> https://www.amazon.com/s?k=9781565924017> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>> Danny Chan >> >>>>>>>>>>>>>>>>>>> 在 2020年3月11日 +0800 PM4:03,Timo Walther < >> >>>>>>>>> twal...@apache.org >> >>>>>>>>>>>> ,写道: >> >>>>>>>>>>>>>>>>>>>> Hi Danny, >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> it is true that our DDL is not standard >> >>>>>> compliant by >> >>>>>>>>>>> using the >> >>>>>>>>>>>>>>> WITH >> >>>>>>>>>>>>>>>>>>>> clause. Nevertheless, we aim for not >> >>>>> diverging >> >>>>>> too >> >>>>>>>>> much >> >>>>>>>>>>> and the >> >>>>>>>>>>>>>>>> LIKE >> >>>>>>>>>>>>>>>>>>>> clause is an example of that. It will solve >> >>>>>> things >> >>>>>>>>> like >> >>>>>>>>>>>>>>> overwriting >> >>>>>>>>>>>>>>>>>>>> WATERMARKs, add additional/modifying >> >>>>>> properties and >> >>>>>>>>>>> inherit >> >>>>>>>>>>>>>>> schema. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Bowen is right that Flink's DDL is mixing 3 >> >>>>>> types >> >>>>>>>>>>> definition >> >>>>>>>>>>>>>>>>> together. >> >>>>>>>>>>>>>>>>>>>> We are not the first ones that try to solve >> >>>>>> this. >> >>>>>>>>> There >> >>>>>>>>>>> is also >> >>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> SQL >> >>>>>>>>>>>>>>>>>>>> MED standard [1] that tried to tackle this >> >>>>>> problem. I >> >>>>>>>>>>> think it >> >>>>>>>>>>>>>>> was >> >>>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>> considered when designing the current DDL. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Currently, I see 3 options for handling Kafka >> >>>>>>>>> offsets. I >> >>>>>>>>>>> will >> >>>>>>>>>>>>>>> give >> >>>>>>>>>>>>>>>>> some >> >>>>>>>>>>>>>>>>>>>> examples and look forward to feedback here: >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> *Option 1* Runtime and semantic parms as part >> >>>>>> of the >> >>>>>>>>>>> query >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> `SELECT * FROM MyTable('offset'=123)` >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Pros: >> >>>>>>>>>>>>>>>>>>>> - Easy to add >> >>>>>>>>>>>>>>>>>>>> - Parameters are part of the main query >> >>>>>>>>>>>>>>>>>>>> - No complicated hinting syntax >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Cons: >> >>>>>>>>>>>>>>>>>>>> - Not SQL compliant >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> *Option 2* Use metadata in query >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> `CREATE TABLE MyTable (id INT, offset AS >> >>>>>>>>>>>>>>>> SYSTEM_METADATA('offset'))` >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> `SELECT * FROM MyTable WHERE offset > >> >>>>> TIMESTAMP >> >>>>>>>>>>> '2012-12-12 >> >>>>>>>>>>>>>>>>> 12:34:22'` >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Pros: >> >>>>>>>>>>>>>>>>>>>> - SQL compliant in the query >> >>>>>>>>>>>>>>>>>>>> - Access of metadata in the DDL which is >> >>>>>> required >> >>>>>>>>> anyway >> >>>>>>>>>>>>>>>>>>>> - Regular pushdown rules apply >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Cons: >> >>>>>>>>>>>>>>>>>>>> - Users need to add an additional comlumn in >> >>>>>> the DDL >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> *Option 3*: Use hints for properties >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> ` >> >>>>>>>>>>>>>>>>>>>> SELECT * >> >>>>>>>>>>>>>>>>>>>> FROM MyTable /*+ PROPERTIES('offset'=123) */ >> >>>>>>>>>>>>>>>>>>>> ` >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Pros: >> >>>>>>>>>>>>>>>>>>>> - Easy to add >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Cons: >> >>>>>>>>>>>>>>>>>>>> - Parameters are not part of the main query >> >>>>>>>>>>>>>>>>>>>> - Cryptic syntax for new users >> >>>>>>>>>>>>>>>>>>>> - Not standard compliant. >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> If we go with this option, I would suggest to >> >>>>>> make it >> >>>>>>>>>>> available >> >>>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>> separate map and don't mix it with statically >> >>>>>> defined >> >>>>>>>>>>>>>> properties. >> >>>>>>>>>>>>>>>>> Such >> >>>>>>>>>>>>>>>>>>>> that the factory can decide which properties >> >>>>>> have the >> >>>>>>>>>>> right to >> >>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>>>>>>> overwritten by the hints: >> >>>>>>>>>>>>>>>>>>>> TableSourceFactory.Context.getQueryHints(): >> >>>>>>>>>>> ReadableConfig >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Regards, >> >>>>>>>>>>>>>>>>>>>> Timo >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> [1] https://en.wikipedia.org/wiki/SQL/MED >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Currently I see 3 options as a >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> On 11.03.20 07:21, Danny Chan wrote: >> >>>>>>>>>>>>>>>>>>>>> Thanks Bowen ~ >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> I agree we should somehow categorize our >> >>>>>> connector >> >>>>>>>>>>>>>> parameters. >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> For type1, I’m already preparing a solution >> >>>>>> like >> >>>>>>>>> the >> >>>>>>>>>>>>>> Confluent >> >>>>>>>>>>>>>>>>> schema registry + Avro schema inference thing, so >> >>>>>> this may >> >>>>>>>>> not >> >>>>>>>>>>> be a >> >>>>>>>>>>>>>>>> problem >> >>>>>>>>>>>>>>>>> in the near future. >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> For type3, I have some questions: >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> "SELECT * FROM mykafka WHERE offset > >> >>>>> 12pm >> >>>>>>>>> yesterday” >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Where does the offset column come from, a >> >>>>>> virtual >> >>>>>>>>>>> column from >> >>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> table schema, you said that >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> They change >> >>>>>>>>>>>>>>>>>>>>> almost every time a query starts and have >> >>>>>> nothing >> >>>>>>>>> to >> >>>>>>>>>>> do with >> >>>>>>>>>>>>>>>>> metadata, thus >> >>>>>>>>>>>>>>>>>>>>> should not be part of table definition/DDL >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> But why you can reference it in the query, >> >>>>>> I’m >> >>>>>>>>>>> confused for >> >>>>>>>>>>>>>>> that, >> >>>>>>>>>>>>>>>>> can you elaborate a little ? >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>> Danny Chan >> >>>>>>>>>>>>>>>>>>>>> 在 2020年3月11日 +0800 PM12:52,Bowen Li < >> >>>>>>>>>>> bowenl...@gmail.com >> >>>>>>>>>>>>>>> ,写道: >> >>>>>>>>>>>>>>>>>>>>>> Thanks Danny for kicking off the effort >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> The root cause of too much manual work is >> >>>>>> Flink >> >>>>>>>>> DDL >> >>>>>>>>>>> has >> >>>>>>>>>>>>>>> mixed 3 >> >>>>>>>>>>>>>>>>> types of >> >>>>>>>>>>>>>>>>>>>>>> params together and doesn't handle each >> >>>>> of >> >>>>>> them >> >>>>>>>>> very >> >>>>>>>>>>> well. >> >>>>>>>>>>>>>>>> Below >> >>>>>>>>>>>>>>>>> are how I >> >>>>>>>>>>>>>>>>>>>>>> categorize them and corresponding >> >>>>>> solutions in my >> >>>>>>>>>>> mind: >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> - type 1: Metadata of external data, like >> >>>>>>>>> external >> >>>>>>>>>>>>>>>> endpoint/url, >> >>>>>>>>>>>>>>>>>>>>>> username/pwd, schemas, formats. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Such metadata are mostly already >> >>>>>> accessible in >> >>>>>>>>>>> external >> >>>>>>>>>>>>>>> system >> >>>>>>>>>>>>>>>>> as long as >> >>>>>>>>>>>>>>>>>>>>>> endpoints and credentials are provided. >> >>>>>> Flink can >> >>>>>>>>>>> get it >> >>>>>>>>>>>>>> thru >> >>>>>>>>>>>>>>>>> catalogs, but >> >>>>>>>>>>>>>>>>>>>>>> we haven't had many catalogs yet and thus >> >>>>>> Flink >> >>>>>>>>> just >> >>>>>>>>>>> hasn't >> >>>>>>>>>>>>>>>> been >> >>>>>>>>>>>>>>>>> able to >> >>>>>>>>>>>>>>>>>>>>>> leverage that. So the solution should be >> >>>>>> building >> >>>>>>>>>>> more >> >>>>>>>>>>>>>>>> catalogs. >> >>>>>>>>>>>>>>>>> Such >> >>>>>>>>>>>>>>>>>>>>>> params should be part of a Flink table >> >>>>>>>>>>> DDL/definition, and >> >>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>> overridable >> >>>>>>>>>>>>>>>>>>>>>> in any means. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> - type 2: Runtime params, like jdbc >> >>>>>> connector's >> >>>>>>>>>>> fetch size, >> >>>>>>>>>>>>>>>>> elasticsearch >> >>>>>>>>>>>>>>>>>>>>>> connector's bulk flush size. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Such params don't affect query results, >> >>>>> but >> >>>>>>>>> affect >> >>>>>>>>>>> how >> >>>>>>>>>>>>>>> results >> >>>>>>>>>>>>>>>>> are produced >> >>>>>>>>>>>>>>>>>>>>>> (eg. fast or slow, aka performance) - >> >>>>> they >> >>>>>> are >> >>>>>>>>>>> essentially >> >>>>>>>>>>>>>>>>> execution and >> >>>>>>>>>>>>>>>>>>>>>> implementation details. They change often >> >>>>>> in >> >>>>>>>>>>> exploration or >> >>>>>>>>>>>>>>>>> development >> >>>>>>>>>>>>>>>>>>>>>> stages, but not quite frequently in >> >>>>>> well-defined >> >>>>>>>>>>>>>> long-running >> >>>>>>>>>>>>>>>>> pipelines. >> >>>>>>>>>>>>>>>>>>>>>> They should always have default values >> >>>>> and >> >>>>>> can be >> >>>>>>>>>>> missing >> >>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>> query. They >> >>>>>>>>>>>>>>>>>>>>>> can be part of a table DDL/definition, >> >>>>> but >> >>>>>> should >> >>>>>>>>>>> also be >> >>>>>>>>>>>>>>>>> replaceable in a >> >>>>>>>>>>>>>>>>>>>>>> query - *this is what table "hints" in >> >>>>>> FLIP-113 >> >>>>>>>>>>> should >> >>>>>>>>>>>>>>> cover*. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> - type 3: Semantic params, like kafka >> >>>>>> connector's >> >>>>>>>>>>> start >> >>>>>>>>>>>>>>> offset. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Such params affect query results - the >> >>>>>> semantics. >> >>>>>>>>>>> They'd >> >>>>>>>>>>>>>>> better >> >>>>>>>>>>>>>>>>> be as >> >>>>>>>>>>>>>>>>>>>>>> filter conditions in WHERE clause that >> >>>>> can >> >>>>>> be >> >>>>>>>>> pushed >> >>>>>>>>>>> down. >> >>>>>>>>>>>>>>> They >> >>>>>>>>>>>>>>>>> change >> >>>>>>>>>>>>>>>>>>>>>> almost every time a query starts and have >> >>>>>>>>> nothing to >> >>>>>>>>>>> do >> >>>>>>>>>>>>>> with >> >>>>>>>>>>>>>>>>> metadata, thus >> >>>>>>>>>>>>>>>>>>>>>> should not be part of table >> >>>>>> definition/DDL, nor >> >>>>>>>>> be >> >>>>>>>>>>>>>> persisted >> >>>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>> catalogs. >> >>>>>>>>>>>>>>>>>>>>>> If they will, users should create views >> >>>>> to >> >>>>>> keep >> >>>>>>>>> such >> >>>>>>>>>>> params >> >>>>>>>>>>>>>>>>> around (note >> >>>>>>>>>>>>>>>>>>>>>> this is different from variable >> >>>>>> substitution). >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Take Flink-Kafka as an example. Once we >> >>>>>> get these >> >>>>>>>>>>> params >> >>>>>>>>>>>>>>> right, >> >>>>>>>>>>>>>>>>> here're the >> >>>>>>>>>>>>>>>>>>>>>> steps users need to do to develop and run >> >>>>>> a Flink >> >>>>>>>>>>> job: >> >>>>>>>>>>>>>>>>>>>>>> - configure a Flink >> >>>>>> ConfluentSchemaRegistry with >> >>>>>>>>> url, >> >>>>>>>>>>>>>>> username, >> >>>>>>>>>>>>>>>>> and password >> >>>>>>>>>>>>>>>>>>>>>> - run "SELECT * FROM mykafka WHERE offset >> >>>>>>> 12pm >> >>>>>>>>>>> yesterday" >> >>>>>>>>>>>>>>>>> (simplified >> >>>>>>>>>>>>>>>>>>>>>> timestamp) in SQL CLI, Flink >> >>>>> automatically >> >>>>>>>>> retrieves >> >>>>>>>>>>> all >> >>>>>>>>>>>>>>>>> metadata of >> >>>>>>>>>>>>>>>>>>>>>> schema, file format, etc and start the >> >>>>> job >> >>>>>>>>>>>>>>>>>>>>>> - users want to make the job read Kafka >> >>>>>> topic >> >>>>>>>>>>> faster, so it >> >>>>>>>>>>>>>>>> goes >> >>>>>>>>>>>>>>>>> as "SELECT >> >>>>>>>>>>>>>>>>>>>>>> * FROM mykafka /* faster_read_key=value*/ >> >>>>>> WHERE >> >>>>>>>>>>> offset > >> >>>>>>>>>>>>>> 12pm >> >>>>>>>>>>>>>>>>> yesterday" >> >>>>>>>>>>>>>>>>>>>>>> - done and satisfied, users submit it to >> >>>>>>>>> production >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Regarding "CREATE TABLE t LIKE with >> >>>>> (k1=v1, >> >>>>>>>>> k2=v2), >> >>>>>>>>>>> I think >> >>>>>>>>>>>>>>>> it's >> >>>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>>>> nice-to-have feature, but not a >> >>>>>> strategically >> >>>>>>>>>>> critical, >> >>>>>>>>>>>>>>>>> long-term solution, >> >>>>>>>>>>>>>>>>>>>>>> because >> >>>>>>>>>>>>>>>>>>>>>> 1) It may seem promising at the current >> >>>>>> stage to >> >>>>>>>>>>> solve the >> >>>>>>>>>>>>>>>>>>>>>> too-much-manual-work problem, but that's >> >>>>>> only >> >>>>>>>>>>> because Flink >> >>>>>>>>>>>>>>>>> hasn't >> >>>>>>>>>>>>>>>>>>>>>> leveraged catalogs well and handled the 3 >> >>>>>> types >> >>>>>>>>> of >> >>>>>>>>>>> params >> >>>>>>>>>>>>>>> above >> >>>>>>>>>>>>>>>>> properly. >> >>>>>>>>>>>>>>>>>>>>>> Once we get the params types right, the >> >>>>>> LIKE >> >>>>>>>>> syntax >> >>>>>>>>>>> won't >> >>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>>>>> important, and will be just an easier way >> >>>>>> to >> >>>>>>>>> create >> >>>>>>>>>>> tables >> >>>>>>>>>>>>>>>>> without retyping >> >>>>>>>>>>>>>>>>>>>>>> long fields like username and pwd. >> >>>>>>>>>>>>>>>>>>>>>> 2) Note that only some rare type of >> >>>>>> catalog can >> >>>>>>>>>>> store k-v >> >>>>>>>>>>>>>>>>> property pair, so >> >>>>>>>>>>>>>>>>>>>>>> table created this way often cannot be >> >>>>>>>>> persisted. In >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> foreseeable >> >>>>>>>>>>>>>>>>>>>>>> future, such catalog will only be >> >>>>>> HiveCatalog, >> >>>>>>>>> and >> >>>>>>>>>>> not >> >>>>>>>>>>>>>>> everyone >> >>>>>>>>>>>>>>>>> has a Hive >> >>>>>>>>>>>>>>>>>>>>>> metastore. To be honest, without >> >>>>>> persistence, >> >>>>>>>>>>> recreating >> >>>>>>>>>>>>>>> tables >> >>>>>>>>>>>>>>>>> every time >> >>>>>>>>>>>>>>>>>>>>>> this way is still a lot of keyboard >> >>>>> typing. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Cheers, >> >>>>>>>>>>>>>>>>>>>>>> Bowen >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Mar 10, 2020 at 8:07 PM Kurt >> >>>>> Young >> >>>>>> < >> >>>>>>>>>>>>>> ykt...@gmail.com >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> If a specific connector want to have >> >>>>> such >> >>>>>>>>>>> parameter and >> >>>>>>>>>>>>>>> read >> >>>>>>>>>>>>>>>>> if out of >> >>>>>>>>>>>>>>>>>>>>>>> configuration, then that's fine. >> >>>>>>>>>>>>>>>>>>>>>>> If we are talking about a configuration >> >>>>>> for all >> >>>>>>>>>>> kinds of >> >>>>>>>>>>>>>>>>> sources, I would >> >>>>>>>>>>>>>>>>>>>>>>> be super careful about that. >> >>>>>>>>>>>>>>>>>>>>>>> It's true it can solve maybe 80% cases, >> >>>>>> but it >> >>>>>>>>>>> will also >> >>>>>>>>>>>>>>> make >> >>>>>>>>>>>>>>>>> the left 20% >> >>>>>>>>>>>>>>>>>>>>>>> feels weird. >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>> Kurt >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 11, 2020 at 11:00 AM Jark >> >>>>> Wu >> >>>>>> < >> >>>>>>>>>>>>>> imj...@gmail.com >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> Hi Kurt, >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> #3 Regarding to global offset: >> >>>>>>>>>>>>>>>>>>>>>>>> I'm not saying to use the global >> >>>>>>>>> configuration to >> >>>>>>>>>>>>>>> override >> >>>>>>>>>>>>>>>>> connector >> >>>>>>>>>>>>>>>>>>>>>>>> properties by the planner. >> >>>>>>>>>>>>>>>>>>>>>>>> But the connector should take this >> >>>>>>>>> configuration >> >>>>>>>>>>> and >> >>>>>>>>>>>>>>>>> translate into their >> >>>>>>>>>>>>>>>>>>>>>>>> client API. >> >>>>>>>>>>>>>>>>>>>>>>>> AFAIK, almost all the message queues >> >>>>>> support >> >>>>>>>>>>> eariliest >> >>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>> latest and a >> >>>>>>>>>>>>>>>>>>>>>>>> timestamp value as start point. >> >>>>>>>>>>>>>>>>>>>>>>>> So we can support 3 options for this >> >>>>>>>>>>> configuration: >> >>>>>>>>>>>>>>>>> "eariliest", "latest" >> >>>>>>>>>>>>>>>>>>>>>>>> and a timestamp string value. >> >>>>>>>>>>>>>>>>>>>>>>>> Of course, this can't solve 100% >> >>>>>> cases, but I >> >>>>>>>>>>> guess can >> >>>>>>>>>>>>>>>>> sovle 80% or 90% >> >>>>>>>>>>>>>>>>>>>>>>>> cases. >> >>>>>>>>>>>>>>>>>>>>>>>> And the remaining cases can be >> >>>>>> resolved by >> >>>>>>>>> LIKE >> >>>>>>>>>>> syntax >> >>>>>>>>>>>>>>>> which >> >>>>>>>>>>>>>>>>> I guess is >> >>>>>>>>>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>>>>> very common cases. >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>> Jark >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, 11 Mar 2020 at 10:33, Kurt >> >>>>>> Young < >> >>>>>>>>>>>>>>> ykt...@gmail.com >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> Good to have such lovely >> >>>>>> discussions. I >> >>>>>>>>> also >> >>>>>>>>>>> want to >> >>>>>>>>>>>>>>>> share >> >>>>>>>>>>>>>>>>> some of my >> >>>>>>>>>>>>>>>>>>>>>>>>> opinions. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> #1 Regarding to error handling: I >> >>>>>> also >> >>>>>>>>> think >> >>>>>>>>>>> ignore >> >>>>>>>>>>>>>>>>> invalid hints would >> >>>>>>>>>>>>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>>>>>>>>>>>> dangerous, maybe >> >>>>>>>>>>>>>>>>>>>>>>>>> the simplest solution is just throw >> >>>>>> an >> >>>>>>>>>>> exception. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> #2 Regarding to property >> >>>>>> replacement: I >> >>>>>>>>> don't >> >>>>>>>>>>> think >> >>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>> should >> >>>>>>>>>>>>>>>>>>>>>>> constraint >> >>>>>>>>>>>>>>>>>>>>>>>>> ourself to >> >>>>>>>>>>>>>>>>>>>>>>>>> the meaning of the word "hint", and >> >>>>>>>>> forbidden >> >>>>>>>>>>> it >> >>>>>>>>>>>>>>>> modifying >> >>>>>>>>>>>>>>>>> any >> >>>>>>>>>>>>>>>>>>>>>>> properties >> >>>>>>>>>>>>>>>>>>>>>>>>> which can effect >> >>>>>>>>>>>>>>>>>>>>>>>>> query results. IMO `PROPERTIES` is >> >>>>>> one of >> >>>>>>>>> the >> >>>>>>>>>>> table >> >>>>>>>>>>>>>>>> hints, >> >>>>>>>>>>>>>>>>> and a >> >>>>>>>>>>>>>>>>>>>>>>> powerful >> >>>>>>>>>>>>>>>>>>>>>>>>> one. It can >> >>>>>>>>>>>>>>>>>>>>>>>>> modify properties located in DDL's >> >>>>>> WITH >> >>>>>>>>> block. >> >>>>>>>>>>> But I >> >>>>>>>>>>>>>>> also >> >>>>>>>>>>>>>>>>> see the harm >> >>>>>>>>>>>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>>>>>>>> if we make it >> >>>>>>>>>>>>>>>>>>>>>>>>> too flexible like change the kafka >> >>>>>> topic >> >>>>>>>>> name >> >>>>>>>>>>> with a >> >>>>>>>>>>>>>>>> hint. >> >>>>>>>>>>>>>>>>> Such use >> >>>>>>>>>>>>>>>>>>>>>>> case >> >>>>>>>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>>>>> not common and >> >>>>>>>>>>>>>>>>>>>>>>>>> sounds very dangerous to me. I >> >>>>> would >> >>>>>>>>> propose >> >>>>>>>>>>> we have >> >>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>> map >> >>>>>>>>>>>>>>>>> of hintable >> >>>>>>>>>>>>>>>>>>>>>>>>> properties for each >> >>>>>>>>>>>>>>>>>>>>>>>>> connector, and should validate all >> >>>>>> passed >> >>>>>>>>> in >> >>>>>>>>>>>>>> properties >> >>>>>>>>>>>>>>>>> are actually >> >>>>>>>>>>>>>>>>>>>>>>>>> hintable. And combining with >> >>>>>>>>>>>>>>>>>>>>>>>>> #1 error handling, we can throw an >> >>>>>>>>> exception >> >>>>>>>>>>> once >> >>>>>>>>>>>>>>>> received >> >>>>>>>>>>>>>>>>> invalid >> >>>>>>>>>>>>>>>>>>>>>>>>> property. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> #3 Regarding to global offset: I'm >> >>>>>> not sure >> >>>>>>>>>>> it's >> >>>>>>>>>>>>>>>> feasible. >> >>>>>>>>>>>>>>>>> Different >> >>>>>>>>>>>>>>>>>>>>>>>>> connectors will have totally >> >>>>>>>>>>>>>>>>>>>>>>>>> different properties to represent >> >>>>>> offset, >> >>>>>>>>> some >> >>>>>>>>>>> might >> >>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>>>> timestamps, >> >>>>>>>>>>>>>>>>>>>>>>> some >> >>>>>>>>>>>>>>>>>>>>>>>>> might be string literals >> >>>>>>>>>>>>>>>>>>>>>>>>> like "earliest", and others might >> >>>>> be >> >>>>>> just >> >>>>>>>>>>> integers. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>> Kurt >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Mar 10, 2020 at 11:46 PM >> >>>>>> Jark Wu < >> >>>>>>>>>>>>>>>> imj...@gmail.com> >> >>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> I want to jump in the discussion >> >>>>>> about >> >>>>>>>>> the >> >>>>>>>>>>> "dynamic >> >>>>>>>>>>>>>>>>> start offset" >> >>>>>>>>>>>>>>>>>>>>>>>>> problem. >> >>>>>>>>>>>>>>>>>>>>>>>>>> First of all, I share the same >> >>>>>> concern >> >>>>>>>>> with >> >>>>>>>>>>> Timo >> >>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>> Fabian, that the >> >>>>>>>>>>>>>>>>>>>>>>>>>> "start offset" affects the query >> >>>>>>>>> semantics, >> >>>>>>>>>>> i.e. >> >>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> query result. >> >>>>>>>>>>>>>>>>>>>>>>>>>> But "hints" is just used for >> >>>>>> optimization >> >>>>>>>>>>> which >> >>>>>>>>>>>>>>> should >> >>>>>>>>>>>>>>>>> affect the >> >>>>>>>>>>>>>>>>>>>>>>>> result? >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> I think the "dynamic start >> >>>>> offset" >> >>>>>> is an >> >>>>>>>>> very >> >>>>>>>>>>>>>>> important >> >>>>>>>>>>>>>>>>> usability >> >>>>>>>>>>>>>>>>>>>>>>>> problem >> >>>>>>>>>>>>>>>>>>>>>>>>>> which will be faced by many >> >>>>>> streaming >> >>>>>>>>>>> platforms. >> >>>>>>>>>>>>>>>>>>>>>>>>>> I also agree "CREATE TEMPORARY >> >>>>>> TABLE Temp >> >>>>>>>>>>> (LIKE t) >> >>>>>>>>>>>>>>> WITH >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>> ('connector.startup-timestamp-millis' = >> >>>>>>>>>>>>>>>>> '1578538374471')" is verbose, >> >>>>>>>>>>>>>>>>>>>>>>>>> what >> >>>>>>>>>>>>>>>>>>>>>>>>>> if we have 10 tables to join? >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> However, what I want to propose >> >>>>>> (should >> >>>>>>>>> be >> >>>>>>>>>>> another >> >>>>>>>>>>>>>>>>> thread) is a >> >>>>>>>>>>>>>>>>>>>>>>> global >> >>>>>>>>>>>>>>>>>>>>>>>>>> configuration to reset start >> >>>>>> offsets of >> >>>>>>>>> all >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>> source >> >>>>>>>>>>>>>>>>> connectors >> >>>>>>>>>>>>>>>>>>>>>>>>>> in the query session, e.g. >> >>>>>>>>>>>>>>>> "table.sources.start-offset". >> >>>>>>>>>>>>>>>>> This is >> >>>>>>>>>>>>>>>>>>>>>>>> possible >> >>>>>>>>>>>>>>>>>>>>>>>>>> now because >> >>>>>> `TableSourceFactory.Context` >> >>>>>>>>> has >> >>>>>>>>>>>>>>>>> `getConfiguration` >> >>>>>>>>>>>>>>>>>>>>>>>>>> method to get the session >> >>>>>> configuration, >> >>>>>>>>> and >> >>>>>>>>>>> use it >> >>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>> create an >> >>>>>>>>>>>>>>>>>>>>>>>> adapted >> >>>>>>>>>>>>>>>>>>>>>>>>>> TableSource. >> >>>>>>>>>>>>>>>>>>>>>>>>>> Then we can also expose to SQL >> >>>>> CLI >> >>>>>> via >> >>>>>>>>> SET >> >>>>>>>>>>> command, >> >>>>>>>>>>>>>>>> e.g. >> >>>>>>>>>>>>>>>>> `SET >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>> 'table.sources.start-offset'='earliest';`, >> >>>>>>>>>>> which is >> >>>>>>>>>>>>>>>>> pretty simple and >> >>>>>>>>>>>>>>>>>>>>>>>>>> straightforward. >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> This is very similar to KSQL's >> >>>>> `SET >> >>>>>>>>>>>>>>>>> 'auto.offset.reset'='earliest'` >> >>>>>>>>>>>>>>>>>>>>>>>> which >> >>>>>>>>>>>>>>>>>>>>>>>>>> is very helpful IMO. >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>> Jark >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 10 Mar 2020 at 22:29, >> >>>>> Timo >> >>>>>>>>> Walther < >> >>>>>>>>>>>>>>>>> twal...@apache.org> >> >>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> compared to the hints, FLIP-110 >> >>>>>> is >> >>>>>>>>> fully >> >>>>>>>>>>>>>> compliant >> >>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>> the SQL >> >>>>>>>>>>>>>>>>>>>>>>>> standard. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think that `CREATE >> >>>>>> TEMPORARY >> >>>>>>>>> TABLE >> >>>>>>>>>>> Temp >> >>>>>>>>>>>>>>> (LIKE >> >>>>>>>>>>>>>>>>> t) WITH >> >>>>>>>>>>>>>>>>>>>>>>> (k=v)` >> >>>>>>>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>>>>>>> too verbose or awkward for the >> >>>>>> power of >> >>>>>>>>>>> basically >> >>>>>>>>>>>>>>>>> changing the >> >>>>>>>>>>>>>>>>>>>>>>> entire >> >>>>>>>>>>>>>>>>>>>>>>>>>>> connector. Usually, this >> >>>>>> statement >> >>>>>>>>> would >> >>>>>>>>>>> just >> >>>>>>>>>>>>>>> precede >> >>>>>>>>>>>>>>>>> the query in >> >>>>>>>>>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>>>>>>>>> multiline file. So it can be >> >>>>>> change >> >>>>>>>>>>> "in-place" >> >>>>>>>>>>>>>> like >> >>>>>>>>>>>>>>>>> the hints you >> >>>>>>>>>>>>>>>>>>>>>>>>>> proposed. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Many companies have a >> >>>>>> well-defined set >> >>>>>>>>> of >> >>>>>>>>>>> tables >> >>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>> should be >> >>>>>>>>>>>>>>>>>>>>>>> used. >> >>>>>>>>>>>>>>>>>>>>>>>>> It >> >>>>>>>>>>>>>>>>>>>>>>>>>>> would be dangerous if users can >> >>>>>> change >> >>>>>>>>> the >> >>>>>>>>>>> path >> >>>>>>>>>>>>>> or >> >>>>>>>>>>>>>>>>> topic in a hint. >> >>>>>>>>>>>>>>>>>>>>>>>> The >> >>>>>>>>>>>>>>>>>>>>>>>>>>> catalog/catalog manager should >> >>>>>> be the >> >>>>>>>>>>> entity that >> >>>>>>>>>>>>>>>>> controls which >> >>>>>>>>>>>>>>>>>>>>>>>> tables >> >>>>>>>>>>>>>>>>>>>>>>>>>>> exist and how they can be >> >>>>>> accessed. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> what’s the problem there if >> >>>>> we >> >>>>>> user >> >>>>>>>>> the >> >>>>>>>>>>> table >> >>>>>>>>>>>>>>> hints >> >>>>>>>>>>>>>>>>> to support >> >>>>>>>>>>>>>>>>>>>>>>>>> “start >> >>>>>>>>>>>>>>>>>>>>>>>>>>> offset”? >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> IMHO it violates the meaning of >> >>>>>> a hint. >> >>>>>>>>>>> According >> >>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>> dictionary, >> >>>>>>>>>>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>>>>>>>>> hint is "a statement that >> >>>>>> expresses >> >>>>>>>>>>> indirectly >> >>>>>>>>>>>>>> what >> >>>>>>>>>>>>>>>>> one prefers not >> >>>>>>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>>>> say explicitly". But offsets >> >>>>> are >> >>>>>> a >> >>>>>>>>>>> property that >> >>>>>>>>>>>>>>> are >> >>>>>>>>>>>>>>>>> very explicit. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> If we go with the hint >> >>>>> approach, >> >>>>>> it >> >>>>>>>>> should >> >>>>>>>>>>> be >> >>>>>>>>>>>>>>>>> expressible in the >> >>>>>>>>>>>>>>>>>>>>>>>>>>> TableSourceFactory which >> >>>>>> properties are >> >>>>>>>>>>> supported >> >>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>> hinting. Or >> >>>>>>>>>>>>>>>>>>>>>>> do >> >>>>>>>>>>>>>>>>>>>>>>>>> you >> >>>>>>>>>>>>>>>>>>>>>>>>>>> plan to offer those hints in a >> >>>>>> separate >> >>>>>>>>>>>>>> Map<String, >> >>>>>>>>>>>>>>>>> String> that >> >>>>>>>>>>>>>>>>>>>>>>>> cannot >> >>>>>>>>>>>>>>>>>>>>>>>>>>> overwrite existing properties? >> >>>>> I >> >>>>>> think >> >>>>>>>>>>> this would >> >>>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>> different >> >>>>>>>>>>>>>>>>>>>>>>>>> story... >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Regards, >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Timo >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> On 10.03.20 10:34, Danny Chan >> >>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Timo ~ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Personally I would say that >> >>>>>> offset > >> >>>>>>>>> 0 >> >>>>>>>>>>> and >> >>>>>>>>>>>>>> start >> >>>>>>>>>>>>>>>>> offset = 10 does >> >>>>>>>>>>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>>>>>>>> have the same semantic, so from >> >>>>>> the SQL >> >>>>>>>>>>> aspect, >> >>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>> can >> >>>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>>>> implement >> >>>>>>>>>>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>>>>>>>>> “starting offset” hint for >> >>>>> query >> >>>>>> with >> >>>>>>>>> such >> >>>>>>>>>>> a >> >>>>>>>>>>>>>>> syntax. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> And the CREATE TABLE LIKE >> >>>>>> syntax is a >> >>>>>>>>>>> DDL which >> >>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>> just verbose >> >>>>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>> defining such dynamic >> >>>>> parameters >> >>>>>> even >> >>>>>>>>> if >> >>>>>>>>>>> it could >> >>>>>>>>>>>>>>> do >> >>>>>>>>>>>>>>>>> that, shall we >> >>>>>>>>>>>>>>>>>>>>>>>>> force >> >>>>>>>>>>>>>>>>>>>>>>>>>>> users to define a temporal >> >>>>> table >> >>>>>> for >> >>>>>>>>> each >> >>>>>>>>>>> query >> >>>>>>>>>>>>>>> with >> >>>>>>>>>>>>>>>>> dynamic >> >>>>>>>>>>>>>>>>>>>>>>> params, >> >>>>>>>>>>>>>>>>>>>>>>>> I >> >>>>>>>>>>>>>>>>>>>>>>>>>>> would say it’s an awkward >> >>>>>> solution. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> "Hints should give "hints" >> >>>>> but >> >>>>>> not >> >>>>>>>>>>> affect the >> >>>>>>>>>>>>>>>> actual >> >>>>>>>>>>>>>>>>> produced >> >>>>>>>>>>>>>>>>>>>>>>>>> result.” >> >>>>>>>>>>>>>>>>>>>>>>>>>>> You mentioned that multiple >> >>>>>> times and >> >>>>>>>>>>> could we >> >>>>>>>>>>>>>>> give a >> >>>>>>>>>>>>>>>>> reason, >> >>>>>>>>>>>>>>>>>>>>>>> what’s >> >>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>> problem there if we user the >> >>>>>> table >> >>>>>>>>> hints to >> >>>>>>>>>>>>>> support >> >>>>>>>>>>>>>>>>> “start offset” >> >>>>>>>>>>>>>>>>>>>>>>> ? >> >>>>>>>>>>>>>>>>>>>>>>>>> From >> >>>>>>>>>>>>>>>>>>>>>>>>>>> my side I saw some benefits for >> >>>>>> that: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> • It’s very convent to set up >> >>>>>> these >> >>>>>>>>>>> parameters, >> >>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> syntax is >> >>>>>>>>>>>>>>>>>>>>>>> very >> >>>>>>>>>>>>>>>>>>>>>>>>> much >> >>>>>>>>>>>>>>>>>>>>>>>>>>> like the DDL definition >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> • It’s scope is very clear, >> >>>>>> right on >> >>>>>>>>> the >> >>>>>>>>>>> table >> >>>>>>>>>>>>>> it >> >>>>>>>>>>>>>>>>> attathed >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> • It does not affect the >> >>>>> table >> >>>>>>>>> schema, >> >>>>>>>>>>> which >> >>>>>>>>>>>>>>> means >> >>>>>>>>>>>>>>>>> in order to >> >>>>>>>>>>>>>>>>>>>>>>>>> specify >> >>>>>>>>>>>>>>>>>>>>>>>>>>> the offset, there is no need to >> >>>>>> define >> >>>>>>>>> an >> >>>>>>>>>>> offset >> >>>>>>>>>>>>>>>>> column which is >> >>>>>>>>>>>>>>>>>>>>>>>> weird >> >>>>>>>>>>>>>>>>>>>>>>>>>>> actually, offset should never >> >>>>> be >> >>>>>> a >> >>>>>>>>> column, >> >>>>>>>>>>> it’s >> >>>>>>>>>>>>>>> more >> >>>>>>>>>>>>>>>>> like a >> >>>>>>>>>>>>>>>>>>>>>>> metadata >> >>>>>>>>>>>>>>>>>>>>>>>>> or a >> >>>>>>>>>>>>>>>>>>>>>>>>>>> start option. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> So in total, FLIP-110 uses >> >>>>> the >> >>>>>> offset >> >>>>>>>>>>> more >> >>>>>>>>>>>>>> like a >> >>>>>>>>>>>>>>>>> Hive partition >> >>>>>>>>>>>>>>>>>>>>>>>>> prune, >> >>>>>>>>>>>>>>>>>>>>>>>>>>> we can do that if we have an >> >>>>>> offset >> >>>>>>>>>>> column, but >> >>>>>>>>>>>>>>> most >> >>>>>>>>>>>>>>>>> of the case we >> >>>>>>>>>>>>>>>>>>>>>>>> do >> >>>>>>>>>>>>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>>>>>>>> define that, so there is >> >>>>>> actually no >> >>>>>>>>>>> conflict or >> >>>>>>>>>>>>>>>>> overlap. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月10日 +0800 >> >>>>> PM4:28,Timo >> >>>>>>>>> Walther < >> >>>>>>>>>>>>>>>>> twal...@apache.org>,写道: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldn't FLIP-110[1] solve >> >>>>>> most >> >>>>>>>>> of the >> >>>>>>>>>>>>>>> problems >> >>>>>>>>>>>>>>>>> we have around >> >>>>>>>>>>>>>>>>>>>>>>>>>> defining >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table properties more >> >>>>>> dynamically >> >>>>>>>>>>> without >> >>>>>>>>>>>>>>> manual >> >>>>>>>>>>>>>>>>> schema work? >> >>>>>>>>>>>>>>>>>>>>>>> Also >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> offset definition is easier >> >>>>>> with >> >>>>>>>>> such a >> >>>>>>>>>>>>>> syntax. >> >>>>>>>>>>>>>>>>> They must not be >> >>>>>>>>>>>>>>>>>>>>>>>>>> defined >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in catalog but could be >> >>>>>> temporary >> >>>>>>>>>>> tables that >> >>>>>>>>>>>>>>>>> extend from the >> >>>>>>>>>>>>>>>>>>>>>>>>> original >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, we should aim >> >>>>> to >> >>>>>> keep >> >>>>>>>>> the >> >>>>>>>>>>> syntax >> >>>>>>>>>>>>>>>>> concise and don't >> >>>>>>>>>>>>>>>>>>>>>>>>> provide >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> too many ways of doing the >> >>>>>> same >> >>>>>>>>> thing. >> >>>>>>>>>>> Hints >> >>>>>>>>>>>>>>>>> should give "hints" >> >>>>>>>>>>>>>>>>>>>>>>>> but >> >>>>>>>>>>>>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> affect the actual produced >> >>>>>> result. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Some connector properties >> >>>>>> might >> >>>>>>>>> also >> >>>>>>>>>>> change >> >>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> plan or schema >> >>>>>>>>>>>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> future. E.g. they might >> >>>>> also >> >>>>>> define >> >>>>>>>>>>> whether a >> >>>>>>>>>>>>>>>>> table source >> >>>>>>>>>>>>>>>>>>>>>>>> supports >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> certain push-downs (e.g. >> >>>>>> predicate >> >>>>>>>>>>>>>> push-down). >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dawid is currently working >> >>>>> a >> >>>>>> draft >> >>>>>>>>>>> that might >> >>>>>>>>>>>>>>>>> makes it possible >> >>>>>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> expose a Kafka offset via >> >>>>> the >> >>>>>>>>> schema >> >>>>>>>>>>> such >> >>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>> `SELECT * FROM >> >>>>>>>>>>>>>>>>>>>>>>>> Topic >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> WHERE offset > 10` would >> >>>>>> become >> >>>>>>>>>>> possible and >> >>>>>>>>>>>>>>>> could >> >>>>>>>>>>>>>>>>> be pushed >> >>>>>>>>>>>>>>>>>>>>>>> down. >> >>>>>>>>>>>>>>>>>>>>>>>>> But >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this is of course, not >> >>>>>> planned >> >>>>>>>>>>> initially. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>> >> >>> >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10.03.20 08:34, Danny >> >>>>> Chan >> >>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Wenlong ~ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For PROPERTIES Hint Error >> >>>>>>>>> handling >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually we have no way >> >>>>> to >> >>>>>>>>> figure out >> >>>>>>>>>>>>>>> whether a >> >>>>>>>>>>>>>>>>> error prone >> >>>>>>>>>>>>>>>>>>>>>>> hint >> >>>>>>>>>>>>>>>>>>>>>>>>> is a >> >>>>>>>>>>>>>>>>>>>>>>>>>>> PROPERTIES hint, for example, >> >>>>> if >> >>>>>> use >> >>>>>>>>>>> writes a >> >>>>>>>>>>>>>> hint >> >>>>>>>>>>>>>>>> like >> >>>>>>>>>>>>>>>>>>>>>>> ‘PROPERTIAS’, >> >>>>>>>>>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>>>>>>>>> do >> >>>>>>>>>>>>>>>>>>>>>>>>>>> not know if this hint is a >> >>>>>> PROPERTIES >> >>>>>>>>>>> hint, what >> >>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>> know is that >> >>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>> hint >> >>>>>>>>>>>>>>>>>>>>>>>>>>> name was not registered in our >> >>>>>> Flink. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If the user writes the >> >>>>>> hint name >> >>>>>>>>>>> correctly >> >>>>>>>>>>>>>>>> (i.e. >> >>>>>>>>>>>>>>>>> PROPERTIES), >> >>>>>>>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>>>>>>>> did >> >>>>>>>>>>>>>>>>>>>>>>>>>>> can enforce the validation of >> >>>>>> the hint >> >>>>>>>>>>> options >> >>>>>>>>>>>>>>> though >> >>>>>>>>>>>>>>>>> the pluggable >> >>>>>>>>>>>>>>>>>>>>>>>>>>> HintOptionChecker. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For PROPERTIES Hint >> >>>>> Option >> >>>>>> Format >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For a key value style >> >>>>> hint >> >>>>>>>>> option, >> >>>>>>>>>>> the key >> >>>>>>>>>>>>>>> can >> >>>>>>>>>>>>>>>>> be either a >> >>>>>>>>>>>>>>>>>>>>>>> simple >> >>>>>>>>>>>>>>>>>>>>>>>>>>> identifier or a string literal, >> >>>>>> which >> >>>>>>>>>>> means that >> >>>>>>>>>>>>>>> it’s >> >>>>>>>>>>>>>>>>> compatible >> >>>>>>>>>>>>>>>>>>>>>>> with >> >>>>>>>>>>>>>>>>>>>>>>>>> our >> >>>>>>>>>>>>>>>>>>>>>>>>>>> DDL syntax. We support simple >> >>>>>>>>> identifier >> >>>>>>>>>>> because >> >>>>>>>>>>>>>>> many >> >>>>>>>>>>>>>>>>> other hints >> >>>>>>>>>>>>>>>>>>>>>>> do >> >>>>>>>>>>>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>>>>>>>> have the component complex keys >> >>>>>> like >> >>>>>>>>> the >> >>>>>>>>>>> table >> >>>>>>>>>>>>>>>>> properties, and we >> >>>>>>>>>>>>>>>>>>>>>>>> want >> >>>>>>>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>>>> unify the parse block. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月10日 +0800 >> >>>>>>>>>>> PM3:19,wenlong.lwl < >> >>>>>>>>>>>>>>>>> wenlong88....@gmail.com >> >>>>>>>>>>>>>>>>>>>>>>>>>> ,写道: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, thanks for >> >>>>> the >> >>>>>>>>> proposal. >> >>>>>>>>>>> +1 for >> >>>>>>>>>>>>>>>>> adding table hints, >> >>>>>>>>>>>>>>>>>>>>>>> it >> >>>>>>>>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>>>>>>> really >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a necessary feature for >> >>>>>> flink >> >>>>>>>>> sql >> >>>>>>>>>>> to >> >>>>>>>>>>>>>>>> integrate >> >>>>>>>>>>>>>>>>> with a catalog. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For error handling, I >> >>>>>> think it >> >>>>>>>>>>> would be >> >>>>>>>>>>>>>>> more >> >>>>>>>>>>>>>>>>> natural to throw >> >>>>>>>>>>>>>>>>>>>>>>> an >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception when error >> >>>>>> table hint >> >>>>>>>>>>> provided, >> >>>>>>>>>>>>>>>>> because the >> >>>>>>>>>>>>>>>>>>>>>>> properties >> >>>>>>>>>>>>>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>>>>>>>>>>>> hint >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be merged and used >> >>>>>> to find >> >>>>>>>>>>> the table >> >>>>>>>>>>>>>>>>> factory which would >> >>>>>>>>>>>>>>>>>>>>>>>>> cause >> >>>>>>>>>>>>>>>>>>>>>>>>>> an >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception when error >> >>>>>> properties >> >>>>>>>>>>> provided, >> >>>>>>>>>>>>>>>>> right? On the other >> >>>>>>>>>>>>>>>>>>>>>>>>> hand, >> >>>>>>>>>>>>>>>>>>>>>>>>>>> unlike >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other hints which just >> >>>>>> affect >> >>>>>>>>> the >> >>>>>>>>>>> way to >> >>>>>>>>>>>>>>>>> execute the query, >> >>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>> property >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table hint actually >> >>>>>> affects the >> >>>>>>>>>>> result of >> >>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> query, we should >> >>>>>>>>>>>>>>>>>>>>>>>>> never >> >>>>>>>>>>>>>>>>>>>>>>>>>>> ignore >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the given property >> >>>>> hints. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the format of >> >>>>>> property >> >>>>>>>>> hints, >> >>>>>>>>>>>>>>> currently, >> >>>>>>>>>>>>>>>>> in sql client, we >> >>>>>>>>>>>>>>>>>>>>>>>>>> accept >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> properties in format of >> >>>>>> string >> >>>>>>>>>>> only in >> >>>>>>>>>>>>>> DDL: >> >>>>>>>>>>>>>>>>>>>>>>>>>> 'connector.type'='kafka', >> >>>>>>>>>>>>>>>>>>>>>>>>>>> I >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think the format of >> >>>>>> properties >> >>>>>>>>> in >> >>>>>>>>>>> hint >> >>>>>>>>>>>>>>> should >> >>>>>>>>>>>>>>>>> be the same as >> >>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>> format we >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defined in ddl. What do >> >>>>>> you >> >>>>>>>>> think? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bests, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wenlong Lyu >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 10 Mar 2020 at >> >>>>>> 14:22, >> >>>>>>>>>>> Danny Chan >> >>>>>>>>>>>>>> < >> >>>>>>>>>>>>>>>>>>>>>>> yuzhao....@gmail.com> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To Weike: About the >> >>>>>> Error >> >>>>>>>>> Handing >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To be consistent with >> >>>>>> other >> >>>>>>>>> SQL >> >>>>>>>>>>>>>> vendors, >> >>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> default is to >> >>>>>>>>>>>>>>>>>>>>>>> log >> >>>>>>>>>>>>>>>>>>>>>>>>>>> warnings >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and if there is any >> >>>>>> error >> >>>>>>>>>>> (invalid hint >> >>>>>>>>>>>>>>>> name >> >>>>>>>>>>>>>>>>> or options), the >> >>>>>>>>>>>>>>>>>>>>>>>>> hint >> >>>>>>>>>>>>>>>>>>>>>>>>>>> is just >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignored. I have >> >>>>> already >> >>>>>>>>>>> addressed in >> >>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> wiki. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To Timo: About the >> >>>>>> PROPERTIES >> >>>>>>>>>>> Table >> >>>>>>>>>>>>>> Hint >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> • The properties >> >>>>> hints >> >>>>>> is >> >>>>>>>>> also >> >>>>>>>>>>>>>> optional, >> >>>>>>>>>>>>>>>>> user can pass in an >> >>>>>>>>>>>>>>>>>>>>>>>>> option >> >>>>>>>>>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> override the table >> >>>>>> properties >> >>>>>>>>>>> but this >> >>>>>>>>>>>>>>> does >> >>>>>>>>>>>>>>>>> not mean it is >> >>>>>>>>>>>>>>>>>>>>>>>>>> required. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> • They should not >> >>>>>> include >> >>>>>>>>>>> semantics: >> >>>>>>>>>>>>>> does >> >>>>>>>>>>>>>>>>> the properties >> >>>>>>>>>>>>>>>>>>>>>>> belong >> >>>>>>>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantic ? I don't >> >>>>>> think so, >> >>>>>>>>> the >> >>>>>>>>>>> plan >> >>>>>>>>>>>>>>> does >> >>>>>>>>>>>>>>>>> not change right ? >> >>>>>>>>>>>>>>>>>>>>>>>> The >> >>>>>>>>>>>>>>>>>>>>>>>>>>> result >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set may be affected, >> >>>>>> but >> >>>>>>>>> there >> >>>>>>>>>>> are >> >>>>>>>>>>>>>>> already >> >>>>>>>>>>>>>>>>> some hints do so, >> >>>>>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>> example, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MS-SQL MAXRECURSION >> >>>>> and >> >>>>>>>>> SNAPSHOT >> >>>>>>>>>>> hint >> >>>>>>>>>>>>>> [1] >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> • `SELECT * FROM >> >>>>> t(k=v, >> >>>>>>>>> k=v)`: >> >>>>>>>>>>> this >> >>>>>>>>>>>>>>> grammar >> >>>>>>>>>>>>>>>>> breaks the SQL >> >>>>>>>>>>>>>>>>>>>>>>>>> standard >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compared to the hints >> >>>>>>>>> way(which >> >>>>>>>>>>> is >> >>>>>>>>>>>>>>> included >> >>>>>>>>>>>>>>>>> in comments) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> • I actually didn't >> >>>>>> found any >> >>>>>>>>>>> vendors >> >>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>> support such >> >>>>>>>>>>>>>>>>>>>>>>> grammar, >> >>>>>>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>> there >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is no way to override >> >>>>>> table >> >>>>>>>>> level >> >>>>>>>>>>>>>>>> properties >> >>>>>>>>>>>>>>>>> dynamically. For >> >>>>>>>>>>>>>>>>>>>>>>>>>> normal >> >>>>>>>>>>>>>>>>>>>>>>>>>>> RDBMS, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there are no >> >>>>>> requests >> >>>>>>>>>>> for such >> >>>>>>>>>>>>>>>>> dynamic parameters >> >>>>>>>>>>>>>>>>>>>>>>>> because >> >>>>>>>>>>>>>>>>>>>>>>>>>>> all the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table have the same >> >>>>>> storage >> >>>>>>>>> and >> >>>>>>>>>>>>>>> computation >> >>>>>>>>>>>>>>>>> and they are >> >>>>>>>>>>>>>>>>>>>>>>> almost >> >>>>>>>>>>>>>>>>>>>>>>>>> all >> >>>>>>>>>>>>>>>>>>>>>>>>>>> batch >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> • While Flink as a >> >>>>>>>>> computation >> >>>>>>>>>>> engine >> >>>>>>>>>>>>>> has >> >>>>>>>>>>>>>>>>> many connectors, >> >>>>>>>>>>>>>>>>>>>>>>>>>>> especially for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some message queue >> >>>>> like >> >>>>>>>>> Kafka, >> >>>>>>>>>>> we would >> >>>>>>>>>>>>>>>> have >> >>>>>>>>>>>>>>>>> a start_offset >> >>>>>>>>>>>>>>>>>>>>>>>> which >> >>>>>>>>>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different each time >> >>>>> we >> >>>>>> start >> >>>>>>>>> the >> >>>>>>>>>>> query, >> >>>>>>>>>>>>>>>> such >> >>>>>>>>>>>>>>>>> parameters can >> >>>>>>>>>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> persisted to catalog, >> >>>>>> because >> >>>>>>>>>>> it’s not >> >>>>>>>>>>>>>>>>> static, this is >> >>>>>>>>>>>>>>>>>>>>>>> actually >> >>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> background we propose >> >>>>>> the >> >>>>>>>>> table >> >>>>>>>>>>> hints >> >>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>> indicate such >> >>>>>>>>>>>>>>>>>>>>>>>> properties >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamically. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To Jark and Jinsong: >> >>>>> I >> >>>>>> have >> >>>>>>>>>>> removed the >> >>>>>>>>>>>>>>>>> query hints part and >> >>>>>>>>>>>>>>>>>>>>>>>>> change >> >>>>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> title. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>> >> >>> >> >> >> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月9日 +0800 >> >>>>>> PM5:46,Timo >> >>>>>>>>>>> Walther < >> >>>>>>>>>>>>>>>>> twal...@apache.org >> >>>>>>>>>>>>>>>>>>>>>>>> ,写道: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks for the >> >>>>>> proposal. I >> >>>>>>>>>>> agree with >> >>>>>>>>>>>>>>>> Jark >> >>>>>>>>>>>>>>>>> and Jingsong. >> >>>>>>>>>>>>>>>>>>>>>>>> Planner >> >>>>>>>>>>>>>>>>>>>>>>>>>>> hints >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and table hints are >> >>>>>>>>> orthogonal >> >>>>>>>>>>> topics >> >>>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>> should be >> >>>>>>>>>>>>>>>>>>>>>>> discussed >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I share Jingsong's >> >>>>>> opinion >> >>>>>>>>>>> that we >> >>>>>>>>>>>>>>> should >> >>>>>>>>>>>>>>>>> not use planner >> >>>>>>>>>>>>>>>>>>>>>>>> hints >> >>>>>>>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> passing connector >> >>>>>>>>> properties. >> >>>>>>>>>>> Planner >> >>>>>>>>>>>>>>>>> hints should be >> >>>>>>>>>>>>>>>>>>>>>>> optional >> >>>>>>>>>>>>>>>>>>>>>>>>> at >> >>>>>>>>>>>>>>>>>>>>>>>>>>> any >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time. They should >> >>>>> not >> >>>>>>>>> include >> >>>>>>>>>>>>>> semantics >> >>>>>>>>>>>>>>>>> but only affect >> >>>>>>>>>>>>>>>>>>>>>>>>> execution >> >>>>>>>>>>>>>>>>>>>>>>>>>>> time. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Connector >> >>>>> properties >> >>>>>> are an >> >>>>>>>>>>> important >> >>>>>>>>>>>>>>>> part >> >>>>>>>>>>>>>>>>> of the query >> >>>>>>>>>>>>>>>>>>>>>>>> itself. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Have you thought >> >>>>>> about >> >>>>>>>>> options >> >>>>>>>>>>> such >> >>>>>>>>>>>>>> as >> >>>>>>>>>>>>>>>>> `SELECT * FROM t(k=v, >> >>>>>>>>>>>>>>>>>>>>>>>>>> k=v)`? >> >>>>>>>>>>>>>>>>>>>>>>>>>>> How >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are other vendors >> >>>>>> deal with >> >>>>>>>>>>> this >> >>>>>>>>>>>>>>> problem? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 09.03.20 10:37, >> >>>>>>>>> Jingsong Li >> >>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, +1 for >> >>>>>> table >> >>>>>>>>> hints, >> >>>>>>>>>>>>>> thanks >> >>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>> driving. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I took a look to >> >>>>>> FLIP, >> >>>>>>>>> most >> >>>>>>>>>>> of >> >>>>>>>>>>>>>>> content >> >>>>>>>>>>>>>>>>> are talking about >> >>>>>>>>>>>>>>>>>>>>>>>> query >> >>>>>>>>>>>>>>>>>>>>>>>>>>> hints. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It is >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard to >> >>>>> discussion >> >>>>>> and >> >>>>>>>>>>> voting. So >> >>>>>>>>>>>>>> +1 >> >>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>> split it as Jark >> >>>>>>>>>>>>>>>>>>>>>>>> said. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another thing is >> >>>>>>>>>>> configuration that >> >>>>>>>>>>>>>>>>> suitable to config with >> >>>>>>>>>>>>>>>>>>>>>>>>> table >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "connector.path" >> >>>>>> and >> >>>>>>>>>>>>>>> "connector.topic", >> >>>>>>>>>>>>>>>>> Are they really >> >>>>>>>>>>>>>>>>>>>>>>>>> suitable >> >>>>>>>>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints? Looks >> >>>>> weird >> >>>>>> to me. >> >>>>>>>>>>> Because I >> >>>>>>>>>>>>>>>>> think these properties >> >>>>>>>>>>>>>>>>>>>>>>>> are >> >>>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jingsong Lee >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 9, >> >>>>>> 2020 at >> >>>>>>>>> 5:30 >> >>>>>>>>>>> PM Jark >> >>>>>>>>>>>>>>> Wu >> >>>>>>>>>>>>>>>> < >> >>>>>>>>>>>>>>>>> imj...@gmail.com> >> >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Danny >> >>>>> for >> >>>>>>>>> starting >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> discussion. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> +1 for this >> >>>>>> feature. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we just >> >>>>> focus >> >>>>>> on the >> >>>>>>>>>>> table >> >>>>>>>>>>>>>> hints >> >>>>>>>>>>>>>>>>> not the query hints in >> >>>>>>>>>>>>>>>>>>>>>>>>> this >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could you split >> >>>>>> the >> >>>>>>>>> FLIP >> >>>>>>>>>>> into two >> >>>>>>>>>>>>>>>>> FLIPs? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because it's >> >>>>>> hard to >> >>>>>>>>> vote >> >>>>>>>>>>> on >> >>>>>>>>>>>>>>> partial >> >>>>>>>>>>>>>>>>> part of a FLIP. You >> >>>>>>>>>>>>>>>>>>>>>>> can >> >>>>>>>>>>>>>>>>>>>>>>>>>> keep >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the table >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints proposal >> >>>>> in >> >>>>>>>>> FLIP-113 >> >>>>>>>>>>> and >> >>>>>>>>>>>>>> move >> >>>>>>>>>>>>>>>>> query hints into >> >>>>>>>>>>>>>>>>>>>>>>> another >> >>>>>>>>>>>>>>>>>>>>>>>>>> FLIP. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So that we can >> >>>>>> focuse >> >>>>>>>>> on >> >>>>>>>>>>> the >> >>>>>>>>>>>>>> table >> >>>>>>>>>>>>>>>>> hints in the FLIP. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, 9 Mar >> >>>>>> 2020 at >> >>>>>>>>>>> 17:14, >> >>>>>>>>>>>>>> DONG, >> >>>>>>>>>>>>>>>>> Weike < >> >>>>>>>>>>>>>>>>>>>>>>>>>> kyled...@connect.hku.hk >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a >> >>>>> nice >> >>>>>>>>> feature, >> >>>>>>>>>>> +1. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I >> >>>>> am >> >>>>>>>>>>> interested in >> >>>>>>>>>>>>>> but >> >>>>>>>>>>>>>>>> not >> >>>>>>>>>>>>>>>>> mentioned in the >> >>>>>>>>>>>>>>>>>>>>>>>>> proposal >> >>>>>>>>>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> handling, as >> >>>>>> it is >> >>>>>>>>> quite >> >>>>>>>>>>> common >> >>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>> users to write >> >>>>>>>>>>>>>>>>>>>>>>>>>> inappropriate >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints in >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL code, if >> >>>>>> illegal >> >>>>>>>>> or >> >>>>>>>>>>> "bad" >> >>>>>>>>>>>>>>> hints >> >>>>>>>>>>>>>>>>> are given, would the >> >>>>>>>>>>>>>>>>>>>>>>>>> system >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simply >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignore them >> >>>>> or >> >>>>>> throw >> >>>>>>>>>>>>>> exceptions? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks : ) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Weike >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Mar >> >>>>> 9, >> >>>>>> 2020 >> >>>>>>>>> at >> >>>>>>>>>>> 5:02 PM >> >>>>>>>>>>>>>>>> Danny >> >>>>>>>>>>>>>>>>> Chan < >> >>>>>>>>>>>>>>>>>>>>>>>>>> yuzhao....@gmail.com> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Note: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we only >> >>>>> plan >> >>>>>> to >> >>>>>>>>>>> support table >> >>>>>>>>>>>>>>>>> hints in Flink release >> >>>>>>>>>>>>>>>>>>>>>>> 1.11, >> >>>>>>>>>>>>>>>>>>>>>>>>> so >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> please >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> focus >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly on >> >>>>>> the table >> >>>>>>>>>>> hints >> >>>>>>>>>>>>>> part >> >>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>> just ignore the >> >>>>>>>>>>>>>>>>>>>>>>> planner >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints, sorry >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >> >>>>> mistake >> >>>>>> ~ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月9日 >> >>>>>> +0800 >> >>>>>>>>>>>>>> PM4:36,Danny >> >>>>>>>>>>>>>>>>> Chan < >> >>>>>>>>>>>>>>>>>>>>>>> yuzhao....@gmail.com >> >>>>>>>>>>>>>>>>>>>>>>>>>>> ,写道: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >> >>>>>> fellows ~ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would >> >>>>>> like to >> >>>>>>>>>>> propose the >> >>>>>>>>>>>>>>>>> supports for SQL hints for >> >>>>>>>>>>>>>>>>>>>>>>>> our >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink SQL. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would >> >>>>>> support >> >>>>>>>>>>> hints >> >>>>>>>>>>>>>> syntax >> >>>>>>>>>>>>>>>> as >> >>>>>>>>>>>>>>>>> following: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> select >> >>>>> /*+ >> >>>>>>>>>>> NO_HASH_JOIN, >> >>>>>>>>>>>>>>>>> RESOURCE(mem='128mb', >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallelism='24') */ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> emp /*+ >> >>>>>>>>> INDEX(idx1, >> >>>>>>>>>>> idx2) >> >>>>>>>>>>>>>> */ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dept /*+ >> >>>>>>>>>>>>>> PROPERTIES(k1='v1', >> >>>>>>>>>>>>>>>>> k2='v2') */ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>> emp.deptno >> >>>>>> = >> >>>>>>>>>>> dept.deptno >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Basically >> >>>>>> we >> >>>>>>>>> would >> >>>>>>>>>>> support >> >>>>>>>>>>>>>>> both >> >>>>>>>>>>>>>>>>> query hints(after the >> >>>>>>>>>>>>>>>>>>>>>>>>> SELECT >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> keyword) >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and table >> >>>>>>>>> hints(after >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> referenced table name), for >> >>>>>>>>>>>>>>>>>>>>>>>> 1.11, >> >>>>>>>>>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> plan to >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support >> >>>>>> table hints >> >>>>>>>>>>> with a >> >>>>>>>>>>>>>> hint >> >>>>>>>>>>>>>>>>> probably named >> >>>>>>>>>>>>>>>>>>>>>>> PROPERTIES: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>> table_name >> >>>>>> /*+ >> >>>>>>>>>>>>>>>>> PROPERTIES(k1='v1', k2='v2') *+/ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am >> >>>>>> looking >> >>>>>>>>> forward >> >>>>>>>>>>> to >> >>>>>>>>>>>>>> your >> >>>>>>>>>>>>>>>>> comments. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You can >> >>>>>> access >> >>>>>>>>> the >> >>>>>>>>>>> FLIP >> >>>>>>>>>>>>>> here: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>> >> >>>>> >> >>> >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny >> >>>>> Chan >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >>> >> >> >> > >> >>