Thanks Jiabao for driving this. +1 to start a vote, a minor comment, should we change the FLIP title according this thread context as well?
Best, Leonard > 2024年1月3日 下午2:43,Jiabao Sun <jiabao....@xtransfer.cn.INVALID> 写道: > > Hi, > > Thank you again for the discussion on this FLIP. > If there are no further comments, I plan to start a voting thread tomorrow. > > Best, > Jiabao > > On 2023/12/20 14:09:49 Jiabao Sun wrote: >> Hi, >> >> Thank you to everyone for the discussion on this FLIP, >> especially Becket for providing guidance that made it more reasonable. >> >> The FLIP document[1] has been updated with the recent discussed content. >> Please take a look to double-check it when you have time. >> >> If we can reach a consensus on this, I will open the voting thread in recent >> days. >> >> Best, >> Jiabao >> >> [1] >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 >> >> >>> 2023年12月20日 11:38,Jiabao Sun <ji...@xtransfer.cn.INVALID> 写道: >>> >>> Thanks Becket, >>> >>> The behavior description has been added to the Public Interfaces section. >>> >>> Best, >>> Jiabao >>> >>> >>>> 2023年12月20日 08:17,Becket Qin <be...@gmail.com <http://gmail.com/>> 写道: >>>> >>>> Hi Jiabao, >>>> >>>> Thanks for updating the FLIP. >>>> Can you add the behavior of the policies that are only applicable to some >>>> but not all of the databases? This is a part of the intended behavior of >>>> the proposed configuration. So, we should include that in the FLIP. >>>> Otherwise, the FLIP looks good to me. >>>> >>>> Cheers, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Tue, Dec 19, 2023 at 11:00 PM Jiabao Sun <ji...@xtransfer.cn.invalid> >>>> wrote: >>>> >>>>> Hi Becket, >>>>> >>>>> I share the same view as you regarding the prefix for this configuration >>>>> option. >>>>> >>>>> For the JDBC connector, I prefer setting 'filter.handling.policy' = 'FOO' >>>>> and throwing an exception when the database do not support that specific >>>>> policy. >>>>> >>>>> Not using a prefix can reduce the learning curve for users and avoid >>>>> introducing a new set of configuration options for every supported JDBC >>>>> database. >>>>> I think the policies we provide can be compatible with most databases that >>>>> follow the JDBC protocol. >>>>> However, there may be cases where certain databases cannot support some >>>>> policies. >>>>> Nevertheless, we can ensure fast failure and allow users to choose a >>>>> suitable policy in such situations. >>>>> >>>>> I have removed the contents about the configuration prefix. >>>>> Please help review it again. >>>>> >>>>> Thanks, >>>>> Jiabao >>>>> >>>>> >>>>>> 2023年12月19日 19:46,Becket Qin <be...@gmail.com <http://gmail.com/>> 写道: >>>>>> >>>>>> Hi Jiabao, >>>>>> >>>>>> Thanks for updating the FLIP. >>>>>> >>>>>> One more question regarding the JDBC connector, since it is a connector >>>>>> shared by multiple databases, what if there is a filter handling policy >>>>>> that is only applicable to one of the databases, but not the others? In >>>>>> that case, how would the users specify that policy? >>>>>> Unlike the example of orc format with 2nd+ level config, JDBC connector >>>>>> only looks at the URL to decide which driver to use. >>>>>> >>>>>> For example, MySql supports policy FOO while other databases do not. If >>>>>> users want to use FOO for MySql, what should they do? Will they set >>>>>> '*mysql.filter.hanlding.policy' >>>>>> = 'FOO', *which will only be picked up when the MySql driver is used? >>>>>> Or they should just set* 'filter.handling.policy' = 'FOO', *and throw >>>>>> exceptions when other JDBC drivers are used? Personally, I prefer the >>>>>> latter. If we pick that, do we still need to mention the following? >>>>>> >>>>>> *The prefix is needed when the option is for a 2nd+ level. * >>>>>>> 'connector' = 'filesystem', >>>>>>> 'format' = 'orc', >>>>>>> 'orc.filter.handling.policy' = 'NUBERIC_ONY' >>>>>>> >>>>>>> *In this case, the values of this configuration may be different >>>>> depending >>>>>>> on the format option. For example, orc format may have INDEXED_ONLY >>>>> while >>>>>>> parquet format may have something else. * >>>>>>> >>>>>> >>>>>> I found this is somewhat misleading, because the example here is not a >>>>> part >>>>>> of the proposal of this FLIP. It is just an example explaining when a >>>>>> prefix is needed, which seems orthogonal to the proposal in this FLIP. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jiangjie (Becket) Qin >>>>>> >>>>>> >>>>>> On Tue, Dec 19, 2023 at 10:09 AM Jiabao Sun <jiabao....@xtransfer.cn >>>>>> <mailto:jiabao....@xtransfer.cn> >>>>> .invalid> >>>>>> wrote: >>>>>> >>>>>>> Thanks Becket for the suggestions, >>>>>>> >>>>>>> Updated. >>>>>>> Please help review it again when you have time. >>>>>>> >>>>>>> Best, >>>>>>> Jiabao >>>>>>> >>>>>>> >>>>>>>> 2023年12月19日 09:06,Becket Qin <be...@gmail.com <http://gmail.com/>> 写道: >>>>>>>> >>>>>>>> Hi JIabao, >>>>>>>> >>>>>>>> Thanks for updating the FLIP. It looks better. Some suggestions / >>>>>>> questions: >>>>>>>> >>>>>>>> 1. In the motivation section: >>>>>>>> >>>>>>>>> *Currently, Flink Table/SQL does not expose fine-grained control for >>>>>>> users >>>>>>>>> to control filter pushdown. **However, filter pushdown has some side >>>>>>>>> effects, such as additional computational pressure on external >>>>>>>>> systems. Moreover, Improper queries can lead to issues such as full >>>>>>> table >>>>>>>>> scans, which in turn can impact the stability of external systems.* >>>>>>>> >>>>>>>> This statement sounds like the side effects are there for all the >>>>>>> systems, >>>>>>>> which is inaccurate. Maybe we can say: >>>>>>>> *Currently, Flink Table/SQL does not expose fine-grained control for >>>>>>> users >>>>>>>> to control filter pushdown. **However, filter pushdown may have side >>>>>>>> effects in some cases, **such as additional computational pressure on >>>>>>>> external systems. The JDBC source is a typical example of that. If a >>>>>>> filter >>>>>>>> is pushed down to the database, an expensive full table scan may happen >>>>>>> if >>>>>>>> the filter involves unindexed columns.* >>>>>>>> >>>>>>>> 2. Regarding the prefix, usually a prefix is not required for the top >>>>>>> level >>>>>>>> connector options. This is because the *connector* option is already >>>>>>> there. >>>>>>>> So >>>>>>>> 'connector' = 'jdbc', >>>>>>>> 'filter.handling.policy' = 'ALWAYS' >>>>>>>> is sufficient. >>>>>>>> >>>>>>>> The prefix is needed when the option is for a 2nd+ level. For example, >>>>>>>> 'connector' = 'jdbc', >>>>>>>> 'format' = 'orc', >>>>>>>> 'orc.some.option' = 'some_value' >>>>>>>> In this case, the prefix of "orc" is needed to make it clear this >>>>> option >>>>>>> is >>>>>>>> for the format. >>>>>>>> >>>>>>>> I am guessing that the reason that currently the connector prefix is >>>>>>> there >>>>>>>> is because the values of this configuration may be different depending >>>>> on >>>>>>>> the connectors. For example, jdbc may have INDEXED_ONLY while MongoDB >>>>> may >>>>>>>> have something else. Personally speaking, I am fine if we do not have a >>>>>>>> prefix in this case because users have already specified the connector >>>>>>> type >>>>>>>> and it is intuitive enough that the option value is for that connector, >>>>>>> not >>>>>>>> others. >>>>>>>> >>>>>>>> 3. can we clarify on the following statement: >>>>>>>> >>>>>>>>> *Introduce the native configuration [prefix].filter.handling.policy in >>>>>>> the >>>>>>>>> connector.* >>>>>>>> >>>>>>>> What do you mean by "native configuration"? From what I understand, the >>>>>>>> FLIP does the following: >>>>>>>> - introduces a new configuration to the JDBC and MongoDB connector. >>>>>>>> - Suggests a convention option name if other connectors are going to >>>>> add >>>>>>> an >>>>>>>> option for the same purpose. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Jiangjie (Becket) Qin >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Dec 18, 2023 at 5:45 PM Jiabao Sun <jiabao....@xtransfer.cn >>>>>>>> <mailto:jiabao....@xtransfer.cn> >>>>>>> .invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Becket, >>>>>>>>> >>>>>>>>> The FLIP document[1] has been updated. >>>>>>>>> Could you help take a look again? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Jiabao >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 >>>>> >>>>> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768> >>>>>>>>> >>>>>>>>> >>>>>>>>>> 2023年12月18日 16:53,Becket Qin <be...@gmail.com <http://gmail.com/>> >>>>>>>>>> 写道: >>>>>>>>>> >>>>>>>>>> Yes, that sounds reasonable to me. We can start with ALWAYS and >>>>> NEVER, >>>>>>>>> and >>>>>>>>>> add more policies as needed. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>> >>>>>>>>>> On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun <jiabao....@xtransfer.cn >>>>>>>>>> <mailto:jiabao....@xtransfer.cn> >>>>>>>>> .invalid> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Bucket, >>>>>>>>>>> >>>>>>>>>>> The jdbc.filter.handling.policy is good to me as it provides >>>>>>> sufficient >>>>>>>>>>> extensibility for future filter pushdown optimizations. >>>>>>>>>>> However, currently, we don't have an implementation for the AUTO >>>>> mode, >>>>>>>>> and >>>>>>>>>>> it seems that the AUTO mode can easily be confused with the ALWAYS >>>>>>> mode >>>>>>>>>>> because users don't have the opportunity to MANUALLY decide which >>>>>>>>> filters >>>>>>>>>>> to push down. >>>>>>>>>>> >>>>>>>>>>> I suggest that we only introduce the ALWAYS and NEVER modes for now, >>>>>>> and >>>>>>>>>>> we can consider introducing more flexible policies in the future, >>>>>>>>>>> such as INDEX_ONLY, NUMBERIC_ONLY and so on. >>>>>>>>>>> >>>>>>>>>>> WDYT? >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jiabao >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 2023年12月18日 16:27,Becket Qin <be...@gmail.com <http://gmail.com/>> >>>>>>>>>>>> 写道: >>>>>>>>>>>> >>>>>>>>>>>> Hi Jiabao, >>>>>>>>>>>> >>>>>>>>>>>> Please see the reply inline. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> The MySQL connector is currently in the flink-connector-jdbc >>>>>>>>> repository >>>>>>>>>>>>> and is not a standalone connector. >>>>>>>>>>>>> Is it too unique to use "mysql" as the configuration option >>>>> prefix? >>>>>>>>>>>> >>>>>>>>>>>> If the intended behavior makes sense to all the supported JDBC >>>>>>> drivers, >>>>>>>>>>> we >>>>>>>>>>>> can make this a JDBC connector configuration. >>>>>>>>>>>> >>>>>>>>>>>> Also, I would like to ask about the difference in behavior between >>>>>>> AUTO >>>>>>>>>>> and >>>>>>>>>>>>> ALWAYS. >>>>>>>>>>>>> It seems that we cannot guarantee the pushing down of all filters >>>>> to >>>>>>>>> the >>>>>>>>>>>>> external system under the ALWAYS >>>>>>>>>>>>> mode because not all filters in Flink SQL are supported by the >>>>>>>>> external >>>>>>>>>>>>> system. >>>>>>>>>>>>> Should we throw an error when encountering a filter that cannot be >>>>>>>>>>> pushed >>>>>>>>>>>>> down in the ALWAYS mode? >>>>>>>>>>>> >>>>>>>>>>>> The idea of AUTO is to do efficiency-aware pushdowns. The source >>>>> will >>>>>>>>>>> query >>>>>>>>>>>> the external system (MySQL, Oracle, SQL Server, etc) first to >>>>>>> retrieve >>>>>>>>>>> the >>>>>>>>>>>> information of the table. With that information, the source will >>>>>>> decide >>>>>>>>>>>> whether to further push a filter to the external system based on >>>>> the >>>>>>>>>>>> efficiency. E.g. only push the indexed fields. In contrast, ALWAYS >>>>>>> will >>>>>>>>>>>> just always push the supported filters to the external system, >>>>>>>>> regardless >>>>>>>>>>>> of the efficiency. In case there are filters that are not >>>>> supported, >>>>>>>>>>>> according to the current contract of SupportsFilterPushdown, these >>>>>>>>>>>> unsupported filters should just be returned by the >>>>>>>>>>>> *SupportsFilterPushdown.applyFilters()* method as remaining >>>>> filters. >>>>>>>>>>>> Therefore, there is no need to throw exceptions here. This is >>>>> likely >>>>>>>>> the >>>>>>>>>>>> desired behavior for most users, IMO. If there are cases that users >>>>>>>>>>> really >>>>>>>>>>>> want to get alerted when a filter cannot be pushed to the external >>>>>>>>>>> system, >>>>>>>>>>>> we can add another value like "ENFORCED_ALWAYS", which behaves like >>>>>>>>>>> ALWAYS, >>>>>>>>>>>> but throws exceptions when a filter cannot be applied to the >>>>> external >>>>>>>>>>>> system. But personally I don't see much value in doing this. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun < >>>>> jiabao....@xtransfer.cn <mailto:jiabao....@xtransfer.cn> >>>>>>>>>>> .invalid> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Becket, >>>>>>>>>>>>> >>>>>>>>>>>>> The MySQL connector is currently in the flink-connector-jdbc >>>>>>>>> repository >>>>>>>>>>>>> and is not a standalone connector. >>>>>>>>>>>>> Is it too unique to use "mysql" as the configuration option >>>>> prefix? >>>>>>>>>>>>> >>>>>>>>>>>>> Also, I would like to ask about the difference in behavior between >>>>>>>>> AUTO >>>>>>>>>>>>> and ALWAYS. >>>>>>>>>>>>> It seems that we cannot guarantee the pushing down of all filters >>>>> to >>>>>>>>> the >>>>>>>>>>>>> external system under the ALWAYS >>>>>>>>>>>>> mode because not all filters in Flink SQL are supported by the >>>>>>>>> external >>>>>>>>>>>>> system. >>>>>>>>>>>>> Should we throw an error when encountering a filter that cannot be >>>>>>>>>>> pushed >>>>>>>>>>>>> down in the ALWAYS mode? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Jiabao >>>>>>>>>>>>> >>>>>>>>>>>>>> 2023年12月18日 15:34,Becket Qin <be...@gmail.com >>>>>>>>>>>>>> <http://gmail.com/>> 写道: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi JIabao, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for updating the FLIP. Maybe I did not explain it clearly >>>>>>>>>>> enough. >>>>>>>>>>>>> My >>>>>>>>>>>>>> point is that given there are various good flavors of behaviors >>>>>>>>>>> handling >>>>>>>>>>>>>> filters pushed down, we should not have a common config of >>>>>>>>>>>>>> "ignore.filter.pushdown", because the behavior is not *common*. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It looks like the original motivation of this FLIP is just for >>>>>>> MySql. >>>>>>>>>>>>> Let's >>>>>>>>>>>>>> focus on what is the best solution for MySql connector here >>>>> first. >>>>>>>>>>> After >>>>>>>>>>>>>> that, if people think the best behavior for MySql happens to be a >>>>>>>>>>> common >>>>>>>>>>>>>> one, we can then discuss whether that is worth being added to the >>>>>>>>> base >>>>>>>>>>>>>> implementation of source. For MySQL, if we are going to >>>>> introduce a >>>>>>>>>>>>> config >>>>>>>>>>>>>> to MySql, why not have something like >>>>>>> "mysql.filter.handling.policy" >>>>>>>>>>> with >>>>>>>>>>>>>> value of AUTO / NEVER / ALWAYS? Isn't that better than >>>>>>>>>>>>>> "ignore.filter.pushdown"? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Dec 17, 2023 at 11:30 PM Jiabao Sun < >>>>>>> jiabao....@xtransfer.cn <mailto:jiabao....@xtransfer.cn> >>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Becket, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The FLIP document has been updated as well. >>>>>>>>>>>>>>> Please take a look when you have time. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2023年12月17日 22:54,Jiabao Sun <ji...@xtransfer.cn.INVALID> >>>>>>> 写道: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks Becket, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I apologize for not being able to continue with this proposal >>>>> due >>>>>>>>> to >>>>>>>>>>>>>>> being too busy during this period. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The viewpoints you shared about the design of Flink Source make >>>>>>>>> sense >>>>>>>>>>>>> to >>>>>>>>>>>>>>> me >>>>>>>>>>>>>>>> The native configuration ‘ignore.filter.pushdown’ is good to >>>>> me. >>>>>>>>>>>>>>>> Having a unified name or naming style can indeed prevent >>>>>>> confusion >>>>>>>>>>> for >>>>>>>>>>>>>>> users regarding >>>>>>>>>>>>>>>> the inconsistent naming of this configuration across different >>>>>>>>>>>>>>> connectors. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Currently, there are not many external connectors that support >>>>>>>>> filter >>>>>>>>>>>>>>> pushdown. >>>>>>>>>>>>>>>> I propose that we first introduce it in flink-connector-jdbc >>>>> and >>>>>>>>>>>>>>> flink-connector-mongodb. >>>>>>>>>>>>>>>> Do you think this is feasible? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>> > [message truncated...]