Thanks Leonard for your reminder,

The FLIP title has been changed as FLIP-377: 
Support fine-grained configuration to control filter push down for Table/SQL 
Sources.

Best,
Jiabao

On 2024/01/03 06:51:10 Leonard Xu wrote:
> Thanks Jiabao for driving this.
> 
> +1 to start a vote, a minor comment, should we change the FLIP title 
> according this thread context as well?
> 
> Best,
> Leonard
> 
> 
> 
> > 2024年1月3日 下午2:43,Jiabao Sun <ji...@xtransfer.cn.INVALID> 写道:
> > 
> > Hi,
> > 
> > Thank you again for the discussion on this FLIP.
> > If there are no further comments, I plan to start a voting thread tomorrow.
> > 
> > Best,
> > Jiabao
> > 
> > On 2023/12/20 14:09:49 Jiabao Sun wrote:
> >> Hi,
> >> 
> >> Thank you to everyone for the discussion on this FLIP, 
> >> especially Becket for providing guidance that made it more reasonable. 
> >> 
> >> The FLIP document[1] has been updated with the recent discussed content. 
> >> Please take a look to double-check it when you have time.
> >> 
> >> If we can reach a consensus on this, I will open the voting thread in 
> >> recent days.
> >> 
> >> Best,
> >> Jiabao
> >> 
> >> [1] 
> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> >> 
> >> 
> >>> 2023年12月20日 11:38,Jiabao Sun <ji...@xtransfer.cn.INVALID> 写道:
> >>> 
> >>> Thanks Becket,
> >>> 
> >>> The behavior description has been added to the Public Interfaces section.
> >>> 
> >>> Best,
> >>> Jiabao
> >>> 
> >>> 
> >>>> 2023年12月20日 08:17,Becket Qin <be...@gmail.com <http://gmail.com/>> 写道:
> >>>> 
> >>>> Hi Jiabao,
> >>>> 
> >>>> Thanks for updating the FLIP.
> >>>> Can you add the behavior of the policies that are only applicable to some
> >>>> but not all of the databases? This is a part of the intended behavior of
> >>>> the proposed configuration. So, we should include that in the FLIP.
> >>>> Otherwise, the FLIP looks good to me.
> >>>> 
> >>>> Cheers,
> >>>> 
> >>>> Jiangjie (Becket) Qin
> >>>> 
> >>>> On Tue, Dec 19, 2023 at 11:00 PM Jiabao Sun <ji...@xtransfer.cn.invalid>
> >>>> wrote:
> >>>> 
> >>>>> Hi Becket,
> >>>>> 
> >>>>> I share the same view as you regarding the prefix for this configuration
> >>>>> option.
> >>>>> 
> >>>>> For the JDBC connector, I prefer setting 'filter.handling.policy' = 
> >>>>> 'FOO'
> >>>>> and throwing an exception when the database do not support that specific
> >>>>> policy.
> >>>>> 
> >>>>> Not using a prefix can reduce the learning curve for users and avoid
> >>>>> introducing a new set of configuration options for every supported JDBC
> >>>>> database.
> >>>>> I think the policies we provide can be compatible with most databases 
> >>>>> that
> >>>>> follow the JDBC protocol.
> >>>>> However, there may be cases where certain databases cannot support some
> >>>>> policies.
> >>>>> Nevertheless, we can ensure fast failure and allow users to choose a
> >>>>> suitable policy in such situations.
> >>>>> 
> >>>>> I have removed the contents about the configuration prefix.
> >>>>> Please help review it again.
> >>>>> 
> >>>>> Thanks,
> >>>>> Jiabao
> >>>>> 
> >>>>> 
> >>>>>> 2023年12月19日 19:46,Becket Qin <be...@gmail.com <http://gmail.com/>> 写道:
> >>>>>> 
> >>>>>> Hi Jiabao,
> >>>>>> 
> >>>>>> Thanks for updating the FLIP.
> >>>>>> 
> >>>>>> One more question regarding the JDBC connector, since it is a connector
> >>>>>> shared by multiple databases, what if there is a filter handling policy
> >>>>>> that is only applicable to one of the databases, but not the others? In
> >>>>>> that case, how would the users specify that policy?
> >>>>>> Unlike the example of orc format with 2nd+ level config, JDBC connector
> >>>>>> only looks at the URL to decide which driver to use.
> >>>>>> 
> >>>>>> For example, MySql supports policy FOO while other databases do not. If
> >>>>>> users want to use FOO for MySql, what should they do? Will they set
> >>>>>> '*mysql.filter.hanlding.policy'
> >>>>>> = 'FOO', *which will only be picked up when the MySql driver is used?
> >>>>>> Or they should just set* 'filter.handling.policy' = 'FOO', *and throw
> >>>>>> exceptions when other JDBC drivers are used? Personally, I prefer the
> >>>>>> latter. If we pick that, do we still need to mention the following?
> >>>>>> 
> >>>>>> *The prefix is needed when the option is for a 2nd+ level. *
> >>>>>>> 'connector' = 'filesystem',
> >>>>>>> 'format' = 'orc',
> >>>>>>> 'orc.filter.handling.policy' = 'NUBERIC_ONY'
> >>>>>>> 
> >>>>>>> *In this case, the values of this configuration may be different
> >>>>> depending
> >>>>>>> on the format option. For example, orc format may have INDEXED_ONLY
> >>>>> while
> >>>>>>> parquet format may have something else. *
> >>>>>>> 
> >>>>>> 
> >>>>>> I found this is somewhat misleading, because the example here is not a
> >>>>> part
> >>>>>> of the proposal of this FLIP. It is just an example explaining when a
> >>>>>> prefix is needed, which seems orthogonal to the proposal in this FLIP.
> >>>>>> 
> >>>>>> Thanks,
> >>>>>> 
> >>>>>> Jiangjie (Becket) Qin
> >>>>>> 
> >>>>>> 
> >>>>>> On Tue, Dec 19, 2023 at 10:09 AM Jiabao Sun <jiabao....@xtransfer.cn 
> >>>>>> <ma...@xtransfer.cn>
> >>>>> .invalid>
> >>>>>> wrote:
> >>>>>> 
> >>>>>>> Thanks Becket for the suggestions,
> >>>>>>> 
> >>>>>>> Updated.
> >>>>>>> Please help review it again when you have time.
> >>>>>>> 
> >>>>>>> Best,
> >>>>>>> Jiabao
> >>>>>>> 
> >>>>>>> 
> >>>>>>>> 2023年12月19日 09:06,Becket Qin <be...@gmail.com <http://gmail.com/>> 
> >>>>>>>> 写道:
> >>>>>>>> 
> >>>>>>>> Hi JIabao,
> >>>>>>>> 
> >>>>>>>> Thanks for updating the FLIP. It looks better. Some suggestions /
> >>>>>>> questions:
> >>>>>>>> 
> >>>>>>>> 1. In the motivation section:
> >>>>>>>> 
> >>>>>>>>> *Currently, Flink Table/SQL does not expose fine-grained control for
> >>>>>>> users
> >>>>>>>>> to control filter pushdown. **However, filter pushdown has some side
> >>>>>>>>> effects, such as additional computational pressure on external
> >>>>>>>>> systems. Moreover, Improper queries can lead to issues such as full
> >>>>>>> table
> >>>>>>>>> scans, which in turn can impact the stability of external systems.*
> >>>>>>>> 
> >>>>>>>> This statement sounds like the side effects are there for all the
> >>>>>>> systems,
> >>>>>>>> which is inaccurate. Maybe we can say:
> >>>>>>>> *Currently, Flink Table/SQL does not expose fine-grained control for
> >>>>>>> users
> >>>>>>>> to control filter pushdown. **However, filter pushdown may have side
> >>>>>>>> effects in some cases, **such as additional computational pressure on
> >>>>>>>> external systems. The JDBC source is a typical example of that. If a
> >>>>>>> filter
> >>>>>>>> is pushed down to the database, an expensive full table scan may 
> >>>>>>>> happen
> >>>>>>> if
> >>>>>>>> the filter involves unindexed columns.*
> >>>>>>>> 
> >>>>>>>> 2. Regarding the prefix, usually a prefix is not required for the top
> >>>>>>> level
> >>>>>>>> connector options. This is because the *connector* option is already
> >>>>>>> there.
> >>>>>>>> So
> >>>>>>>> 'connector' = 'jdbc',
> >>>>>>>> 'filter.handling.policy' = 'ALWAYS'
> >>>>>>>> is sufficient.
> >>>>>>>> 
> >>>>>>>> The prefix is needed when the option is for a 2nd+ level. For 
> >>>>>>>> example,
> >>>>>>>> 'connector' = 'jdbc',
> >>>>>>>> 'format' = 'orc',
> >>>>>>>> 'orc.some.option' = 'some_value'
> >>>>>>>> In this case, the prefix of "orc" is needed to make it clear this
> >>>>> option
> >>>>>>> is
> >>>>>>>> for the format.
> >>>>>>>> 
> >>>>>>>> I am guessing that the reason that currently the connector prefix is
> >>>>>>> there
> >>>>>>>> is because the values of this configuration may be different 
> >>>>>>>> depending
> >>>>> on
> >>>>>>>> the connectors. For example, jdbc may have INDEXED_ONLY while MongoDB
> >>>>> may
> >>>>>>>> have something else. Personally speaking, I am fine if we do not 
> >>>>>>>> have a
> >>>>>>>> prefix in this case because users have already specified the 
> >>>>>>>> connector
> >>>>>>> type
> >>>>>>>> and it is intuitive enough that the option value is for that 
> >>>>>>>> connector,
> >>>>>>> not
> >>>>>>>> others.
> >>>>>>>> 
> >>>>>>>> 3. can we clarify on the following statement:
> >>>>>>>> 
> >>>>>>>>> *Introduce the native configuration [prefix].filter.handling.policy 
> >>>>>>>>> in
> >>>>>>> the
> >>>>>>>>> connector.*
> >>>>>>>> 
> >>>>>>>> What do you mean by "native configuration"? From what I understand, 
> >>>>>>>> the
> >>>>>>>> FLIP does the following:
> >>>>>>>> - introduces a new configuration to the JDBC and MongoDB connector.
> >>>>>>>> - Suggests a convention option name if other connectors are going to
> >>>>> add
> >>>>>>> an
> >>>>>>>> option for the same purpose.
> >>>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>>> 
> >>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> On Mon, Dec 18, 2023 at 5:45 PM Jiabao Sun <jiabao....@xtransfer.cn 
> >>>>>>>> <ma...@xtransfer.cn>
> >>>>>>> .invalid>
> >>>>>>>> wrote:
> >>>>>>>> 
> >>>>>>>>> Hi Becket,
> >>>>>>>>> 
> >>>>>>>>> The FLIP document[1] has been updated.
> >>>>>>>>> Could you help take a look again?
> >>>>>>>>> 
> >>>>>>>>> Thanks,
> >>>>>>>>> Jiabao
> >>>>>>>>> 
> >>>>>>>>> [1]
> >>>>>>>>> 
> >>>>>>> 
> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> >>>>>  
> >>>>> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768>
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>> 2023年12月18日 16:53,Becket Qin <be...@gmail.com <http://gmail.com/>> 
> >>>>>>>>>> 写道:
> >>>>>>>>>> 
> >>>>>>>>>> Yes, that sounds reasonable to me. We can start with ALWAYS and
> >>>>> NEVER,
> >>>>>>>>> and
> >>>>>>>>>> add more policies as needed.
> >>>>>>>>>> 
> >>>>>>>>>> Thanks,
> >>>>>>>>>> 
> >>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>> 
> >>>>>>>>>> On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun 
> >>>>>>>>>> <jiabao....@xtransfer.cn <ma...@xtransfer.cn>
> >>>>>>>>> .invalid>
> >>>>>>>>>> wrote:
> >>>>>>>>>> 
> >>>>>>>>>>> Thanks Bucket,
> >>>>>>>>>>> 
> >>>>>>>>>>> The jdbc.filter.handling.policy is good to me as it provides
> >>>>>>> sufficient
> >>>>>>>>>>> extensibility for future filter pushdown optimizations.
> >>>>>>>>>>> However, currently, we don't have an implementation for the AUTO
> >>>>> mode,
> >>>>>>>>> and
> >>>>>>>>>>> it seems that the AUTO mode can easily be confused with the ALWAYS
> >>>>>>> mode
> >>>>>>>>>>> because users don't have the opportunity to MANUALLY decide which
> >>>>>>>>> filters
> >>>>>>>>>>> to push down.
> >>>>>>>>>>> 
> >>>>>>>>>>> I suggest that we only introduce the ALWAYS and NEVER modes for 
> >>>>>>>>>>> now,
> >>>>>>> and
> >>>>>>>>>>> we can consider introducing more flexible policies in the future,
> >>>>>>>>>>> such as INDEX_ONLY, NUMBERIC_ONLY and so on.
> >>>>>>>>>>> 
> >>>>>>>>>>> WDYT?
> >>>>>>>>>>> 
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Jiabao
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>>> 2023年12月18日 16:27,Becket Qin <be...@gmail.com 
> >>>>>>>>>>>> <http://gmail.com/>> 写道:
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Hi Jiabao,
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Please see the reply inline.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> 
> >>>>>>>>>>>>> The MySQL connector is currently in the flink-connector-jdbc
> >>>>>>>>> repository
> >>>>>>>>>>>>> and is not a standalone connector.
> >>>>>>>>>>>>> Is it too unique to use "mysql" as the configuration option
> >>>>> prefix?
> >>>>>>>>>>>> 
> >>>>>>>>>>>> If the intended behavior makes sense to all the supported JDBC
> >>>>>>> drivers,
> >>>>>>>>>>> we
> >>>>>>>>>>>> can make this a JDBC connector configuration.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Also, I would like to ask about the difference in behavior 
> >>>>>>>>>>>> between
> >>>>>>> AUTO
> >>>>>>>>>>> and
> >>>>>>>>>>>>> ALWAYS.
> >>>>>>>>>>>>> It seems that we cannot guarantee the pushing down of all 
> >>>>>>>>>>>>> filters
> >>>>> to
> >>>>>>>>> the
> >>>>>>>>>>>>> external system under the ALWAYS
> >>>>>>>>>>>>> mode because not all filters in Flink SQL are supported by the
> >>>>>>>>> external
> >>>>>>>>>>>>> system.
> >>>>>>>>>>>>> Should we throw an error when encountering a filter that cannot 
> >>>>>>>>>>>>> be
> >>>>>>>>>>> pushed
> >>>>>>>>>>>>> down in the ALWAYS mode?
> >>>>>>>>>>>> 
> >>>>>>>>>>>> The idea of AUTO is to do efficiency-aware pushdowns. The source
> >>>>> will
> >>>>>>>>>>> query
> >>>>>>>>>>>> the external system (MySQL, Oracle, SQL Server, etc) first to
> >>>>>>> retrieve
> >>>>>>>>>>> the
> >>>>>>>>>>>> information of the table. With that information, the source will
> >>>>>>> decide
> >>>>>>>>>>>> whether to further push a filter to the external system based on
> >>>>> the
> >>>>>>>>>>>> efficiency. E.g. only push the indexed fields. In contrast, 
> >>>>>>>>>>>> ALWAYS
> >>>>>>> will
> >>>>>>>>>>>> just always push the supported filters to the external system,
> >>>>>>>>> regardless
> >>>>>>>>>>>> of the efficiency. In case there are filters that are not
> >>>>> supported,
> >>>>>>>>>>>> according to the current contract of SupportsFilterPushdown, 
> >>>>>>>>>>>> these
> >>>>>>>>>>>> unsupported filters should just be returned by the
> >>>>>>>>>>>> *SupportsFilterPushdown.applyFilters()* method as remaining
> >>>>> filters.
> >>>>>>>>>>>> Therefore, there is no need to throw exceptions here. This is
> >>>>> likely
> >>>>>>>>> the
> >>>>>>>>>>>> desired behavior for most users, IMO. If there are cases that 
> >>>>>>>>>>>> users
> >>>>>>>>>>> really
> >>>>>>>>>>>> want to get alerted when a filter cannot be pushed to the 
> >>>>>>>>>>>> external
> >>>>>>>>>>> system,
> >>>>>>>>>>>> we can add another value like "ENFORCED_ALWAYS", which behaves 
> >>>>>>>>>>>> like
> >>>>>>>>>>> ALWAYS,
> >>>>>>>>>>>> but throws exceptions when a filter cannot be applied to the
> >>>>> external
> >>>>>>>>>>>> system. But personally I don't see much value in doing this.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> 
> >>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>> 
> >>>>>>>>>>>> 
> >>>>>>>>>>>> 
> >>>>>>>>>>>> On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun <
> >>>>> jiabao....@xtransfer.cn <ma...@xtransfer.cn>
> >>>>>>>>>>> .invalid>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>> 
> >>>>>>>>>>>>> Hi Becket,
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> The MySQL connector is currently in the flink-connector-jdbc
> >>>>>>>>> repository
> >>>>>>>>>>>>> and is not a standalone connector.
> >>>>>>>>>>>>> Is it too unique to use "mysql" as the configuration option
> >>>>> prefix?
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Also, I would like to ask about the difference in behavior 
> >>>>>>>>>>>>> between
> >>>>>>>>> AUTO
> >>>>>>>>>>>>> and ALWAYS.
> >>>>>>>>>>>>> It seems that we cannot guarantee the pushing down of all 
> >>>>>>>>>>>>> filters
> >>>>> to
> >>>>>>>>> the
> >>>>>>>>>>>>> external system under the ALWAYS
> >>>>>>>>>>>>> mode because not all filters in Flink SQL are supported by the
> >>>>>>>>> external
> >>>>>>>>>>>>> system.
> >>>>>>>>>>>>> Should we throw an error when encountering a filter that cannot 
> >>>>>>>>>>>>> be
> >>>>>>>>>>> pushed
> >>>>>>>>>>>>> down in the ALWAYS mode?
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Jiabao
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 2023年12月18日 15:34,Becket Qin <be...@gmail.com 
> >>>>>>>>>>>>>> <http://gmail.com/>> 写道:
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Hi JIabao,
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Thanks for updating the FLIP. Maybe I did not explain it 
> >>>>>>>>>>>>>> clearly
> >>>>>>>>>>> enough.
> >>>>>>>>>>>>> My
> >>>>>>>>>>>>>> point is that given there are various good flavors of behaviors
> >>>>>>>>>>> handling
> >>>>>>>>>>>>>> filters pushed down, we should not have a common config of
> >>>>>>>>>>>>>> "ignore.filter.pushdown", because the behavior is not *common*.
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> It looks like the original motivation of this FLIP is just for
> >>>>>>> MySql.
> >>>>>>>>>>>>> Let's
> >>>>>>>>>>>>>> focus on what is the best solution for MySql connector here
> >>>>> first.
> >>>>>>>>>>> After
> >>>>>>>>>>>>>> that, if people think the best behavior for MySql happens to 
> >>>>>>>>>>>>>> be a
> >>>>>>>>>>> common
> >>>>>>>>>>>>>> one, we can then discuss whether that is worth being added to 
> >>>>>>>>>>>>>> the
> >>>>>>>>> base
> >>>>>>>>>>>>>> implementation of source. For MySQL, if we are going to
> >>>>> introduce a
> >>>>>>>>>>>>> config
> >>>>>>>>>>>>>> to MySql, why not have something like
> >>>>>>> "mysql.filter.handling.policy"
> >>>>>>>>>>> with
> >>>>>>>>>>>>>> value of AUTO / NEVER / ALWAYS? Isn't that better than
> >>>>>>>>>>>>>> "ignore.filter.pushdown"?
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
>
[message truncated...]

Reply via email to