Thanks Jiabao for driving this.

+1 to start a vote, a minor comment, should we change the FLIP title according 
this thread context as well?

Best,
Leonard



> 2024年1月3日 下午2:43,Jiabao Sun <jiabao....@xtransfer.cn.INVALID> 写道:
> 
> Hi,
> 
> Thank you again for the discussion on this FLIP.
> If there are no further comments, I plan to start a voting thread tomorrow.
> 
> Best,
> Jiabao
> 
> On 2023/12/20 14:09:49 Jiabao Sun wrote:
>> Hi,
>> 
>> Thank you to everyone for the discussion on this FLIP, 
>> especially Becket for providing guidance that made it more reasonable. 
>> 
>> The FLIP document[1] has been updated with the recent discussed content. 
>> Please take a look to double-check it when you have time.
>> 
>> If we can reach a consensus on this, I will open the voting thread in recent 
>> days.
>> 
>> Best,
>> Jiabao
>> 
>> [1] 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>> 
>> 
>>> 2023年12月20日 11:38,Jiabao Sun <ji...@xtransfer.cn.INVALID> 写道:
>>> 
>>> Thanks Becket,
>>> 
>>> The behavior description has been added to the Public Interfaces section.
>>> 
>>> Best,
>>> Jiabao
>>> 
>>> 
>>>> 2023年12月20日 08:17,Becket Qin <be...@gmail.com <http://gmail.com/>> 写道:
>>>> 
>>>> Hi Jiabao,
>>>> 
>>>> Thanks for updating the FLIP.
>>>> Can you add the behavior of the policies that are only applicable to some
>>>> but not all of the databases? This is a part of the intended behavior of
>>>> the proposed configuration. So, we should include that in the FLIP.
>>>> Otherwise, the FLIP looks good to me.
>>>> 
>>>> Cheers,
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>> On Tue, Dec 19, 2023 at 11:00 PM Jiabao Sun <ji...@xtransfer.cn.invalid>
>>>> wrote:
>>>> 
>>>>> Hi Becket,
>>>>> 
>>>>> I share the same view as you regarding the prefix for this configuration
>>>>> option.
>>>>> 
>>>>> For the JDBC connector, I prefer setting 'filter.handling.policy' = 'FOO'
>>>>> and throwing an exception when the database do not support that specific
>>>>> policy.
>>>>> 
>>>>> Not using a prefix can reduce the learning curve for users and avoid
>>>>> introducing a new set of configuration options for every supported JDBC
>>>>> database.
>>>>> I think the policies we provide can be compatible with most databases that
>>>>> follow the JDBC protocol.
>>>>> However, there may be cases where certain databases cannot support some
>>>>> policies.
>>>>> Nevertheless, we can ensure fast failure and allow users to choose a
>>>>> suitable policy in such situations.
>>>>> 
>>>>> I have removed the contents about the configuration prefix.
>>>>> Please help review it again.
>>>>> 
>>>>> Thanks,
>>>>> Jiabao
>>>>> 
>>>>> 
>>>>>> 2023年12月19日 19:46,Becket Qin <be...@gmail.com <http://gmail.com/>> 写道:
>>>>>> 
>>>>>> Hi Jiabao,
>>>>>> 
>>>>>> Thanks for updating the FLIP.
>>>>>> 
>>>>>> One more question regarding the JDBC connector, since it is a connector
>>>>>> shared by multiple databases, what if there is a filter handling policy
>>>>>> that is only applicable to one of the databases, but not the others? In
>>>>>> that case, how would the users specify that policy?
>>>>>> Unlike the example of orc format with 2nd+ level config, JDBC connector
>>>>>> only looks at the URL to decide which driver to use.
>>>>>> 
>>>>>> For example, MySql supports policy FOO while other databases do not. If
>>>>>> users want to use FOO for MySql, what should they do? Will they set
>>>>>> '*mysql.filter.hanlding.policy'
>>>>>> = 'FOO', *which will only be picked up when the MySql driver is used?
>>>>>> Or they should just set* 'filter.handling.policy' = 'FOO', *and throw
>>>>>> exceptions when other JDBC drivers are used? Personally, I prefer the
>>>>>> latter. If we pick that, do we still need to mention the following?
>>>>>> 
>>>>>> *The prefix is needed when the option is for a 2nd+ level. *
>>>>>>> 'connector' = 'filesystem',
>>>>>>> 'format' = 'orc',
>>>>>>> 'orc.filter.handling.policy' = 'NUBERIC_ONY'
>>>>>>> 
>>>>>>> *In this case, the values of this configuration may be different
>>>>> depending
>>>>>>> on the format option. For example, orc format may have INDEXED_ONLY
>>>>> while
>>>>>>> parquet format may have something else. *
>>>>>>> 
>>>>>> 
>>>>>> I found this is somewhat misleading, because the example here is not a
>>>>> part
>>>>>> of the proposal of this FLIP. It is just an example explaining when a
>>>>>> prefix is needed, which seems orthogonal to the proposal in this FLIP.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Jiangjie (Becket) Qin
>>>>>> 
>>>>>> 
>>>>>> On Tue, Dec 19, 2023 at 10:09 AM Jiabao Sun <jiabao....@xtransfer.cn 
>>>>>> <mailto:jiabao....@xtransfer.cn>
>>>>> .invalid>
>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks Becket for the suggestions,
>>>>>>> 
>>>>>>> Updated.
>>>>>>> Please help review it again when you have time.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jiabao
>>>>>>> 
>>>>>>> 
>>>>>>>> 2023年12月19日 09:06,Becket Qin <be...@gmail.com <http://gmail.com/>> 写道:
>>>>>>>> 
>>>>>>>> Hi JIabao,
>>>>>>>> 
>>>>>>>> Thanks for updating the FLIP. It looks better. Some suggestions /
>>>>>>> questions:
>>>>>>>> 
>>>>>>>> 1. In the motivation section:
>>>>>>>> 
>>>>>>>>> *Currently, Flink Table/SQL does not expose fine-grained control for
>>>>>>> users
>>>>>>>>> to control filter pushdown. **However, filter pushdown has some side
>>>>>>>>> effects, such as additional computational pressure on external
>>>>>>>>> systems. Moreover, Improper queries can lead to issues such as full
>>>>>>> table
>>>>>>>>> scans, which in turn can impact the stability of external systems.*
>>>>>>>> 
>>>>>>>> This statement sounds like the side effects are there for all the
>>>>>>> systems,
>>>>>>>> which is inaccurate. Maybe we can say:
>>>>>>>> *Currently, Flink Table/SQL does not expose fine-grained control for
>>>>>>> users
>>>>>>>> to control filter pushdown. **However, filter pushdown may have side
>>>>>>>> effects in some cases, **such as additional computational pressure on
>>>>>>>> external systems. The JDBC source is a typical example of that. If a
>>>>>>> filter
>>>>>>>> is pushed down to the database, an expensive full table scan may happen
>>>>>>> if
>>>>>>>> the filter involves unindexed columns.*
>>>>>>>> 
>>>>>>>> 2. Regarding the prefix, usually a prefix is not required for the top
>>>>>>> level
>>>>>>>> connector options. This is because the *connector* option is already
>>>>>>> there.
>>>>>>>> So
>>>>>>>> 'connector' = 'jdbc',
>>>>>>>> 'filter.handling.policy' = 'ALWAYS'
>>>>>>>> is sufficient.
>>>>>>>> 
>>>>>>>> The prefix is needed when the option is for a 2nd+ level. For example,
>>>>>>>> 'connector' = 'jdbc',
>>>>>>>> 'format' = 'orc',
>>>>>>>> 'orc.some.option' = 'some_value'
>>>>>>>> In this case, the prefix of "orc" is needed to make it clear this
>>>>> option
>>>>>>> is
>>>>>>>> for the format.
>>>>>>>> 
>>>>>>>> I am guessing that the reason that currently the connector prefix is
>>>>>>> there
>>>>>>>> is because the values of this configuration may be different depending
>>>>> on
>>>>>>>> the connectors. For example, jdbc may have INDEXED_ONLY while MongoDB
>>>>> may
>>>>>>>> have something else. Personally speaking, I am fine if we do not have a
>>>>>>>> prefix in this case because users have already specified the connector
>>>>>>> type
>>>>>>>> and it is intuitive enough that the option value is for that connector,
>>>>>>> not
>>>>>>>> others.
>>>>>>>> 
>>>>>>>> 3. can we clarify on the following statement:
>>>>>>>> 
>>>>>>>>> *Introduce the native configuration [prefix].filter.handling.policy in
>>>>>>> the
>>>>>>>>> connector.*
>>>>>>>> 
>>>>>>>> What do you mean by "native configuration"? From what I understand, the
>>>>>>>> FLIP does the following:
>>>>>>>> - introduces a new configuration to the JDBC and MongoDB connector.
>>>>>>>> - Suggests a convention option name if other connectors are going to
>>>>> add
>>>>>>> an
>>>>>>>> option for the same purpose.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Dec 18, 2023 at 5:45 PM Jiabao Sun <jiabao....@xtransfer.cn 
>>>>>>>> <mailto:jiabao....@xtransfer.cn>
>>>>>>> .invalid>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Becket,
>>>>>>>>> 
>>>>>>>>> The FLIP document[1] has been updated.
>>>>>>>>> Could you help take a look again?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Jiabao
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>> 
>>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>>>>>  
>>>>> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 2023年12月18日 16:53,Becket Qin <be...@gmail.com <http://gmail.com/>> 
>>>>>>>>>> 写道:
>>>>>>>>>> 
>>>>>>>>>> Yes, that sounds reasonable to me. We can start with ALWAYS and
>>>>> NEVER,
>>>>>>>>> and
>>>>>>>>>> add more policies as needed.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>> 
>>>>>>>>>> On Mon, Dec 18, 2023 at 4:48 PM Jiabao Sun <jiabao....@xtransfer.cn 
>>>>>>>>>> <mailto:jiabao....@xtransfer.cn>
>>>>>>>>> .invalid>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks Bucket,
>>>>>>>>>>> 
>>>>>>>>>>> The jdbc.filter.handling.policy is good to me as it provides
>>>>>>> sufficient
>>>>>>>>>>> extensibility for future filter pushdown optimizations.
>>>>>>>>>>> However, currently, we don't have an implementation for the AUTO
>>>>> mode,
>>>>>>>>> and
>>>>>>>>>>> it seems that the AUTO mode can easily be confused with the ALWAYS
>>>>>>> mode
>>>>>>>>>>> because users don't have the opportunity to MANUALLY decide which
>>>>>>>>> filters
>>>>>>>>>>> to push down.
>>>>>>>>>>> 
>>>>>>>>>>> I suggest that we only introduce the ALWAYS and NEVER modes for now,
>>>>>>> and
>>>>>>>>>>> we can consider introducing more flexible policies in the future,
>>>>>>>>>>> such as INDEX_ONLY, NUMBERIC_ONLY and so on.
>>>>>>>>>>> 
>>>>>>>>>>> WDYT?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jiabao
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 2023年12月18日 16:27,Becket Qin <be...@gmail.com <http://gmail.com/>> 
>>>>>>>>>>>> 写道:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Jiabao,
>>>>>>>>>>>> 
>>>>>>>>>>>> Please see the reply inline.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> The MySQL connector is currently in the flink-connector-jdbc
>>>>>>>>> repository
>>>>>>>>>>>>> and is not a standalone connector.
>>>>>>>>>>>>> Is it too unique to use "mysql" as the configuration option
>>>>> prefix?
>>>>>>>>>>>> 
>>>>>>>>>>>> If the intended behavior makes sense to all the supported JDBC
>>>>>>> drivers,
>>>>>>>>>>> we
>>>>>>>>>>>> can make this a JDBC connector configuration.
>>>>>>>>>>>> 
>>>>>>>>>>>> Also, I would like to ask about the difference in behavior between
>>>>>>> AUTO
>>>>>>>>>>> and
>>>>>>>>>>>>> ALWAYS.
>>>>>>>>>>>>> It seems that we cannot guarantee the pushing down of all filters
>>>>> to
>>>>>>>>> the
>>>>>>>>>>>>> external system under the ALWAYS
>>>>>>>>>>>>> mode because not all filters in Flink SQL are supported by the
>>>>>>>>> external
>>>>>>>>>>>>> system.
>>>>>>>>>>>>> Should we throw an error when encountering a filter that cannot be
>>>>>>>>>>> pushed
>>>>>>>>>>>>> down in the ALWAYS mode?
>>>>>>>>>>>> 
>>>>>>>>>>>> The idea of AUTO is to do efficiency-aware pushdowns. The source
>>>>> will
>>>>>>>>>>> query
>>>>>>>>>>>> the external system (MySQL, Oracle, SQL Server, etc) first to
>>>>>>> retrieve
>>>>>>>>>>> the
>>>>>>>>>>>> information of the table. With that information, the source will
>>>>>>> decide
>>>>>>>>>>>> whether to further push a filter to the external system based on
>>>>> the
>>>>>>>>>>>> efficiency. E.g. only push the indexed fields. In contrast, ALWAYS
>>>>>>> will
>>>>>>>>>>>> just always push the supported filters to the external system,
>>>>>>>>> regardless
>>>>>>>>>>>> of the efficiency. In case there are filters that are not
>>>>> supported,
>>>>>>>>>>>> according to the current contract of SupportsFilterPushdown, these
>>>>>>>>>>>> unsupported filters should just be returned by the
>>>>>>>>>>>> *SupportsFilterPushdown.applyFilters()* method as remaining
>>>>> filters.
>>>>>>>>>>>> Therefore, there is no need to throw exceptions here. This is
>>>>> likely
>>>>>>>>> the
>>>>>>>>>>>> desired behavior for most users, IMO. If there are cases that users
>>>>>>>>>>> really
>>>>>>>>>>>> want to get alerted when a filter cannot be pushed to the external
>>>>>>>>>>> system,
>>>>>>>>>>>> we can add another value like "ENFORCED_ALWAYS", which behaves like
>>>>>>>>>>> ALWAYS,
>>>>>>>>>>>> but throws exceptions when a filter cannot be applied to the
>>>>> external
>>>>>>>>>>>> system. But personally I don't see much value in doing this.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Dec 18, 2023 at 3:54 PM Jiabao Sun <
>>>>> jiabao....@xtransfer.cn <mailto:jiabao....@xtransfer.cn>
>>>>>>>>>>> .invalid>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Becket,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The MySQL connector is currently in the flink-connector-jdbc
>>>>>>>>> repository
>>>>>>>>>>>>> and is not a standalone connector.
>>>>>>>>>>>>> Is it too unique to use "mysql" as the configuration option
>>>>> prefix?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also, I would like to ask about the difference in behavior between
>>>>>>>>> AUTO
>>>>>>>>>>>>> and ALWAYS.
>>>>>>>>>>>>> It seems that we cannot guarantee the pushing down of all filters
>>>>> to
>>>>>>>>> the
>>>>>>>>>>>>> external system under the ALWAYS
>>>>>>>>>>>>> mode because not all filters in Flink SQL are supported by the
>>>>>>>>> external
>>>>>>>>>>>>> system.
>>>>>>>>>>>>> Should we throw an error when encountering a filter that cannot be
>>>>>>>>>>> pushed
>>>>>>>>>>>>> down in the ALWAYS mode?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2023年12月18日 15:34,Becket Qin <be...@gmail.com 
>>>>>>>>>>>>>> <http://gmail.com/>> 写道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi JIabao,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for updating the FLIP. Maybe I did not explain it clearly
>>>>>>>>>>> enough.
>>>>>>>>>>>>> My
>>>>>>>>>>>>>> point is that given there are various good flavors of behaviors
>>>>>>>>>>> handling
>>>>>>>>>>>>>> filters pushed down, we should not have a common config of
>>>>>>>>>>>>>> "ignore.filter.pushdown", because the behavior is not *common*.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It looks like the original motivation of this FLIP is just for
>>>>>>> MySql.
>>>>>>>>>>>>> Let's
>>>>>>>>>>>>>> focus on what is the best solution for MySql connector here
>>>>> first.
>>>>>>>>>>> After
>>>>>>>>>>>>>> that, if people think the best behavior for MySql happens to be a
>>>>>>>>>>> common
>>>>>>>>>>>>>> one, we can then discuss whether that is worth being added to the
>>>>>>>>> base
>>>>>>>>>>>>>> implementation of source. For MySQL, if we are going to
>>>>> introduce a
>>>>>>>>>>>>> config
>>>>>>>>>>>>>> to MySql, why not have something like
>>>>>>> "mysql.filter.handling.policy"
>>>>>>>>>>> with
>>>>>>>>>>>>>> value of AUTO / NEVER / ALWAYS? Isn't that better than
>>>>>>>>>>>>>> "ignore.filter.pushdown"?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Sun, Dec 17, 2023 at 11:30 PM Jiabao Sun <
>>>>>>> jiabao....@xtransfer.cn <mailto:jiabao....@xtransfer.cn>
>>>>>>>>>>>>> .invalid>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Becket,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The FLIP document has been updated as well.
>>>>>>>>>>>>>>> Please take a look when you have time.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2023年12月17日 22:54,Jiabao Sun <ji...@xtransfer.cn.INVALID>
>>>>>>> 写道:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks Becket,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I apologize for not being able to continue with this proposal
>>>>> due
>>>>>>>>> to
>>>>>>>>>>>>>>> being too busy during this period.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The viewpoints you shared about the design of Flink Source make
>>>>>>>>> sense
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>> The native configuration ‘ignore.filter.pushdown’ is good to
>>>>> me.
>>>>>>>>>>>>>>>> Having a unified name or naming style can indeed prevent
>>>>>>> confusion
>>>>>>>>>>> for
>>>>>>>>>>>>>>> users regarding
>>>>>>>>>>>>>>>> the inconsistent naming of this configuration across different
>>>>>>>>>>>>>>> connectors.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Currently, there are not many external connectors that support
>>>>>>>>> filter
>>>>>>>>>>>>>>> pushdown.
>>>>>>>>>>>>>>>> I propose that we first introduce it in flink-connector-jdbc
>>>>> and
>>>>>>>>>>>>>>> flink-connector-mongodb.
>>>>>>>>>>>>>>>> Do you think this is feasible?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>> 
> [message truncated...]

Reply via email to