Thanks Hang for the suggestion.

I think the configuration of TableSource is not closely related to 
SourceReader, 
so I prefer to introduce a independent configuration class TableSourceOptions 
in the flink-table-common module, similar to LookupOptions.

For the second point, I suggest adding Java doc to the SupportsXXXPushDown 
interfaces, providing detailed information on these options that needs to be 
supported.

I have made updates in the FLIP document.
Please help check it again.


Best,
Jiabao


> 2023年10月30日 17:23,Hang Ruan <ruanhang1...@gmail.com> 写道:
> 
> Thanks for the improvements, Jiabao.
> 
> There are some details that I am not sure about.
> 1. The new option `source.filter-push-down.enabled` will be added to which
> class? I think it should be `SourceReaderOptions`.
> 2. How are the connector developers able to know and follow the FLIP? Do we
> need an abstract base class or provide a default method?
> 
> Best,
> Hang
> 
> Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月30日周一 14:45写道:
> 
>> Hi, all,
>> 
>> Thanks for the lively discussion.
>> 
>> Based on the discussion, I have made some adjustments to the FLIP document:
>> 
>> 1. The name of the newly added option has been changed to
>> "source.filter-push-down.enabled".
>> 2. Considering compatibility with older versions, the newly added
>> "source.filter-push-down.enabled" option needs to respect the optimizer's
>> "table.optimizer.source.predicate-pushdown-enabled" option.
>>    But there is a consideration to remove the old option in Flink 2.0.
>> 3. We can provide more options to disable other source abilities with side
>> effects, such as “source.aggregate.enabled” and “source.projection.enabled"
>>    This is not urgent and can be continuously introduced.
>> 
>> Looking forward to your feedback again.
>> 
>> Best,
>> Jiabao
>> 
>> 
>>> 2023年10月29日 08:45,Becket Qin <becket....@gmail.com> 写道:
>>> 
>>> Thanks for digging into the git history, Jark. I agree it makes sense to
>>> deprecate this API in 2.0.
>>> 
>>> Cheers,
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>> On Fri, Oct 27, 2023 at 5:47 PM Jark Wu <imj...@gmail.com> wrote:
>>> 
>>>> Hi Becket,
>>>> 
>>>> I checked the history of "
>>>> *table.optimizer.source.predicate-pushdown-enabled*",
>>>> it seems it was introduced since the legacy FilterableTableSource
>>>> interface
>>>> which might be an experiential feature at that time. I don't see the
>>>> necessity
>>>> of this option at the moment. Maybe we can deprecate this option and
>> drop
>>>> it
>>>> in Flink 2.0[1] if it is not necessary anymore. This may help to
>>>> simplify this discussion.
>>>> 
>>>> 
>>>> Best,
>>>> Jark
>>>> 
>>>> [1]: https://issues.apache.org/jira/browse/FLINK-32383
>>>> 
>>>> 
>>>> 
>>>> On Thu, 26 Oct 2023 at 10:14, Becket Qin <becket....@gmail.com> wrote:
>>>> 
>>>>> Thanks for the proposal, Jiabao. My two cents below:
>>>>> 
>>>>> 1. If I understand correctly, the motivation of the FLIP is mainly to
>>>>> make predicate pushdown optional on SOME of the Sources. If so,
>> intuitively
>>>>> the configuration should be Source specific instead of general.
>> Otherwise,
>>>>> we will end up with general configurations that may not take effect for
>>>>> some of the Source implementations. This violates the basic rule of a
>>>>> configuration - it does what it says, regardless of the implementation.
>>>>> While configuration standardization is usually a good thing, it should
>> not
>>>>> break the basic rules.
>>>>> If we really want to have this general configuration, for the sources
>>>>> this configuration does not apply, they should throw an exception to
>> make
>>>>> it clear that this configuration is not supported. However, that seems
>> ugly.
>>>>> 
>>>>> 2. I think the actual motivation of this FLIP is about "how a source
>>>>> should implement predicate pushdown efficiently", not "whether
>> predicate
>>>>> pushdown should be applied to the source." For example, if a source
>> wants
>>>>> to avoid additional computing load in the external system, it can
>> always
>>>>> read the entire record and apply the predicates by itself. However,
>> from
>>>>> the Flink perspective, the predicate pushdown is applied, it is just
>>>>> implemented differently by the source. So the design principle here is
>> that
>>>>> Flink only cares about whether a source supports predicate pushdown or
>> not,
>>>>> it does not care about the implementation efficiency / side effect of
>> the
>>>>> predicates pushdown. It is the Source implementation's responsibility
>> to
>>>>> ensure the predicates pushdown is implemented efficiently and does not
>>>>> impose excessive pressure on the external system. And it is OK to have
>>>>> additional configurations to achieve this goal. Obviously, such
>>>>> configurations will be source specific in this case.
>>>>> 
>>>>> 3. Regarding the existing configurations of
>> *table.optimizer.source.predicate-pushdown-enabled.
>>>>> *I am not sure why we need it. Supposedly, if a source implements a
>>>>> SupportsXXXPushDown interface, the optimizer should push the
>> corresponding
>>>>> predicates to the Source. I am not sure in which case this
>> configuration
>>>>> would be used. Any ideas @Jark Wu <imj...@gmail.com>?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jiangjie (Becket) Qin
>>>>> 
>>>>> 
>>>>> On Wed, Oct 25, 2023 at 11:55 PM Jiabao Sun
>>>>> <jiabao....@xtransfer.cn.invalid> wrote:
>>>>> 
>>>>>> Thanks Jane for the detailed explanation.
>>>>>> 
>>>>>> I think that for users, we should respect conventions over
>>>>>> configurations.
>>>>>> Conventions can be default values explicitly specified in
>>>>>> configurations, or they can be behaviors that follow previous
>> versions.
>>>>>> If the same code has different behaviors in different versions, it
>> would
>>>>>> be a very bad thing.
>>>>>> 
>>>>>> I agree that for regular users, it is not necessary to understand all
>>>>>> the configurations related to Flink.
>>>>>> By following conventions, they can have a good experience.
>>>>>> 
>>>>>> Let's get back to the practical situation and consider it.
>>>>>> 
>>>>>> Case 1:
>>>>>> The user is not familiar with the purpose of the
>>>>>> table.optimizer.source.predicate-pushdown-enabled configuration but
>> follows
>>>>>> the convention of allowing predicate pushdown to the source by
>> default.
>>>>>> Just understanding the source.predicate-pushdown-enabled configuration
>>>>>> and performing fine-grained toggle control will work well.
>>>>>> 
>>>>>> Case 2:
>>>>>> The user understands the meaning of the
>>>>>> table.optimizer.source.predicate-pushdown-enabled configuration and
>> has set
>>>>>> its value to false.
>>>>>> We have reason to believe that the user understands the meaning of the
>>>>>> predicate pushdown configuration and the intention is to disable
>> predicate
>>>>>> pushdown (rather than whether or not to allow it).
>>>>>> The previous choice of globally disabling it is likely because it
>>>>>> couldn't be disabled on individual sources.
>>>>>> From this perspective, if we provide more fine-grained configuration
>>>>>> support and provide detailed explanations of the configuration
>> behaviors in
>>>>>> the documentation,
>>>>>> users can clearly understand the differences between these two
>>>>>> configurations and use them correctly.
>>>>>> 
>>>>>> Also, I don't agree that
>>>>>> table.optimizer.source.predicate-pushdown-enabled = true and
>>>>>> source.predicate-pushdown-enabled = false means that the local
>>>>>> configuration overrides the global configuration.
>>>>>> On the contrary, both configurations are functioning correctly.
>>>>>> The optimizer allows predicate pushdown to all sources, but some
>> sources
>>>>>> can reject the filters pushed down by the optimizer.
>>>>>> This is natural, just like different components at different levels
>> are
>>>>>> responsible for different tasks.
>>>>>> 
>>>>>> The more serious issue is that if "source.predicate-pushdown-enabled"
>>>>>> does not respect "table.optimizer.source.predicate-pushdown-enabled”,
>>>>>> the "table.optimizer.source.predicate-pushdown-enabled" configuration
>>>>>> will be invalidated.
>>>>>> This means that regardless of whether
>>>>>> "table.optimizer.source.predicate-pushdown-enabled" is set to true or
>>>>>> false, it will have no effect.
>>>>>> 
>>>>>> Best,
>>>>>> Jiabao
>>>>>> 
>>>>>> 
>>>>>>> 2023年10月25日 22:24,Jane Chan <qingyue....@gmail.com> 写道:
>>>>>>> 
>>>>>>> Hi Jiabao,
>>>>>>> 
>>>>>>> Thanks for the in-depth clarification. Here are my cents
>>>>>>> 
>>>>>>> However, "table.optimizer.source.predicate-pushdown-enabled" and
>>>>>>>> "scan.filter-push-down.enabled" are configurations for different
>>>>>>>> components(optimizer and source operator).
>>>>>>>> 
>>>>>>> 
>>>>>>> We cannot assume that every user would be interested in understanding
>>>>>> the
>>>>>>> internal components of Flink, such as the optimizer or connectors,
>> and
>>>>>> the
>>>>>>> specific configurations associated with each component. Instead,
>> users
>>>>>>> might be more concerned about knowing which configuration enables or
>>>>>>> disables the filter push-down feature for all source connectors, and
>>>>>> which
>>>>>>> parameter provides the flexibility to override this behavior for a
>>>>>> single
>>>>>>> source if needed.
>>>>>>> 
>>>>>>> So, from this perspective, I am inclined to divide these two
>> parameters
>>>>>>> based on the scope of their impact from the user's perspective (i.e.
>>>>>>> global-level or operator-level), rather than categorizing them based
>>>>>> on the
>>>>>>> component hierarchy from a developer's point of view. Therefore,
>> based
>>>>>> on
>>>>>>> this premise, it is intuitive and natural for users to
>>>>>>> understand fine-grained configuration options can override global
>>>>>>> configurations.
>>>>>>> 
>>>>>>> Additionally, if "scan.filter-push-down.enabled" doesn't respect to
>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" and the default
>>>>>> value
>>>>>>>> of "scan.filter-push-down.enabled" is defined as true,
>>>>>>>> it means that just modifying
>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" as false will
>>>>>> have no
>>>>>>>> effect, and filter pushdown will still be performed.
>>>>>>>> 
>>>>>>>> If we define the default value of "scan.filter-push-down.enabled" as
>>>>>>>> false, it would introduce a difference in behavior compared to the
>>>>>> previous
>>>>>>>> version.
>>>>>>>> 
>>>>>>> 
>>>>>>> <1>If I understand correctly, "scan.filter-push-down.enabled" is a
>>>>>>> connector option, which means the only way to configure it is to
>>>>>> explicitly
>>>>>>> specify it in DDL (no matter whether disable or enable), and the SET
>>>>>>> command is not applicable, so I think it's natural to still respect
>>>>>> user's
>>>>>>> specification here. Otherwise, users might be more confused about why
>>>>>> the
>>>>>>> DDL does not work as expected, and the reason is just because some
>>>>>> other
>>>>>>> "optimizer" configuration is set to a different value.
>>>>>>> 
>>>>>>> <2> From the implementation side, I am inclined to keep the
>> parameter's
>>>>>>> priority consistent for all conditions.
>>>>>>> 
>>>>>>> Let "global" denote
>>>>>> "table.optimizer.source.predicate-pushdown-enabled",
>>>>>>> and let "per-source" denote "scan.filter-push-down.enabled" for
>>>>>> specific
>>>>>>> source T,  the following Truth table (based on the current design)
>>>>>>> indicates the inconsistent behavior for "per-source override global".
>>>>>>> 
>>>>>>> .------------.---------------.-------------------
>>>>>>> ----.-------------------------------------.
>>>>>>> | global   | per-source | push-down for T | per-source override
>> global
>>>>>> |
>>>>>>> 
>>>>>> 
>> :-----------+--------------+-----------------------+------------------------------------:
>>>>>>> | true       | false         | false                    | Y
>>>>>>>                      |
>>>>>>> 
>>>>>> 
>> :-----------+--------------+-----------------------+------------------------------------:
>>>>>>> | false     | true           | false                    | N
>>>>>>>                      |
>>>>>>> 
>>>>>> 
>> .------------.---------------.-----------------------.-------------------------------------.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jane
>>>>>>> 
>>>>>>> On Wed, Oct 25, 2023 at 6:22 PM Jiabao Sun <jiabao....@xtransfer.cn
>>>>>> .invalid>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Benchao for the feedback.
>>>>>>>> 
>>>>>>>> I understand that the configuration of global parallelism and task
>>>>>>>> parallelism is at different granularities but with the same
>>>>>> configuration.
>>>>>>>> However, "table.optimizer.source.predicate-pushdown-enabled" and
>>>>>>>> "scan.filter-push-down.enabled" are configurations for different
>>>>>>>> components(optimizer and source operator).
>>>>>>>> 
>>>>>>>> From a user's perspective, there are two scenarios:
>>>>>>>> 
>>>>>>>> 1. Disabling all filter pushdown
>>>>>>>> In this case, setting
>>>>>> "table.optimizer.source.predicate-pushdown-enabled"
>>>>>>>> to false is sufficient to meet the requirement.
>>>>>>>> 
>>>>>>>> 2. Disabling filter pushdown for specific sources
>>>>>>>> In this scenario, there is no need to adjust the value of
>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled".
>>>>>>>> Instead, the focus should be on the configuration of
>>>>>>>> "scan.filter-push-down.enabled" to meet the requirement.
>>>>>>>> In this case, users do not need to set
>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" to false and
>>>>>> manually
>>>>>>>> enable filter pushdown for specific sources.
>>>>>>>> 
>>>>>>>> Additionally, if "scan.filter-push-down.enabled" doesn't respect to
>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" and the default
>>>>>> value
>>>>>>>> of "scan.filter-push-down.enabled" is defined as true,
>>>>>>>> it means that just modifying
>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" as false will
>>>>>> have no
>>>>>>>> effect, and filter pushdown will still be performed.
>>>>>>>> 
>>>>>>>> If we define the default value of "scan.filter-push-down.enabled" as
>>>>>>>> false, it would introduce a difference in behavior compared to the
>>>>>> previous
>>>>>>>> version.
>>>>>>>> The same SQL query that could successfully push down filters in the
>>>>>> old
>>>>>>>> version but would no longer do so after the upgrade.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jiabao
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 2023年10月25日 17:10,Benchao Li <libenc...@apache.org> 写道:
>>>>>>>>> 
>>>>>>>>> Thanks Jiabao for the detailed explanations, that helps a lot, I
>>>>>>>>> understand your rationale now.
>>>>>>>>> 
>>>>>>>>> Correct me if I'm wrong. Your perspective is from "developer",
>> which
>>>>>>>>> means there is an optimizer and connector component, and if we want
>>>>>> to
>>>>>>>>> enable this feature (pushing filters down into connectors), you
>> must
>>>>>>>>> enable it firstly in optimizer, and only then connector has the
>>>>>> chance
>>>>>>>>> to decide to use it or not.
>>>>>>>>> 
>>>>>>>>> My perspective is from "user" that (Why a user should care about
>> the
>>>>>>>>> difference of optimizer/connector) , this is a feature, and has two
>>>>>>>>> way to control it, one way is to config it job-level, the other one
>>>>>> is
>>>>>>>>> in table properties. What a user expects is that they can control a
>>>>>>>>> feature in a tiered way, that setting it per job, and then
>>>>>>>>> fine-grained tune it per table.
>>>>>>>>> 
>>>>>>>>> This is some kind of similar to other concepts, such as
>> parallelism,
>>>>>>>>> users can set a job level default parallelism, and then
>> fine-grained
>>>>>>>>> tune it per operator. There may be more such debate in the future
>>>>>>>>> e.g., we can have a job level config about adding key-by before
>>>>>> lookup
>>>>>>>>> join, and also a hint/table property way to fine-grained control it
>>>>>>>>> per lookup operator. Hence we'd better find a unified way for all
>>>>>>>>> those similar kind of features.
>>>>>>>>> 
>>>>>>>>> Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三
>> 15:27写道:
>>>>>>>>>> 
>>>>>>>>>> Thanks Jane for further explanation.
>>>>>>>>>> 
>>>>>>>>>> These two configurations correspond to different levels.
>>>>>>>> "scan.filter-push-down.enabled" does not make
>>>>>>>> "table.optimizer.source.predicate" invalid.
>>>>>>>>>> The planner will still push down predicates to all sources.
>>>>>>>>>> Whether filter pushdown is allowed or not is determined by the
>>>>>> specific
>>>>>>>> source's "scan.filter-push-down.enabled" configuration.
>>>>>>>>>> 
>>>>>>>>>> However, "table.optimizer.source.predicate" does directly affect
>>>>>>>> "scan.filter-push-down.enabled”.
>>>>>>>>>> When the planner disables predicate pushdown, the source-level
>>>>>> filter
>>>>>>>> pushdown will also not be executed, even if the source allows filter
>>>>>>>> pushdown.
>>>>>>>>>> 
>>>>>>>>>> Whatever, in point 1 and 2, our expectation is consistent.
>>>>>>>>>> For the 3rd point, I still think that the planner-level
>>>>>> configuration
>>>>>>>> takes precedence over the source-level configuration.
>>>>>>>>>> It may seem counterintuitive when we globally disable predicate
>>>>>>>> pushdown but allow filter pushdown at the source level.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Jiabao
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> 写道:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Jiabao,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for clarifying this. While by
>> "scan.filter-push-down.enabled
>>>>>>>> takes a
>>>>>>>>>>> higher priority" I meant that this value should be respected
>>>>>> whenever
>>>>>>>> it is
>>>>>>>>>>> set explicitly.
>>>>>>>>>>> 
>>>>>>>>>>> The conclusion that
>>>>>>>>>>> 
>>>>>>>>>>> 2. "table.optimizer.source.predicate" = "true" and
>>>>>>>>>>>> "scan.filter-push-down.enabled" = "false"
>>>>>>>>>>>> Allow the planner to perform predicate pushdown, but individual
>>>>>>>> sources do
>>>>>>>>>>>> not enable filter pushdown.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> This indicates that the option "scan.filter-push-down.enabled =
>>>>>> false"
>>>>>>>> for
>>>>>>>>>>> an individual source connector does indeed override the
>>>>>> global-level
>>>>>>>>>>> planner settings to make a difference. And thus "has a higher
>>>>>>>> priority".
>>>>>>>>>>> 
>>>>>>>>>>> While for
>>>>>>>>>>> 
>>>>>>>>>>> 3. "table.optimizer.source.predicate" = "false"
>>>>>>>>>>>> Predicate pushdown is not allowed for the planner.
>>>>>>>>>>>> Regardless of the value of the "scan.filter-push-down.enabled"
>>>>>>>>>>>> configuration, filter pushdown is disabled.
>>>>>>>>>>>> In this scenario, the behavior remains consistent with the old
>>>>>>>> version as
>>>>>>>>>>>> well.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I still think "scan.filter-push-down.enabled" should also be
>>>>>> respected
>>>>>>>> if
>>>>>>>>>>> it is enabled for individual connectors. WDYT?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jane
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun <
>>>>>> jiabao....@xtransfer.cn
>>>>>>>> .invalid>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Benchao for the feedback.
>>>>>>>>>>>> 
>>>>>>>>>>>> For the current proposal, we recommend keeping the default value
>>>>>> of
>>>>>>>>>>>> "table.optimizer.source.predicate" as true,
>>>>>>>>>>>> and setting the the default value of newly introduced option
>>>>>>>>>>>> "scan.filter-push-down.enabled" to true as well.
>>>>>>>>>>>> 
>>>>>>>>>>>> The main purpose of doing this is to maintain consistency with
>>>>>>>> previous
>>>>>>>>>>>> versions, as whether to perform
>>>>>>>>>>>> filter pushdown in the old version solely depends on the
>>>>>>>>>>>> "table.optimizer.source.predicate" option.
>>>>>>>>>>>> That means by default, as long as a TableSource implements the
>>>>>>>>>>>> SupportsFilterPushDown interface, filter pushdown is allowed.
>>>>>>>>>>>> And it seems that we don't have much benefit in changing the
>>>>>> default
>>>>>>>> value
>>>>>>>>>>>> of "table.optimizer.source.predicate" to false.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regarding the priority of these two configurations, I believe
>> that
>>>>>>>>>>>> "table.optimizer.source.predicate"
>>>>>>>>>>>> takes precedence over "scan.filter-push-down.enabled" and it
>>>>>> exhibits
>>>>>>>> the
>>>>>>>>>>>> following behavior.
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. "table.optimizer.source.predicate" = "true" and
>>>>>>>>>>>> "scan.filter-push-down.enabled" = "true"
>>>>>>>>>>>> This is the default behavior, allowing filter pushdown for
>>>>>> sources.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. "table.optimizer.source.predicate" = "true" and
>>>>>>>>>>>> "scan.filter-push-down.enabled" = "false"
>>>>>>>>>>>> Allow the planner to perform predicate pushdown, but individual
>>>>>>>> sources do
>>>>>>>>>>>> not enable filter pushdown.
>>>>>>>>>>>> 
>>>>>>>>>>>> 3. "table.optimizer.source.predicate" = "false"
>>>>>>>>>>>> Predicate pushdown is not allowed for the planner.
>>>>>>>>>>>> Regardless of the value of the "scan.filter-push-down.enabled"
>>>>>>>>>>>> configuration, filter pushdown is disabled.
>>>>>>>>>>>> In this scenario, the behavior remains consistent with the old
>>>>>>>> version as
>>>>>>>>>>>> well.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> From an implementation perspective, setting the priority of
>>>>>>>>>>>> "scan.filter-push-down.enabled" higher than
>>>>>>>>>>>> "table.optimizer.source.predicate" is difficult to achieve now.
>>>>>>>>>>>> Because the PushFilterIntoSourceScanRuleBase at the planner
>> level
>>>>>>>> takes
>>>>>>>>>>>> precedence over the source-level FilterPushDownSpec.
>>>>>>>>>>>> Only when the PushFilterIntoSourceScanRuleBase is enabled, will
>>>>>> the
>>>>>>>>>>>> Source-level filter pushdown be performed.
>>>>>>>>>>>> 
>>>>>>>>>>>> Additionally, in my opinion, there doesn't seem to be much
>>>>>> benefit in
>>>>>>>>>>>> setting a higher priority for "scan.filter-push-down.enabled".
>>>>>>>>>>>> It may instead affect compatibility and increase implementation
>>>>>>>> complexity.
>>>>>>>>>>>> 
>>>>>>>>>>>> WDYT?
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jiabao
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I agree with Jane that fine-grained configurations should have
>>>>>> higher
>>>>>>>>>>>>> priority than job level configurations.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For current proposal, we can achieve that:
>>>>>>>>>>>>> - Set "table.optimizer.source.predicate" = "true" to enable by
>>>>>>>>>>>>> default, and set ""scan.filter-push-down.enabled" = "false" to
>>>>>>>> disable
>>>>>>>>>>>>> it per table source
>>>>>>>>>>>>> - Set "table.optimizer.source.predicate" = "false" to disable
>> by
>>>>>>>>>>>>> default, and set ""scan.filter-push-down.enabled" = "true" to
>>>>>> enable
>>>>>>>>>>>>> it per table source
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 23:55写道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I believe that the configuration
>>>>>> "table.optimizer.source.predicate"
>>>>>>>>>>>> has a
>>>>>>>>>>>>>>> higher priority at the planner level than the configuration
>> at
>>>>>> the
>>>>>>>>>>>> source
>>>>>>>>>>>>>>> level,
>>>>>>>>>>>>>>> and it seems easy to implement now.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Correct me if I'm wrong, but I think the fine-grained
>>>>>> configuration
>>>>>>>>>>>>>> "scan.filter-push-down.enabled" should have a higher priority
>>>>>>>> because
>>>>>>>>>>>> the
>>>>>>>>>>>>>> default value of "table.optimizer.source.predicate" is true.
>> As
>>>>>> a
>>>>>>>>>>>> result,
>>>>>>>>>>>>>> turning off filter push-down for a specific source will not
>> take
>>>>>>>> effect
>>>>>>>>>>>>>> unless the default value of "table.optimizer.source.predicate"
>>>>>> is
>>>>>>>>>>>> changed
>>>>>>>>>>>>>> to false, or, alternatively, let users manually set
>>>>>>>>>>>>>> "table.optimizer.source.predicate" to false first and then
>>>>>>>> selectively
>>>>>>>>>>>>>> enable filter push-down for the desired sources, which is less
>>>>>>>>>>>> intuitive.
>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Jane
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun <
>>>>>> jiabao....@xtransfer.cn
>>>>>>>>>>>> .invalid>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks Jane,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I believe that the configuration
>>>>>> "table.optimizer.source.predicate"
>>>>>>>>>>>> has a
>>>>>>>>>>>>>>> higher priority at the planner level than the configuration
>> at
>>>>>> the
>>>>>>>>>>>> source
>>>>>>>>>>>>>>> level,
>>>>>>>>>>>>>>> and it seems easy to implement now.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com> 写道:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Jiabao,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for driving this discussion. I have a small question
>>>>>> that
>>>>>>>> will
>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" take precedence over
>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" when the two parameters
>>>>>> might
>>>>>>>>>>>> conflict
>>>>>>>>>>>>>>>> each other?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Jane
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun <
>>>>>>>> jiabao....@xtransfer.cn
>>>>>>>>>>>>>>> .invalid>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks Jark,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If we only add configuration without adding the
>>>>>>>> enableFilterPushDown
>>>>>>>>>>>>>>>>> method in the SupportsFilterPushDown interface,
>>>>>>>>>>>>>>>>> each connector would have to handle the same logic in the
>>>>>>>>>>>> applyFilters
>>>>>>>>>>>>>>>>> method to determine whether filter pushdown is needed.
>>>>>>>>>>>>>>>>> This would increase complexity and violate the original
>>>>>> behavior
>>>>>>>> of
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> applyFilters method.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On the contrary, we only need to pass the configuration
>>>>>>>> parameter in
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> newly added enableFilterPushDown method
>>>>>>>>>>>>>>>>> to decide whether to perform predicate pushdown.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I think this approach would be clearer and simpler.
>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi JIabao,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think the current interface can already satisfy your
>>>>>>>> requirements.
>>>>>>>>>>>>>>>>>> The connector can reject all the filters by returning the
>>>>>> input
>>>>>>>>>>>> filters
>>>>>>>>>>>>>>>>>> as `Result#remainingFilters`.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So maybe we don't need to introduce a new method to
>> disable
>>>>>>>>>>>>>>>>>> pushdown, but just introduce an option for the specific
>>>>>>>> connector.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu <
>> xbjt...@gmail.com
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks @Jiabao for kicking off this discussion.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Could you add a section to explain the difference between
>>>>>>>> proposed
>>>>>>>>>>>>>>>>>>> connector level config `scan.filter-push-down.enabled`
>> and
>>>>>>>> existing
>>>>>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>>>> level config
>>>>>>>> `table.optimizer.source.predicate-pushdown-enabled` ?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Leonard
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn
>>>>>>>> .INVALID>
>>>>>>>>>>>> 写道:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi Devs,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I would like to start a discussion on FLIP-377: support
>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> disable filter pushdown for Table/SQL Sources[1].
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Currently, Flink Table/SQL does not expose fine-grained
>>>>>>>> control
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> users to enable or disable filter pushdown.
>>>>>>>>>>>>>>>>>>>> However, filter pushdown has some side effects, such as
>>>>>>>> additional
>>>>>>>>>>>>>>>>>>> computational pressure on external systems.
>>>>>>>>>>>>>>>>>>>> Moreover, Improper queries can lead to issues such as
>> full
>>>>>>>> table
>>>>>>>>>>>>>>> scans,
>>>>>>>>>>>>>>>>>>> which in turn can impact the stability of external
>> systems.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Suppose we have an SQL query with two sources: Kafka
>> and a
>>>>>>>>>>>> database.
>>>>>>>>>>>>>>>>>>>> The database is sensitive to pressure, and we want to
>>>>>>>> configure
>>>>>>>>>>>> it to
>>>>>>>>>>>>>>>>>>> not perform filter pushdown to the database source.
>>>>>>>>>>>>>>>>>>>> However, we still want to perform filter pushdown to the
>>>>>> Kafka
>>>>>>>>>>>> source
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> decrease network IO.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I propose to support configuration to disable filter
>> push
>>>>>>>> down for
>>>>>>>>>>>>>>>>>>> Table/SQL sources to let user decide whether to perform
>>>>>> filter
>>>>>>>>>>>>>>> pushdown.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Benchao Li
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Benchao Li
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>> 
>> 

Reply via email to