Thanks Hang and Lincoln for the good point.

'source.predicate-pushdown.enabled’ is great to me. I have changed the proposal 
document.

Do we need to maintain consistency in hyphen-separated naming style between 
'source.predicate-pushdown-enabled' and 
'table.optimizer.source.predicate-pushdown-enabled'?

Best,
Jiabao


> 2023年10月25日 20:14,Hang Ruan <ruanhang1...@gmail.com> 写道:
> 
> Hi, all,
> 
> Thanks for the lively discussion.
> 
> I agree with Jiabao. I think enabling "scan.filter-push-down.enabled"
> relies on enabling "table.optimizer.source.predicate-pushdown-enabled".
> It is a little strange that the planner still needs to push down the
> filters when we set "scan.filter-push-down.enabled=false" and
> "table.optimizer.source.predicate-pushdown-enabled=true".
> Maybe we need to add some checks to warn the users when setting
> "scan.filter-push-down.enabled=true" and
> "table.optimizer.source.predicate-pushdown-enabled=false".
> 
> Besides that, I am +1 for renaming 'scan.filter-push-down.enabled' to
> 'source.predicate-pushdown.enabled'.
> 
> Best,
> Hang
> 
> Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 18:23写道:
> 
>> Thanks Benchao for the feedback.
>> 
>> I understand that the configuration of global parallelism and task
>> parallelism is at different granularities but with the same configuration.
>> However, "table.optimizer.source.predicate-pushdown-enabled" and
>> "scan.filter-push-down.enabled" are configurations for different
>> components(optimizer and source operator).
>> 
>> From a user's perspective, there are two scenarios:
>> 
>> 1. Disabling all filter pushdown
>> In this case, setting "table.optimizer.source.predicate-pushdown-enabled"
>> to false is sufficient to meet the requirement.
>> 
>> 2. Disabling filter pushdown for specific sources
>> In this scenario, there is no need to adjust the value of
>> "table.optimizer.source.predicate-pushdown-enabled".
>> Instead, the focus should be on the configuration of
>> "scan.filter-push-down.enabled" to meet the requirement.
>> In this case, users do not need to set
>> "table.optimizer.source.predicate-pushdown-enabled" to false and manually
>> enable filter pushdown for specific sources.
>> 
>> Additionally, if "scan.filter-push-down.enabled" doesn't respect to
>> "table.optimizer.source.predicate-pushdown-enabled" and the default value
>> of "scan.filter-push-down.enabled" is defined as true,
>> it means that just modifying
>> "table.optimizer.source.predicate-pushdown-enabled" as false will have no
>> effect, and filter pushdown will still be performed.
>> 
>> If we define the default value of "scan.filter-push-down.enabled" as
>> false, it would introduce a difference in behavior compared to the previous
>> version.
>> The same SQL query that could successfully push down filters in the old
>> version but would no longer do so after the upgrade.
>> 
>> Best,
>> Jiabao
>> 
>> 
>>> 2023年10月25日 17:10,Benchao Li <libenc...@apache.org> 写道:
>>> 
>>> Thanks Jiabao for the detailed explanations, that helps a lot, I
>>> understand your rationale now.
>>> 
>>> Correct me if I'm wrong. Your perspective is from "developer", which
>>> means there is an optimizer and connector component, and if we want to
>>> enable this feature (pushing filters down into connectors), you must
>>> enable it firstly in optimizer, and only then connector has the chance
>>> to decide to use it or not.
>>> 
>>> My perspective is from "user" that (Why a user should care about the
>>> difference of optimizer/connector) , this is a feature, and has two
>>> way to control it, one way is to config it job-level, the other one is
>>> in table properties. What a user expects is that they can control a
>>> feature in a tiered way, that setting it per job, and then
>>> fine-grained tune it per table.
>>> 
>>> This is some kind of similar to other concepts, such as parallelism,
>>> users can set a job level default parallelism, and then fine-grained
>>> tune it per operator. There may be more such debate in the future
>>> e.g., we can have a job level config about adding key-by before lookup
>>> join, and also a hint/table property way to fine-grained control it
>>> per lookup operator. Hence we'd better find a unified way for all
>>> those similar kind of features.
>>> 
>>> Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 15:27写道:
>>>> 
>>>> Thanks Jane for further explanation.
>>>> 
>>>> These two configurations correspond to different levels.
>> "scan.filter-push-down.enabled" does not make
>> "table.optimizer.source.predicate" invalid.
>>>> The planner will still push down predicates to all sources.
>>>> Whether filter pushdown is allowed or not is determined by the specific
>> source's "scan.filter-push-down.enabled" configuration.
>>>> 
>>>> However, "table.optimizer.source.predicate" does directly affect
>> "scan.filter-push-down.enabled”.
>>>> When the planner disables predicate pushdown, the source-level filter
>> pushdown will also not be executed, even if the source allows filter
>> pushdown.
>>>> 
>>>> Whatever, in point 1 and 2, our expectation is consistent.
>>>> For the 3rd point, I still think that the planner-level configuration
>> takes precedence over the source-level configuration.
>>>> It may seem counterintuitive when we globally disable predicate
>> pushdown but allow filter pushdown at the source level.
>>>> 
>>>> Best,
>>>> Jiabao
>>>> 
>>>> 
>>>> 
>>>>> 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> 写道:
>>>>> 
>>>>> Hi Jiabao,
>>>>> 
>>>>> Thanks for clarifying this. While by "scan.filter-push-down.enabled
>> takes a
>>>>> higher priority" I meant that this value should be respected whenever
>> it is
>>>>> set explicitly.
>>>>> 
>>>>> The conclusion that
>>>>> 
>>>>> 2. "table.optimizer.source.predicate" = "true" and
>>>>>> "scan.filter-push-down.enabled" = "false"
>>>>>> Allow the planner to perform predicate pushdown, but individual
>> sources do
>>>>>> not enable filter pushdown.
>>>>>> 
>>>>> 
>>>>> This indicates that the option "scan.filter-push-down.enabled = false"
>> for
>>>>> an individual source connector does indeed override the global-level
>>>>> planner settings to make a difference. And thus "has a higher
>> priority".
>>>>> 
>>>>> While for
>>>>> 
>>>>> 3. "table.optimizer.source.predicate" = "false"
>>>>>> Predicate pushdown is not allowed for the planner.
>>>>>> Regardless of the value of the "scan.filter-push-down.enabled"
>>>>>> configuration, filter pushdown is disabled.
>>>>>> In this scenario, the behavior remains consistent with the old
>> version as
>>>>>> well.
>>>>>> 
>>>>> 
>>>>> I still think "scan.filter-push-down.enabled" should also be respected
>> if
>>>>> it is enabled for individual connectors. WDYT?
>>>>> 
>>>>> Best,
>>>>> Jane
>>>>> 
>>>>> On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun <jiabao....@xtransfer.cn
>> .invalid>
>>>>> wrote:
>>>>> 
>>>>>> Thanks Benchao for the feedback.
>>>>>> 
>>>>>> For the current proposal, we recommend keeping the default value of
>>>>>> "table.optimizer.source.predicate" as true,
>>>>>> and setting the the default value of newly introduced option
>>>>>> "scan.filter-push-down.enabled" to true as well.
>>>>>> 
>>>>>> The main purpose of doing this is to maintain consistency with
>> previous
>>>>>> versions, as whether to perform
>>>>>> filter pushdown in the old version solely depends on the
>>>>>> "table.optimizer.source.predicate" option.
>>>>>> That means by default, as long as a TableSource implements the
>>>>>> SupportsFilterPushDown interface, filter pushdown is allowed.
>>>>>> And it seems that we don't have much benefit in changing the default
>> value
>>>>>> of "table.optimizer.source.predicate" to false.
>>>>>> 
>>>>>> Regarding the priority of these two configurations, I believe that
>>>>>> "table.optimizer.source.predicate"
>>>>>> takes precedence over "scan.filter-push-down.enabled" and it exhibits
>> the
>>>>>> following behavior.
>>>>>> 
>>>>>> 1. "table.optimizer.source.predicate" = "true" and
>>>>>> "scan.filter-push-down.enabled" = "true"
>>>>>> This is the default behavior, allowing filter pushdown for sources.
>>>>>> 
>>>>>> 2. "table.optimizer.source.predicate" = "true" and
>>>>>> "scan.filter-push-down.enabled" = "false"
>>>>>> Allow the planner to perform predicate pushdown, but individual
>> sources do
>>>>>> not enable filter pushdown.
>>>>>> 
>>>>>> 3. "table.optimizer.source.predicate" = "false"
>>>>>> Predicate pushdown is not allowed for the planner.
>>>>>> Regardless of the value of the "scan.filter-push-down.enabled"
>>>>>> configuration, filter pushdown is disabled.
>>>>>> In this scenario, the behavior remains consistent with the old
>> version as
>>>>>> well.
>>>>>> 
>>>>>> 
>>>>>> From an implementation perspective, setting the priority of
>>>>>> "scan.filter-push-down.enabled" higher than
>>>>>> "table.optimizer.source.predicate" is difficult to achieve now.
>>>>>> Because the PushFilterIntoSourceScanRuleBase at the planner level
>> takes
>>>>>> precedence over the source-level FilterPushDownSpec.
>>>>>> Only when the PushFilterIntoSourceScanRuleBase is enabled, will the
>>>>>> Source-level filter pushdown be performed.
>>>>>> 
>>>>>> Additionally, in my opinion, there doesn't seem to be much benefit in
>>>>>> setting a higher priority for "scan.filter-push-down.enabled".
>>>>>> It may instead affect compatibility and increase implementation
>> complexity.
>>>>>> 
>>>>>> WDYT?
>>>>>> 
>>>>>> Best,
>>>>>> Jiabao
>>>>>> 
>>>>>> 
>>>>>>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道:
>>>>>>> 
>>>>>>> I agree with Jane that fine-grained configurations should have higher
>>>>>>> priority than job level configurations.
>>>>>>> 
>>>>>>> For current proposal, we can achieve that:
>>>>>>> - Set "table.optimizer.source.predicate" = "true" to enable by
>>>>>>> default, and set ""scan.filter-push-down.enabled" = "false" to
>> disable
>>>>>>> it per table source
>>>>>>> - Set "table.optimizer.source.predicate" = "false" to disable by
>>>>>>> default, and set ""scan.filter-push-down.enabled" = "true" to enable
>>>>>>> it per table source
>>>>>>> 
>>>>>>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 23:55写道:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I believe that the configuration "table.optimizer.source.predicate"
>>>>>> has a
>>>>>>>>> higher priority at the planner level than the configuration at the
>>>>>> source
>>>>>>>>> level,
>>>>>>>>> and it seems easy to implement now.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Correct me if I'm wrong, but I think the fine-grained configuration
>>>>>>>> "scan.filter-push-down.enabled" should have a higher priority
>> because
>>>>>> the
>>>>>>>> default value of "table.optimizer.source.predicate" is true. As a
>>>>>> result,
>>>>>>>> turning off filter push-down for a specific source will not take
>> effect
>>>>>>>> unless the default value of "table.optimizer.source.predicate" is
>>>>>> changed
>>>>>>>> to false, or, alternatively, let users manually set
>>>>>>>> "table.optimizer.source.predicate" to false first and then
>> selectively
>>>>>>>> enable filter push-down for the desired sources, which is less
>>>>>> intuitive.
>>>>>>>> WDYT?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jane
>>>>>>>> 
>>>>>>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun <jiabao....@xtransfer.cn
>>>>>> .invalid>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks Jane,
>>>>>>>>> 
>>>>>>>>> I believe that the configuration "table.optimizer.source.predicate"
>>>>>> has a
>>>>>>>>> higher priority at the planner level than the configuration at the
>>>>>> source
>>>>>>>>> level,
>>>>>>>>> and it seems easy to implement now.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Jiabao
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com> 写道:
>>>>>>>>>> 
>>>>>>>>>> Hi Jiabao,
>>>>>>>>>> 
>>>>>>>>>> Thanks for driving this discussion. I have a small question that
>> will
>>>>>>>>>> "scan.filter-push-down.enabled" take precedence over
>>>>>>>>>> "table.optimizer.source.predicate" when the two parameters might
>>>>>> conflict
>>>>>>>>>> each other?
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Jane
>>>>>>>>>> 
>>>>>>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun <
>> jiabao....@xtransfer.cn
>>>>>>>>> .invalid>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks Jark,
>>>>>>>>>>> 
>>>>>>>>>>> If we only add configuration without adding the
>> enableFilterPushDown
>>>>>>>>>>> method in the SupportsFilterPushDown interface,
>>>>>>>>>>> each connector would have to handle the same logic in the
>>>>>> applyFilters
>>>>>>>>>>> method to determine whether filter pushdown is needed.
>>>>>>>>>>> This would increase complexity and violate the original behavior
>> of
>>>>>> the
>>>>>>>>>>> applyFilters method.
>>>>>>>>>>> 
>>>>>>>>>>> On the contrary, we only need to pass the configuration
>> parameter in
>>>>>> the
>>>>>>>>>>> newly added enableFilterPushDown method
>>>>>>>>>>> to decide whether to perform predicate pushdown.
>>>>>>>>>>> 
>>>>>>>>>>> I think this approach would be clearer and simpler.
>>>>>>>>>>> WDYT?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jiabao
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi JIabao,
>>>>>>>>>>>> 
>>>>>>>>>>>> I think the current interface can already satisfy your
>> requirements.
>>>>>>>>>>>> The connector can reject all the filters by returning the input
>>>>>> filters
>>>>>>>>>>>> as `Result#remainingFilters`.
>>>>>>>>>>>> 
>>>>>>>>>>>> So maybe we don't need to introduce a new method to disable
>>>>>>>>>>>> pushdown, but just introduce an option for the specific
>> connector.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jark
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu <xbjt...@gmail.com>
>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks @Jiabao for kicking off this discussion.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Could you add a section to explain the difference between
>> proposed
>>>>>>>>>>>>> connector level config `scan.filter-push-down.enabled` and
>> existing
>>>>>>>>>>> query
>>>>>>>>>>>>> level config
>> `table.optimizer.source.predicate-pushdown-enabled` ?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Leonard
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn
>> .INVALID>
>>>>>> 写道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Devs,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I would like to start a discussion on FLIP-377: support
>>>>>> configuration
>>>>>>>>>>> to
>>>>>>>>>>>>> disable filter pushdown for Table/SQL Sources[1].
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Currently, Flink Table/SQL does not expose fine-grained
>> control
>>>>>> for
>>>>>>>>>>>>> users to enable or disable filter pushdown.
>>>>>>>>>>>>>> However, filter pushdown has some side effects, such as
>> additional
>>>>>>>>>>>>> computational pressure on external systems.
>>>>>>>>>>>>>> Moreover, Improper queries can lead to issues such as full
>> table
>>>>>>>>> scans,
>>>>>>>>>>>>> which in turn can impact the stability of external systems.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Suppose we have an SQL query with two sources: Kafka and a
>>>>>> database.
>>>>>>>>>>>>>> The database is sensitive to pressure, and we want to
>> configure
>>>>>> it to
>>>>>>>>>>>>> not perform filter pushdown to the database source.
>>>>>>>>>>>>>> However, we still want to perform filter pushdown to the Kafka
>>>>>> source
>>>>>>>>>>> to
>>>>>>>>>>>>> decrease network IO.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I propose to support configuration to disable filter push
>> down for
>>>>>>>>>>>>> Table/SQL sources to let user decide whether to perform filter
>>>>>>>>> pushdown.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Jiabao
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Best,
>>>>>>> Benchao Li
>>>>>> 
>>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> Best,
>>> Benchao Li
>> 
>> 

Reply via email to