Thanks Jiabao for the detailed explanations, that helps a lot, I
understand your rationale now.

Correct me if I'm wrong. Your perspective is from "developer", which
means there is an optimizer and connector component, and if we want to
enable this feature (pushing filters down into connectors), you must
enable it firstly in optimizer, and only then connector has the chance
to decide to use it or not.

My perspective is from "user" that (Why a user should care about the
difference of optimizer/connector) , this is a feature, and has two
way to control it, one way is to config it job-level, the other one is
in table properties. What a user expects is that they can control a
feature in a tiered way, that setting it per job, and then
fine-grained tune it per table.

This is some kind of similar to other concepts, such as parallelism,
users can set a job level default parallelism, and then fine-grained
tune it per operator. There may be more such debate in the future
e.g., we can have a job level config about adding key-by before lookup
join, and also a hint/table property way to fine-grained control it
per lookup operator. Hence we'd better find a unified way for all
those similar kind of features.

Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 15:27写道:
>
> Thanks Jane for further explanation.
>
> These two configurations correspond to different levels. 
> "scan.filter-push-down.enabled" does not make 
> "table.optimizer.source.predicate" invalid.
> The planner will still push down predicates to all sources.
> Whether filter pushdown is allowed or not is determined by the specific 
> source's "scan.filter-push-down.enabled" configuration.
>
> However, "table.optimizer.source.predicate" does directly affect 
> "scan.filter-push-down.enabled”.
> When the planner disables predicate pushdown, the source-level filter 
> pushdown will also not be executed, even if the source allows filter pushdown.
>
> Whatever, in point 1 and 2, our expectation is consistent.
> For the 3rd point, I still think that the planner-level configuration takes 
> precedence over the source-level configuration.
> It may seem counterintuitive when we globally disable predicate pushdown but 
> allow filter pushdown at the source level.
>
> Best,
> Jiabao
>
>
>
> > 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> 写道:
> >
> > Hi Jiabao,
> >
> > Thanks for clarifying this. While by "scan.filter-push-down.enabled takes a
> > higher priority" I meant that this value should be respected whenever it is
> > set explicitly.
> >
> > The conclusion that
> >
> > 2. "table.optimizer.source.predicate" = "true" and
> >> "scan.filter-push-down.enabled" = "false"
> >> Allow the planner to perform predicate pushdown, but individual sources do
> >> not enable filter pushdown.
> >>
> >
> > This indicates that the option "scan.filter-push-down.enabled = false" for
> > an individual source connector does indeed override the global-level
> > planner settings to make a difference. And thus "has a higher priority".
> >
> > While for
> >
> > 3. "table.optimizer.source.predicate" = "false"
> >> Predicate pushdown is not allowed for the planner.
> >> Regardless of the value of the "scan.filter-push-down.enabled"
> >> configuration, filter pushdown is disabled.
> >> In this scenario, the behavior remains consistent with the old version as
> >> well.
> >>
> >
> > I still think "scan.filter-push-down.enabled" should also be respected if
> > it is enabled for individual connectors. WDYT?
> >
> > Best,
> > Jane
> >
> > On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun <jiabao....@xtransfer.cn.invalid>
> > wrote:
> >
> >> Thanks Benchao for the feedback.
> >>
> >> For the current proposal, we recommend keeping the default value of
> >> "table.optimizer.source.predicate" as true,
> >> and setting the the default value of newly introduced option
> >> "scan.filter-push-down.enabled" to true as well.
> >>
> >> The main purpose of doing this is to maintain consistency with previous
> >> versions, as whether to perform
> >> filter pushdown in the old version solely depends on the
> >> "table.optimizer.source.predicate" option.
> >> That means by default, as long as a TableSource implements the
> >> SupportsFilterPushDown interface, filter pushdown is allowed.
> >> And it seems that we don't have much benefit in changing the default value
> >> of "table.optimizer.source.predicate" to false.
> >>
> >> Regarding the priority of these two configurations, I believe that
> >> "table.optimizer.source.predicate"
> >> takes precedence over "scan.filter-push-down.enabled" and it exhibits the
> >> following behavior.
> >>
> >> 1. "table.optimizer.source.predicate" = "true" and
> >> "scan.filter-push-down.enabled" = "true"
> >> This is the default behavior, allowing filter pushdown for sources.
> >>
> >> 2. "table.optimizer.source.predicate" = "true" and
> >> "scan.filter-push-down.enabled" = "false"
> >> Allow the planner to perform predicate pushdown, but individual sources do
> >> not enable filter pushdown.
> >>
> >> 3. "table.optimizer.source.predicate" = "false"
> >> Predicate pushdown is not allowed for the planner.
> >> Regardless of the value of the "scan.filter-push-down.enabled"
> >> configuration, filter pushdown is disabled.
> >> In this scenario, the behavior remains consistent with the old version as
> >> well.
> >>
> >>
> >> From an implementation perspective, setting the priority of
> >> "scan.filter-push-down.enabled" higher than
> >> "table.optimizer.source.predicate" is difficult to achieve now.
> >> Because the PushFilterIntoSourceScanRuleBase at the planner level takes
> >> precedence over the source-level FilterPushDownSpec.
> >> Only when the PushFilterIntoSourceScanRuleBase is enabled, will the
> >> Source-level filter pushdown be performed.
> >>
> >> Additionally, in my opinion, there doesn't seem to be much benefit in
> >> setting a higher priority for "scan.filter-push-down.enabled".
> >> It may instead affect compatibility and increase implementation complexity.
> >>
> >> WDYT?
> >>
> >> Best,
> >> Jiabao
> >>
> >>
> >>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道:
> >>>
> >>> I agree with Jane that fine-grained configurations should have higher
> >>> priority than job level configurations.
> >>>
> >>> For current proposal, we can achieve that:
> >>> - Set "table.optimizer.source.predicate" = "true" to enable by
> >>> default, and set ""scan.filter-push-down.enabled" = "false" to disable
> >>> it per table source
> >>> - Set "table.optimizer.source.predicate" = "false" to disable by
> >>> default, and set ""scan.filter-push-down.enabled" = "true" to enable
> >>> it per table source
> >>>
> >>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 23:55写道:
> >>>>
> >>>>>
> >>>>> I believe that the configuration "table.optimizer.source.predicate"
> >> has a
> >>>>> higher priority at the planner level than the configuration at the
> >> source
> >>>>> level,
> >>>>> and it seems easy to implement now.
> >>>>>
> >>>>
> >>>> Correct me if I'm wrong, but I think the fine-grained configuration
> >>>> "scan.filter-push-down.enabled" should have a higher priority because
> >> the
> >>>> default value of "table.optimizer.source.predicate" is true. As a
> >> result,
> >>>> turning off filter push-down for a specific source will not take effect
> >>>> unless the default value of "table.optimizer.source.predicate" is
> >> changed
> >>>> to false, or, alternatively, let users manually set
> >>>> "table.optimizer.source.predicate" to false first and then selectively
> >>>> enable filter push-down for the desired sources, which is less
> >> intuitive.
> >>>> WDYT?
> >>>>
> >>>> Best,
> >>>> Jane
> >>>>
> >>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun <jiabao....@xtransfer.cn
> >> .invalid>
> >>>> wrote:
> >>>>
> >>>>> Thanks Jane,
> >>>>>
> >>>>> I believe that the configuration "table.optimizer.source.predicate"
> >> has a
> >>>>> higher priority at the planner level than the configuration at the
> >> source
> >>>>> level,
> >>>>> and it seems easy to implement now.
> >>>>>
> >>>>> Best,
> >>>>> Jiabao
> >>>>>
> >>>>>
> >>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com> 写道:
> >>>>>>
> >>>>>> Hi Jiabao,
> >>>>>>
> >>>>>> Thanks for driving this discussion. I have a small question that will
> >>>>>> "scan.filter-push-down.enabled" take precedence over
> >>>>>> "table.optimizer.source.predicate" when the two parameters might
> >> conflict
> >>>>>> each other?
> >>>>>>
> >>>>>> Best,
> >>>>>> Jane
> >>>>>>
> >>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun <jiabao....@xtransfer.cn
> >>>>> .invalid>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks Jark,
> >>>>>>>
> >>>>>>> If we only add configuration without adding the enableFilterPushDown
> >>>>>>> method in the SupportsFilterPushDown interface,
> >>>>>>> each connector would have to handle the same logic in the
> >> applyFilters
> >>>>>>> method to determine whether filter pushdown is needed.
> >>>>>>> This would increase complexity and violate the original behavior of
> >> the
> >>>>>>> applyFilters method.
> >>>>>>>
> >>>>>>> On the contrary, we only need to pass the configuration parameter in
> >> the
> >>>>>>> newly added enableFilterPushDown method
> >>>>>>> to decide whether to perform predicate pushdown.
> >>>>>>>
> >>>>>>> I think this approach would be clearer and simpler.
> >>>>>>> WDYT?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jiabao
> >>>>>>>
> >>>>>>>
> >>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道:
> >>>>>>>>
> >>>>>>>> Hi JIabao,
> >>>>>>>>
> >>>>>>>> I think the current interface can already satisfy your requirements.
> >>>>>>>> The connector can reject all the filters by returning the input
> >> filters
> >>>>>>>> as `Result#remainingFilters`.
> >>>>>>>>
> >>>>>>>> So maybe we don't need to introduce a new method to disable
> >>>>>>>> pushdown, but just introduce an option for the specific connector.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu <xbjt...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks @Jiabao for kicking off this discussion.
> >>>>>>>>>
> >>>>>>>>> Could you add a section to explain the difference between proposed
> >>>>>>>>> connector level config `scan.filter-push-down.enabled` and existing
> >>>>>>> query
> >>>>>>>>> level config `table.optimizer.source.predicate-pushdown-enabled` ?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Leonard
> >>>>>>>>>
> >>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn.INVALID>
> >> 写道:
> >>>>>>>>>>
> >>>>>>>>>> Hi Devs,
> >>>>>>>>>>
> >>>>>>>>>> I would like to start a discussion on FLIP-377: support
> >> configuration
> >>>>>>> to
> >>>>>>>>> disable filter pushdown for Table/SQL Sources[1].
> >>>>>>>>>>
> >>>>>>>>>> Currently, Flink Table/SQL does not expose fine-grained control
> >> for
> >>>>>>>>> users to enable or disable filter pushdown.
> >>>>>>>>>> However, filter pushdown has some side effects, such as additional
> >>>>>>>>> computational pressure on external systems.
> >>>>>>>>>> Moreover, Improper queries can lead to issues such as full table
> >>>>> scans,
> >>>>>>>>> which in turn can impact the stability of external systems.
> >>>>>>>>>>
> >>>>>>>>>> Suppose we have an SQL query with two sources: Kafka and a
> >> database.
> >>>>>>>>>> The database is sensitive to pressure, and we want to configure
> >> it to
> >>>>>>>>> not perform filter pushdown to the database source.
> >>>>>>>>>> However, we still want to perform filter pushdown to the Kafka
> >> source
> >>>>>>> to
> >>>>>>>>> decrease network IO.
> >>>>>>>>>>
> >>>>>>>>>> I propose to support configuration to disable filter push down for
> >>>>>>>>> Table/SQL sources to let user decide whether to perform filter
> >>>>> pushdown.
> >>>>>>>>>>
> >>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jiabao
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Best,
> >>> Benchao Li
> >>
> >>
>


-- 

Best,
Benchao Li

Reply via email to