Thanks Hang and Lincoln for the good point. 'source.predicate-pushdown.enabled’ is great to me. I have changed the proposal document.
Do we need to maintain consistency in hyphen-separated naming style between 'source.predicate-pushdown-enabled' and 'table.optimizer.source.predicate-pushdown-enabled'? Best, Jiabao > 2023年10月25日 20:14,Hang Ruan <ruanhang1...@gmail.com> 写道: > > Hi, all, > > Thanks for the lively discussion. > > I agree with Jiabao. I think enabling "scan.filter-push-down.enabled" > relies on enabling "table.optimizer.source.predicate-pushdown-enabled". > It is a little strange that the planner still needs to push down the > filters when we set "scan.filter-push-down.enabled=false" and > "table.optimizer.source.predicate-pushdown-enabled=true". > Maybe we need to add some checks to warn the users when setting > "scan.filter-push-down.enabled=true" and > "table.optimizer.source.predicate-pushdown-enabled=false". > > Besides that, I am +1 for renaming 'scan.filter-push-down.enabled' to > 'source.predicate-pushdown.enabled'. > > Best, > Hang > > Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 18:23写道: > >> Thanks Benchao for the feedback. >> >> I understand that the configuration of global parallelism and task >> parallelism is at different granularities but with the same configuration. >> However, "table.optimizer.source.predicate-pushdown-enabled" and >> "scan.filter-push-down.enabled" are configurations for different >> components(optimizer and source operator). >> >> From a user's perspective, there are two scenarios: >> >> 1. Disabling all filter pushdown >> In this case, setting "table.optimizer.source.predicate-pushdown-enabled" >> to false is sufficient to meet the requirement. >> >> 2. Disabling filter pushdown for specific sources >> In this scenario, there is no need to adjust the value of >> "table.optimizer.source.predicate-pushdown-enabled". >> Instead, the focus should be on the configuration of >> "scan.filter-push-down.enabled" to meet the requirement. >> In this case, users do not need to set >> "table.optimizer.source.predicate-pushdown-enabled" to false and manually >> enable filter pushdown for specific sources. >> >> Additionally, if "scan.filter-push-down.enabled" doesn't respect to >> "table.optimizer.source.predicate-pushdown-enabled" and the default value >> of "scan.filter-push-down.enabled" is defined as true, >> it means that just modifying >> "table.optimizer.source.predicate-pushdown-enabled" as false will have no >> effect, and filter pushdown will still be performed. >> >> If we define the default value of "scan.filter-push-down.enabled" as >> false, it would introduce a difference in behavior compared to the previous >> version. >> The same SQL query that could successfully push down filters in the old >> version but would no longer do so after the upgrade. >> >> Best, >> Jiabao >> >> >>> 2023年10月25日 17:10,Benchao Li <libenc...@apache.org> 写道: >>> >>> Thanks Jiabao for the detailed explanations, that helps a lot, I >>> understand your rationale now. >>> >>> Correct me if I'm wrong. Your perspective is from "developer", which >>> means there is an optimizer and connector component, and if we want to >>> enable this feature (pushing filters down into connectors), you must >>> enable it firstly in optimizer, and only then connector has the chance >>> to decide to use it or not. >>> >>> My perspective is from "user" that (Why a user should care about the >>> difference of optimizer/connector) , this is a feature, and has two >>> way to control it, one way is to config it job-level, the other one is >>> in table properties. What a user expects is that they can control a >>> feature in a tiered way, that setting it per job, and then >>> fine-grained tune it per table. >>> >>> This is some kind of similar to other concepts, such as parallelism, >>> users can set a job level default parallelism, and then fine-grained >>> tune it per operator. There may be more such debate in the future >>> e.g., we can have a job level config about adding key-by before lookup >>> join, and also a hint/table property way to fine-grained control it >>> per lookup operator. Hence we'd better find a unified way for all >>> those similar kind of features. >>> >>> Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月25日周三 15:27写道: >>>> >>>> Thanks Jane for further explanation. >>>> >>>> These two configurations correspond to different levels. >> "scan.filter-push-down.enabled" does not make >> "table.optimizer.source.predicate" invalid. >>>> The planner will still push down predicates to all sources. >>>> Whether filter pushdown is allowed or not is determined by the specific >> source's "scan.filter-push-down.enabled" configuration. >>>> >>>> However, "table.optimizer.source.predicate" does directly affect >> "scan.filter-push-down.enabled”. >>>> When the planner disables predicate pushdown, the source-level filter >> pushdown will also not be executed, even if the source allows filter >> pushdown. >>>> >>>> Whatever, in point 1 and 2, our expectation is consistent. >>>> For the 3rd point, I still think that the planner-level configuration >> takes precedence over the source-level configuration. >>>> It may seem counterintuitive when we globally disable predicate >> pushdown but allow filter pushdown at the source level. >>>> >>>> Best, >>>> Jiabao >>>> >>>> >>>> >>>>> 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> 写道: >>>>> >>>>> Hi Jiabao, >>>>> >>>>> Thanks for clarifying this. While by "scan.filter-push-down.enabled >> takes a >>>>> higher priority" I meant that this value should be respected whenever >> it is >>>>> set explicitly. >>>>> >>>>> The conclusion that >>>>> >>>>> 2. "table.optimizer.source.predicate" = "true" and >>>>>> "scan.filter-push-down.enabled" = "false" >>>>>> Allow the planner to perform predicate pushdown, but individual >> sources do >>>>>> not enable filter pushdown. >>>>>> >>>>> >>>>> This indicates that the option "scan.filter-push-down.enabled = false" >> for >>>>> an individual source connector does indeed override the global-level >>>>> planner settings to make a difference. And thus "has a higher >> priority". >>>>> >>>>> While for >>>>> >>>>> 3. "table.optimizer.source.predicate" = "false" >>>>>> Predicate pushdown is not allowed for the planner. >>>>>> Regardless of the value of the "scan.filter-push-down.enabled" >>>>>> configuration, filter pushdown is disabled. >>>>>> In this scenario, the behavior remains consistent with the old >> version as >>>>>> well. >>>>>> >>>>> >>>>> I still think "scan.filter-push-down.enabled" should also be respected >> if >>>>> it is enabled for individual connectors. WDYT? >>>>> >>>>> Best, >>>>> Jane >>>>> >>>>> On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun <jiabao....@xtransfer.cn >> .invalid> >>>>> wrote: >>>>> >>>>>> Thanks Benchao for the feedback. >>>>>> >>>>>> For the current proposal, we recommend keeping the default value of >>>>>> "table.optimizer.source.predicate" as true, >>>>>> and setting the the default value of newly introduced option >>>>>> "scan.filter-push-down.enabled" to true as well. >>>>>> >>>>>> The main purpose of doing this is to maintain consistency with >> previous >>>>>> versions, as whether to perform >>>>>> filter pushdown in the old version solely depends on the >>>>>> "table.optimizer.source.predicate" option. >>>>>> That means by default, as long as a TableSource implements the >>>>>> SupportsFilterPushDown interface, filter pushdown is allowed. >>>>>> And it seems that we don't have much benefit in changing the default >> value >>>>>> of "table.optimizer.source.predicate" to false. >>>>>> >>>>>> Regarding the priority of these two configurations, I believe that >>>>>> "table.optimizer.source.predicate" >>>>>> takes precedence over "scan.filter-push-down.enabled" and it exhibits >> the >>>>>> following behavior. >>>>>> >>>>>> 1. "table.optimizer.source.predicate" = "true" and >>>>>> "scan.filter-push-down.enabled" = "true" >>>>>> This is the default behavior, allowing filter pushdown for sources. >>>>>> >>>>>> 2. "table.optimizer.source.predicate" = "true" and >>>>>> "scan.filter-push-down.enabled" = "false" >>>>>> Allow the planner to perform predicate pushdown, but individual >> sources do >>>>>> not enable filter pushdown. >>>>>> >>>>>> 3. "table.optimizer.source.predicate" = "false" >>>>>> Predicate pushdown is not allowed for the planner. >>>>>> Regardless of the value of the "scan.filter-push-down.enabled" >>>>>> configuration, filter pushdown is disabled. >>>>>> In this scenario, the behavior remains consistent with the old >> version as >>>>>> well. >>>>>> >>>>>> >>>>>> From an implementation perspective, setting the priority of >>>>>> "scan.filter-push-down.enabled" higher than >>>>>> "table.optimizer.source.predicate" is difficult to achieve now. >>>>>> Because the PushFilterIntoSourceScanRuleBase at the planner level >> takes >>>>>> precedence over the source-level FilterPushDownSpec. >>>>>> Only when the PushFilterIntoSourceScanRuleBase is enabled, will the >>>>>> Source-level filter pushdown be performed. >>>>>> >>>>>> Additionally, in my opinion, there doesn't seem to be much benefit in >>>>>> setting a higher priority for "scan.filter-push-down.enabled". >>>>>> It may instead affect compatibility and increase implementation >> complexity. >>>>>> >>>>>> WDYT? >>>>>> >>>>>> Best, >>>>>> Jiabao >>>>>> >>>>>> >>>>>>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> 写道: >>>>>>> >>>>>>> I agree with Jane that fine-grained configurations should have higher >>>>>>> priority than job level configurations. >>>>>>> >>>>>>> For current proposal, we can achieve that: >>>>>>> - Set "table.optimizer.source.predicate" = "true" to enable by >>>>>>> default, and set ""scan.filter-push-down.enabled" = "false" to >> disable >>>>>>> it per table source >>>>>>> - Set "table.optimizer.source.predicate" = "false" to disable by >>>>>>> default, and set ""scan.filter-push-down.enabled" = "true" to enable >>>>>>> it per table source >>>>>>> >>>>>>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 23:55写道: >>>>>>>> >>>>>>>>> >>>>>>>>> I believe that the configuration "table.optimizer.source.predicate" >>>>>> has a >>>>>>>>> higher priority at the planner level than the configuration at the >>>>>> source >>>>>>>>> level, >>>>>>>>> and it seems easy to implement now. >>>>>>>>> >>>>>>>> >>>>>>>> Correct me if I'm wrong, but I think the fine-grained configuration >>>>>>>> "scan.filter-push-down.enabled" should have a higher priority >> because >>>>>> the >>>>>>>> default value of "table.optimizer.source.predicate" is true. As a >>>>>> result, >>>>>>>> turning off filter push-down for a specific source will not take >> effect >>>>>>>> unless the default value of "table.optimizer.source.predicate" is >>>>>> changed >>>>>>>> to false, or, alternatively, let users manually set >>>>>>>> "table.optimizer.source.predicate" to false first and then >> selectively >>>>>>>> enable filter push-down for the desired sources, which is less >>>>>> intuitive. >>>>>>>> WDYT? >>>>>>>> >>>>>>>> Best, >>>>>>>> Jane >>>>>>>> >>>>>>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun <jiabao....@xtransfer.cn >>>>>> .invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks Jane, >>>>>>>>> >>>>>>>>> I believe that the configuration "table.optimizer.source.predicate" >>>>>> has a >>>>>>>>> higher priority at the planner level than the configuration at the >>>>>> source >>>>>>>>> level, >>>>>>>>> and it seems easy to implement now. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jiabao >>>>>>>>> >>>>>>>>> >>>>>>>>>> 2023年10月24日 17:36,Jane Chan <qingyue....@gmail.com> 写道: >>>>>>>>>> >>>>>>>>>> Hi Jiabao, >>>>>>>>>> >>>>>>>>>> Thanks for driving this discussion. I have a small question that >> will >>>>>>>>>> "scan.filter-push-down.enabled" take precedence over >>>>>>>>>> "table.optimizer.source.predicate" when the two parameters might >>>>>> conflict >>>>>>>>>> each other? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jane >>>>>>>>>> >>>>>>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun < >> jiabao....@xtransfer.cn >>>>>>>>> .invalid> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Jark, >>>>>>>>>>> >>>>>>>>>>> If we only add configuration without adding the >> enableFilterPushDown >>>>>>>>>>> method in the SupportsFilterPushDown interface, >>>>>>>>>>> each connector would have to handle the same logic in the >>>>>> applyFilters >>>>>>>>>>> method to determine whether filter pushdown is needed. >>>>>>>>>>> This would increase complexity and violate the original behavior >> of >>>>>> the >>>>>>>>>>> applyFilters method. >>>>>>>>>>> >>>>>>>>>>> On the contrary, we only need to pass the configuration >> parameter in >>>>>> the >>>>>>>>>>> newly added enableFilterPushDown method >>>>>>>>>>> to decide whether to perform predicate pushdown. >>>>>>>>>>> >>>>>>>>>>> I think this approach would be clearer and simpler. >>>>>>>>>>> WDYT? >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jiabao >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> 写道: >>>>>>>>>>>> >>>>>>>>>>>> Hi JIabao, >>>>>>>>>>>> >>>>>>>>>>>> I think the current interface can already satisfy your >> requirements. >>>>>>>>>>>> The connector can reject all the filters by returning the input >>>>>> filters >>>>>>>>>>>> as `Result#remainingFilters`. >>>>>>>>>>>> >>>>>>>>>>>> So maybe we don't need to introduce a new method to disable >>>>>>>>>>>> pushdown, but just introduce an option for the specific >> connector. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Jark >>>>>>>>>>>> >>>>>>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu <xbjt...@gmail.com> >> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks @Jiabao for kicking off this discussion. >>>>>>>>>>>>> >>>>>>>>>>>>> Could you add a section to explain the difference between >> proposed >>>>>>>>>>>>> connector level config `scan.filter-push-down.enabled` and >> existing >>>>>>>>>>> query >>>>>>>>>>>>> level config >> `table.optimizer.source.predicate-pushdown-enabled` ? >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Leonard >>>>>>>>>>>>> >>>>>>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun <jiabao....@xtransfer.cn >> .INVALID> >>>>>> 写道: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Devs, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would like to start a discussion on FLIP-377: support >>>>>> configuration >>>>>>>>>>> to >>>>>>>>>>>>> disable filter pushdown for Table/SQL Sources[1]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Currently, Flink Table/SQL does not expose fine-grained >> control >>>>>> for >>>>>>>>>>>>> users to enable or disable filter pushdown. >>>>>>>>>>>>>> However, filter pushdown has some side effects, such as >> additional >>>>>>>>>>>>> computational pressure on external systems. >>>>>>>>>>>>>> Moreover, Improper queries can lead to issues such as full >> table >>>>>>>>> scans, >>>>>>>>>>>>> which in turn can impact the stability of external systems. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Suppose we have an SQL query with two sources: Kafka and a >>>>>> database. >>>>>>>>>>>>>> The database is sensitive to pressure, and we want to >> configure >>>>>> it to >>>>>>>>>>>>> not perform filter pushdown to the database source. >>>>>>>>>>>>>> However, we still want to perform filter pushdown to the Kafka >>>>>> source >>>>>>>>>>> to >>>>>>>>>>>>> decrease network IO. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I propose to support configuration to disable filter push >> down for >>>>>>>>>>>>> Table/SQL sources to let user decide whether to perform filter >>>>>>>>> pushdown. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looking forward to your feedback. >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Best, >>>>>>> Benchao Li >>>>>> >>>>>> >>>> >>> >>> >>> -- >>> >>> Best, >>> Benchao Li >> >>