Thanks Becket for the feedback, Regarding concerns about common configurations, I think we can introduce FiltersApplier to unify the behavior of various connectors.
public static class FiltersApplier { private final ReadableConfig config; private final Function<List<ResolvedExpression>, Result> action; private FiltersApplier( ReadableConfig config, Function<List<ResolvedExpression>, Result> action) { this.config = config; this.action = action; } public Result applyFilters(List<ResolvedExpression> filters) { if (config.get(ENABLE_FILTER_PUSH_DOWN)) { return action.apply(filters); } else { return Result.of(Collections.emptyList(), filters); } } public static FiltersApplier of( ReadableConfig config, Function<List<ResolvedExpression>, Result> action) { return new FiltersApplier(config, action); } } For connectors implementation: @Override public Result applyFilters(List<ResolvedExpression> filters) { return FiltersApplier.of(config, f -> Result.of(new ArrayList<>(filters), Collections.emptyList())); } As for the name, whether it is "source.filter-push-down.enabled" or "source.ignore-pushed-down-filters.enabled", I think both are okay. Do you think this change is feasible? Best, Jiabao > 2023年11月15日 23:44,Becket Qin <becket....@gmail.com> 写道: > > Hi Jiabao, > > Yes, I still have concerns. > > The FLIP violates the following two principles regarding configuration: > > 1.* A config of a class should never negate the semantic of a decorative > interface implemented by that class. * > A decorative interface is a public contract with other components, while a > config is only internal to the class itself. The configurations for the > Sources are not (and should never be) visible or understood to > other components (e.g. optimizer). A configuration of a Source only > controls the behavior of that Source, provided it is not violating the API > contract / semantic defined by the decorative interface. So when a Source > implementation implements SupportsFilterPushdown, this is a clear public > contract with Flink that filters should be pushed down to that Source. > Therefore, for the same source, there should not be a configuration > "source.filter-push-down.enabled" which stops the filters from being pushed > down to that Source. However, that specific source implementation can have > its own config to control its internal behavior, e.g. > "ignore-pushed-down-filters.enabled" which may push back all the pushed > down filters back to the Flink optimizer. > > 2. When we are talking about "common configs", in fact we are talking about > "configs for common (abstract) implementation classes". With that as a > context, *a common config should always be backed by a common > implementation class, so that consistent behavior can be guaranteed. * > The LookupOptions you mentioned are configurations defined for classes > DefaultLookupCache / PeriodicCacheReloadTrigger / TimedCacheReloadTrigger. > These configs are considered as "common" only because the implementation > classes using them are common building blocks for lookup table > implementations. It would not make sense to have a dangling config in the > LookupOptions without the underlying common implementation class, but only > relies on a specific source to implement the stated behavior. > As a bad example, there is this outlier config "max-retries" in > LookupOptions, which I don't think should be here. This is because the > retry behavior can be very implementation specific. For example, there can > be many different flavors of retry related configurations, retry-backoff, > retry-timeout, retry-async, etc. Why only max-retry is put here? should all > of them be put here? If we put all such kinds of configs in the common > configs for "standardization and unification", the number of "common > configs" can easily go crazy. And I don't see material benefits of doing > that. So here I don't think the configuration "max-retry" should be in > LookupOptions, because it is not backed by any common implementation > classes. If max-retry is implemented in the HBase source, it should stay > there. For the same reason, the config proposed in this FLIP (probably with > a name less confusing for the first reason mentioned above) should stay in > the specific Source implementation. > > For the two reasons above, I am -1 to what the FLIP currently proposes. > > I think the right way to address the motivation here is still to have a > config like "ignore-pushed-down-filters.enabled" for the specific source > implementation. Please let me know if this solves the problem you are > facing. > > Thanks, > > Jiangjie (Becket) Qin > > > On Wed, Nov 15, 2023 at 11:52 AM Jiabao Sun <jiabao....@xtransfer.cn.invalid> > wrote: > >> Hi Becket, >> >> The purpose of introducing this configuration is that not all filter >> pushdowns can improve overall performance. >> If the filter can hit the external index, then pushdown is definitely >> worth it, as it can not only improve query time but also decrease network >> overhead. >> However, for filters that do not hit the external index, it may increase a >> lot of performance overhead on the external system. >> >> Undeniably, if the connector can make accurate decisions for good and bad >> filters, we may not need to introduce this configuration option to disable >> pushing down filters to the external system. >> However, it is currently not easy to achieve. >> >> IMO, supporting filter pushdown does not mean that always filter pushdown >> is better. >> In the absence of automatic decision-making, I think we should leave this >> decision to users. >> >> The newly introduced configuration option is similar to LookupOptions, >> providing unified naming and default values to avoid confusion caused by >> inconsistent naming in different connectors for users. >> Setting the default value to true allows it to maintain compatibility with >> the default behavior of "always pushdown". >> >> Do you have any other concerns about this proposal? Please let me know. >> >> Thanks, >> Jiabao >> >> >>> 2023年10月31日 17:29,Jiabao Sun <jiabao....@xtransfer.cn.INVALID> 写道: >>> >>> Hi Becket, >>> >>> Actually, for FileSystemSource, it is not always desired, only OCR file >> formats support filter pushdown. >>> >>> We can disable predicate pushdown for FileSystemSource by setting >> 'table.optimizer.source.predicate-pushdown-enabled' to false. >>> I think we can also disable filter pushdown at a more granular level >> through fine-grained configuration. >>> >>> >>> Best, >>> Jiabao >>> >>> >>>> 2023年10月31日 16:50,Becket Qin <becket....@gmail.com> 写道: >>>> >>>> Hi Jiabao, >>>> >>>> Thanks for the explanation. Maybe it's easier to explain with an >> example. >>>> >>>> Let's take FileSystemTableSource as an example. Currently it implements >>>> SupportsFilterPushDown interface. With your proposal, does it have to >>>> support `source.filter-push-down.enabled` as well? But this >> configuration >>>> does not quite make sense for the FileSystemTableSource because filter >>>> pushdown is always desired. However, because this configuration is a >> part >>>> of the SupportsFilterPushDown interface (which sounds confusing to begin >>>> with), the FileSystemTableSource can only do one of the following: >>>> >>>> 1. Ignore the user configuration to always apply the pushed down >> filters - >>>> this is an apparent anti-pattern because a configuration should always >> do >>>> what it says. >>>> 2. Throw an exception telling users that this configuration is not >>>> applicable to the FileSystemTableSource. >>>> 3. Implement this configuration to push back the pushed down filters, >> even >>>> though this is never desired. >>>> >>>> None of the above options looks awkward. I am curious what your >> solution is >>>> here? >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Tue, Oct 31, 2023 at 3:11 PM Jiabao Sun <jiabao....@xtransfer.cn >> .invalid> >>>> wrote: >>>> >>>>> Thanks Becket for the further explanation. >>>>> >>>>> Perhaps I didn't explain it clearly. >>>>> >>>>> 1. If a source does not implement the SupportsFilterPushDown interface, >>>>> the newly added configurations do not need to be added to either the >>>>> requiredOptions or optionalOptions. >>>>> Similar to LookupOptions, if a source does not implement >>>>> LookupTableSource, there is no need to add LookupOptions to either >>>>> requiredOptions or optionalOptions. >>>>> >>>>> 2. "And these configs are specific to those sources, instead of common >>>>> configs." >>>>> The newly introduced configurations define standardized names and >> default >>>>> values. >>>>> They still belong to the configuration at the individual source level. >>>>> The purpose is to avoid scattered configuration items when different >>>>> sources implement the same logic. >>>>> Whether a source should accept these configurations is determined by >> the >>>>> source's Factory. >>>>> >>>>> Best, >>>>> Jiabao >>>>> >>>>> >>>>>> 2023年10月31日 13:47,Becket Qin <becket....@gmail.com> 写道: >>>>>> >>>>>> Hi Jiabao, >>>>>> >>>>>> Please see the replies inline. >>>>>> >>>>>> Introducing common configurations does not mean that all sources must >>>>>>> accept these configuration options. >>>>>>> The configuration options supported by a source are determined by the >>>>>>> requiredOptions and optionalOptions in the Factory interface. >>>>>> >>>>>> This is not true. Both required and optional options are SUPPORTED. >> That >>>>>> means they are implemented and if one specifies an optional config it >>>>> will >>>>>> still take effect. An OptionalConfig is "Optional" because this >>>>>> configuration has a default value. Hence, it is OK that users do not >>>>>> specify their own value. In another word, it is "optional" for the end >>>>>> users to set the config, but the implementation and support for that >>>>> config >>>>>> is NOT optional. In case a source does not support a common config, an >>>>>> exception must be thrown when the config is provided by the end users. >>>>>> However, the config we are talking about in this FLIP is a common >> config >>>>>> optional to implement, meaning that sometimes the claimed behavior >> won't >>>>> be >>>>>> there even if users specify that config. >>>>>> >>>>>> Similar to sources that do not implement the LookupTableSource >> interface, >>>>>>> sources that do not implement the SupportsFilterPushDown interface >> also >>>>> do >>>>>>> not need to accept newly introduced options. >>>>>> >>>>>> First of all, filter pushdown is a behavior of the query optimizer, >> not >>>>> the >>>>>> behavior of Sources. The Sources tells the optimizer that it has the >>>>>> ability to accept pushed down filters by implementing the >>>>>> SupportsFilterPushDown interface. And this is the only contract >> between >>>>> the >>>>>> Source and Optimizer regarding whether filters should be pushed down. >> As >>>>>> long as a specific source implements this decorative interface, filter >>>>>> pushdown should always take place, i.e. >>>>>> *SupportsFilterPushDown.applyFilters()* will be called. There should >> be >>>>> no >>>>>> other config to disable that call. However, Sources can decide how to >>>>>> behave based on their own configurations after *applyFilters()* is >>>>> called. >>>>>> And these configs are specific to those sources, instead of common >>>>> configs. >>>>>> Please see the examples I mentioned in the previous email. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jiangjie (Becket) Qin >>>>>> >>>>>> On Tue, Oct 31, 2023 at 10:27 AM Jiabao Sun <jiabao....@xtransfer.cn >>>>> .invalid> >>>>>> wrote: >>>>>> >>>>>>> Hi Becket, >>>>>>> >>>>>>> Sorry, there was a typo in the second point. Let me correct it: >>>>>>> >>>>>>> Introducing common configurations does not mean that all sources must >>>>>>> accept these configuration options. >>>>>>> The configuration options supported by a source are determined by the >>>>>>> requiredOptions and optionalOptions in the Factory interface. >>>>>>> >>>>>>> Similar to sources that do not implement the LookupTableSource >>>>> interface, >>>>>>> sources that do not implement the SupportsFilterPushDown interface >> also >>>>> do >>>>>>> not need to accept newly introduced options. >>>>>>> >>>>>>> Best, >>>>>>> Jiabao >>>>>>> >>>>>>> >>>>>>>> 2023年10月31日 10:13,Jiabao Sun <jiabao....@xtransfer.cn.INVALID> 写道: >>>>>>>> >>>>>>>> Thanks Becket for the feedback. >>>>>>>> >>>>>>>> 1. Currently, the SupportsFilterPushDown#applyFilters method >> returns a >>>>>>> result that includes acceptedFilters and remainingFilters. The source >>>>> can >>>>>>> decide to push down some filters or not accept any of them. >>>>>>>> 2. Introducing common configuration options does not mean that a >> source >>>>>>> that supports the SupportsFilterPushDown capability must accept this >>>>>>> configuration. Similar to LookupOptions, only sources that implement >> the >>>>>>> LookupTableSource interface are necessary to accept these >> configuration >>>>>>> options. >>>>>>>> >>>>>>>> Best, >>>>>>>> Jiabao >>>>>>>> >>>>>>>> >>>>>>>>> 2023年10月31日 07:49,Becket Qin <becket....@gmail.com> 写道: >>>>>>>>> >>>>>>>>> Hi Jiabao and Ruanhang, >>>>>>>>> >>>>>>>>> Adding a configuration of source.filter-push-down.enabled as a >> common >>>>>>>>> source configuration seems problematic. >>>>>>>>> 1. The config name is misleading. filter pushdown should only be >>>>>>> determined >>>>>>>>> by whether the SupportsFilterPushdown interface is implemented or >> not. >>>>>>>>> 2. The behavior of this configuration is only applicable to some >>>>> source >>>>>>>>> implementations. Why is it a common configuration? >>>>>>>>> >>>>>>>>> Here's my suggestion for design principles: >>>>>>>>> 1. Only add source impl specific configuration to corresponding >>>>> sources. >>>>>>>>> 2. The configuration name should not overrule existing common >>>>> contracts. >>>>>>>>> >>>>>>>>> For example, in the case of MySql source. There are several >> options: >>>>>>>>> 1. Have a configuration of `*mysql.avoid.remote.full.table.scan`*. >> If >>>>>>> this >>>>>>>>> configuration is set, and a filter pushdown does not hit an index, >> the >>>>>>>>> MySql source impl would not further pushdown the filter to MySql >>>>>>> servers. >>>>>>>>> Note that this assumes the MySql source can retrieve the index >>>>>>> information >>>>>>>>> from the MySql servers. >>>>>>>>> 2. If the MySql index information is not available to the MySql >>>>> source, >>>>>>> the >>>>>>>>> configuration could be something like >>>>>>> *`mysql.pushback.pushed.down.filters`*. >>>>>>>>> Once set to true, MySql source would just add all the filters to >> the >>>>>>>>> RemainingFilters in the Result returned by >>>>>>>>> *SupportsFilterPushdown.applyFilters().* >>>>>>>>> 3. An alternative to option 2 is to have a ` >>>>>>>>> *mysql.apply.predicates.after.scan*`. When it is set to true, MySql >>>>>>> source >>>>>>>>> will not push the filter down to the MySql servers, but apply the >>>>>>> filters >>>>>>>>> inside the MySql source itself. >>>>>>>>> >>>>>>>>> As you may see, the above configurations do not disable filter >>>>> pushdown >>>>>>>>> itself. They just allow various implementations of filter pushdown. >>>>> And >>>>>>> the >>>>>>>>> configuration name does not give any illusion that filter pushdown >> is >>>>>>>>> disabled. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>> >>>>>>>>> On Mon, Oct 30, 2023 at 11:58 PM Jiabao Sun < >> jiabao....@xtransfer.cn >>>>>>> .invalid> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Hang for the suggestion. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think the configuration of TableSource is not closely related to >>>>>>>>>> SourceReader, >>>>>>>>>> so I prefer to introduce a independent configuration class >>>>>>>>>> TableSourceOptions in the flink-table-common module, similar to >>>>>>>>>> LookupOptions. >>>>>>>>>> >>>>>>>>>> For the second point, I suggest adding Java doc to the >>>>>>> SupportsXXXPushDown >>>>>>>>>> interfaces, providing detailed information on these options that >>>>> needs >>>>>>> to >>>>>>>>>> be supported. >>>>>>>>>> >>>>>>>>>> I have made updates in the FLIP document. >>>>>>>>>> Please help check it again. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jiabao >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2023年10月30日 17:23,Hang Ruan <ruanhang1...@gmail.com> 写道: >>>>>>>>>>> >>>>>>>>>>> Thanks for the improvements, Jiabao. >>>>>>>>>>> >>>>>>>>>>> There are some details that I am not sure about. >>>>>>>>>>> 1. The new option `source.filter-push-down.enabled` will be >> added to >>>>>>>>>> which >>>>>>>>>>> class? I think it should be `SourceReaderOptions`. >>>>>>>>>>> 2. How are the connector developers able to know and follow the >>>>> FLIP? >>>>>>> Do >>>>>>>>>> we >>>>>>>>>>> need an abstract base class or provide a default method? >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Hang >>>>>>>>>>> >>>>>>>>>>> Jiabao Sun <jiabao....@xtransfer.cn.invalid> 于2023年10月30日周一 >>>>> 14:45写道: >>>>>>>>>>> >>>>>>>>>>>> Hi, all, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the lively discussion. >>>>>>>>>>>> >>>>>>>>>>>> Based on the discussion, I have made some adjustments to the >> FLIP >>>>>>>>>> document: >>>>>>>>>>>> >>>>>>>>>>>> 1. The name of the newly added option has been changed to >>>>>>>>>>>> "source.filter-push-down.enabled". >>>>>>>>>>>> 2. Considering compatibility with older versions, the newly >> added >>>>>>>>>>>> "source.filter-push-down.enabled" option needs to respect the >>>>>>>>>> optimizer's >>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" option. >>>>>>>>>>>> But there is a consideration to remove the old option in Flink >> 2.0. >>>>>>>>>>>> 3. We can provide more options to disable other source abilities >>>>> with >>>>>>>>>> side >>>>>>>>>>>> effects, such as “source.aggregate.enabled” and >>>>>>>>>> “source.projection.enabled" >>>>>>>>>>>> This is not urgent and can be continuously introduced. >>>>>>>>>>>> >>>>>>>>>>>> Looking forward to your feedback again. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Jiabao >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 2023年10月29日 08:45,Becket Qin <becket....@gmail.com> 写道: >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for digging into the git history, Jark. I agree it makes >>>>>>> sense >>>>>>>>>> to >>>>>>>>>>>>> deprecate this API in 2.0. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> >>>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Oct 27, 2023 at 5:47 PM Jark Wu <imj...@gmail.com> >> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Becket, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I checked the history of " >>>>>>>>>>>>>> *table.optimizer.source.predicate-pushdown-enabled*", >>>>>>>>>>>>>> it seems it was introduced since the legacy >> FilterableTableSource >>>>>>>>>>>>>> interface >>>>>>>>>>>>>> which might be an experiential feature at that time. I don't >> see >>>>>>> the >>>>>>>>>>>>>> necessity >>>>>>>>>>>>>> of this option at the moment. Maybe we can deprecate this >> option >>>>>>> and >>>>>>>>>>>> drop >>>>>>>>>>>>>> it >>>>>>>>>>>>>> in Flink 2.0[1] if it is not necessary anymore. This may help >> to >>>>>>>>>>>>>> simplify this discussion. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Jark >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1]: https://issues.apache.org/jira/browse/FLINK-32383 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 26 Oct 2023 at 10:14, Becket Qin < >> becket....@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the proposal, Jiabao. My two cents below: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. If I understand correctly, the motivation of the FLIP is >>>>>>> mainly to >>>>>>>>>>>>>>> make predicate pushdown optional on SOME of the Sources. If >> so, >>>>>>>>>>>> intuitively >>>>>>>>>>>>>>> the configuration should be Source specific instead of >> general. >>>>>>>>>>>> Otherwise, >>>>>>>>>>>>>>> we will end up with general configurations that may not take >>>>>>> effect >>>>>>>>>> for >>>>>>>>>>>>>>> some of the Source implementations. This violates the basic >> rule >>>>>>> of a >>>>>>>>>>>>>>> configuration - it does what it says, regardless of the >>>>>>>>>> implementation. >>>>>>>>>>>>>>> While configuration standardization is usually a good thing, >> it >>>>>>>>>> should >>>>>>>>>>>> not >>>>>>>>>>>>>>> break the basic rules. >>>>>>>>>>>>>>> If we really want to have this general configuration, for the >>>>>>> sources >>>>>>>>>>>>>>> this configuration does not apply, they should throw an >>>>> exception >>>>>>> to >>>>>>>>>>>> make >>>>>>>>>>>>>>> it clear that this configuration is not supported. However, >> that >>>>>>>>>> seems >>>>>>>>>>>> ugly. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. I think the actual motivation of this FLIP is about "how a >>>>>>> source >>>>>>>>>>>>>>> should implement predicate pushdown efficiently", not >> "whether >>>>>>>>>>>> predicate >>>>>>>>>>>>>>> pushdown should be applied to the source." For example, if a >>>>>>> source >>>>>>>>>>>> wants >>>>>>>>>>>>>>> to avoid additional computing load in the external system, it >>>>> can >>>>>>>>>>>> always >>>>>>>>>>>>>>> read the entire record and apply the predicates by itself. >>>>>>> However, >>>>>>>>>>>> from >>>>>>>>>>>>>>> the Flink perspective, the predicate pushdown is applied, it >> is >>>>>>> just >>>>>>>>>>>>>>> implemented differently by the source. So the design >> principle >>>>>>> here >>>>>>>>>> is >>>>>>>>>>>> that >>>>>>>>>>>>>>> Flink only cares about whether a source supports predicate >>>>>>> pushdown >>>>>>>>>> or >>>>>>>>>>>> not, >>>>>>>>>>>>>>> it does not care about the implementation efficiency / side >>>>>>> effect of >>>>>>>>>>>> the >>>>>>>>>>>>>>> predicates pushdown. It is the Source implementation's >>>>>>> responsibility >>>>>>>>>>>> to >>>>>>>>>>>>>>> ensure the predicates pushdown is implemented efficiently and >>>>> does >>>>>>>>>> not >>>>>>>>>>>>>>> impose excessive pressure on the external system. And it is >> OK >>>>> to >>>>>>>>>> have >>>>>>>>>>>>>>> additional configurations to achieve this goal. Obviously, >> such >>>>>>>>>>>>>>> configurations will be source specific in this case. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 3. Regarding the existing configurations of >>>>>>>>>>>> *table.optimizer.source.predicate-pushdown-enabled. >>>>>>>>>>>>>>> *I am not sure why we need it. Supposedly, if a source >>>>> implements >>>>>>> a >>>>>>>>>>>>>>> SupportsXXXPushDown interface, the optimizer should push the >>>>>>>>>>>> corresponding >>>>>>>>>>>>>>> predicates to the Source. I am not sure in which case this >>>>>>>>>>>> configuration >>>>>>>>>>>>>>> would be used. Any ideas @Jark Wu <imj...@gmail.com>? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Oct 25, 2023 at 11:55 PM Jiabao Sun >>>>>>>>>>>>>>> <jiabao....@xtransfer.cn.invalid> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks Jane for the detailed explanation. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think that for users, we should respect conventions over >>>>>>>>>>>>>>>> configurations. >>>>>>>>>>>>>>>> Conventions can be default values explicitly specified in >>>>>>>>>>>>>>>> configurations, or they can be behaviors that follow >> previous >>>>>>>>>>>> versions. >>>>>>>>>>>>>>>> If the same code has different behaviors in different >> versions, >>>>>>> it >>>>>>>>>>>> would >>>>>>>>>>>>>>>> be a very bad thing. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I agree that for regular users, it is not necessary to >>>>> understand >>>>>>>>>> all >>>>>>>>>>>>>>>> the configurations related to Flink. >>>>>>>>>>>>>>>> By following conventions, they can have a good experience. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Let's get back to the practical situation and consider it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Case 1: >>>>>>>>>>>>>>>> The user is not familiar with the purpose of the >>>>>>>>>>>>>>>> table.optimizer.source.predicate-pushdown-enabled >> configuration >>>>>>> but >>>>>>>>>>>> follows >>>>>>>>>>>>>>>> the convention of allowing predicate pushdown to the source >> by >>>>>>>>>>>> default. >>>>>>>>>>>>>>>> Just understanding the source.predicate-pushdown-enabled >>>>>>>>>> configuration >>>>>>>>>>>>>>>> and performing fine-grained toggle control will work well. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Case 2: >>>>>>>>>>>>>>>> The user understands the meaning of the >>>>>>>>>>>>>>>> table.optimizer.source.predicate-pushdown-enabled >> configuration >>>>>>> and >>>>>>>>>>>> has set >>>>>>>>>>>>>>>> its value to false. >>>>>>>>>>>>>>>> We have reason to believe that the user understands the >> meaning >>>>>>> of >>>>>>>>>> the >>>>>>>>>>>>>>>> predicate pushdown configuration and the intention is to >>>>> disable >>>>>>>>>>>> predicate >>>>>>>>>>>>>>>> pushdown (rather than whether or not to allow it). >>>>>>>>>>>>>>>> The previous choice of globally disabling it is likely >> because >>>>> it >>>>>>>>>>>>>>>> couldn't be disabled on individual sources. >>>>>>>>>>>>>>>> From this perspective, if we provide more fine-grained >>>>>>> configuration >>>>>>>>>>>>>>>> support and provide detailed explanations of the >> configuration >>>>>>>>>>>> behaviors in >>>>>>>>>>>>>>>> the documentation, >>>>>>>>>>>>>>>> users can clearly understand the differences between these >> two >>>>>>>>>>>>>>>> configurations and use them correctly. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, I don't agree that >>>>>>>>>>>>>>>> table.optimizer.source.predicate-pushdown-enabled = true and >>>>>>>>>>>>>>>> source.predicate-pushdown-enabled = false means that the >> local >>>>>>>>>>>>>>>> configuration overrides the global configuration. >>>>>>>>>>>>>>>> On the contrary, both configurations are functioning >> correctly. >>>>>>>>>>>>>>>> The optimizer allows predicate pushdown to all sources, but >>>>> some >>>>>>>>>>>> sources >>>>>>>>>>>>>>>> can reject the filters pushed down by the optimizer. >>>>>>>>>>>>>>>> This is natural, just like different components at different >>>>>>> levels >>>>>>>>>>>> are >>>>>>>>>>>>>>>> responsible for different tasks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The more serious issue is that if >>>>>>>>>> "source.predicate-pushdown-enabled" >>>>>>>>>>>>>>>> does not respect >>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled”, >>>>>>>>>>>>>>>> the "table.optimizer.source.predicate-pushdown-enabled" >>>>>>>>>> configuration >>>>>>>>>>>>>>>> will be invalidated. >>>>>>>>>>>>>>>> This means that regardless of whether >>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" is set >> to >>>>>>> true >>>>>>>>>> or >>>>>>>>>>>>>>>> false, it will have no effect. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2023年10月25日 22:24,Jane Chan <qingyue....@gmail.com> 写道: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Jiabao, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for the in-depth clarification. Here are my cents >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> However, >> "table.optimizer.source.predicate-pushdown-enabled" >>>>> and >>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" are configurations for >>>>>>> different >>>>>>>>>>>>>>>>>> components(optimizer and source operator). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We cannot assume that every user would be interested in >>>>>>>>>> understanding >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> internal components of Flink, such as the optimizer or >>>>>>> connectors, >>>>>>>>>>>> and >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> specific configurations associated with each component. >>>>> Instead, >>>>>>>>>>>> users >>>>>>>>>>>>>>>>> might be more concerned about knowing which configuration >>>>>>> enables >>>>>>>>>> or >>>>>>>>>>>>>>>>> disables the filter push-down feature for all source >>>>> connectors, >>>>>>>>>> and >>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>> parameter provides the flexibility to override this >> behavior >>>>>>> for a >>>>>>>>>>>>>>>> single >>>>>>>>>>>>>>>>> source if needed. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So, from this perspective, I am inclined to divide these >> two >>>>>>>>>>>> parameters >>>>>>>>>>>>>>>>> based on the scope of their impact from the user's >> perspective >>>>>>>>>> (i.e. >>>>>>>>>>>>>>>>> global-level or operator-level), rather than categorizing >> them >>>>>>>>>> based >>>>>>>>>>>>>>>> on the >>>>>>>>>>>>>>>>> component hierarchy from a developer's point of view. >>>>> Therefore, >>>>>>>>>>>> based >>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> this premise, it is intuitive and natural for users to >>>>>>>>>>>>>>>>> understand fine-grained configuration options can override >>>>>>> global >>>>>>>>>>>>>>>>> configurations. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Additionally, if "scan.filter-push-down.enabled" doesn't >>>>>>> respect to >>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" and >> the >>>>>>>>>> default >>>>>>>>>>>>>>>> value >>>>>>>>>>>>>>>>>> of "scan.filter-push-down.enabled" is defined as true, >>>>>>>>>>>>>>>>>> it means that just modifying >>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" as >> false >>>>>>> will >>>>>>>>>>>>>>>> have no >>>>>>>>>>>>>>>>>> effect, and filter pushdown will still be performed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If we define the default value of >>>>>>> "scan.filter-push-down.enabled" >>>>>>>>>> as >>>>>>>>>>>>>>>>>> false, it would introduce a difference in behavior >> compared >>>>> to >>>>>>> the >>>>>>>>>>>>>>>> previous >>>>>>>>>>>>>>>>>> version. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <1>If I understand correctly, >> "scan.filter-push-down.enabled" >>>>>>> is a >>>>>>>>>>>>>>>>> connector option, which means the only way to configure it >> is >>>>> to >>>>>>>>>>>>>>>> explicitly >>>>>>>>>>>>>>>>> specify it in DDL (no matter whether disable or enable), >> and >>>>> the >>>>>>>>>> SET >>>>>>>>>>>>>>>>> command is not applicable, so I think it's natural to still >>>>>>> respect >>>>>>>>>>>>>>>> user's >>>>>>>>>>>>>>>>> specification here. Otherwise, users might be more confused >>>>>>> about >>>>>>>>>> why >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> DDL does not work as expected, and the reason is just >> because >>>>>>> some >>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>> "optimizer" configuration is set to a different value. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <2> From the implementation side, I am inclined to keep the >>>>>>>>>>>> parameter's >>>>>>>>>>>>>>>>> priority consistent for all conditions. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Let "global" denote >>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled", >>>>>>>>>>>>>>>>> and let "per-source" denote "scan.filter-push-down.enabled" >>>>> for >>>>>>>>>>>>>>>> specific >>>>>>>>>>>>>>>>> source T, the following Truth table (based on the current >>>>>>> design) >>>>>>>>>>>>>>>>> indicates the inconsistent behavior for "per-source >> override >>>>>>>>>> global". >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> .------------.---------------.------------------- >>>>>>>>>>>>>>>>> ----.-------------------------------------. >>>>>>>>>>>>>>>>> | global | per-source | push-down for T | per-source >>>>> override >>>>>>>>>>>> global >>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >> :-----------+--------------+-----------------------+------------------------------------: >>>>>>>>>>>>>>>>> | true | false | false | Y >>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >> :-----------+--------------+-----------------------+------------------------------------: >>>>>>>>>>>>>>>>> | false | true | false | N >>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >> .------------.---------------.-----------------------.-------------------------------------. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Jane >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Oct 25, 2023 at 6:22 PM Jiabao Sun < >>>>>>>>>> jiabao....@xtransfer.cn >>>>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks Benchao for the feedback. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I understand that the configuration of global parallelism >> and >>>>>>> task >>>>>>>>>>>>>>>>>> parallelism is at different granularities but with the >> same >>>>>>>>>>>>>>>> configuration. >>>>>>>>>>>>>>>>>> However, >> "table.optimizer.source.predicate-pushdown-enabled" >>>>>>> and >>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" are configurations for >>>>>>> different >>>>>>>>>>>>>>>>>> components(optimizer and source operator). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> From a user's perspective, there are two scenarios: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. Disabling all filter pushdown >>>>>>>>>>>>>>>>>> In this case, setting >>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" >>>>>>>>>>>>>>>>>> to false is sufficient to meet the requirement. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2. Disabling filter pushdown for specific sources >>>>>>>>>>>>>>>>>> In this scenario, there is no need to adjust the value of >>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled". >>>>>>>>>>>>>>>>>> Instead, the focus should be on the configuration of >>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" to meet the requirement. >>>>>>>>>>>>>>>>>> In this case, users do not need to set >>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" to >> false >>>>>>> and >>>>>>>>>>>>>>>> manually >>>>>>>>>>>>>>>>>> enable filter pushdown for specific sources. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Additionally, if "scan.filter-push-down.enabled" doesn't >>>>>>> respect >>>>>>>>>> to >>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" and >> the >>>>>>>>>> default >>>>>>>>>>>>>>>> value >>>>>>>>>>>>>>>>>> of "scan.filter-push-down.enabled" is defined as true, >>>>>>>>>>>>>>>>>> it means that just modifying >>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate-pushdown-enabled" as >> false >>>>>>> will >>>>>>>>>>>>>>>> have no >>>>>>>>>>>>>>>>>> effect, and filter pushdown will still be performed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If we define the default value of >>>>>>> "scan.filter-push-down.enabled" >>>>>>>>>> as >>>>>>>>>>>>>>>>>> false, it would introduce a difference in behavior >> compared >>>>> to >>>>>>> the >>>>>>>>>>>>>>>> previous >>>>>>>>>>>>>>>>>> version. >>>>>>>>>>>>>>>>>> The same SQL query that could successfully push down >> filters >>>>> in >>>>>>>>>> the >>>>>>>>>>>>>>>> old >>>>>>>>>>>>>>>>>> version but would no longer do so after the upgrade. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2023年10月25日 17:10,Benchao Li <libenc...@apache.org> 写道: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks Jiabao for the detailed explanations, that helps a >>>>>>> lot, I >>>>>>>>>>>>>>>>>>> understand your rationale now. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Correct me if I'm wrong. Your perspective is from >>>>> "developer", >>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>> means there is an optimizer and connector component, and >> if >>>>> we >>>>>>>>>> want >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> enable this feature (pushing filters down into >> connectors), >>>>>>> you >>>>>>>>>>>> must >>>>>>>>>>>>>>>>>>> enable it firstly in optimizer, and only then connector >> has >>>>>>> the >>>>>>>>>>>>>>>> chance >>>>>>>>>>>>>>>>>>> to decide to use it or not. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> My perspective is from "user" that (Why a user should >> care >>>>>>> about >>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> difference of optimizer/connector) , this is a feature, >> and >>>>>>> has >>>>>>>>>> two >>>>>>>>>>>>>>>>>>> way to control it, one way is to config it job-level, the >>>>>>> other >>>>>>>>>> one >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>> in table properties. What a user expects is that they can >>>>>>>>>> control a >>>>>>>>>>>>>>>>>>> feature in a tiered way, that setting it per job, and >> then >>>>>>>>>>>>>>>>>>> fine-grained tune it per table. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> This is some kind of similar to other concepts, such as >>>>>>>>>>>> parallelism, >>>>>>>>>>>>>>>>>>> users can set a job level default parallelism, and then >>>>>>>>>>>> fine-grained >>>>>>>>>>>>>>>>>>> tune it per operator. There may be more such debate in >> the >>>>>>> future >>>>>>>>>>>>>>>>>>> e.g., we can have a job level config about adding key-by >>>>>>> before >>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>> join, and also a hint/table property way to fine-grained >>>>>>> control >>>>>>>>>> it >>>>>>>>>>>>>>>>>>> per lookup operator. Hence we'd better find a unified way >>>>> for >>>>>>> all >>>>>>>>>>>>>>>>>>> those similar kind of features. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Jiabao Sun <jiabao....@xtransfer.cn.invalid> >> 于2023年10月25日周三 >>>>>>>>>>>> 15:27写道: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks Jane for further explanation. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> These two configurations correspond to different levels. >>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" does not make >>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" invalid. >>>>>>>>>>>>>>>>>>>> The planner will still push down predicates to all >> sources. >>>>>>>>>>>>>>>>>>>> Whether filter pushdown is allowed or not is determined >> by >>>>>>> the >>>>>>>>>>>>>>>> specific >>>>>>>>>>>>>>>>>> source's "scan.filter-push-down.enabled" configuration. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> However, "table.optimizer.source.predicate" does >> directly >>>>>>> affect >>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled”. >>>>>>>>>>>>>>>>>>>> When the planner disables predicate pushdown, the >>>>>>> source-level >>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>> pushdown will also not be executed, even if the source >> allows >>>>>>>>>> filter >>>>>>>>>>>>>>>>>> pushdown. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Whatever, in point 1 and 2, our expectation is >> consistent. >>>>>>>>>>>>>>>>>>>> For the 3rd point, I still think that the planner-level >>>>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>>> takes precedence over the source-level configuration. >>>>>>>>>>>>>>>>>>>> It may seem counterintuitive when we globally disable >>>>>>> predicate >>>>>>>>>>>>>>>>>> pushdown but allow filter pushdown at the source level. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2023年10月25日 14:35,Jane Chan <qingyue....@gmail.com> >> 写道: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Jiabao, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks for clarifying this. While by >>>>>>>>>>>> "scan.filter-push-down.enabled >>>>>>>>>>>>>>>>>> takes a >>>>>>>>>>>>>>>>>>>>> higher priority" I meant that this value should be >>>>> respected >>>>>>>>>>>>>>>> whenever >>>>>>>>>>>>>>>>>> it is >>>>>>>>>>>>>>>>>>>>> set explicitly. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The conclusion that >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2. "table.optimizer.source.predicate" = "true" and >>>>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" = "false" >>>>>>>>>>>>>>>>>>>>>> Allow the planner to perform predicate pushdown, but >>>>>>>>>> individual >>>>>>>>>>>>>>>>>> sources do >>>>>>>>>>>>>>>>>>>>>> not enable filter pushdown. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This indicates that the option >>>>>>> "scan.filter-push-down.enabled = >>>>>>>>>>>>>>>> false" >>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>> an individual source connector does indeed override the >>>>>>>>>>>>>>>> global-level >>>>>>>>>>>>>>>>>>>>> planner settings to make a difference. And thus "has a >>>>>>> higher >>>>>>>>>>>>>>>>>> priority". >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> While for >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 3. "table.optimizer.source.predicate" = "false" >>>>>>>>>>>>>>>>>>>>>> Predicate pushdown is not allowed for the planner. >>>>>>>>>>>>>>>>>>>>>> Regardless of the value of the >>>>>>> "scan.filter-push-down.enabled" >>>>>>>>>>>>>>>>>>>>>> configuration, filter pushdown is disabled. >>>>>>>>>>>>>>>>>>>>>> In this scenario, the behavior remains consistent with >>>>> the >>>>>>> old >>>>>>>>>>>>>>>>>> version as >>>>>>>>>>>>>>>>>>>>>> well. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I still think "scan.filter-push-down.enabled" should >> also >>>>> be >>>>>>>>>>>>>>>> respected >>>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>>> it is enabled for individual connectors. WDYT? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>> Jane >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun < >>>>>>>>>>>>>>>> jiabao....@xtransfer.cn >>>>>>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks Benchao for the feedback. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> For the current proposal, we recommend keeping the >>>>> default >>>>>>>>>> value >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" as true, >>>>>>>>>>>>>>>>>>>>>> and setting the the default value of newly introduced >>>>>>> option >>>>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" to true as well. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The main purpose of doing this is to maintain >> consistency >>>>>>> with >>>>>>>>>>>>>>>>>> previous >>>>>>>>>>>>>>>>>>>>>> versions, as whether to perform >>>>>>>>>>>>>>>>>>>>>> filter pushdown in the old version solely depends on >> the >>>>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" option. >>>>>>>>>>>>>>>>>>>>>> That means by default, as long as a TableSource >>>>> implements >>>>>>> the >>>>>>>>>>>>>>>>>>>>>> SupportsFilterPushDown interface, filter pushdown is >>>>>>> allowed. >>>>>>>>>>>>>>>>>>>>>> And it seems that we don't have much benefit in >> changing >>>>>>> the >>>>>>>>>>>>>>>> default >>>>>>>>>>>>>>>>>> value >>>>>>>>>>>>>>>>>>>>>> of "table.optimizer.source.predicate" to false. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Regarding the priority of these two configurations, I >>>>>>> believe >>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" >>>>>>>>>>>>>>>>>>>>>> takes precedence over "scan.filter-push-down.enabled" >> and >>>>>>> it >>>>>>>>>>>>>>>> exhibits >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> following behavior. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1. "table.optimizer.source.predicate" = "true" and >>>>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" = "true" >>>>>>>>>>>>>>>>>>>>>> This is the default behavior, allowing filter pushdown >>>>> for >>>>>>>>>>>>>>>> sources. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 2. "table.optimizer.source.predicate" = "true" and >>>>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" = "false" >>>>>>>>>>>>>>>>>>>>>> Allow the planner to perform predicate pushdown, but >>>>>>>>>> individual >>>>>>>>>>>>>>>>>> sources do >>>>>>>>>>>>>>>>>>>>>> not enable filter pushdown. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 3. "table.optimizer.source.predicate" = "false" >>>>>>>>>>>>>>>>>>>>>> Predicate pushdown is not allowed for the planner. >>>>>>>>>>>>>>>>>>>>>> Regardless of the value of the >>>>>>> "scan.filter-push-down.enabled" >>>>>>>>>>>>>>>>>>>>>> configuration, filter pushdown is disabled. >>>>>>>>>>>>>>>>>>>>>> In this scenario, the behavior remains consistent with >>>>> the >>>>>>> old >>>>>>>>>>>>>>>>>> version as >>>>>>>>>>>>>>>>>>>>>> well. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> From an implementation perspective, setting the >> priority >>>>> of >>>>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" higher than >>>>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" is difficult to >>>>> achieve >>>>>>>>>> now. >>>>>>>>>>>>>>>>>>>>>> Because the PushFilterIntoSourceScanRuleBase at the >>>>> planner >>>>>>>>>>>> level >>>>>>>>>>>>>>>>>> takes >>>>>>>>>>>>>>>>>>>>>> precedence over the source-level FilterPushDownSpec. >>>>>>>>>>>>>>>>>>>>>> Only when the PushFilterIntoSourceScanRuleBase is >>>>> enabled, >>>>>>>>>> will >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> Source-level filter pushdown be performed. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Additionally, in my opinion, there doesn't seem to be >>>>> much >>>>>>>>>>>>>>>> benefit in >>>>>>>>>>>>>>>>>>>>>> setting a higher priority for >>>>>>> "scan.filter-push-down.enabled". >>>>>>>>>>>>>>>>>>>>>> It may instead affect compatibility and increase >>>>>>>>>> implementation >>>>>>>>>>>>>>>>>> complexity. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 2023年10月25日 11:56,Benchao Li <libenc...@apache.org> >> 写道: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I agree with Jane that fine-grained configurations >>>>> should >>>>>>>>>> have >>>>>>>>>>>>>>>> higher >>>>>>>>>>>>>>>>>>>>>>> priority than job level configurations. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> For current proposal, we can achieve that: >>>>>>>>>>>>>>>>>>>>>>> - Set "table.optimizer.source.predicate" = "true" to >>>>>>> enable >>>>>>>>>> by >>>>>>>>>>>>>>>>>>>>>>> default, and set ""scan.filter-push-down.enabled" = >>>>>>> "false" >>>>>>>>>> to >>>>>>>>>>>>>>>>>> disable >>>>>>>>>>>>>>>>>>>>>>> it per table source >>>>>>>>>>>>>>>>>>>>>>> - Set "table.optimizer.source.predicate" = "false" to >>>>>>> disable >>>>>>>>>>>> by >>>>>>>>>>>>>>>>>>>>>>> default, and set ""scan.filter-push-down.enabled" = >>>>>>> "true" to >>>>>>>>>>>>>>>> enable >>>>>>>>>>>>>>>>>>>>>>> it per table source >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Jane Chan <qingyue....@gmail.com> 于2023年10月24日周二 >>>>> 23:55写道: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I believe that the configuration >>>>>>>>>>>>>>>> "table.optimizer.source.predicate" >>>>>>>>>>>>>>>>>>>>>> has a >>>>>>>>>>>>>>>>>>>>>>>>> higher priority at the planner level than the >>>>>>> configuration >>>>>>>>>>>> at >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> source >>>>>>>>>>>>>>>>>>>>>>>>> level, >>>>>>>>>>>>>>>>>>>>>>>>> and it seems easy to implement now. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Correct me if I'm wrong, but I think the >> fine-grained >>>>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" should have a higher >>>>>>>>>> priority >>>>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>> default value of "table.optimizer.source.predicate" >> is >>>>>>> true. >>>>>>>>>>>> As >>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>> result, >>>>>>>>>>>>>>>>>>>>>>>> turning off filter push-down for a specific source >> will >>>>>>> not >>>>>>>>>>>> take >>>>>>>>>>>>>>>>>> effect >>>>>>>>>>>>>>>>>>>>>>>> unless the default value of >>>>>>>>>> "table.optimizer.source.predicate" >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>> changed >>>>>>>>>>>>>>>>>>>>>>>> to false, or, alternatively, let users manually set >>>>>>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" to false first >> and >>>>>>> then >>>>>>>>>>>>>>>>>> selectively >>>>>>>>>>>>>>>>>>>>>>>> enable filter push-down for the desired sources, >> which >>>>> is >>>>>>>>>> less >>>>>>>>>>>>>>>>>>>>>> intuitive. >>>>>>>>>>>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>> Jane >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun < >>>>>>>>>>>>>>>> jiabao....@xtransfer.cn >>>>>>>>>>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jane, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I believe that the configuration >>>>>>>>>>>>>>>> "table.optimizer.source.predicate" >>>>>>>>>>>>>>>>>>>>>> has a >>>>>>>>>>>>>>>>>>>>>>>>> higher priority at the planner level than the >>>>>>> configuration >>>>>>>>>>>> at >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> source >>>>>>>>>>>>>>>>>>>>>>>>> level, >>>>>>>>>>>>>>>>>>>>>>>>> and it seems easy to implement now. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 2023年10月24日 17:36,Jane Chan < >> qingyue....@gmail.com> >>>>>>> 写道: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jiabao, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this discussion. I have a small >>>>>>>>>> question >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>>>>>> "scan.filter-push-down.enabled" take precedence >> over >>>>>>>>>>>>>>>>>>>>>>>>>> "table.optimizer.source.predicate" when the two >>>>>>> parameters >>>>>>>>>>>>>>>> might >>>>>>>>>>>>>>>>>>>>>> conflict >>>>>>>>>>>>>>>>>>>>>>>>>> each other? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>> Jane >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun < >>>>>>>>>>>>>>>>>> jiabao....@xtransfer.cn >>>>>>>>>>>>>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jark, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> If we only add configuration without adding the >>>>>>>>>>>>>>>>>> enableFilterPushDown >>>>>>>>>>>>>>>>>>>>>>>>>>> method in the SupportsFilterPushDown interface, >>>>>>>>>>>>>>>>>>>>>>>>>>> each connector would have to handle the same >> logic >>>>> in >>>>>>> the >>>>>>>>>>>>>>>>>>>>>> applyFilters >>>>>>>>>>>>>>>>>>>>>>>>>>> method to determine whether filter pushdown is >>>>> needed. >>>>>>>>>>>>>>>>>>>>>>>>>>> This would increase complexity and violate the >>>>>>> original >>>>>>>>>>>>>>>> behavior >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>> applyFilters method. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On the contrary, we only need to pass the >>>>>>> configuration >>>>>>>>>>>>>>>>>> parameter in >>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>> newly added enableFilterPushDown method >>>>>>>>>>>>>>>>>>>>>>>>>>> to decide whether to perform predicate pushdown. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I think this approach would be clearer and >> simpler. >>>>>>>>>>>>>>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2023年10月24日 16:58,Jark Wu <imj...@gmail.com> >> 写道: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi JIabao, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I think the current interface can already >> satisfy >>>>>>> your >>>>>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>>>>>>>>>>>>>>> The connector can reject all the filters by >>>>> returning >>>>>>>>>> the >>>>>>>>>>>>>>>> input >>>>>>>>>>>>>>>>>>>>>> filters >>>>>>>>>>>>>>>>>>>>>>>>>>>> as `Result#remainingFilters`. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> So maybe we don't need to introduce a new >> method to >>>>>>>>>>>> disable >>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, but just introduce an option for the >>>>>>> specific >>>>>>>>>>>>>>>>>> connector. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu < >>>>>>>>>>>> xbjt...@gmail.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks @Jiabao for kicking off this discussion. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Could you add a section to explain the >> difference >>>>>>>>>> between >>>>>>>>>>>>>>>>>> proposed >>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector level config >>>>>>> `scan.filter-push-down.enabled` >>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> existing >>>>>>>>>>>>>>>>>>>>>>>>>>> query >>>>>>>>>>>>>>>>>>>>>>>>>>>>> level config >>>>>>>>>>>>>>>>>> `table.optimizer.source.predicate-pushdown-enabled` ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Leonard >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2023年10月24日 下午4:18,Jiabao Sun < >>>>>>>>>> jiabao....@xtransfer.cn >>>>>>>>>>>>>>>>>> .INVALID> >>>>>>>>>>>>>>>>>>>>>> 写道: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Devs, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to start a discussion on >> FLIP-377: >>>>>>>>>> support >>>>>>>>>>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>> disable filter pushdown for Table/SQL >> Sources[1]. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently, Flink Table/SQL does not expose >>>>>>>>>> fine-grained >>>>>>>>>>>>>>>>>> control >>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users to enable or disable filter pushdown. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, filter pushdown has some side >> effects, >>>>>>> such >>>>>>>>>> as >>>>>>>>>>>>>>>>>> additional >>>>>>>>>>>>>>>>>>>>>>>>>>>>> computational pressure on external systems. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Moreover, Improper queries can lead to issues >>>>> such >>>>>>> as >>>>>>>>>>>> full >>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>> scans, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which in turn can impact the stability of >> external >>>>>>>>>>>> systems. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Suppose we have an SQL query with two sources: >>>>>>> Kafka >>>>>>>>>>>> and a >>>>>>>>>>>>>>>>>>>>>> database. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The database is sensitive to pressure, and we >>>>> want >>>>>>> to >>>>>>>>>>>>>>>>>> configure >>>>>>>>>>>>>>>>>>>>>> it to >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not perform filter pushdown to the database >>>>> source. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, we still want to perform filter >> pushdown >>>>>>> to >>>>>>>>>> the >>>>>>>>>>>>>>>> Kafka >>>>>>>>>>>>>>>>>>>>>> source >>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>> decrease network IO. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I propose to support configuration to disable >>>>>>> filter >>>>>>>>>>>> push >>>>>>>>>>>>>>>>>> down for >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Table/SQL sources to let user decide whether to >>>>>>> perform >>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>>>>>>>> pushdown. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your feedback. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jiabao >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>> Benchao Li >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> Benchao Li >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >> >>