So if NestedFieldReferenceExpression doesn't need inputIndex, is there a need to introduce a base class `ReferenceExpression`?
Best, Jingsong On Mon, Aug 28, 2023 at 2:09 PM Jingsong Li <jingsongl...@gmail.com> wrote: > > Hi thanks all for your discussion. > > What is inputIndex in NestedFieldReferenceExpression? > > I know inputIndex has special usage in FieldReferenceExpression, but > it is only for Join operators, and it is only for SQL optimization. It > looks like there is no requirement for Nested. > > Best, > Jingsong > > On Mon, Aug 28, 2023 at 1:13 PM Venkatakrishnan Sowrirajan > <vsowr...@asu.edu> wrote: > > > > Thanks for all the feedback and discussion everyone. Looks like we have > > reached a consensus here. > > > > Just to summarize: > > > > 1. Introduce a new *ReferenceExpression* (or *BaseReferenceExpression*) > > abstract class which will be extended by both *FieldReferenceExpression* > > and *NestedFieldReferenceExpression* (to be introduced as part of this FLIP) > > 2. No need of *supportsNestedFilters *check as the current > > *SupportsFilterPushDown* should already ignore unknown expressions ( > > *NestedFieldReferenceExpression* for example) and return them as > > *remainingFilters. > > *Maybe this should be clarified explicitly in the Javadoc of > > *SupportsFilterPushDown. > > *I will file a separate JIRA to fix the documentation. > > 3. Refactor *SupportsProjectionPushDown* to use *ReferenceExpression > > *instead > > of existing 2-d arrays to consolidate and be consistent with other > > Supports*PushDown APIs - *outside the scope of this FLIP* > > 4. Similarly *SupportsAggregatePushDown* should also be evolved whenever > > nested fields support is added to use the *ReferenceExpression - **outside > > the scope of this FLIP* > > > > Does this sound good? Please let me know if I have missed anything here. If > > there are no concerns, I will start a vote tomorrow. I will also get the > > FLIP-356 wiki updated. Thanks everyone once again! > > > > Regards > > Venkata krishnan > > > > > > On Thu, Aug 24, 2023 at 8:19 PM Becket Qin <becket....@gmail.com> wrote: > > > > > Hi Jark, > > > > > > How about having a separate NestedFieldReferenceExpression, and > > > > abstracting a common base class "ReferenceExpression" for > > > > NestedFieldReferenceExpression and FieldReferenceExpression? This makes > > > > unifying expressions in > > > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > > > > ...)" > > > > possible. > > > > > > > > > I'd be fine with this. It at least provides a consistent API style / > > > formality. > > > > > > Re: Yunhong, > > > > > > 3. Finally, I think we need to look at the costs and benefits of unifying > > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or others) > > > from > > > > the perspective of interface implementers. A stable API can reduce user > > > > development and change costs, if the current API can fully meet the > > > > functional requirements at the framework level, I personal suggest > > > reducing > > > > the impact on connector developers. > > > > > > > > > > I agree that the cost and benefit should be measured. And the measurement > > > should be in the long term instead of short term. That is why we always > > > need to align on the ideal end state first. > > > Meeting functionality requirements is the bare minimum bar for an API. > > > Simplicity, intuitiveness, robustness and evolvability are also important. > > > In addition, for projects with many APIs, such as Flink, a consistent API > > > style is also critical for the user adoption as well as bug avoidance. It > > > is very helpful for the community to agree on some API design conventions > > > / > > > principles. > > > For example, in this particular case, via our discussion, hopefully we > > > sort > > > of established the following API design conventions / principles for all > > > the Supports*PushDown interfaces. > > > > > > 1. By default, expressions should be used if applicable instead of other > > > representations. > > > 2. In general, the pushdown method should not assume all the pushdowns > > > will > > > succeed. So the applyX() method should return a boolean or List<X>, to > > > handle the cases that some of the pushdowns cannot be fulfilled by the > > > implementation. > > > > > > Establishing such conventions and principles demands careful thinking for > > > the aspects I mentioned earlier in addition to the API functionalities. > > > This helps lower the bar of understanding, reduces the chance of having > > > loose ends in the API, and will benefit all the participants in the > > > project > > > over time. I think this is the right way to achieve real API stability. > > > Otherwise, we may end up chasing our tails to find ways not to change the > > > existing non-ideal APIs. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Fri, Aug 25, 2023 at 9:33 AM yh z <zhengyunhon...@gmail.com> wrote: > > > > > > > Hi, Venkat, > > > > > > > > Thanks for the FLIP, it sounds good to support nested fields filter > > > > pushdown. Based on the design of flip and the above options, I would > > > > like > > > > to make a few suggestions: > > > > > > > > 1. At present, introducing NestedFieldReferenceExpression looks like a > > > > better solution, which can fully meet our requirements while reducing > > > > modifications to base class FieldReferenceExpression. In the long run, I > > > > tend to abstract a basic class for NestedFieldReferenceExpression and > > > > FieldReferenceExpression as u suggested. > > > > > > > > 2. Personally, I don't recommend introducing *supportsNestedFilters() in > > > > supportsFilterPushdown. We just need to better declare the return value > > > of > > > > the method *applyFilters. > > > > > > > > 3. Finally, I think we need to look at the costs and benefits of > > > > unifying > > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or others) > > > from > > > > the perspective of interface implementers. A stable API can reduce user > > > > development and change costs, if the current API can fully meet the > > > > functional requirements at the framework level, I personal suggest > > > reducing > > > > the impact on connector developers. > > > > > > > > Regards, > > > > Yunhong Zheng (Swuferhong) > > > > > > > > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月25日周五 01:25写道: > > > > > > > > > To keep it backwards compatible, introduce another API > > > > > *applyAggregates > > > > > *with > > > > > *List<ReferenceExpression> *when nested field support is added and > > > > > deprecate the current API. This will by default throw an exception. In > > > > > flink planner, *applyAggregates *with nested fields and if it throws > > > > > exception then *applyAggregates* without nested fields. > > > > > > > > > > Regards > > > > > Venkata krishnan > > > > > > > > > > > > > > > On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan < > > > > > vsowr...@asu.edu> wrote: > > > > > > > > > > > Jark, > > > > > > > > > > > > How about having a separate NestedFieldReferenceExpression, and > > > > > >> abstracting a common base class "ReferenceExpression" for > > > > > >> NestedFieldReferenceExpression and FieldReferenceExpression? This > > > > makes > > > > > >> unifying expressions in > > > > > >> > > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > > > > > >> ...)" > > > > > >> possible. > > > > > > > > > > > > This should be fine for *SupportsProjectionPushDown* and > > > > > > *SupportsFilterPushDown*. One concern in the case of > > > > > > *SupportsAggregatePushDown* with nested fields support (to be added > > > in > > > > > > the future), with this proposal, the API will become backwards > > > > > incompatible > > > > > > as the *args *for the aggregate function is > > > > > *List<FieldReferenceExpression> > > > > > > *that needs to change to *List<ReferenceExpression>*. > > > > > > > > > > > > Regards > > > > > > Venkata krishnan > > > > > > > > > > > > > > > > > > On Thu, Aug 24, 2023 at 1:18 AM Jark Wu <imj...@gmail.com> wrote: > > > > > > > > > > > >> Hi Becket, > > > > > >> > > > > > >> I think it is the second case, that a FieldReferenceExpression is > > > > > >> constructed > > > > > >> by the framework and passed to the connector (interfaces listed by > > > > > >> Venkata[1] > > > > > >> and Catalog#listPartitionsByFilter). Besides, understanding the > > > nested > > > > > >> field > > > > > >> is optional for users/connectors (just treat it as an unknown > > > > expression > > > > > >> if > > > > > >> the > > > > > >> connector doesn't want to support it). > > > > > >> > > > > > >> If we extend FieldReferenceExpression, in the case of "where > > > > col.nested > > > > > > > > > > > >> 10", > > > > > >> for the connectors already supported filter/delete pushdown, they > > > may > > > > > >> wrongly > > > > > >> pushdown "col > 10" instead of "nested > 10" because they still > > > treat > > > > > >> FieldReferenceExpression as a top-level column. This problem can be > > > > > >> resolved > > > > > >> by introducing an additional "supportedNestedPushdown" for each > > > > > interface, > > > > > >> but that method is not elegant and is hard to remove in the future, > > > > and > > > > > >> this could > > > > > >> be avoided if we have a separate NestedFieldReferenceExpression. > > > > > >> > > > > > >> If we want to extend FieldReferenceExpression, we have to add > > > > > protections > > > > > >> for every related API in one shot. Besides, > > > > > >> FieldReferenceExpression > > > > is > > > > > a > > > > > >> fundamental class in the planner, we have to go through all the > > > > > >> code > > > > > that > > > > > >> is using it to make sure it properly handling it if it is a nested > > > > field > > > > > >> which > > > > > >> is a big effort for the community. > > > > > >> > > > > > >> If we were designing this API on day 1, I fully support merging > > > > > >> them > > > > in > > > > > a > > > > > >> FieldReferenceExpression. But in this case, I'm thinking about how > > > to > > > > > >> provide > > > > > >> users with a smooth migration path, and allow the community to > > > > gradually > > > > > >> put efforts into evolving the API, and not block the "Nested Fields > > > > > Filter > > > > > >> Pushdown" > > > > > >> requirement. > > > > > >> > > > > > >> How about having a separate NestedFieldReferenceExpression, and > > > > > >> abstracting a common base class "ReferenceExpression" for > > > > > >> NestedFieldReferenceExpression and FieldReferenceExpression? This > > > > makes > > > > > >> unifying expressions in > > > > > >> > > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > > > > > >> ...)" > > > > > >> possible. > > > > > >> > > > > > >> Best, > > > > > >> Jark > > > > > >> > > > > > >> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan < > > > > > >> vsowr...@asu.edu> > > > > > >> wrote: > > > > > >> > > > > > >> > Becket and Jark, > > > > > >> > > > > > > >> > Deprecate all the other > > > > > >> > > methods except tryApplyFilters() and tryApplyProjections(). > > > > > >> > > > > > > >> > For *SupportsProjectionPushDown*, we still need a > > > > > >> > *supportsNestedProjections* API on the table source as some of > > > > > >> > the > > > > > table > > > > > >> > sources might not be able to handle nested fields and therefore > > > the > > > > > >> Flink > > > > > >> > planner should not push down the nested projections or else the > > > > > >> > *applyProjection > > > > > >> > *API has to be appropriately changed to return > > > > > >> > *unconvertibleProjections *similar > > > > > >> > to *SupportsFilterPushDown*. > > > > > >> > > > > > > >> > Or we have to introduce two different applyProjections() > > > > > >> > > methods for FieldReferenceExpression / > > > > > NestedFieldReferenceExpression > > > > > >> > > respectively. > > > > > >> > > > > > > >> > Agree this is not preferred. Given that *supportNestedProjections > > > > > >> *cannot > > > > > >> > be deprecated/removed based on the current API form, extending > > > > > >> > *FieldReferenceExpression* to support nested fields should be > > > okay. > > > > > >> > > > > > > >> > Another alternative could be to change *applyProjections *to take > > > > > >> > List<ResolvedExpression> and on the connector side they choose to > > > > > handle > > > > > >> > *FieldReferenceExpression* and *NestedFieldReferenceExpression > > > > > >> > *as > > > > > >> > applicable and return the remainingProjections. In the case of > > > > nested > > > > > >> field > > > > > >> > projections not supported, it should return them back but only > > > > > >> projecting > > > > > >> > the top level fields. IMO, this is also *not preferred*. > > > > > >> > > > > > > >> > *SupportsAggregatePushDown* > > > > > >> > > > > > > >> > *AggregateExpression *currently takes in a list of > > > > > >> > *FieldReferenceExpression* as args for the aggregate function, if > > > in > > > > > >> future > > > > > >> > *SupportsAggregatePushDown* adds support for aggregate pushdown > > > > > >> > on > > > > > >> nested > > > > > >> > fields then the AggregateExpression API also has to change if a > > > new > > > > > >> > NestedFieldReferenceExpression is introduced for nested fields. > > > > > >> > > > > > > >> > If we add a > > > > > >> > > flag for each new filter, > > > > > >> > > the interface will be filled with lots of flags (e.g., > > > > > >> supportsBetween, > > > > > >> > > supportsIN) > > > > > >> > > > > > > >> > In an ideal situation, I completely agree with you. But in the > > > > current > > > > > >> > state, *supportsNestedFilters* can act as a bridge to reach the > > > > > eventual > > > > > >> > desired state which is to have a clean and consistent set of APIs > > > > > >> > throughout all Supports*PushDown. > > > > > >> > > > > > > >> > Also shared some thoughts on the end state API > > > > > >> > < > > > > > >> > > > > > > >> > > > > > > > > > > > > https://urldefense.com/v3/__https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing__;!!IKRxdwAv5BmarQ!ZZ2nS1PYlXLnEGFcikS3NsYG7tMaV3wU_z7FmvihNwQBmoLZk2WmcpuRWszK0FFmsInh9A6cndkJrQ$ > > > > > >> > > > > > > > >> > with extension to the *FieldReferenceExpression* to support > > > > > >> > nested > > > > > >> fields. > > > > > >> > Please take a look. > > > > > >> > > > > > > >> > Regards > > > > > >> > Venkata krishnan > > > > > >> > > > > > > >> > On Tue, Aug 22, 2023 at 5:02 PM Becket Qin <becket....@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > > >> > > Hi Jark, > > > > > >> > > > > > > > >> > > Regarding the migration path, it would be useful to scrutinize > > > the > > > > > use > > > > > >> > case > > > > > >> > > of FiledReferenceExpression and ResolvedExpressions. There are > > > two > > > > > >> kinds > > > > > >> > of > > > > > >> > > use cases: > > > > > >> > > > > > > > >> > > 1. A ResolvedExpression is constructed by the user or connector > > > / > > > > > >> plugin > > > > > >> > > developers. > > > > > >> > > 2. A ResolvedExpression is constructed by the framework and > > > passed > > > > > to > > > > > >> > user > > > > > >> > > or connector / plugin developers. > > > > > >> > > > > > > > >> > > For the first case, both of the approaches provide the same > > > > > migration > > > > > >> > > experience. > > > > > >> > > > > > > > >> > > For the second case, generally speaking, introducing > > > > > >> > > NestedFieldReferenceExpression and extending > > > > > FieldReferenceExpression > > > > > >> > would > > > > > >> > > have the same impact for backwards compatibility. > > > > > >> SupportsFilterPushDown > > > > > >> > is > > > > > >> > > a special case here because understanding the filter > > > > > >> > > expressions > > > > is > > > > > >> > > optional for the source implementation. In other use cases, if > > > > > >> > > understanding the reference to a nested field is a must have, > > > the > > > > > user > > > > > >> > code > > > > > >> > > has to be changed, regardless of which approach we take to > > > support > > > > > >> nested > > > > > >> > > fields. > > > > > >> > > > > > > > >> > > Therefore, I think we have to check each public API where the > > > > nested > > > > > >> > field > > > > > >> > > reference is exposed. If we have many public APIs where > > > > > understanding > > > > > >> > > nested fields is optional for the user / plugin / connector > > > > > >> developers, > > > > > >> > > having a separate NestedFieldReferenceExpression would have a > > > more > > > > > >> smooth > > > > > >> > > migration. Otherwise, there seems to be no difference between > > > the > > > > > two > > > > > >> > > approaches. > > > > > >> > > > > > > > >> > > Migration path aside, the main reason I prefer extending > > > > > >> > > FieldReferenceExpression over a new > > > NestedFieldReferenceExpression > > > > > is > > > > > >> > > because this makes the SupportsProjectionPushDown interface > > > > simpler. > > > > > >> > > Otherwise, we have to treat it as a special case that does not > > > > match > > > > > >> the > > > > > >> > > overall API style. Or we have to introduce two different > > > > > >> > applyProjections() > > > > > >> > > methods for FieldReferenceExpression / > > > > > NestedFieldReferenceExpression > > > > > >> > > respectively. This issue further extends to implementation in > > > > > >> addition to > > > > > >> > > public API. A single FieldReferenceExpression might help > > > simplify > > > > > the > > > > > >> > > implementation code a little bit. For example, in a recursive > > > > > >> processing > > > > > >> > of > > > > > >> > > a row with nested rows, we may not need to switch between > > > > > >> > > FieldReferenceExpression and NestedFieldReferenceExpression > > > > > depending > > > > > >> on > > > > > >> > > whether the record being processed is a top level record or > > > nested > > > > > >> > record. > > > > > >> > > > > > > > >> > > Thanks, > > > > > >> > > > > > > > >> > > Jiangjie (Becket) Qin > > > > > >> > > > > > > > >> > > > > > > > >> > > On Tue, Aug 22, 2023 at 11:43 PM Jark Wu <imj...@gmail.com> > > > > wrote: > > > > > >> > > > > > > > >> > > > Hi Becket, > > > > > >> > > > > > > > > >> > > > I totally agree we should try to have a consistent API for a > > > > final > > > > > >> > state. > > > > > >> > > > The only concern I have mentioned is the "smooth" migration > > > > path. > > > > > >> > > > The FiledReferenceExpression is widely used in many public > > > APIs, > > > > > >> > > > not only in the SupportsFilterPushDown. Yes, we can change > > > every > > > > > >> > > > methods in 2-steps, but is it good to change API back and > > > forth > > > > > for > > > > > >> > this? > > > > > >> > > > Personally, I'm fine with a separate > > > > > NestedFieldReferenceExpression > > > > > >> > > class. > > > > > >> > > > TBH, I prefer the separated way because it makes the > > > > > >> > > > reference > > > > > >> > expression > > > > > >> > > > more clear and concise. > > > > > >> > > > > > > > > >> > > > Best, > > > > > >> > > > Jark > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Tue, 22 Aug 2023 at 16:53, Becket Qin < > > > becket....@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > > > > >> > > > > Thanks for the reply, Jark. > > > > > >> > > > > > > > > > >> > > > > I think it will be helpful to understand the final state we > > > > want > > > > > >> to > > > > > >> > > > > eventually achieve first, then we can discuss the steps > > > > towards > > > > > >> that > > > > > >> > > > final > > > > > >> > > > > state. > > > > > >> > > > > > > > > > >> > > > > It looks like there are two proposed end states now: > > > > > >> > > > > > > > > > >> > > > > 1. Have a separate NestedFieldReferenceExpression class; > > > keep > > > > > >> > > > > SupportsFilterPushDown and SupportsProjectionPushDown the > > > > same. > > > > > >> It is > > > > > >> > > > just > > > > > >> > > > > a one step change. > > > > > >> > > > > - Regarding the supportsNestedFilterPushDown() method, > > > > > >> > > > > if > > > > our > > > > > >> > > contract > > > > > >> > > > > with the connector developer today is "The implementation > > > > should > > > > > >> > ignore > > > > > >> > > > > unrecognized expressions by putting them into the remaining > > > > > >> filters, > > > > > >> > > > > instead of throwing exceptions". Then there is no need for > > > > this > > > > > >> > > method. I > > > > > >> > > > > am not sure about the current contract. We should probably > > > > make > > > > > it > > > > > >> > > clear > > > > > >> > > > in > > > > > >> > > > > the interface Java doc. > > > > > >> > > > > > > > > > >> > > > > 2. Extend the existing FiledReferenceExpression class to > > > > support > > > > > >> > nested > > > > > >> > > > > fields; SupportsFilterPushDown only has one method of > > > > > >> > > > > applyFilters(List<ResolvedExpression>); > > > > > SupportsProjectionPushDown > > > > > >> > only > > > > > >> > > > has > > > > > >> > > > > one method of > > > applyProjections(List<FieldReferenceExpression>, > > > > > >> > > DataType). > > > > > >> > > > > It could just be two steps if we are not too obsessed with > > > the > > > > > >> exact > > > > > >> > > > names > > > > > >> > > > > of "applyFilters" and "applyProjections". More > > > > > >> > > > > specifically, > > > > it > > > > > >> takes > > > > > >> > > two > > > > > >> > > > > steps to achieve this final state: > > > > > >> > > > > a. introduce a new method > > > > > >> > tryApplyFilters(List<ResolvedExpression>) > > > > > >> > > > to > > > > > >> > > > > SupportsFilterPushDown, which may have > > > > FiledReferenceExpression > > > > > >> with > > > > > >> > > > nested > > > > > >> > > > > fields. The default implementation throws an exception. The > > > > > >> runtime > > > > > >> > > will > > > > > >> > > > > first call tryApplyFilters() with nested fields. In case of > > > > > >> > exception, > > > > > >> > > it > > > > > >> > > > > calls the existing applyFilters() without including the > > > nested > > > > > >> > filters. > > > > > >> > > > > Similarly, in SupportsProjectionPushDown, introduce a > > > > > >> > > > > tryApplyProjections<List<NestedFieldReference> method > > > > returning > > > > > a > > > > > >> > > Result. > > > > > >> > > > > The Result also contains the accepted and unapplicable > > > > > >> projections. > > > > > >> > The > > > > > >> > > > > default implementation also throws an exception. Deprecate > > > all > > > > > the > > > > > >> > > other > > > > > >> > > > > methods except tryApplyFilters() and tryApplyProjections(). > > > > > >> > > > > b. remove the deprecated methods in the next major > > > version > > > > > >> bump. > > > > > >> > > > > > > > > > >> > > > > Now the question is putting the migration steps aside, > > > > > >> > > > > which > > > > end > > > > > >> > state > > > > > >> > > do > > > > > >> > > > > we prefer? While the first end state is acceptable for me, > > > > > >> > personally, > > > > > >> > > I > > > > > >> > > > > prefer the latter if we are designing from scratch. It is > > > > clean, > > > > > >> > > > consistent > > > > > >> > > > > and intuitive. Given the size of Flink, keeping APIs in the > > > > same > > > > > >> > style > > > > > >> > > > over > > > > > >> > > > > time is important. The migration is also not that > > > complicated. > > > > > >> > > > > > > > > > >> > > > > Thanks, > > > > > >> > > > > > > > > > >> > > > > Jiangjie (Becket) Qin > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <imj...@gmail.com> > > > > > wrote: > > > > > >> > > > > > > > > > >> > > > > > Hi Venkat, > > > > > >> > > > > > > > > > > >> > > > > > Thanks for the proposal. > > > > > >> > > > > > > > > > > >> > > > > > I have some minor comments about the FLIP. > > > > > >> > > > > > > > > > > >> > > > > > 1. I think we don't need to > > > > > >> > > > > > add SupportsFilterPushDown#supportsNestedFilters() > > > > > >> > > > > > method, > > > > > >> > > > > > because connectors can skip nested filters by putting > > > > > >> > > > > > them > > > > in > > > > > >> > > > > > Result#remainingFilters(). > > > > > >> > > > > > And this is backward-compatible because unknown > > > expressions > > > > > were > > > > > >> > > added > > > > > >> > > > to > > > > > >> > > > > > the remaining filters. > > > > > >> > > > > > Planner should push predicate expressions as more as > > > > possible. > > > > > >> If > > > > > >> > we > > > > > >> > > > add > > > > > >> > > > > a > > > > > >> > > > > > flag for each new filter, > > > > > >> > > > > > the interface will be filled with lots of flags (e.g., > > > > > >> > > supportsBetween, > > > > > >> > > > > > supportsIN). > > > > > >> > > > > > > > > > > >> > > > > > 2. NestedFieldReferenceExpression#nestedFieldName should > > > be > > > > an > > > > > >> > array > > > > > >> > > of > > > > > >> > > > > > field names? > > > > > >> > > > > > Each string represents a field name part of the field > > > path. > > > > > Just > > > > > >> > keep > > > > > >> > > > > > aligning with `nestedFieldIndexArray`. > > > > > >> > > > > > > > > > > >> > > > > > 3. My concern about making FieldReferenceExpression > > > support > > > > > >> nested > > > > > >> > > > fields > > > > > >> > > > > > is the compatibility. > > > > > >> > > > > > It is a public API and users/connectors are already using > > > > it. > > > > > >> > People > > > > > >> > > > > > assumed it is a top-level column > > > > > >> > > > > > reference, and applied logic on it. But that's not true > > > now > > > > > and > > > > > >> > this > > > > > >> > > > may > > > > > >> > > > > > lead to unexpected errors. > > > > > >> > > > > > Having a separate NestedFieldReferenceExpression sounds > > > > safer > > > > > to > > > > > >> > me. > > > > > >> > > > > Mixing > > > > > >> > > > > > them in a class may > > > > > >> > > > > > confuse users what's the meaning of getFieldName() and > > > > > >> > > > getFieldIndex(). > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > Regarding using NestedFieldReferenceExpression in > > > > > >> > > > > > SupportsProjectionPushDown, do you > > > > > >> > > > > > have any concerns @Timo Walther <twal...@apache.org> ? > > > > > >> > > > > > > > > > > >> > > > > > Best, > > > > > >> > > > > > Jark > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan > > > > > >> > > > > > < > > > > > >> > > > > vsowr...@asu.edu > > > > > >> > > > > > > > > > > > >> > > > > > wrote: > > > > > >> > > > > > > > > > > >> > > > > > > Sounds like a great suggestion, Becket. +1. Agree with > > > > > >> cleaning > > > > > >> > up > > > > > >> > > > the > > > > > >> > > > > > APIs > > > > > >> > > > > > > and making it consistent in all the pushdown APIs. > > > > > >> > > > > > > > > > > > >> > > > > > > Your suggested approach seems fine to me, unless anyone > > > > else > > > > > >> has > > > > > >> > > any > > > > > >> > > > > > other > > > > > >> > > > > > > concerns. Just have couple of clarifying questions: > > > > > >> > > > > > > > > > > > >> > > > > > > 1. Do you think we should standardize the APIs across > > > all > > > > > the > > > > > >> > > > pushdown > > > > > >> > > > > > > supports like SupportsPartitionPushdown, > > > > > >> SupportsDynamicFiltering > > > > > >> > > etc > > > > > >> > > > > in > > > > > >> > > > > > > the end state? > > > > > >> > > > > > > > > > > > >> > > > > > > The current proposal works if we do not want to migrate > > > > > >> > > > > > > > SupportsFilterPushdown to also use > > > > > >> > NestedFieldReferenceExpression > > > > > >> > > > in > > > > > >> > > > > > the > > > > > >> > > > > > > > long term. > > > > > >> > > > > > > > > > > > > >> > > > > > > Did you mean *FieldReferenceExpression* instead of > > > > > >> > > > > > > *NestedFieldReferenceExpression*? > > > > > >> > > > > > > > > > > > >> > > > > > > 2. Extend the FieldReferenceExpression to support > > > > > >> > > > > > > nested > > > > > >> fields. > > > > > >> > > > > > > > - Change the index field type from int to int[]. > > > > > >> > > > > > > > > > > > >> > > > > > > - Add a new method int[] getFieldIndexArray(). > > > > > >> > > > > > > > - Deprecate the int getFieldIndex() method, the > > > code > > > > > >> will > > > > > >> > be > > > > > >> > > > > > removed > > > > > >> > > > > > > in > > > > > >> > > > > > > > the next major version bump. > > > > > >> > > > > > > > > > > > >> > > > > > > I assume getFieldIndex would return fieldIndexArray[0], > > > > > right? > > > > > >> > > > > > > > > > > > >> > > > > > > Thanks > > > > > >> > > > > > > Venkat > > > > > >> > > > > > > > > > > > >> > > > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin < > > > > > >> becket....@gmail.com > > > > > >> > > > > > > > >> > > > > wrote: > > > > > >> > > > > > > > > > > > >> > > > > > > > Thanks for the proposal, Venkata. > > > > > >> > > > > > > > > > > > > >> > > > > > > > The current proposal works if we do not want to > > > migrate > > > > > >> > > > > > > > SupportsFilterPushdown to also use > > > > > >> > NestedFieldReferenceExpression > > > > > >> > > > in > > > > > >> > > > > > the > > > > > >> > > > > > > > long term. > > > > > >> > > > > > > > > > > > > >> > > > > > > Did you mean *FieldReferenceExpression* instead of > > > > > >> > > > > > > *NestedFieldReferenceExpression*? > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > Otherwise, the alternative solution briefly mentioned > > > in > > > > > the > > > > > >> > > > rejected > > > > > >> > > > > > > > alternatives would be the following: > > > > > >> > > > > > > > Phase 1: > > > > > >> > > > > > > > 1. Introduce a supportsNestedFilters() method to the > > > > > >> > > > > > > SupportsFilterPushdown > > > > > >> > > > > > > > interface. (same as current proposal). > > > > > >> > > > > > > > 2. Extend the FieldReferenceExpression to support > > > nested > > > > > >> > fields. > > > > > >> > > > > > > > - Change the index field type from int to int[]. > > > > > >> > > > > > > > > > > > >> > > > > > > - Add a new method int[] getFieldIndexArray(). > > > > > >> > > > > > > > - Deprecate the int getFieldIndex() method, the > > > code > > > > > >> will > > > > > >> > be > > > > > >> > > > > > removed > > > > > >> > > > > > > in > > > > > >> > > > > > > > the next major version bump. > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > 3. In the SupportsProjectionPushDown interface > > > > > >> > > > > > > > - add a new method > > > > > >> > > > > applyProjection(List<FieldReferenceExpression>, > > > > > >> > > > > > > > DataType), with default implementation invoking > > > > > >> > > > > > applyProjection(int[][], > > > > > >> > > > > > > > DataType) > > > > > >> > > > > > > > - deprecate the current applyProjection(int[][], > > > > > >> DataType) > > > > > >> > > > method > > > > > >> > > > > > > > > > > > > >> > > > > > > > Phase 2 (in the next major version bump) > > > > > >> > > > > > > > 1. remove the deprecated methods. > > > > > >> > > > > > > > > > > > > >> > > > > > > > Phase 3 (optional) > > > > > >> > > > > > > > 1. deprecate and remove the supportsNestedFilters() / > > > > > >> > > > > > > > supportsNestedProjection() methods from the > > > > > >> > > SupportsFilterPushDown > > > > > >> > > > / > > > > > >> > > > > > > > SupportsProjectionPushDown interfaces. > > > > > >> > > > > > > > > > > > > >> > > > > > > > Personally I prefer this alternative. It takes longer > > > to > > > > > >> finish > > > > > >> > > the > > > > > >> > > > > > work, > > > > > >> > > > > > > > but the API eventually becomes clean and consistent. > > > > But I > > > > > >> can > > > > > >> > > live > > > > > >> > > > > > with > > > > > >> > > > > > > > the current proposal. > > > > > >> > > > > > > > > > > > > >> > > > > > > > Thanks, > > > > > >> > > > > > > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin > > > > > >> > > > > > > > > > > > > >> > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan > > > > > Sowrirajan > > > > > >> < > > > > > >> > > > > > > > vsowr...@asu.edu> wrote: > > > > > >> > > > > > > > > > > > > >> > > > > > > > > Gentle ping for reviews/feedback. > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan > > > > > Sowrirajan < > > > > > >> > > > > > > > vsowr...@asu.edu > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > wrote: > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > Hi All, > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > I am opening this thread to discuss FLIP-356: > > > > Support > > > > > >> > Nested > > > > > >> > > > > Fields > > > > > >> > > > > > > > > > Filter Pushdown. The FLIP can be found at > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$ > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > This FLIP adds support for pushing down nested > > > > fields > > > > > >> > filters > > > > > >> > > > to > > > > > >> > > > > > the > > > > > >> > > > > > > > > > underlying TableSource. In our data lake, we find > > > a > > > > > lot > > > > > >> of > > > > > >> > > > > datasets > > > > > >> > > > > > > > have > > > > > >> > > > > > > > > > nested fields and also user queries with filters > > > > > >> defined on > > > > > >> > > the > > > > > >> > > > > > > nested > > > > > >> > > > > > > > > > fields. This would drastically improve the > > > > performance > > > > > >> for > > > > > >> > > > those > > > > > >> > > > > > sets > > > > > >> > > > > > > > of > > > > > >> > > > > > > > > > queries. > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > Appreciate any comments or feedback you may have > > > on > > > > > this > > > > > >> > > > > proposal. > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > Regards > > > > > >> > > > > > > > > > Venkata krishnan > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > >