So if NestedFieldReferenceExpression doesn't need inputIndex, is there
a need to introduce a base class `ReferenceExpression`?

Best,
Jingsong

On Mon, Aug 28, 2023 at 2:09 PM Jingsong Li <jingsongl...@gmail.com> wrote:
>
> Hi thanks all for your discussion.
>
> What is inputIndex in NestedFieldReferenceExpression?
>
> I know inputIndex has special usage in FieldReferenceExpression, but
> it is only for Join operators, and it is only for SQL optimization. It
> looks like there is no requirement for Nested.
>
> Best,
> Jingsong
>
> On Mon, Aug 28, 2023 at 1:13 PM Venkatakrishnan Sowrirajan
> <vsowr...@asu.edu> wrote:
> >
> > Thanks for all the feedback and discussion everyone. Looks like we have
> > reached a consensus here.
> >
> > Just to summarize:
> >
> > 1. Introduce a new *ReferenceExpression* (or *BaseReferenceExpression*)
> > abstract class which will be extended by both *FieldReferenceExpression*
> > and *NestedFieldReferenceExpression* (to be introduced as part of this FLIP)
> > 2. No need of *supportsNestedFilters *check as the current
> > *SupportsFilterPushDown* should already ignore unknown expressions (
> > *NestedFieldReferenceExpression* for example) and return them as
> > *remainingFilters.
> > *Maybe this should be clarified explicitly in the Javadoc of
> > *SupportsFilterPushDown.
> > *I will file a separate JIRA to fix the documentation.
> > 3. Refactor *SupportsProjectionPushDown* to use *ReferenceExpression 
> > *instead
> > of existing 2-d arrays to consolidate and be consistent with other
> > Supports*PushDown APIs - *outside the scope of this FLIP*
> > 4. Similarly *SupportsAggregatePushDown* should also be evolved whenever
> > nested fields support is added to use the *ReferenceExpression - **outside
> > the scope of this FLIP*
> >
> > Does this sound good? Please let me know if I have missed anything here. If
> > there are no concerns, I will start a vote tomorrow. I will also get the
> > FLIP-356 wiki updated. Thanks everyone once again!
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Thu, Aug 24, 2023 at 8:19 PM Becket Qin <becket....@gmail.com> wrote:
> >
> > > Hi Jark,
> > >
> > > How about having a separate NestedFieldReferenceExpression, and
> > > > abstracting a common base class "ReferenceExpression" for
> > > > NestedFieldReferenceExpression and FieldReferenceExpression? This makes
> > > > unifying expressions in
> > > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression>
> > > > ...)"
> > > > possible.
> > >
> > >
> > > I'd be fine with this. It at least provides a consistent API style /
> > > formality.
> > >
> > >  Re: Yunhong,
> > >
> > > 3. Finally, I think we need to look at the costs and benefits of unifying
> > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or others)
> > > from
> > > > the perspective of interface implementers. A stable API can reduce user
> > > > development and change costs, if the current API can fully meet the
> > > > functional requirements at the framework level, I personal suggest
> > > reducing
> > > > the impact on connector developers.
> > > >
> > >
> > > I agree that the cost and benefit should be measured. And the measurement
> > > should be in the long term instead of short term. That is why we always
> > > need to align on the ideal end state first.
> > > Meeting functionality requirements is the bare minimum bar for an API.
> > > Simplicity, intuitiveness, robustness and evolvability are also important.
> > > In addition, for projects with many APIs, such as Flink, a consistent API
> > > style is also critical for the user adoption as well as bug avoidance. It
> > > is very helpful for the community to agree on some API design conventions 
> > > /
> > > principles.
> > > For example, in this particular case, via our discussion, hopefully we 
> > > sort
> > > of established the following API design conventions / principles for all
> > > the Supports*PushDown interfaces.
> > >
> > > 1. By default, expressions should be used if applicable instead of other
> > > representations.
> > > 2. In general, the pushdown method should not assume all the pushdowns 
> > > will
> > > succeed. So the applyX() method should return a boolean or List<X>, to
> > > handle the cases that some of the pushdowns cannot be fulfilled by the
> > > implementation.
> > >
> > > Establishing such conventions and principles demands careful thinking for
> > > the aspects I mentioned earlier in addition to the API functionalities.
> > > This helps lower the bar of understanding, reduces the chance of having
> > > loose ends in the API, and will benefit all the participants in the 
> > > project
> > > over time. I think this is the right way to achieve real API stability.
> > > Otherwise, we may end up chasing our tails to find ways not to change the
> > > existing non-ideal APIs.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Fri, Aug 25, 2023 at 9:33 AM yh z <zhengyunhon...@gmail.com> wrote:
> > >
> > > > Hi, Venkat,
> > > >
> > > > Thanks for the FLIP, it sounds good to support nested fields filter
> > > > pushdown. Based on the design of flip and the above options, I would 
> > > > like
> > > > to make a few suggestions:
> > > >
> > > > 1.  At present, introducing NestedFieldReferenceExpression looks like a
> > > > better solution, which can fully meet our requirements while reducing
> > > > modifications to base class FieldReferenceExpression. In the long run, I
> > > > tend to abstract a basic class for NestedFieldReferenceExpression and
> > > > FieldReferenceExpression as u suggested.
> > > >
> > > > 2. Personally, I don't recommend introducing *supportsNestedFilters() in
> > > > supportsFilterPushdown. We just need to better declare the return value
> > > of
> > > > the method *applyFilters.
> > > >
> > > > 3. Finally, I think we need to look at the costs and benefits of 
> > > > unifying
> > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or others)
> > > from
> > > > the perspective of interface implementers. A stable API can reduce user
> > > > development and change costs, if the current API can fully meet the
> > > > functional requirements at the framework level, I personal suggest
> > > reducing
> > > > the impact on connector developers.
> > > >
> > > > Regards,
> > > > Yunhong Zheng (Swuferhong)
> > > >
> > > >
> > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月25日周五 01:25写道:
> > > >
> > > > > To keep it backwards compatible, introduce another API 
> > > > > *applyAggregates
> > > > > *with
> > > > > *List<ReferenceExpression> *when nested field support is added and
> > > > > deprecate the current API. This will by default throw an exception. In
> > > > > flink planner, *applyAggregates *with nested fields and if it throws
> > > > > exception then *applyAggregates* without nested fields.
> > > > >
> > > > > Regards
> > > > > Venkata krishnan
> > > > >
> > > > >
> > > > > On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan <
> > > > > vsowr...@asu.edu> wrote:
> > > > >
> > > > > > Jark,
> > > > > >
> > > > > > How about having a separate NestedFieldReferenceExpression, and
> > > > > >> abstracting a common base class "ReferenceExpression" for
> > > > > >> NestedFieldReferenceExpression and FieldReferenceExpression? This
> > > > makes
> > > > > >> unifying expressions in
> > > > > >>
> > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression>
> > > > > >> ...)"
> > > > > >> possible.
> > > > > >
> > > > > > This should be fine for *SupportsProjectionPushDown* and
> > > > > > *SupportsFilterPushDown*. One concern in the case of
> > > > > > *SupportsAggregatePushDown* with nested fields support (to be added
> > > in
> > > > > > the future), with this proposal, the API will become backwards
> > > > > incompatible
> > > > > > as the *args *for the aggregate function is
> > > > > *List<FieldReferenceExpression>
> > > > > > *that needs to change to *List<ReferenceExpression>*.
> > > > > >
> > > > > > Regards
> > > > > > Venkata krishnan
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 24, 2023 at 1:18 AM Jark Wu <imj...@gmail.com> wrote:
> > > > > >
> > > > > >> Hi Becket,
> > > > > >>
> > > > > >> I think it is the second case, that a FieldReferenceExpression is
> > > > > >> constructed
> > > > > >> by the framework and passed to the connector (interfaces listed by
> > > > > >> Venkata[1]
> > > > > >> and Catalog#listPartitionsByFilter). Besides, understanding the
> > > nested
> > > > > >> field
> > > > > >> is optional for users/connectors (just treat it as an unknown
> > > > expression
> > > > > >> if
> > > > > >> the
> > > > > >> connector doesn't want to support it).
> > > > > >>
> > > > > >> If we extend FieldReferenceExpression, in the case of "where
> > > > col.nested
> > > > > >
> > > > > >> 10",
> > > > > >> for the connectors already supported filter/delete pushdown, they
> > > may
> > > > > >> wrongly
> > > > > >> pushdown "col > 10" instead of "nested > 10" because they still
> > > treat
> > > > > >> FieldReferenceExpression as a top-level column. This problem can be
> > > > > >> resolved
> > > > > >> by introducing an additional "supportedNestedPushdown" for each
> > > > > interface,
> > > > > >> but that method is not elegant and is hard to remove in the future,
> > > > and
> > > > > >> this could
> > > > > >> be avoided if we have a separate NestedFieldReferenceExpression.
> > > > > >>
> > > > > >> If we want to extend FieldReferenceExpression, we have to add
> > > > > protections
> > > > > >> for every related API in one shot. Besides, 
> > > > > >> FieldReferenceExpression
> > > > is
> > > > > a
> > > > > >> fundamental class in the planner, we have to go through all the 
> > > > > >> code
> > > > > that
> > > > > >> is using it to make sure it properly handling it if it is a nested
> > > > field
> > > > > >> which
> > > > > >> is a big effort for the community.
> > > > > >>
> > > > > >> If we were designing this API on day 1, I fully support merging 
> > > > > >> them
> > > > in
> > > > > a
> > > > > >> FieldReferenceExpression. But in this case, I'm thinking about how
> > > to
> > > > > >> provide
> > > > > >> users with a smooth migration path, and allow the community to
> > > > gradually
> > > > > >> put efforts into evolving the API, and not block the "Nested Fields
> > > > > Filter
> > > > > >> Pushdown"
> > > > > >> requirement.
> > > > > >>
> > > > > >> How about having a separate NestedFieldReferenceExpression, and
> > > > > >> abstracting a common base class "ReferenceExpression" for
> > > > > >> NestedFieldReferenceExpression and FieldReferenceExpression? This
> > > > makes
> > > > > >> unifying expressions in
> > > > > >>
> > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression>
> > > > > >> ...)"
> > > > > >> possible.
> > > > > >>
> > > > > >> Best,
> > > > > >> Jark
> > > > > >>
> > > > > >> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan <
> > > > > >> vsowr...@asu.edu>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Becket and Jark,
> > > > > >> >
> > > > > >> >  Deprecate all the other
> > > > > >> > > methods except tryApplyFilters() and tryApplyProjections().
> > > > > >> >
> > > > > >> > For *SupportsProjectionPushDown*, we still need a
> > > > > >> > *supportsNestedProjections* API on the table source as some of 
> > > > > >> > the
> > > > > table
> > > > > >> > sources might not be able to handle nested fields and therefore
> > > the
> > > > > >> Flink
> > > > > >> > planner should not push down the nested projections or else the
> > > > > >> > *applyProjection
> > > > > >> > *API has to be appropriately changed to return
> > > > > >> > *unconvertibleProjections *similar
> > > > > >> > to *SupportsFilterPushDown*.
> > > > > >> >
> > > > > >> > Or we have to introduce two different applyProjections()
> > > > > >> > > methods for FieldReferenceExpression /
> > > > > NestedFieldReferenceExpression
> > > > > >> > > respectively.
> > > > > >> >
> > > > > >> > Agree this is not preferred. Given that *supportNestedProjections
> > > > > >> *cannot
> > > > > >> > be deprecated/removed based on the current API form, extending
> > > > > >> > *FieldReferenceExpression* to support nested fields should be
> > > okay.
> > > > > >> >
> > > > > >> > Another alternative could be to change *applyProjections *to take
> > > > > >> > List<ResolvedExpression> and on the connector side they choose to
> > > > > handle
> > > > > >> > *FieldReferenceExpression* and *NestedFieldReferenceExpression 
> > > > > >> > *as
> > > > > >> > applicable and return the remainingProjections. In the case of
> > > > nested
> > > > > >> field
> > > > > >> > projections not supported, it should return them back but only
> > > > > >> projecting
> > > > > >> > the top level fields. IMO, this is also *not preferred*.
> > > > > >> >
> > > > > >> > *SupportsAggregatePushDown*
> > > > > >> >
> > > > > >> > *AggregateExpression *currently takes in a list of
> > > > > >> > *FieldReferenceExpression* as args for the aggregate function, if
> > > in
> > > > > >> future
> > > > > >> > *SupportsAggregatePushDown* adds support for aggregate pushdown 
> > > > > >> > on
> > > > > >> nested
> > > > > >> > fields then the AggregateExpression API also has to change if a
> > > new
> > > > > >> > NestedFieldReferenceExpression is introduced for nested fields.
> > > > > >> >
> > > > > >> > If we add a
> > > > > >> > > flag for each new filter,
> > > > > >> > > the interface will be filled with lots of flags (e.g.,
> > > > > >> supportsBetween,
> > > > > >> > > supportsIN)
> > > > > >> >
> > > > > >> > In an ideal situation, I completely agree with you. But in the
> > > > current
> > > > > >> > state, *supportsNestedFilters* can act as a bridge to reach the
> > > > > eventual
> > > > > >> > desired state which is to have a clean and consistent set of APIs
> > > > > >> > throughout all Supports*PushDown.
> > > > > >> >
> > > > > >> > Also shared some thoughts on the end state API
> > > > > >> > <
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > > https://urldefense.com/v3/__https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing__;!!IKRxdwAv5BmarQ!ZZ2nS1PYlXLnEGFcikS3NsYG7tMaV3wU_z7FmvihNwQBmoLZk2WmcpuRWszK0FFmsInh9A6cndkJrQ$
> > > > > >> > >
> > > > > >> > with extension to the *FieldReferenceExpression* to support 
> > > > > >> > nested
> > > > > >> fields.
> > > > > >> > Please take a look.
> > > > > >> >
> > > > > >> > Regards
> > > > > >> > Venkata krishnan
> > > > > >> >
> > > > > >> > On Tue, Aug 22, 2023 at 5:02 PM Becket Qin <becket....@gmail.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Hi Jark,
> > > > > >> > >
> > > > > >> > > Regarding the migration path, it would be useful to scrutinize
> > > the
> > > > > use
> > > > > >> > case
> > > > > >> > > of FiledReferenceExpression and ResolvedExpressions. There are
> > > two
> > > > > >> kinds
> > > > > >> > of
> > > > > >> > > use cases:
> > > > > >> > >
> > > > > >> > > 1. A ResolvedExpression is constructed by the user or connector
> > > /
> > > > > >> plugin
> > > > > >> > > developers.
> > > > > >> > > 2. A ResolvedExpression is constructed by the framework and
> > > passed
> > > > > to
> > > > > >> > user
> > > > > >> > > or connector / plugin developers.
> > > > > >> > >
> > > > > >> > > For the first case, both of the approaches provide the same
> > > > > migration
> > > > > >> > > experience.
> > > > > >> > >
> > > > > >> > > For the second case, generally speaking, introducing
> > > > > >> > > NestedFieldReferenceExpression and extending
> > > > > FieldReferenceExpression
> > > > > >> > would
> > > > > >> > > have the same impact for backwards compatibility.
> > > > > >> SupportsFilterPushDown
> > > > > >> > is
> > > > > >> > > a special case here because understanding the filter 
> > > > > >> > > expressions
> > > > is
> > > > > >> > > optional for the source implementation. In other use cases, if
> > > > > >> > > understanding the reference to a nested field is a must have,
> > > the
> > > > > user
> > > > > >> > code
> > > > > >> > > has to be changed, regardless of which approach we take to
> > > support
> > > > > >> nested
> > > > > >> > > fields.
> > > > > >> > >
> > > > > >> > > Therefore, I think we have to check each public API where the
> > > > nested
> > > > > >> > field
> > > > > >> > > reference is exposed. If we have many public APIs where
> > > > > understanding
> > > > > >> > > nested fields is optional for the user  / plugin / connector
> > > > > >> developers,
> > > > > >> > > having a separate NestedFieldReferenceExpression would have a
> > > more
> > > > > >> smooth
> > > > > >> > > migration. Otherwise, there seems to be no difference between
> > > the
> > > > > two
> > > > > >> > > approaches.
> > > > > >> > >
> > > > > >> > > Migration path aside, the main reason I prefer extending
> > > > > >> > > FieldReferenceExpression over a new
> > > NestedFieldReferenceExpression
> > > > > is
> > > > > >> > > because this makes the SupportsProjectionPushDown interface
> > > > simpler.
> > > > > >> > > Otherwise, we have to treat it as a special case that does not
> > > > match
> > > > > >> the
> > > > > >> > > overall API style. Or we have to introduce two different
> > > > > >> > applyProjections()
> > > > > >> > > methods for FieldReferenceExpression /
> > > > > NestedFieldReferenceExpression
> > > > > >> > > respectively. This issue further extends to implementation in
> > > > > >> addition to
> > > > > >> > > public API. A single FieldReferenceExpression might help
> > > simplify
> > > > > the
> > > > > >> > > implementation code a little bit. For example, in a recursive
> > > > > >> processing
> > > > > >> > of
> > > > > >> > > a row with nested rows, we may not need to switch between
> > > > > >> > > FieldReferenceExpression and NestedFieldReferenceExpression
> > > > > depending
> > > > > >> on
> > > > > >> > > whether the record being processed is a top level record or
> > > nested
> > > > > >> > record.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > Jiangjie (Becket) Qin
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Tue, Aug 22, 2023 at 11:43 PM Jark Wu <imj...@gmail.com>
> > > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Becket,
> > > > > >> > > >
> > > > > >> > > > I totally agree we should try to have a consistent API for a
> > > > final
> > > > > >> > state.
> > > > > >> > > > The only concern I have mentioned is the "smooth" migration
> > > > path.
> > > > > >> > > > The FiledReferenceExpression is widely used in many public
> > > APIs,
> > > > > >> > > > not only in the SupportsFilterPushDown. Yes, we can change
> > > every
> > > > > >> > > > methods in 2-steps, but is it good to change API back and
> > > forth
> > > > > for
> > > > > >> > this?
> > > > > >> > > > Personally, I'm fine with a separate
> > > > > NestedFieldReferenceExpression
> > > > > >> > > class.
> > > > > >> > > > TBH, I prefer the separated way because it makes the 
> > > > > >> > > > reference
> > > > > >> > expression
> > > > > >> > > > more clear and concise.
> > > > > >> > > >
> > > > > >> > > > Best,
> > > > > >> > > > Jark
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Tue, 22 Aug 2023 at 16:53, Becket Qin <
> > > becket....@gmail.com>
> > > > > >> wrote:
> > > > > >> > > >
> > > > > >> > > > > Thanks for the reply, Jark.
> > > > > >> > > > >
> > > > > >> > > > > I think it will be helpful to understand the final state we
> > > > want
> > > > > >> to
> > > > > >> > > > > eventually achieve first, then we can discuss the steps
> > > > towards
> > > > > >> that
> > > > > >> > > > final
> > > > > >> > > > > state.
> > > > > >> > > > >
> > > > > >> > > > > It looks like there are two proposed end states now:
> > > > > >> > > > >
> > > > > >> > > > > 1. Have a separate NestedFieldReferenceExpression class;
> > > keep
> > > > > >> > > > > SupportsFilterPushDown and SupportsProjectionPushDown the
> > > > same.
> > > > > >> It is
> > > > > >> > > > just
> > > > > >> > > > > a one step change.
> > > > > >> > > > >    - Regarding the supportsNestedFilterPushDown() method, 
> > > > > >> > > > > if
> > > > our
> > > > > >> > > contract
> > > > > >> > > > > with the connector developer today is "The implementation
> > > > should
> > > > > >> > ignore
> > > > > >> > > > > unrecognized expressions by putting them into the remaining
> > > > > >> filters,
> > > > > >> > > > > instead of throwing exceptions". Then there is no need for
> > > > this
> > > > > >> > > method. I
> > > > > >> > > > > am not sure about the current contract. We should probably
> > > > make
> > > > > it
> > > > > >> > > clear
> > > > > >> > > > in
> > > > > >> > > > > the interface Java doc.
> > > > > >> > > > >
> > > > > >> > > > > 2. Extend the existing FiledReferenceExpression class to
> > > > support
> > > > > >> > nested
> > > > > >> > > > > fields; SupportsFilterPushDown only has one method of
> > > > > >> > > > > applyFilters(List<ResolvedExpression>);
> > > > > SupportsProjectionPushDown
> > > > > >> > only
> > > > > >> > > > has
> > > > > >> > > > > one method of
> > > applyProjections(List<FieldReferenceExpression>,
> > > > > >> > > DataType).
> > > > > >> > > > > It could just be two steps if we are not too obsessed with
> > > the
> > > > > >> exact
> > > > > >> > > > names
> > > > > >> > > > > of "applyFilters" and "applyProjections". More 
> > > > > >> > > > > specifically,
> > > > it
> > > > > >> takes
> > > > > >> > > two
> > > > > >> > > > > steps to achieve this final state:
> > > > > >> > > > >     a. introduce a new method
> > > > > >> > tryApplyFilters(List<ResolvedExpression>)
> > > > > >> > > > to
> > > > > >> > > > > SupportsFilterPushDown, which may have
> > > > FiledReferenceExpression
> > > > > >> with
> > > > > >> > > > nested
> > > > > >> > > > > fields. The default implementation throws an exception. The
> > > > > >> runtime
> > > > > >> > > will
> > > > > >> > > > > first call tryApplyFilters() with nested fields. In case of
> > > > > >> > exception,
> > > > > >> > > it
> > > > > >> > > > > calls the existing applyFilters() without including the
> > > nested
> > > > > >> > filters.
> > > > > >> > > > > Similarly, in SupportsProjectionPushDown, introduce a
> > > > > >> > > > > tryApplyProjections<List<NestedFieldReference> method
> > > > returning
> > > > > a
> > > > > >> > > Result.
> > > > > >> > > > > The Result also contains the accepted and unapplicable
> > > > > >> projections.
> > > > > >> > The
> > > > > >> > > > > default implementation also throws an exception. Deprecate
> > > all
> > > > > the
> > > > > >> > > other
> > > > > >> > > > > methods except tryApplyFilters() and tryApplyProjections().
> > > > > >> > > > >     b. remove the deprecated methods in the next major
> > > version
> > > > > >> bump.
> > > > > >> > > > >
> > > > > >> > > > > Now the question is putting the migration steps aside, 
> > > > > >> > > > > which
> > > > end
> > > > > >> > state
> > > > > >> > > do
> > > > > >> > > > > we prefer? While the first end state is acceptable for me,
> > > > > >> > personally,
> > > > > >> > > I
> > > > > >> > > > > prefer the latter if we are designing from scratch. It is
> > > > clean,
> > > > > >> > > > consistent
> > > > > >> > > > > and intuitive. Given the size of Flink, keeping APIs in the
> > > > same
> > > > > >> > style
> > > > > >> > > > over
> > > > > >> > > > > time is important. The migration is also not that
> > > complicated.
> > > > > >> > > > >
> > > > > >> > > > > Thanks,
> > > > > >> > > > >
> > > > > >> > > > > Jiangjie (Becket) Qin
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <imj...@gmail.com>
> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hi Venkat,
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks for the proposal.
> > > > > >> > > > > >
> > > > > >> > > > > > I have some minor comments about the FLIP.
> > > > > >> > > > > >
> > > > > >> > > > > > 1. I think we don't need to
> > > > > >> > > > > > add SupportsFilterPushDown#supportsNestedFilters() 
> > > > > >> > > > > > method,
> > > > > >> > > > > > because connectors can skip nested filters by putting 
> > > > > >> > > > > > them
> > > > in
> > > > > >> > > > > > Result#remainingFilters().
> > > > > >> > > > > > And this is backward-compatible because unknown
> > > expressions
> > > > > were
> > > > > >> > > added
> > > > > >> > > > to
> > > > > >> > > > > > the remaining filters.
> > > > > >> > > > > > Planner should push predicate expressions as more as
> > > > possible.
> > > > > >> If
> > > > > >> > we
> > > > > >> > > > add
> > > > > >> > > > > a
> > > > > >> > > > > > flag for each new filter,
> > > > > >> > > > > > the interface will be filled with lots of flags (e.g.,
> > > > > >> > > supportsBetween,
> > > > > >> > > > > > supportsIN).
> > > > > >> > > > > >
> > > > > >> > > > > > 2. NestedFieldReferenceExpression#nestedFieldName should
> > > be
> > > > an
> > > > > >> > array
> > > > > >> > > of
> > > > > >> > > > > > field names?
> > > > > >> > > > > > Each string represents a field name part of the field
> > > path.
> > > > > Just
> > > > > >> > keep
> > > > > >> > > > > > aligning with `nestedFieldIndexArray`.
> > > > > >> > > > > >
> > > > > >> > > > > > 3. My concern about making FieldReferenceExpression
> > > support
> > > > > >> nested
> > > > > >> > > > fields
> > > > > >> > > > > > is the compatibility.
> > > > > >> > > > > > It is a public API and users/connectors are already using
> > > > it.
> > > > > >> > People
> > > > > >> > > > > > assumed it is a top-level column
> > > > > >> > > > > > reference, and applied logic on it. But that's not true
> > > now
> > > > > and
> > > > > >> > this
> > > > > >> > > > may
> > > > > >> > > > > > lead to unexpected errors.
> > > > > >> > > > > > Having a separate NestedFieldReferenceExpression sounds
> > > > safer
> > > > > to
> > > > > >> > me.
> > > > > >> > > > > Mixing
> > > > > >> > > > > > them in a class may
> > > > > >> > > > > >  confuse users what's the meaning of getFieldName() and
> > > > > >> > > > getFieldIndex().
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > Regarding using NestedFieldReferenceExpression in
> > > > > >> > > > > > SupportsProjectionPushDown, do you
> > > > > >> > > > > > have any concerns @Timo Walther <twal...@apache.org> ?
> > > > > >> > > > > >
> > > > > >> > > > > > Best,
> > > > > >> > > > > > Jark
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan 
> > > > > >> > > > > > <
> > > > > >> > > > > vsowr...@asu.edu
> > > > > >> > > > > > >
> > > > > >> > > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > Sounds like a great suggestion, Becket. +1. Agree with
> > > > > >> cleaning
> > > > > >> > up
> > > > > >> > > > the
> > > > > >> > > > > > APIs
> > > > > >> > > > > > > and making it consistent in all the pushdown APIs.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Your suggested approach seems fine to me, unless anyone
> > > > else
> > > > > >> has
> > > > > >> > > any
> > > > > >> > > > > > other
> > > > > >> > > > > > > concerns. Just have couple of clarifying questions:
> > > > > >> > > > > > >
> > > > > >> > > > > > > 1. Do you think we should standardize the APIs across
> > > all
> > > > > the
> > > > > >> > > > pushdown
> > > > > >> > > > > > > supports like SupportsPartitionPushdown,
> > > > > >> SupportsDynamicFiltering
> > > > > >> > > etc
> > > > > >> > > > > in
> > > > > >> > > > > > > the end state?
> > > > > >> > > > > > >
> > > > > >> > > > > > > The current proposal works if we do not want to migrate
> > > > > >> > > > > > > > SupportsFilterPushdown to also use
> > > > > >> > NestedFieldReferenceExpression
> > > > > >> > > > in
> > > > > >> > > > > > the
> > > > > >> > > > > > > > long term.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > Did you mean *FieldReferenceExpression* instead of
> > > > > >> > > > > > > *NestedFieldReferenceExpression*?
> > > > > >> > > > > > >
> > > > > >> > > > > > > 2. Extend the FieldReferenceExpression to support 
> > > > > >> > > > > > > nested
> > > > > >> fields.
> > > > > >> > > > > > > >     - Change the index field type from int to int[].
> > > > > >> > > > > > >
> > > > > >> > > > > > >     - Add a new method int[] getFieldIndexArray().
> > > > > >> > > > > > > >     - Deprecate the int getFieldIndex() method, the
> > > code
> > > > > >> will
> > > > > >> > be
> > > > > >> > > > > > removed
> > > > > >> > > > > > > in
> > > > > >> > > > > > > > the next major version bump.
> > > > > >> > > > > > >
> > > > > >> > > > > > > I assume getFieldIndex would return fieldIndexArray[0],
> > > > > right?
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thanks
> > > > > >> > > > > > > Venkat
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin <
> > > > > >> becket....@gmail.com
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Thanks for the proposal, Venkata.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > The current proposal works if we do not want to
> > > migrate
> > > > > >> > > > > > > > SupportsFilterPushdown to also use
> > > > > >> > NestedFieldReferenceExpression
> > > > > >> > > > in
> > > > > >> > > > > > the
> > > > > >> > > > > > > > long term.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > Did you mean *FieldReferenceExpression* instead of
> > > > > >> > > > > > > *NestedFieldReferenceExpression*?
> > > > > >> > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Otherwise, the alternative solution briefly mentioned
> > > in
> > > > > the
> > > > > >> > > > rejected
> > > > > >> > > > > > > > alternatives would be the following:
> > > > > >> > > > > > > > Phase 1:
> > > > > >> > > > > > > > 1. Introduce a supportsNestedFilters() method to the
> > > > > >> > > > > > > SupportsFilterPushdown
> > > > > >> > > > > > > > interface. (same as current proposal).
> > > > > >> > > > > > > > 2. Extend the FieldReferenceExpression to support
> > > nested
> > > > > >> > fields.
> > > > > >> > > > > > > >     - Change the index field type from int to int[].
> > > > > >> > > > > > >
> > > > > >> > > > > > >     - Add a new method int[] getFieldIndexArray().
> > > > > >> > > > > > > >     - Deprecate the int getFieldIndex() method, the
> > > code
> > > > > >> will
> > > > > >> > be
> > > > > >> > > > > > removed
> > > > > >> > > > > > > in
> > > > > >> > > > > > > > the next major version bump.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > 3. In the SupportsProjectionPushDown interface
> > > > > >> > > > > > > >     - add a new method
> > > > > >> > > > > applyProjection(List<FieldReferenceExpression>,
> > > > > >> > > > > > > > DataType), with default implementation invoking
> > > > > >> > > > > > applyProjection(int[][],
> > > > > >> > > > > > > > DataType)
> > > > > >> > > > > > > >     - deprecate the current applyProjection(int[][],
> > > > > >> DataType)
> > > > > >> > > > method
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Phase 2 (in the next major version bump)
> > > > > >> > > > > > > > 1. remove the deprecated methods.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Phase 3 (optional)
> > > > > >> > > > > > > > 1. deprecate and remove the supportsNestedFilters() /
> > > > > >> > > > > > > > supportsNestedProjection() methods from the
> > > > > >> > > SupportsFilterPushDown
> > > > > >> > > > /
> > > > > >> > > > > > > > SupportsProjectionPushDown interfaces.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Personally I prefer this alternative. It takes longer
> > > to
> > > > > >> finish
> > > > > >> > > the
> > > > > >> > > > > > work,
> > > > > >> > > > > > > > but the API eventually becomes clean and consistent.
> > > > But I
> > > > > >> can
> > > > > >> > > live
> > > > > >> > > > > > with
> > > > > >> > > > > > > > the current proposal.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thanks,
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan
> > > > > Sowrirajan
> > > > > >> <
> > > > > >> > > > > > > > vsowr...@asu.edu> wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Gentle ping for reviews/feedback.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan
> > > > > Sowrirajan <
> > > > > >> > > > > > > > vsowr...@asu.edu
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > Hi All,
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > I am opening this thread to discuss FLIP-356:
> > > > Support
> > > > > >> > Nested
> > > > > >> > > > > Fields
> > > > > >> > > > > > > > > > Filter Pushdown. The FLIP can be found at
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > This FLIP adds support for pushing down nested
> > > > fields
> > > > > >> > filters
> > > > > >> > > > to
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > > underlying TableSource. In our data lake, we find
> > > a
> > > > > lot
> > > > > >> of
> > > > > >> > > > > datasets
> > > > > >> > > > > > > > have
> > > > > >> > > > > > > > > > nested fields and also user queries with filters
> > > > > >> defined on
> > > > > >> > > the
> > > > > >> > > > > > > nested
> > > > > >> > > > > > > > > > fields. This would drastically improve the
> > > > performance
> > > > > >> for
> > > > > >> > > > those
> > > > > >> > > > > > sets
> > > > > >> > > > > > > > of
> > > > > >> > > > > > > > > > queries.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Appreciate any comments or feedback you may have
> > > on
> > > > > this
> > > > > >> > > > > proposal.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Regards
> > > > > >> > > > > > > > > > Venkata krishnan
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >

Reply via email to