Thanks for the reply, Jark.

I think it will be helpful to understand the final state we want to
eventually achieve first, then we can discuss the steps towards that final
state.

It looks like there are two proposed end states now:

1. Have a separate NestedFieldReferenceExpression class; keep
SupportsFilterPushDown and SupportsProjectionPushDown the same. It is just
a one step change.
   - Regarding the supportsNestedFilterPushDown() method, if our contract
with the connector developer today is "The implementation should ignore
unrecognized expressions by putting them into the remaining filters,
instead of throwing exceptions". Then there is no need for this method. I
am not sure about the current contract. We should probably make it clear in
the interface Java doc.

2. Extend the existing FiledReferenceExpression class to support nested
fields; SupportsFilterPushDown only has one method of
applyFilters(List<ResolvedExpression>); SupportsProjectionPushDown only has
one method of applyProjections(List<FieldReferenceExpression>, DataType).
It could just be two steps if we are not too obsessed with the exact names
of "applyFilters" and "applyProjections". More specifically, it takes two
steps to achieve this final state:
    a. introduce a new method tryApplyFilters(List<ResolvedExpression>) to
SupportsFilterPushDown, which may have FiledReferenceExpression with nested
fields. The default implementation throws an exception. The runtime will
first call tryApplyFilters() with nested fields. In case of exception, it
calls the existing applyFilters() without including the nested filters.
Similarly, in SupportsProjectionPushDown, introduce a
tryApplyProjections<List<NestedFieldReference> method returning a Result.
The Result also contains the accepted and unapplicable projections. The
default implementation also throws an exception. Deprecate all the other
methods except tryApplyFilters() and tryApplyProjections().
    b. remove the deprecated methods in the next major version bump.

Now the question is putting the migration steps aside, which end state do
we prefer? While the first end state is acceptable for me, personally, I
prefer the latter if we are designing from scratch. It is clean, consistent
and intuitive. Given the size of Flink, keeping APIs in the same style over
time is important. The migration is also not that complicated.

Thanks,

Jiangjie (Becket) Qin


On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <imj...@gmail.com> wrote:

> Hi Venkat,
>
> Thanks for the proposal.
>
> I have some minor comments about the FLIP.
>
> 1. I think we don't need to
> add SupportsFilterPushDown#supportsNestedFilters() method,
> because connectors can skip nested filters by putting them in
> Result#remainingFilters().
> And this is backward-compatible because unknown expressions were added to
> the remaining filters.
> Planner should push predicate expressions as more as possible. If we add a
> flag for each new filter,
> the interface will be filled with lots of flags (e.g., supportsBetween,
> supportsIN).
>
> 2. NestedFieldReferenceExpression#nestedFieldName should be an array of
> field names?
> Each string represents a field name part of the field path. Just keep
> aligning with `nestedFieldIndexArray`.
>
> 3. My concern about making FieldReferenceExpression support nested fields
> is the compatibility.
> It is a public API and users/connectors are already using it. People
> assumed it is a top-level column
> reference, and applied logic on it. But that's not true now and this may
> lead to unexpected errors.
> Having a separate NestedFieldReferenceExpression sounds safer to me. Mixing
> them in a class may
>  confuse users what's the meaning of getFieldName() and getFieldIndex().
>
>
> Regarding using NestedFieldReferenceExpression in
> SupportsProjectionPushDown, do you
> have any concerns @Timo Walther <twal...@apache.org> ?
>
> Best,
> Jark
>
>
>
> On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan <vsowr...@asu.edu
> >
> wrote:
>
> > Sounds like a great suggestion, Becket. +1. Agree with cleaning up the
> APIs
> > and making it consistent in all the pushdown APIs.
> >
> > Your suggested approach seems fine to me, unless anyone else has any
> other
> > concerns. Just have couple of clarifying questions:
> >
> > 1. Do you think we should standardize the APIs across all the pushdown
> > supports like SupportsPartitionPushdown, SupportsDynamicFiltering etc in
> > the end state?
> >
> > The current proposal works if we do not want to migrate
> > > SupportsFilterPushdown to also use NestedFieldReferenceExpression in
> the
> > > long term.
> > >
> > Did you mean *FieldReferenceExpression* instead of
> > *NestedFieldReferenceExpression*?
> >
> > 2. Extend the FieldReferenceExpression to support nested fields.
> > >     - Change the index field type from int to int[].
> >
> >     - Add a new method int[] getFieldIndexArray().
> > >     - Deprecate the int getFieldIndex() method, the code will be
> removed
> > in
> > > the next major version bump.
> >
> > I assume getFieldIndex would return fieldIndexArray[0], right?
> >
> > Thanks
> > Venkat
> >
> > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin <becket....@gmail.com> wrote:
> >
> > > Thanks for the proposal, Venkata.
> > >
> > > The current proposal works if we do not want to migrate
> > > SupportsFilterPushdown to also use NestedFieldReferenceExpression in
> the
> > > long term.
> > >
> > Did you mean *FieldReferenceExpression* instead of
> > *NestedFieldReferenceExpression*?
> >
> > >
> > > Otherwise, the alternative solution briefly mentioned in the rejected
> > > alternatives would be the following:
> > > Phase 1:
> > > 1. Introduce a supportsNestedFilters() method to the
> > SupportsFilterPushdown
> > > interface. (same as current proposal).
> > > 2. Extend the FieldReferenceExpression to support nested fields.
> > >     - Change the index field type from int to int[].
> >
> >     - Add a new method int[] getFieldIndexArray().
> > >     - Deprecate the int getFieldIndex() method, the code will be
> removed
> > in
> > > the next major version bump.
> >
> >
> >
> > 3. In the SupportsProjectionPushDown interface
> > >     - add a new method applyProjection(List<FieldReferenceExpression>,
> > > DataType), with default implementation invoking
> applyProjection(int[][],
> > > DataType)
> > >     - deprecate the current applyProjection(int[][], DataType) method
> > >
> > > Phase 2 (in the next major version bump)
> > > 1. remove the deprecated methods.
> > >
> > > Phase 3 (optional)
> > > 1. deprecate and remove the supportsNestedFilters() /
> > > supportsNestedProjection() methods from the SupportsFilterPushDown /
> > > SupportsProjectionPushDown interfaces.
> > >
> > > Personally I prefer this alternative. It takes longer to finish the
> work,
> > > but the API eventually becomes clean and consistent. But I can live
> with
> > > the current proposal.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan Sowrirajan <
> > > vsowr...@asu.edu> wrote:
> > >
> > > > Gentle ping for reviews/feedback.
> > > >
> > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan Sowrirajan <
> > > vsowr...@asu.edu
> > > > >
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I am opening this thread to discuss FLIP-356: Support Nested Fields
> > > > > Filter Pushdown. The FLIP can be found at
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$
> > > > >
> > > > > This FLIP adds support for pushing down nested fields filters to
> the
> > > > > underlying TableSource. In our data lake, we find a lot of datasets
> > > have
> > > > > nested fields and also user queries with filters defined on the
> > nested
> > > > > fields. This would drastically improve the performance for those
> sets
> > > of
> > > > > queries.
> > > > >
> > > > > Appreciate any comments or feedback you may have on this proposal.
> > > > >
> > > > > Regards
> > > > > Venkata krishnan
> > > > >
> > > >
> > >
> >
>

Reply via email to