Gentle ping

On Wed, Aug 16, 2023, 11:56 AM Venkatakrishnan Sowrirajan <vsowr...@asu.edu>
wrote:

> Forgot to share the link -
> https://lists.apache.org/thread/686bhgwrrb4xmbfzlk60szwxos4z64t7 in the
> last email.
>
> Regards
> Venkata krishnan
>
>
> On Wed, Aug 16, 2023 at 11:55 AM Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu> wrote:
>
>> Btw, this is the FLIP proposal discussion thread. Please share your
>> thoughts. Thanks.
>>
>> Regards
>> Venkata krishnan
>>
>>
>> On Sun, Aug 13, 2023 at 6:35 AM liu ron <ron9....@gmail.com> wrote:
>>
>>> Hi, Venkata krishnan
>>>
>>> Thanks for driving this work, look forward to your FLIP.
>>>
>>> Best,
>>> Ron
>>>
>>> Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月13日周日 14:34写道:
>>>
>>> > Thanks Yunhong. That's correct. I am able to make it work locally.
>>> > Currently, in the process of writing a FLIP for the necessary changes
>>> to
>>> > the SupportsFilterPushDown API to support nested fields filter push
>>> down.
>>> >
>>> > Regards
>>> > Venkata krishnan
>>> >
>>> >
>>> > On Mon, Aug 7, 2023 at 8:28 PM yh z <zhengyunhon...@gmail.com> wrote:
>>> >
>>> > > Hi Venkatakrishnan,
>>> > > Sorry for the late reply. I have looked at the code and feel like you
>>> > need
>>> > > to modify the logic of the
>>> > > ExpressionConverter.visit(FieldReferenceExpression expression)
>>> method to
>>> > > support nested types,
>>> > > which are not currently supported in currently code.
>>> > >
>>> > > Regards,
>>> > > Yunhong Zheng (Swuferhong)
>>> > >
>>> > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月7日周一 13:30写道:
>>> > >
>>> > > > (Sorry, I pressed send too early)
>>> > > >
>>> > > > Thanks for the help @zhengyunhon...@gmail.com.
>>> > > >
>>> > > > Agree on not changing the API as much as possible as well as wrt
>>> > > > simplifying Projection pushdown with nested fields eventually as
>>> well.
>>> > > >
>>> > > > In terms of the code itself, currently I am trying to leverage the
>>> > > > FieldReferenceExpression to also handle nested fields for filter
>>> push
>>> > > down.
>>> > > > But where I am currently struggling to make progress is, once the
>>> > filters
>>> > > > are pushed to the table source itself, in
>>> > > >
>>> >
>>> PushFilterIntoSourceScanRuleBase#resolveFiltersAndCreateTableSourceTable
>>> > > > there is a conversion from List<ResolvedExpression (in this case
>>> > > > FieldReferenceExpression) to the List<RexNode> itself.
>>> > > >
>>> > > > If you have some pointers for that, please let me know. Thanks.
>>> > > >
>>> > > > Regards
>>> > > > Venkata krishnan
>>> > > >
>>> > > >
>>> > > > On Sun, Aug 6, 2023 at 10:23 PM Venkatakrishnan Sowrirajan <
>>> > > > vsowr...@asu.edu>
>>> > > > wrote:
>>> > > >
>>> > > > > Thanks @zhengyunhon...@gmail.com
>>> > > > > Regards
>>> > > > > Venkata krishnan
>>> > > > >
>>> > > > >
>>> > > > > On Sun, Aug 6, 2023 at 6:16 PM yh z <zhengyunhon...@gmail.com>
>>> > wrote:
>>> > > > >
>>> > > > >> Hi, Venkatakrishnan,
>>> > > > >> I think this is a very useful feature. I have been focusing on
>>> the
>>> > > > >> development of the flink-table-planner module recently, so if
>>> you
>>> > need
>>> > > > >> some
>>> > > > >> help, I can assist you in completing the development of some
>>> > sub-tasks
>>> > > > or
>>> > > > >> code review.
>>> > > > >>
>>> > > > >> Returning to the design itself, I think it's necessary to modify
>>> > > > >> FieldReferenceExpression or re-implement a
>>> > > > NestedFieldReferenceExpression.
>>> > > > >> As for modifying the interface of SupportsProjectionPushDown, I
>>> > think
>>> > > we
>>> > > > >> need to make some trade-offs. As a connector developer, the
>>> > stability
>>> > > of
>>> > > > >> the interface is very important. If there are no unresolved
>>> bugs, I
>>> > > > >> personally do not recommend modifying the interface. However,
>>> when I
>>> > > > first
>>> > > > >> read the code of SupportsProjectionPushDown, the design of
>>> int[][]
>>> > was
>>> > > > >> very
>>> > > > >> confusing for me, and it took me a long time to understand it by
>>> > > running
>>> > > > >> specify UT tests. Therefore, in terms of the design of this
>>> > interface
>>> > > > and
>>> > > > >> the consistency between different interfaces, there is indeed
>>> room
>>> > for
>>> > > > >> improvement it.
>>> > > > >>
>>> > > > >> Thanks,
>>> > > > >> Yunhong Zheng (Swuferhong)
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >> Becket Qin <becket....@gmail.com> 于2023年8月3日周四 07:44写道:
>>> > > > >>
>>> > > > >> > Hi Jark,
>>> > > > >> >
>>> > > > >> > If the FieldReferenceExpression contains an int[] to support a
>>> > > nested
>>> > > > >> field
>>> > > > >> > reference, List<FieldReferenceExpression> (or
>>> > > > >> FieldReferenceExpression[])
>>> > > > >> > and int[][] are actually equivalent. If we are designing this
>>> from
>>> > > > >> scratch,
>>> > > > >> > personally I prefer using List<FieldReferenceExpression> for
>>> > > > >> consistency,
>>> > > > >> > i.e. always resolving everything to expressions for users.
>>> > > Projection
>>> > > > >> is a
>>> > > > >> > simpler case, but should not be a special case. This avoids
>>> doing
>>> > > the
>>> > > > >> same
>>> > > > >> > thing in different ways which is also a confusion to the
>>> users. To
>>> > > me,
>>> > > > >> the
>>> > > > >> > int[][] format would become kind of a technical debt after we
>>> > extend
>>> > > > the
>>> > > > >> > FieldReferenceExpression. Although we don't have to address it
>>> > right
>>> > > > >> away
>>> > > > >> > in the same FLIP, this kind of debt accumulates over time and
>>> > makes
>>> > > > the
>>> > > > >> > project harder to learn and maintain. So, personally I prefer
>>> to
>>> > > > address
>>> > > > >> > these technical debts as soon as possible.
>>> > > > >> >
>>> > > > >> > Thanks,
>>> > > > >> >
>>> > > > >> > Jiangjie (Becket) Qin
>>> > > > >> >
>>> > > > >> > On Wed, Aug 2, 2023 at 8:19 PM Jark Wu <imj...@gmail.com>
>>> wrote:
>>> > > > >> >
>>> > > > >> > > Hi,
>>> > > > >> > >
>>> > > > >> > > I agree with Becket that we may need to extend
>>> > > > >> FieldReferenceExpression
>>> > > > >> > to
>>> > > > >> > > support nested field access (or maybe a new
>>> > > > >> > > NestedFieldReferenceExpression).
>>> > > > >> > > But I have some concerns about evolving the
>>> > > > >> > > SupportsProjectionPushDown.applyProjection.
>>> > > > >> > > A projection is much simpler than Filter Expression which
>>> only
>>> > > needs
>>> > > > >> to
>>> > > > >> > > represent the field indexes.
>>> > > > >> > > If we evolve `applyProjection` to accept
>>> > > > >> `List<FieldReferenceExpression>
>>> > > > >> > > projectedFields`,
>>> > > > >> > > users have to convert the `List<FieldReferenceExpression>`
>>> back
>>> > to
>>> > > > >> > int[][]
>>> > > > >> > > which is an overhead for users.
>>> > > > >> > > Field indexes (int[][]) is required to project schemas with
>>> the
>>> > > > >> > > utility org.apache.flink.table.connector.Projection.
>>> > > > >> > >
>>> > > > >> > >
>>> > > > >> > > Best,
>>> > > > >> > > Jark
>>> > > > >> > >
>>> > > > >> > >
>>> > > > >> > >
>>> > > > >> > > On Wed, 2 Aug 2023 at 07:40, Venkatakrishnan Sowrirajan <
>>> > > > >> > vsowr...@asu.edu>
>>> > > > >> > > wrote:
>>> > > > >> > >
>>> > > > >> > > > Thanks Becket for the suggestion. That makes sense. Let
>>> me try
>>> > > it
>>> > > > >> out
>>> > > > >> > and
>>> > > > >> > > > get back to you.
>>> > > > >> > > >
>>> > > > >> > > > Regards
>>> > > > >> > > > Venkata krishnan
>>> > > > >> > > >
>>> > > > >> > > >
>>> > > > >> > > > On Tue, Aug 1, 2023 at 9:04 AM Becket Qin <
>>> > becket....@gmail.com
>>> > > >
>>> > > > >> > wrote:
>>> > > > >> > > >
>>> > > > >> > > > > This is a very useful feature in practice.
>>> > > > >> > > > >
>>> > > > >> > > > > It looks to me that the key issue here is that Flink
>>> > > > >> > ResolvedExpression
>>> > > > >> > > > > does not have necessary abstraction for nested field
>>> access.
>>> > > So
>>> > > > >> the
>>> > > > >> > > > Calcite
>>> > > > >> > > > > RexFieldAccess does not have a counterpart in the
>>> > > > >> ResolvedExpression.
>>> > > > >> > > The
>>> > > > >> > > > > FieldReferenceExpression only supports direct access to
>>> the
>>> > > > >> fields,
>>> > > > >> > not
>>> > > > >> > > > > nested access.
>>> > > > >> > > > >
>>> > > > >> > > > > Theoretically speaking, this nested field reference is
>>> also
>>> > > > >> required
>>> > > > >> > by
>>> > > > >> > > > > projection pushdown. However, we addressed that by
>>> using an
>>> > > > >> int[][]
>>> > > > >> > in
>>> > > > >> > > > the
>>> > > > >> > > > > SupportsProjectionPushDown interface. Maybe we can do
>>> the
>>> > > > >> following:
>>> > > > >> > > > >
>>> > > > >> > > > > 1. Extend the FieldReferenceExpression to include an
>>> int[]
>>> > for
>>> > > > >> nested
>>> > > > >> > > > field
>>> > > > >> > > > > access,
>>> > > > >> > > > > 2. By doing (1),
>>> > > > >> > > > >
>>> > SupportsFilterPushDown#applyFilters(List<ResolvedExpression>)
>>> > > > can
>>> > > > >> > > support
>>> > > > >> > > > > nested field access.
>>> > > > >> > > > > 3. Evolve the
>>> > > SupportsProjectionPushDown.applyProjection(int[][]
>>> > > > >> > > > > projectedFields, DataType producedDataType) to
>>> > > > >> > > > > applyProjection(List<FieldReferenceExpression>
>>> > > projectedFields,
>>> > > > >> > > DataType
>>> > > > >> > > > > producedDataType)
>>> > > > >> > > > >
>>> > > > >> > > > > This will need a FLIP.
>>> > > > >> > > > >
>>> > > > >> > > > > Thanks,
>>> > > > >> > > > >
>>> > > > >> > > > > Jiangjie (Becket) Qin
>>> > > > >> > > > >
>>> > > > >> > > > > On Tue, Aug 1, 2023 at 11:42 PM Venkatakrishnan
>>> Sowrirajan <
>>> > > > >> > > > > vsowr...@asu.edu>
>>> > > > >> > > > > wrote:
>>> > > > >> > > > >
>>> > > > >> > > > > > Thanks for the response. Looking forward to your
>>> pointers.
>>> > > In
>>> > > > >> the
>>> > > > >> > > > > > meanwhile, let me figure out how we can implement it.
>>> Will
>>> > > > keep
>>> > > > >> you
>>> > > > >> > > > > posted.
>>> > > > >> > > > > >
>>> > > > >> > > > > > On Mon, Jul 31, 2023, 11:43 PM liu ron <
>>> > ron9....@gmail.com>
>>> > > > >> wrote:
>>> > > > >> > > > > >
>>> > > > >> > > > > > > Hi, Venkata
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Thanks for reporting this issue. Currently, Flink
>>> > doesn't
>>> > > > >> support
>>> > > > >> > > > > nested
>>> > > > >> > > > > > > filter pushdown. I also think that this optimization
>>> > would
>>> > > > be
>>> > > > >> > > useful,
>>> > > > >> > > > > > > especially for jobs, which may need to read a lot of
>>> > data
>>> > > > from
>>> > > > >> > the
>>> > > > >> > > > > > parquet
>>> > > > >> > > > > > > or orc file. We didn't move forward with this for
>>> some
>>> > > > >> priority
>>> > > > >> > > > > reasons.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Regarding your three questions, I will respond to
>>> you
>>> > > later
>>> > > > >> after
>>> > > > >> > > my
>>> > > > >> > > > > > > on-call is finished because I need to dive into the
>>> > source
>>> > > > >> code.
>>> > > > >> > > > About
>>> > > > >> > > > > > your
>>> > > > >> > > > > > > commit, I don't think it's the right solution
>>> because
>>> > > > >> > > > > > > FieldReferenceExpression doesn't currently support
>>> > nested
>>> > > > >> field
>>> > > > >> > > > filter
>>> > > > >> > > > > > > pushdown, maybe we need to extend it.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > You can also look further into reasonable solutions,
>>> > which
>>> > > > >> we'll
>>> > > > >> > > > > discuss
>>> > > > >> > > > > > > further later on.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Best,
>>> > > > >> > > > > > > Ron
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu>
>>> > > 于2023年7月29日周六
>>> > > > >> > > 03:31写道:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > > Hi all,
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Currently, I am working on adding support for
>>> nested
>>> > > > fields
>>> > > > >> > > filter
>>> > > > >> > > > > push
>>> > > > >> > > > > > > > down. In our use case running Flink on Batch, we
>>> found
>>> > > > >> nested
>>> > > > >> > > > fields
>>> > > > >> > > > > > > filter
>>> > > > >> > > > > > > > push down is key - without it, it is significantly
>>> > slow.
>>> > > > >> Note:
>>> > > > >> > > > Spark
>>> > > > >> > > > > > SQL
>>> > > > >> > > > > > > > supports nested fields filter push down.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > While debugging the code using IcebergTableSource
>>> as
>>> > the
>>> > > > >> table
>>> > > > >> > > > > source,
>>> > > > >> > > > > > > > narrowed down the issue to missing support for
>>> > > > >> > > > > > > >
>>> > > > >> RexNodeExtractor#RexNodeToExpressionConverter#visitFieldAccess.
>>> > > > >> > > > > > > > As part of fixing it, I made changes by returning
>>> an
>>> > > > >> > > > > > > > Option(FieldReferenceExpression)
>>> > > > >> > > > > > > > with appropriate reference to the parent index
>>> and the
>>> > > > child
>>> > > > >> > > index
>>> > > > >> > > > > for
>>> > > > >> > > > > > > the
>>> > > > >> > > > > > > > nested field with the data type info.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > But this new ResolvedExpression cannot be
>>> converted to
>>> > > > >> RexNode
>>> > > > >> > > > which
>>> > > > >> > > > > > > > happens in PushFilterIntoSourceScanRuleBase
>>> > > > >> > > > > > > > <
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://urldefense.com/v3/__https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/PushFilterIntoSourceScanRuleBase.java*L104__;Iw!!IKRxdwAv5BmarQ!fNgxcul8ZGwkNE9ygOeVGlWlU6m_MLMXf4A3S3oQu9LBzYTPF90pZ7uXSGMr-5dFmzRn37-e9Q5cMnVs$
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > .
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Few questions
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > 1. Does FieldReferenceExpression support nested
>>> fields
>>> > > > >> > currently
>>> > > > >> > > or
>>> > > > >> > > > > > > should
>>> > > > >> > > > > > > > it be extended to support nested fields? I
>>> couldn't
>>> > > figure
>>> > > > >> this
>>> > > > >> > > out
>>> > > > >> > > > > > from
>>> > > > >> > > > > > > > the PushProjectIntoTableScanRule that supports
>>> nested
>>> > > > column
>>> > > > >> > > > > projection
>>> > > > >> > > > > > > > push down.
>>> > > > >> > > > > > > > 2. ExpressionConverter
>>> > > > >> > > > > > > > <
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://urldefense.com/v3/__https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/expressions/converter/ExpressionConverter.java*L197__;Iw!!IKRxdwAv5BmarQ!fNgxcul8ZGwkNE9ygOeVGlWlU6m_MLMXf4A3S3oQu9LBzYTPF90pZ7uXSGMr-5dFmzRn37-e9Z6jnkJm$
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > converts ResolvedExpression -> RexNode but the new
>>> > > > >> > > > > > > FieldReferenceExpression
>>> > > > >> > > > > > > > with the nested field cannot be converted to
>>> RexNode.
>>> > > This
>>> > > > >> is
>>> > > > >> > why
>>> > > > >> > > > the
>>> > > > >> > > > > > > > answer to the 1st question is key.
>>> > > > >> > > > > > > > 3. Anything else that I'm missing here? or is
>>> there an
>>> > > > even
>>> > > > >> > > easier
>>> > > > >> > > > > way
>>> > > > >> > > > > > to
>>> > > > >> > > > > > > > add support for nested fields filter push down?
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Partially working changes - Commit
>>> > > > >> > > > > > > > <
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://urldefense.com/v3/__https://github.com/venkata91/flink/commit/00cdf34ecf9be3ba669a97baaed4b69b85cd26f9__;!!IKRxdwAv5BmarQ!fNgxcul8ZGwkNE9ygOeVGlWlU6m_MLMXf4A3S3oQu9LBzYTPF90pZ7uXSGMr-5dFmzRn37-e9XeOjJ_a$
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > Please
>>> > > > >> > > > > > > > feel free to leave a comment directly in the
>>> commit.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Any pointers here would be much appreciated!
>>> Thanks in
>>> > > > >> advance.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Disclaimer: Relatively new to Flink code base
>>> > especially
>>> > > > >> Table
>>> > > > >> > > > > planner
>>> > > > >> > > > > > > :-).
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Regards
>>> > > > >> > > > > > > > Venkata krishnan
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Reply via email to