Gentle ping On Wed, Aug 16, 2023, 11:56 AM Venkatakrishnan Sowrirajan <vsowr...@asu.edu> wrote:
> Forgot to share the link - > https://lists.apache.org/thread/686bhgwrrb4xmbfzlk60szwxos4z64t7 in the > last email. > > Regards > Venkata krishnan > > > On Wed, Aug 16, 2023 at 11:55 AM Venkatakrishnan Sowrirajan < > vsowr...@asu.edu> wrote: > >> Btw, this is the FLIP proposal discussion thread. Please share your >> thoughts. Thanks. >> >> Regards >> Venkata krishnan >> >> >> On Sun, Aug 13, 2023 at 6:35 AM liu ron <ron9....@gmail.com> wrote: >> >>> Hi, Venkata krishnan >>> >>> Thanks for driving this work, look forward to your FLIP. >>> >>> Best, >>> Ron >>> >>> Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月13日周日 14:34写道: >>> >>> > Thanks Yunhong. That's correct. I am able to make it work locally. >>> > Currently, in the process of writing a FLIP for the necessary changes >>> to >>> > the SupportsFilterPushDown API to support nested fields filter push >>> down. >>> > >>> > Regards >>> > Venkata krishnan >>> > >>> > >>> > On Mon, Aug 7, 2023 at 8:28 PM yh z <zhengyunhon...@gmail.com> wrote: >>> > >>> > > Hi Venkatakrishnan, >>> > > Sorry for the late reply. I have looked at the code and feel like you >>> > need >>> > > to modify the logic of the >>> > > ExpressionConverter.visit(FieldReferenceExpression expression) >>> method to >>> > > support nested types, >>> > > which are not currently supported in currently code. >>> > > >>> > > Regards, >>> > > Yunhong Zheng (Swuferhong) >>> > > >>> > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月7日周一 13:30写道: >>> > > >>> > > > (Sorry, I pressed send too early) >>> > > > >>> > > > Thanks for the help @zhengyunhon...@gmail.com. >>> > > > >>> > > > Agree on not changing the API as much as possible as well as wrt >>> > > > simplifying Projection pushdown with nested fields eventually as >>> well. >>> > > > >>> > > > In terms of the code itself, currently I am trying to leverage the >>> > > > FieldReferenceExpression to also handle nested fields for filter >>> push >>> > > down. >>> > > > But where I am currently struggling to make progress is, once the >>> > filters >>> > > > are pushed to the table source itself, in >>> > > > >>> > >>> PushFilterIntoSourceScanRuleBase#resolveFiltersAndCreateTableSourceTable >>> > > > there is a conversion from List<ResolvedExpression (in this case >>> > > > FieldReferenceExpression) to the List<RexNode> itself. >>> > > > >>> > > > If you have some pointers for that, please let me know. Thanks. >>> > > > >>> > > > Regards >>> > > > Venkata krishnan >>> > > > >>> > > > >>> > > > On Sun, Aug 6, 2023 at 10:23 PM Venkatakrishnan Sowrirajan < >>> > > > vsowr...@asu.edu> >>> > > > wrote: >>> > > > >>> > > > > Thanks @zhengyunhon...@gmail.com >>> > > > > Regards >>> > > > > Venkata krishnan >>> > > > > >>> > > > > >>> > > > > On Sun, Aug 6, 2023 at 6:16 PM yh z <zhengyunhon...@gmail.com> >>> > wrote: >>> > > > > >>> > > > >> Hi, Venkatakrishnan, >>> > > > >> I think this is a very useful feature. I have been focusing on >>> the >>> > > > >> development of the flink-table-planner module recently, so if >>> you >>> > need >>> > > > >> some >>> > > > >> help, I can assist you in completing the development of some >>> > sub-tasks >>> > > > or >>> > > > >> code review. >>> > > > >> >>> > > > >> Returning to the design itself, I think it's necessary to modify >>> > > > >> FieldReferenceExpression or re-implement a >>> > > > NestedFieldReferenceExpression. >>> > > > >> As for modifying the interface of SupportsProjectionPushDown, I >>> > think >>> > > we >>> > > > >> need to make some trade-offs. As a connector developer, the >>> > stability >>> > > of >>> > > > >> the interface is very important. If there are no unresolved >>> bugs, I >>> > > > >> personally do not recommend modifying the interface. However, >>> when I >>> > > > first >>> > > > >> read the code of SupportsProjectionPushDown, the design of >>> int[][] >>> > was >>> > > > >> very >>> > > > >> confusing for me, and it took me a long time to understand it by >>> > > running >>> > > > >> specify UT tests. Therefore, in terms of the design of this >>> > interface >>> > > > and >>> > > > >> the consistency between different interfaces, there is indeed >>> room >>> > for >>> > > > >> improvement it. >>> > > > >> >>> > > > >> Thanks, >>> > > > >> Yunhong Zheng (Swuferhong) >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> Becket Qin <becket....@gmail.com> 于2023年8月3日周四 07:44写道: >>> > > > >> >>> > > > >> > Hi Jark, >>> > > > >> > >>> > > > >> > If the FieldReferenceExpression contains an int[] to support a >>> > > nested >>> > > > >> field >>> > > > >> > reference, List<FieldReferenceExpression> (or >>> > > > >> FieldReferenceExpression[]) >>> > > > >> > and int[][] are actually equivalent. If we are designing this >>> from >>> > > > >> scratch, >>> > > > >> > personally I prefer using List<FieldReferenceExpression> for >>> > > > >> consistency, >>> > > > >> > i.e. always resolving everything to expressions for users. >>> > > Projection >>> > > > >> is a >>> > > > >> > simpler case, but should not be a special case. This avoids >>> doing >>> > > the >>> > > > >> same >>> > > > >> > thing in different ways which is also a confusion to the >>> users. To >>> > > me, >>> > > > >> the >>> > > > >> > int[][] format would become kind of a technical debt after we >>> > extend >>> > > > the >>> > > > >> > FieldReferenceExpression. Although we don't have to address it >>> > right >>> > > > >> away >>> > > > >> > in the same FLIP, this kind of debt accumulates over time and >>> > makes >>> > > > the >>> > > > >> > project harder to learn and maintain. So, personally I prefer >>> to >>> > > > address >>> > > > >> > these technical debts as soon as possible. >>> > > > >> > >>> > > > >> > Thanks, >>> > > > >> > >>> > > > >> > Jiangjie (Becket) Qin >>> > > > >> > >>> > > > >> > On Wed, Aug 2, 2023 at 8:19 PM Jark Wu <imj...@gmail.com> >>> wrote: >>> > > > >> > >>> > > > >> > > Hi, >>> > > > >> > > >>> > > > >> > > I agree with Becket that we may need to extend >>> > > > >> FieldReferenceExpression >>> > > > >> > to >>> > > > >> > > support nested field access (or maybe a new >>> > > > >> > > NestedFieldReferenceExpression). >>> > > > >> > > But I have some concerns about evolving the >>> > > > >> > > SupportsProjectionPushDown.applyProjection. >>> > > > >> > > A projection is much simpler than Filter Expression which >>> only >>> > > needs >>> > > > >> to >>> > > > >> > > represent the field indexes. >>> > > > >> > > If we evolve `applyProjection` to accept >>> > > > >> `List<FieldReferenceExpression> >>> > > > >> > > projectedFields`, >>> > > > >> > > users have to convert the `List<FieldReferenceExpression>` >>> back >>> > to >>> > > > >> > int[][] >>> > > > >> > > which is an overhead for users. >>> > > > >> > > Field indexes (int[][]) is required to project schemas with >>> the >>> > > > >> > > utility org.apache.flink.table.connector.Projection. >>> > > > >> > > >>> > > > >> > > >>> > > > >> > > Best, >>> > > > >> > > Jark >>> > > > >> > > >>> > > > >> > > >>> > > > >> > > >>> > > > >> > > On Wed, 2 Aug 2023 at 07:40, Venkatakrishnan Sowrirajan < >>> > > > >> > vsowr...@asu.edu> >>> > > > >> > > wrote: >>> > > > >> > > >>> > > > >> > > > Thanks Becket for the suggestion. That makes sense. Let >>> me try >>> > > it >>> > > > >> out >>> > > > >> > and >>> > > > >> > > > get back to you. >>> > > > >> > > > >>> > > > >> > > > Regards >>> > > > >> > > > Venkata krishnan >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > On Tue, Aug 1, 2023 at 9:04 AM Becket Qin < >>> > becket....@gmail.com >>> > > > >>> > > > >> > wrote: >>> > > > >> > > > >>> > > > >> > > > > This is a very useful feature in practice. >>> > > > >> > > > > >>> > > > >> > > > > It looks to me that the key issue here is that Flink >>> > > > >> > ResolvedExpression >>> > > > >> > > > > does not have necessary abstraction for nested field >>> access. >>> > > So >>> > > > >> the >>> > > > >> > > > Calcite >>> > > > >> > > > > RexFieldAccess does not have a counterpart in the >>> > > > >> ResolvedExpression. >>> > > > >> > > The >>> > > > >> > > > > FieldReferenceExpression only supports direct access to >>> the >>> > > > >> fields, >>> > > > >> > not >>> > > > >> > > > > nested access. >>> > > > >> > > > > >>> > > > >> > > > > Theoretically speaking, this nested field reference is >>> also >>> > > > >> required >>> > > > >> > by >>> > > > >> > > > > projection pushdown. However, we addressed that by >>> using an >>> > > > >> int[][] >>> > > > >> > in >>> > > > >> > > > the >>> > > > >> > > > > SupportsProjectionPushDown interface. Maybe we can do >>> the >>> > > > >> following: >>> > > > >> > > > > >>> > > > >> > > > > 1. Extend the FieldReferenceExpression to include an >>> int[] >>> > for >>> > > > >> nested >>> > > > >> > > > field >>> > > > >> > > > > access, >>> > > > >> > > > > 2. By doing (1), >>> > > > >> > > > > >>> > SupportsFilterPushDown#applyFilters(List<ResolvedExpression>) >>> > > > can >>> > > > >> > > support >>> > > > >> > > > > nested field access. >>> > > > >> > > > > 3. Evolve the >>> > > SupportsProjectionPushDown.applyProjection(int[][] >>> > > > >> > > > > projectedFields, DataType producedDataType) to >>> > > > >> > > > > applyProjection(List<FieldReferenceExpression> >>> > > projectedFields, >>> > > > >> > > DataType >>> > > > >> > > > > producedDataType) >>> > > > >> > > > > >>> > > > >> > > > > This will need a FLIP. >>> > > > >> > > > > >>> > > > >> > > > > Thanks, >>> > > > >> > > > > >>> > > > >> > > > > Jiangjie (Becket) Qin >>> > > > >> > > > > >>> > > > >> > > > > On Tue, Aug 1, 2023 at 11:42 PM Venkatakrishnan >>> Sowrirajan < >>> > > > >> > > > > vsowr...@asu.edu> >>> > > > >> > > > > wrote: >>> > > > >> > > > > >>> > > > >> > > > > > Thanks for the response. Looking forward to your >>> pointers. >>> > > In >>> > > > >> the >>> > > > >> > > > > > meanwhile, let me figure out how we can implement it. >>> Will >>> > > > keep >>> > > > >> you >>> > > > >> > > > > posted. >>> > > > >> > > > > > >>> > > > >> > > > > > On Mon, Jul 31, 2023, 11:43 PM liu ron < >>> > ron9....@gmail.com> >>> > > > >> wrote: >>> > > > >> > > > > > >>> > > > >> > > > > > > Hi, Venkata >>> > > > >> > > > > > > >>> > > > >> > > > > > > Thanks for reporting this issue. Currently, Flink >>> > doesn't >>> > > > >> support >>> > > > >> > > > > nested >>> > > > >> > > > > > > filter pushdown. I also think that this optimization >>> > would >>> > > > be >>> > > > >> > > useful, >>> > > > >> > > > > > > especially for jobs, which may need to read a lot of >>> > data >>> > > > from >>> > > > >> > the >>> > > > >> > > > > > parquet >>> > > > >> > > > > > > or orc file. We didn't move forward with this for >>> some >>> > > > >> priority >>> > > > >> > > > > reasons. >>> > > > >> > > > > > > >>> > > > >> > > > > > > Regarding your three questions, I will respond to >>> you >>> > > later >>> > > > >> after >>> > > > >> > > my >>> > > > >> > > > > > > on-call is finished because I need to dive into the >>> > source >>> > > > >> code. >>> > > > >> > > > About >>> > > > >> > > > > > your >>> > > > >> > > > > > > commit, I don't think it's the right solution >>> because >>> > > > >> > > > > > > FieldReferenceExpression doesn't currently support >>> > nested >>> > > > >> field >>> > > > >> > > > filter >>> > > > >> > > > > > > pushdown, maybe we need to extend it. >>> > > > >> > > > > > > >>> > > > >> > > > > > > You can also look further into reasonable solutions, >>> > which >>> > > > >> we'll >>> > > > >> > > > > discuss >>> > > > >> > > > > > > further later on. >>> > > > >> > > > > > > >>> > > > >> > > > > > > Best, >>> > > > >> > > > > > > Ron >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> >>> > > 于2023年7月29日周六 >>> > > > >> > > 03:31写道: >>> > > > >> > > > > > > >>> > > > >> > > > > > > > Hi all, >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > Currently, I am working on adding support for >>> nested >>> > > > fields >>> > > > >> > > filter >>> > > > >> > > > > push >>> > > > >> > > > > > > > down. In our use case running Flink on Batch, we >>> found >>> > > > >> nested >>> > > > >> > > > fields >>> > > > >> > > > > > > filter >>> > > > >> > > > > > > > push down is key - without it, it is significantly >>> > slow. >>> > > > >> Note: >>> > > > >> > > > Spark >>> > > > >> > > > > > SQL >>> > > > >> > > > > > > > supports nested fields filter push down. >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > While debugging the code using IcebergTableSource >>> as >>> > the >>> > > > >> table >>> > > > >> > > > > source, >>> > > > >> > > > > > > > narrowed down the issue to missing support for >>> > > > >> > > > > > > > >>> > > > >> RexNodeExtractor#RexNodeToExpressionConverter#visitFieldAccess. >>> > > > >> > > > > > > > As part of fixing it, I made changes by returning >>> an >>> > > > >> > > > > > > > Option(FieldReferenceExpression) >>> > > > >> > > > > > > > with appropriate reference to the parent index >>> and the >>> > > > child >>> > > > >> > > index >>> > > > >> > > > > for >>> > > > >> > > > > > > the >>> > > > >> > > > > > > > nested field with the data type info. >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > But this new ResolvedExpression cannot be >>> converted to >>> > > > >> RexNode >>> > > > >> > > > which >>> > > > >> > > > > > > > happens in PushFilterIntoSourceScanRuleBase >>> > > > >> > > > > > > > < >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://urldefense.com/v3/__https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/PushFilterIntoSourceScanRuleBase.java*L104__;Iw!!IKRxdwAv5BmarQ!fNgxcul8ZGwkNE9ygOeVGlWlU6m_MLMXf4A3S3oQu9LBzYTPF90pZ7uXSGMr-5dFmzRn37-e9Q5cMnVs$ >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > . >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > Few questions >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > 1. Does FieldReferenceExpression support nested >>> fields >>> > > > >> > currently >>> > > > >> > > or >>> > > > >> > > > > > > should >>> > > > >> > > > > > > > it be extended to support nested fields? I >>> couldn't >>> > > figure >>> > > > >> this >>> > > > >> > > out >>> > > > >> > > > > > from >>> > > > >> > > > > > > > the PushProjectIntoTableScanRule that supports >>> nested >>> > > > column >>> > > > >> > > > > projection >>> > > > >> > > > > > > > push down. >>> > > > >> > > > > > > > 2. ExpressionConverter >>> > > > >> > > > > > > > < >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://urldefense.com/v3/__https://github.com/apache/flink/blob/3f63e03e83144e9857834f8db1895637d2aa218a/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/expressions/converter/ExpressionConverter.java*L197__;Iw!!IKRxdwAv5BmarQ!fNgxcul8ZGwkNE9ygOeVGlWlU6m_MLMXf4A3S3oQu9LBzYTPF90pZ7uXSGMr-5dFmzRn37-e9Z6jnkJm$ >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > converts ResolvedExpression -> RexNode but the new >>> > > > >> > > > > > > FieldReferenceExpression >>> > > > >> > > > > > > > with the nested field cannot be converted to >>> RexNode. >>> > > This >>> > > > >> is >>> > > > >> > why >>> > > > >> > > > the >>> > > > >> > > > > > > > answer to the 1st question is key. >>> > > > >> > > > > > > > 3. Anything else that I'm missing here? or is >>> there an >>> > > > even >>> > > > >> > > easier >>> > > > >> > > > > way >>> > > > >> > > > > > to >>> > > > >> > > > > > > > add support for nested fields filter push down? >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > Partially working changes - Commit >>> > > > >> > > > > > > > < >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://urldefense.com/v3/__https://github.com/venkata91/flink/commit/00cdf34ecf9be3ba669a97baaed4b69b85cd26f9__;!!IKRxdwAv5BmarQ!fNgxcul8ZGwkNE9ygOeVGlWlU6m_MLMXf4A3S3oQu9LBzYTPF90pZ7uXSGMr-5dFmzRn37-e9XeOjJ_a$ >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > Please >>> > > > >> > > > > > > > feel free to leave a comment directly in the >>> commit. >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > Any pointers here would be much appreciated! >>> Thanks in >>> > > > >> advance. >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > Disclaimer: Relatively new to Flink code base >>> > especially >>> > > > >> Table >>> > > > >> > > > > planner >>> > > > >> > > > > > > :-). >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > Regards >>> > > > >> > > > > > > > Venkata krishnan >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > > >>> > > > >>> > > >>> > >>> >>