LiaCastaneda commented on code in PR #17478: URL: https://github.com/apache/datafusion/pull/17478#discussion_r2336365081
########## datafusion/substrait/tests/cases/consumer_integration.rs: ########## @@ -605,26 +605,30 @@ mod tests { #[tokio::test] async fn test_multiple_joins() -> Result<()> { let plan_str = test_plan_to_string("multiple_joins.json").await?; - assert_eq!( + assert_snapshot!( plan_str, - "Projection: left.count(Int64(1)) AS count_first, left.category, left.count(Int64(1)):1 AS count_second, right.count(Int64(1)) AS count_third\ - \n Left Join: left.id = right.id\ - \n SubqueryAlias: left\ - \n Left Join: left.id = right.id\ - \n SubqueryAlias: left\ - \n Left Join: left.id = right.id\ - \n SubqueryAlias: left\ - \n Aggregate: groupBy=[[id]], aggr=[[count(Int64(1))]]\ - \n Values: (Int64(1)), (Int64(2))\ - \n SubqueryAlias: right\ - \n Aggregate: groupBy=[[id, category]], aggr=[[]]\ - \n Values: (Int64(1), Utf8(\"info\")), (Int64(2), Utf8(\"low\"))\ - \n SubqueryAlias: right\ - \n Aggregate: groupBy=[[id]], aggr=[[count(Int64(1))]]\ - \n Values: (Int64(1)), (Int64(2))\ - \n SubqueryAlias: right\ - \n Aggregate: groupBy=[[id]], aggr=[[count(Int64(1))]]\ - \n Values: (Int64(1)), (Int64(2))" + @r#" + Projection: left.count(Int64(1)) AS count_first, left.category, left.count(Int64(1)):1 AS count_second, right.count(Int64(1)) AS count_third + Left Join: left.id = right.id + SubqueryAlias: left + Projection: left.id, left.count(Int64(1)), left.id:1, left.category, right.id AS id:2, right.count(Int64(1)) AS count(Int64(1)):1 Review Comment: I think this solves my doubt above, this is not requalifying the column name but its adding the `:<N> `as an alias, so I guess the optimizer sees the column names for resultion not the alias? ########## datafusion/expr/src/logical_plan/plan.rs: ########## @@ -2223,13 +2223,15 @@ impl SubqueryAlias { alias: impl Into<TableReference>, ) -> Result<Self> { let alias = alias.into(); - let fields = change_redundant_column(plan.schema().fields()); - let meta_data = plan.schema().as_ref().metadata().clone(); - let schema: Schema = - DFSchema::from_unqualified_fields(fields.into(), meta_data)?.into(); - // Since schema is the same, other than qualifier, we can use existing - // functional dependencies: + let plan = maybe_project_redundant_column(plan)?; + + let fields = plan.schema().fields().clone(); + let meta_data = plan.schema().metadata().clone(); let func_dependencies = plan.schema().functional_dependencies().clone(); + + let schema = DFSchema::from_unqualified_fields(fields, meta_data)?; Review Comment: Here the join.schema() will have the` :<N>` still no? since we are returning the schema from the top level projection ( if it was inserted ). I know this solves the issue + makes the optimizer pass work but I don't understand how the optimizer works with this fix if we are still keeping the` :<N>` in the top level schema.. ########## datafusion/expr/src/logical_plan/builder.rs: ########## @@ -2675,34 +2703,6 @@ mod tests { Ok(()) } - #[test] - fn test_change_redundant_column() -> Result<()> { Review Comment: maybe we should add a test for maybe_project_redundant_column? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org