maropu commented on a change in pull request #28556: URL: https://github.com/apache/spark/pull/28556#discussion_r426336730
########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ########## @@ -82,6 +87,8 @@ object NestedColumnAliasing { case _: LocalLimit => true case _: Repartition => true case _: Sample => true + case _: RepartitionByExpression => true + case _: Join => true Review comment: Btw, joins update the nullability of input attributes (outer join cases), but does this logic work correctly? Either way, I think we need some tests for the case. It seems the current PR only has tests for inner join cases? ########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ########## @@ -82,6 +87,8 @@ object NestedColumnAliasing { case _: LocalLimit => true case _: Repartition => true case _: Sample => true + case _: RepartitionByExpression => true + case _: Join => true Review comment: nit: Could you update the title like `Nested column aliasing for RepartitionByExpression/Join`? ########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ########## @@ -34,7 +34,8 @@ object NestedColumnAliasing { : Option[(Map[ExtractValue, Alias], Map[ExprId, Seq[Alias]])] = plan match { case Project(projectList, child) if SQLConf.get.nestedSchemaPruningEnabled && canProjectPushThrough(child) => - getAliasSubMap(projectList) + val exprsToPrune = projectList ++ child.expressions Review comment: nit: `exprsToPrune` -> `exprCandidatesToPrune`? ########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ########## @@ -68,10 +69,14 @@ object NestedColumnAliasing { */ def replaceChildrenWithAliases( plan: LogicalPlan, + nestedFieldToAlias: Map[ExtractValue, Alias], attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { plan.withNewChildren(plan.children.map { plan => Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, Seq(a))), plan) - }) + }).transformExpressions { + case f: ExtractValue if nestedFieldToAlias.contains(f) => + nestedFieldToAlias(f).toAttribute Review comment: Is this change needed only for supporting joins? ########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ########## @@ -204,15 +211,8 @@ object GeneratorNestedColumnAliasing { g: Generate, nestedFieldToAlias: Map[ExtractValue, Alias], attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { - val newGenerator = g.generator.transform { - case f: ExtractValue if nestedFieldToAlias.contains(f) => - nestedFieldToAlias(f).toAttribute - }.asInstanceOf[Generator] - // Defer updating `Generate.unrequiredChildIndex` to next round of `ColumnPruning`. - val newGenerate = g.copy(generator = newGenerator) - - NestedColumnAliasing.replaceChildrenWithAliases(newGenerate, attrToAliases) + NestedColumnAliasing.replaceChildrenWithAliases(g, nestedFieldToAlias, attrToAliases) Review comment: nit: I think we need to update the method name of `replaceChildrenWithAliases`. We don't need `Children` in the name, anymore? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org