[GitHub] [spark] maropu commented on a change in pull request #28556: [SPARK-31736][SQL] Nested column aliasing for other operators

GitBox Sun, 17 May 2020 19:02:28 -0700


maropu commented on a change in pull request #28556:
URL: https://github.com/apache/spark/pull/28556#discussion_r426336730




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##########
@@ -82,6 +87,8 @@ object NestedColumnAliasing {
     case _: LocalLimit => true
     case _: Repartition => true
     case _: Sample => true
+    case _: RepartitionByExpression => true
+    case _: Join => true

Review comment:
       Btw, joins update the nullability of input attributes (outer join 
cases), but does this logic work correctly? Either way, I think we need some 
tests for the case. It seems the current PR only has tests for inner join cases?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##########
@@ -82,6 +87,8 @@ object NestedColumnAliasing {
     case _: LocalLimit => true
     case _: Repartition => true
     case _: Sample => true
+    case _: RepartitionByExpression => true
+    case _: Join => true

Review comment:
       nit: Could you update the title like `Nested column aliasing for 
RepartitionByExpression/Join`?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##########
@@ -34,7 +34,8 @@ object NestedColumnAliasing {
     : Option[(Map[ExtractValue, Alias], Map[ExprId, Seq[Alias]])] = plan match 
{
     case Project(projectList, child)
         if SQLConf.get.nestedSchemaPruningEnabled && 
canProjectPushThrough(child) =>
-      getAliasSubMap(projectList)
+      val exprsToPrune = projectList ++ child.expressions

Review comment:
       nit: `exprsToPrune` -> `exprCandidatesToPrune`?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##########
@@ -68,10 +69,14 @@ object NestedColumnAliasing {
    */
   def replaceChildrenWithAliases(
       plan: LogicalPlan,
+      nestedFieldToAlias: Map[ExtractValue, Alias],
       attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = {
     plan.withNewChildren(plan.children.map { plan =>
       Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, 
Seq(a))), plan)
-    })
+    }).transformExpressions {
+      case f: ExtractValue if nestedFieldToAlias.contains(f) =>
+        nestedFieldToAlias(f).toAttribute

Review comment:
       Is this change needed only for supporting joins?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##########
@@ -204,15 +211,8 @@ object GeneratorNestedColumnAliasing {
       g: Generate,
       nestedFieldToAlias: Map[ExtractValue, Alias],
       attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = {
-    val newGenerator = g.generator.transform {
-      case f: ExtractValue if nestedFieldToAlias.contains(f) =>
-        nestedFieldToAlias(f).toAttribute
-    }.asInstanceOf[Generator]
-
     // Defer updating `Generate.unrequiredChildIndex` to next round of 
`ColumnPruning`.
-    val newGenerate = g.copy(generator = newGenerator)
-
-    NestedColumnAliasing.replaceChildrenWithAliases(newGenerate, attrToAliases)
+    NestedColumnAliasing.replaceChildrenWithAliases(g, nestedFieldToAlias, 
attrToAliases)

Review comment:
       nit: I think we need to update the method name of 
`replaceChildrenWithAliases`. We don't need `Children` in the name, anymore?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28556: [SPARK-31736][SQL] Nested column aliasing for other operators

Reply via email to