[GitHub] [spark] viirya commented on a change in pull request #28560: [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column

GitBox Sun, 17 May 2020 23:22:28 -0700


viirya commented on a change in pull request #28560:
URL: https://github.com/apache/spark/pull/28560#discussion_r426391779




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##########
@@ -68,10 +76,23 @@ object NestedColumnAliasing {
    */
   def replaceChildrenWithAliases(
       plan: LogicalPlan,
+      nestedFieldToAlias: Map[ExtractValue, Alias],
       attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = {
     plan.withNewChildren(plan.children.map { plan =>
       Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, 
Seq(a))), plan)
-    })
+    }).transformExpressions {
+      case f: ExtractValue if nestedFieldToAlias.contains(f) =>
+        nestedFieldToAlias(f).toAttribute
+    }
+  }
+
+  /**
+   * Returns true for those operators that we can prune nested column on it.
+   */
+  private def canPruneOn(plan: LogicalPlan) = plan match {
+    case _: Aggregate => true
+    case _: Expand => true
+    case _ => false

Review comment:
       I think I was wrong. Re-checking `FlatMapGroupsInPandas`'s Python API, 
it looks like
   
   ```python
   df.groupby("id").apply(normalize).show()
   ```
   
   So basically the python udf takes no nested column selection but a full 
columns of DataFrame. It doesn't do nested column pruning.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28560: [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column

Reply via email to