[GitHub] [spark] viirya commented on a change in pull request #28133: [SPARK-31156][SQL] DataFrameStatFunctions API to be consistent with respect to Column type

GitBox Mon, 06 Apr 2020 22:26:25 -0700

viirya commented on a change in pull request #28133: [SPARK-31156][SQL] 
DataFrameStatFunctions API to be consistent with respect to Column type
URL: https://github.com/apache/spark/pull/28133#discussion_r404541728


 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/FrequentItems.scala
 ##########
 @@ -66,6 +68,19 @@ object FrequentItems extends Logging {
     }
   }
 
+  /** Helper function to resolve column to expr (if not yet) */
+  // TODO: it might be helpful to have this helper in Dataset.scala,
+  // e.g. `drop` function uses exactly the same flow to deal with
+  // `Column` arguments
+  private def resolveColumn(df: DataFrame, col: Column): Column = {
+    col match {
+      case Column(u: UnresolvedAttribute) =>
+        Column(df.queryExecution.analyzed.resolveQuoted(
+          u.name, df.sparkSession.sessionState.analyzer.resolver).getOrElse(u))
+      case Column(_expr: Expression) => col
+    }
+  }
 
 Review comment:
   The problem with Column is, it can contain an unresolved expression, for 
example `UnresolvedAttribute + UnresolvedAttribute ...`.
   
   When we are allowed to use column name only, we can rely on 
`df.resolve(colName)` to resolve it. Once you extend to Column, you cannot do 
the same check as before.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28133: [SPARK-31156][SQL] DataFrameStatFunctions API to be consistent with respect to Column type

Reply via email to