This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.3 by this push: new 058dcbf3fb0 [SPARK-43240][SQL][3.3] Fix the wrong result issue when calling df.describe() method 058dcbf3fb0 is described below commit 058dcbf3fb0b17a4295f6e0b516f5c955cfa2d59 Author: Jia Ke <ke.a....@intel.com> AuthorDate: Wed Apr 26 17:24:46 2023 +0800 [SPARK-43240][SQL][3.3] Fix the wrong result issue when calling df.describe() method ### What changes were proposed in this pull request? The df.describe() method will cached the RDD. And if the cached RDD is RDD[Unsaferow], which may be released after the row is used, then the result will be wong. Here we need to copy the RDD before caching as the [TakeOrderedAndProjectExec ](https://github.com/apache/spark/blob/d68d46c9e2cec04541e2457f4778117b570d8cdb/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala#L204)operator does. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Closes #40914 from JkSelf/describe. Authored-by: Jia Ke <ke.a....@intel.com> Signed-off-by: Wenchen Fan <wenc...@databricks.com> --- .../main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala index 9155c1cb6e7..ff6c08cea00 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala @@ -288,7 +288,7 @@ object StatFunctions extends Logging { } // If there is no selected columns, we don't need to run this aggregate, so make it a lazy val. - lazy val aggResult = ds.select(aggExprs: _*).queryExecution.toRdd.collect().head + lazy val aggResult = ds.select(aggExprs: _*).queryExecution.toRdd.map(_.copy()).collect().head // We will have one row for each selected statistic in the result. val result = Array.fill[InternalRow](selectedStatistics.length) { --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org