[jira] [Commented] (SPARK-46992) Inconsistent results with 'sort', 'cache', and AQE.

Nicholas Chammas (Jira) Tue, 06 Feb 2024 09:44:05 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-46992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814913#comment-17814913
 ]


Nicholas Chammas commented on SPARK-46992:
------------------------------------------

I can confirm the behavior described above is still present on {{master}} at 
[{{5d5b3a5}}|https://github.com/apache/spark/commit/5d5b3a54b7b5fb4308fe40da696ba805c72983fc].

Adding the {{correctness}} label.

> Inconsistent results with 'sort', 'cache', and AQE.
> ---------------------------------------------------
>
>                 Key: SPARK-46992
>                 URL: https://issues.apache.org/jira/browse/SPARK-46992
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.2, 3.5.0
>            Reporter: Denis Tarima
>            Priority: Critical
>
>  
> With AQE enabled, having {color:#4c9aff}sort{color} in the plan changes 
> {color:#4c9aff}sample{color} results after caching.
> Moreover, when cached,  {color:#4c9aff}collect{color} returns records as if 
> it's not cached, which is inconsistent with {color:#4c9aff}count{color} and 
> {color:#4c9aff}show{color}.
> A script to reproduce:
> {code:scala}
> import spark.implicits._
> val df = (1 to 4).toDF("id").sort("id").sample(0.4, 123)
> println("NON CACHED:")
> println("  count: " + df.count())
> println("  collect: " + df.collect().mkString(" "))
> df.show()
> println("CACHED:")
> df.cache().count()
> println("  count: " + df.count())
> println("  collect: " + df.collect().mkString(" "))
> df.show()
> df.unpersist()
> {code}
> output:
> {code}
> NON CACHED:
>   count: 2
>   collect: [1] [4]
> +---+
> | id|
> +---+
> |  1|
> |  4|
> +---+
> CACHED:
>   count: 3
>   collect: [1] [4]
> +---+
> | id|
> +---+
> |  1|
> |  2|
> |  3|
> +---+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-46992) Inconsistent results with 'sort', 'cache', and AQE.

Reply via email to