[ https://issues.apache.org/jira/browse/SPARK-45592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-45592. --------------------------------- Fix Version/s: 4.0.0 3.5.1 Resolution: Fixed Issue resolved by pull request 43435 [https://github.com/apache/spark/pull/43435] > AQE and InMemoryTableScanExec correctness bug > --------------------------------------------- > > Key: SPARK-45592 > URL: https://issues.apache.org/jira/browse/SPARK-45592 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0 > Reporter: Emil Ejbyfeldt > Assignee: Apache Spark > Priority: Blocker > Labels: correctness, pull-request-available > Fix For: 3.5.1, 4.0.0 > > > The following query should return 1000000 > {code:java} > import org.apache.spark.storage.StorageLevelval > df = spark.range(0, 1000000, 1, 5).map(l => (l, l)) > val ee = df.select($"_1".as("src"), $"_2".as("dst")) > .persist(StorageLevel.MEMORY_AND_DISK) > ee.count() > val minNbrs1 = ee > .groupBy("src").agg(min(col("dst")).as("min_number")) > .persist(StorageLevel.MEMORY_AND_DISK) > val join = ee.join(minNbrs1, "src") > join.count(){code} > but on spark 3.5.0 there is a correctness bug causing it to return `104800` > or some other smaller value. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org