[ https://issues.apache.org/jira/browse/SPARK-35767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-35767. ----------------------------------- Fix Version/s: 3.1.3 3.2.0 3.0.3 Assignee: Andy Grove Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/32920 > CoalesceExec can execute child plan twice > ----------------------------------------- > > Key: SPARK-35767 > URL: https://issues.apache.org/jira/browse/SPARK-35767 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.2, 3.1.2, 3.2.0 > Reporter: Andy Grove > Assignee: Andy Grove > Priority: Minor > Fix For: 3.0.3, 3.2.0, 3.1.3 > > > CoalesceExec calls `child.execute()` in the if condition and throws away the > results, then calls `child.execute()` again in the else condition. This could > cause a section of the plan to be executed twice. > {code:java} > protected override def doExecute(): RDD[InternalRow] = { > if (numPartitions == 1 && child.execute().getNumPartitions < 1) { > // Make sure we don't output an RDD with 0 partitions, when claiming that > we have a > // `SinglePartition`. > new CoalesceExec.EmptyRDDWithPartitions(sparkContext, numPartitions) > } else { > child.execute().coalesce(numPartitions, shuffle = false) > } > } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org