cloud-fan commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1684091860
########## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala: ########## @@ -47,6 +47,15 @@ import org.apache.spark.util.random.XORShiftRandom */ trait ShuffleExchangeLike extends Exchange { + /** + * The asynchronous job that materializes the shuffle. It also does the preparations work, + * such as waiting for the subqueries. + */ + @transient private lazy val shuffleFuture: Future[MapOutputStatistics] = executeQuery { + materializationStarted.set(true) Review Comment: After a closer look, I don't think this change works as we expect. We set this `materializationStarted` flag before we return the `Future`, which means we are still on the AQE loop's main thread. That said, once we submit a query stage, its `materializationStarted` becomes true immediately and we can't really avoid the wasted query stage execution. The test passed because `ShuffleExchangeExec` calls `child.execute()` before returning the `Future`. Then we exit the AQE loop without cancelling other stages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org