cloud-fan commented on code in PR #45234:
URL: https://github.com/apache/spark/pull/45234#discussion_r1684091860


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala:
##########
@@ -47,6 +47,15 @@ import org.apache.spark.util.random.XORShiftRandom
  */
 trait ShuffleExchangeLike extends Exchange {
 
+  /**
+   * The asynchronous job that materializes the shuffle. It also does the 
preparations work,
+   * such as waiting for the subqueries.
+   */
+  @transient private lazy val shuffleFuture: Future[MapOutputStatistics] = 
executeQuery {
+    materializationStarted.set(true)

Review Comment:
   After a closer look, I don't think this change works as we expect. We set 
this `materializationStarted` flag before we return the `Future`, which means 
we are still on the AQE loop's main thread. That said, once we submit a query 
stage, its `materializationStarted` becomes true immediately and we can't 
really avoid the wasted query stage execution when cancelling it.
   
   The test passed because `ShuffleExchangeExec` calls `child.execute()` before 
returning the `Future`. Then we exit the AQE loop without cancelling other 
stages.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to