erenavsarogullari commented on code in PR #45234:
URL: https://github.com/apache/spark/pull/45234#discussion_r1520503365


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala:
##########
@@ -897,6 +897,138 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  test("SPARK-47148: AQE should avoid to materialize ShuffleQueryStage on the 
cancellation") {
+    withSQLConf(
+      SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+      SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> "true",
+      SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
+      withTable("bucketed_table1", "bucketed_table2", "bucketed_table3") {
+        val df = (0 until 50).map(i => (i % 5, i % 13, i.toString)).toDF("i", 
"j", "k")
+        df.write.format("parquet").bucketBy(8, 
"i").saveAsTable("bucketed_table1")
+        df.write.format("parquet").bucketBy(8, 
"i").saveAsTable("bucketed_table2")
+        df.write.format("parquet").bucketBy(8, 
"i").saveAsTable("bucketed_table3")
+
+        val warehouseFilePath = new 
URI(spark.sessionState.conf.warehousePath).getPath
+        val tableDir = new File(warehouseFilePath, "bucketed_table2")
+        Utils.deleteRecursively(tableDir)
+        df.write.parquet(tableDir.getAbsolutePath)

Review Comment:
   Yes, it helps to break `QueryStage` materialization in order to trigger 
`QueryStage` cancellation so we can  simulate the failure case. For this test 
case, execution steps are as follows:
   ```
   ShuffleQueryStage-1 - is materialized successfully,
   ShuffleQueryStage-2 - materialization is failed and this stage is excluded 
from cancellation by flagged as 'earlyFailedStage',
   ShuffleQueryStage-1 - cancellation is kicked-off,
   ShuffleQueryStage-3 - cancellation is kicked-off (although, it is not 
materialized).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to