[PR] [SPARK-48861][SQL] Enable shuffle file removal/skipMigration for all SQL executions [spark]

via GitHub Mon, 15 Jul 2024 13:35:15 -0700


abellina opened a new pull request, #47360:
URL: https://github.com/apache/spark/pull/47360


   This PR follows https://github.com/apache/spark/pull/45930 and 
https://github.com/apache/spark/pull/46302 (which is open as of the creation of 
this PR) to enable shuffle file cleanup for all SQL executions, not just Spark 
Connect.
   
   The prior PR https://github.com/apache/spark/pull/45930 introduces two new 
configs: `spark.sql.shuffleDependency.skipMigration.enabled` and 
`spark.sql.shuffleDependency.fileCleanup.enabled`. These two configs are not 
specifically namespaced to Spark Connect and I'd like to make sure we can use 
them from all QueryExecutions. Before this PR, only Spark Connect could enable 
it.
   
   My change is to move the check for `shuffleCleanupMode` inside of 
`QueryExecution`, instead of having that be passed to this class in the 
constructor. I also am explicitly turning on these features in the tests, 
rather than using `Utils.isTesting`. 
   
   I would love to hear any concerns on why we shouldn't do this or what 
testing you want to see. I have run Standalone tests (note I needed 
https://github.com/apache/spark/pull/46302) and can run other tests if required 
or can code them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-48861][SQL] Enable shuffle file removal/skipMigration for all SQL executions [spark]

Reply via email to