holdenk opened a new pull request #28038: [SPARK-31208][CORE][KUBERNETES] Add an expiremental cleanShuffleDependencies URL: https://github.com/apache/spark/pull/28038 ### What changes were proposed in this pull request? Add a cleanShuffleDependencies as an experimental developer feature to allow folks to clean up shuffle files more aggressively than we currently do. ### Why are the changes needed? Dynamic scaling on Kubernetes (introduced in Spark 3) depends on only shutting down executors without shuffle files. However Spark does not aggressively clean up shuffle files (see SPARK-5836) and instead depends on JVM GC on the driver to trigger deletes. We already have a mechanism to explicitly clean up shuffle files from the ALS algorithm where we create a lot of quickly orphaned shuffle files. We should expose this as an advanced developer feature to enable people to better clean-up shuffle files improving dynamic scaling of their jobs on Kubernetes. ### Does this PR introduce any user-facing change? This adds a new experimental API. ### How was this patch tested? ALS already used a mechanism like this, re-targets the ALS code to the new interface, tested with existing ALS tests.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org