[ https://issues.apache.org/jira/browse/SPARK-31208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-31208. ----------------------------------- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 28038 [https://github.com/apache/spark/pull/28038] > Expose the ability for user to cleanup shuffle files > ---------------------------------------------------- > > Key: SPARK-31208 > URL: https://issues.apache.org/jira/browse/SPARK-31208 > Project: Spark > Issue Type: Improvement > Components: Kubernetes > Affects Versions: 3.0.0, 3.1.0 > Reporter: Holden Karau > Assignee: Holden Karau > Priority: Major > Fix For: 3.1.0 > > > Dynamic scaling on Kubernetes (introduced in Spark 3) depends on only > shutting down executors without shuffle files. However Spark does not > aggressively clean up shuffle files (see SPARK-5836) and instead depends on > JVM GC on the driver to trigger deletes. We already have a mechanism to > explicitly clean up shuffle files from the ALS algorithm where we create a > lot of quickly orphaned shuffle files. We should expose this as an advanced > developer feature to enable people to better clean-up shuffle files improving > dynamic scaling of their jobs on Kubernetes. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org