holdenk opened a new pull request #28038: [SPARK-31208][CORE][KUBERNETES] Add 
an expiremental cleanShuffleDependencies
URL: https://github.com/apache/spark/pull/28038
 
 
   ### What changes were proposed in this pull request?
   
   Add a cleanShuffleDependencies as an experimental developer feature to allow 
folks to clean up shuffle files more aggressively than we currently do.
   
   ### Why are the changes needed?
   
   Dynamic scaling on Kubernetes (introduced in Spark 3) depends on only 
shutting down executors without shuffle files. However Spark does not 
aggressively clean up shuffle files (see SPARK-5836) and instead depends on JVM 
GC on the driver to trigger deletes. We already have a mechanism to explicitly 
clean up shuffle files from the ALS algorithm where we create a lot of quickly 
orphaned shuffle files. We should expose this as an advanced developer feature 
to enable people to better clean-up shuffle files improving dynamic scaling of 
their jobs on Kubernetes.
   
   ### Does this PR introduce any user-facing change?
   
   This adds a new experimental API.
   
   ### How was this patch tested?
   
   ALS already used a mechanism like this, re-targets the ALS code to the new 
interface, tested with existing ALS tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to