[ https://issues.apache.org/jira/browse/SPARK-21097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113264#comment-16113264 ]
Brad commented on SPARK-21097: ------------------------------ I'm still working on thoroughly benchmarking and testing this change. If anyone is interested in this, send me a message. Thanks > Dynamic allocation will preserve cached data > -------------------------------------------- > > Key: SPARK-21097 > URL: https://issues.apache.org/jira/browse/SPARK-21097 > Project: Spark > Issue Type: Improvement > Components: Block Manager, Scheduler, Spark Core > Affects Versions: 2.2.0, 2.3.0 > Reporter: Brad > Attachments: Preserving Cached Data with Dynamic Allocation.pdf > > > We want to use dynamic allocation to distribute resources among many notebook > users on our spark clusters. One difficulty is that if a user has cached data > then we are either prevented from de-allocating any of their executors, or we > are forced to drop their cached data, which can lead to a bad user experience. > We propose adding a feature to preserve cached data by copying it to other > executors before de-allocation. This behavior would be enabled by a simple > spark config like "spark.dynamicAllocation.recoverCachedData". Now when an > executor reaches its configured idle timeout, instead of just killing it on > the spot, we will stop sending it new tasks, replicate all of its rdd blocks > onto other executors, and then kill it. If there is an issue while we > replicate the data, like an error, it takes too long, or there isn't enough > space, then we will fall back to the original behavior and drop the data and > kill the executor. > This feature should allow anyone with notebook users to use their cluster > resources more efficiently. Also since it will be completely opt-in it will > unlikely to cause problems for other use cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org