[ https://issues.apache.org/jira/browse/SPARK-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-12147: --------------------------------- Labels: bulk-closed (was: ) > Off heap storage and dynamicAllocation operation > ------------------------------------------------ > > Key: SPARK-12147 > URL: https://issues.apache.org/jira/browse/SPARK-12147 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 1.5.2 > Environment: Cloudera Hadoop 2.6.0-cdh5.4.8 > Tachyon 0.7.1 > Yarn > Reporter: Rares Mirica > Priority: Minor > Labels: bulk-closed > Attachments: spark-defaults.conf > > > For the purpose of increasing computation density and efficiency I set up to > test off-heap storage (using Tachyon) with dynamicAllocation enabled. > Following the available documentation (programming-guide for Spark 1.5.2) I > was expecting data to be cached in Tachyon for the lifetime of the > application (driver instance) or until unpersist() is called. This belief was > supported by the doc: "Cached data is not lost if individual executors > crash." where with crash I also assimilate Graceful Decommission. > Furthermore, in the GD description documented in the job-scheduling document > cached data preservation through off-heap storage is also hinted at. > Seeing how Tachyon is now in a state where these promises of a better future > are well within reach, I consider it a bug that upon graceful decommission of > an executor the off-heap data is deleted (presumably as part of the cleanup > phase). > Needless to say, enabling the preservation of the off-heap persisted data > after graceful decommission for dynamic allocation would yield significant > improvements in resource allocation, especially over yarn where executors use > up compute "slots" even if idle. After a long, expensive, computation where > we take advantage of the dynamically scaled executors, the rest of the spark > jobs can use the cached data while releasing the compute resources for other > cluster tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org