[jira] [Updated] (SPARK-12147) Off heap storage and dynamicAllocation operation

2019-05-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-12147:
-
Labels: bulk-closed  (was: )

> Off heap storage and dynamicAllocation operation
> 
>
> Key: SPARK-12147
> URL: https://issues.apache.org/jira/browse/SPARK-12147
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.2
> Environment: Cloudera Hadoop 2.6.0-cdh5.4.8
> Tachyon 0.7.1
> Yarn
>Reporter: Rares Mirica
>Priority: Minor
>  Labels: bulk-closed
> Attachments: spark-defaults.conf
>
>
> For the purpose of increasing computation density and efficiency I set up to 
> test off-heap storage (using Tachyon) with dynamicAllocation enabled.
> Following the available documentation (programming-guide for Spark 1.5.2) I 
> was expecting data to be cached in Tachyon for the lifetime of the 
> application (driver instance) or until unpersist() is called. This belief was 
> supported by the doc: "Cached data is not lost if individual executors 
> crash." where with crash I also assimilate Graceful Decommission. 
> Furthermore, in the GD description documented in the job-scheduling document 
> cached data preservation through off-heap storage is also hinted at.
> Seeing how Tachyon is now in a state where these promises of a better future 
> are well within reach, I consider it a bug that upon graceful decommission of 
> an executor the off-heap data is deleted (presumably as part of the cleanup 
> phase).
> Needless to say, enabling the preservation of the off-heap persisted data 
> after graceful decommission for dynamic allocation would yield significant 
> improvements in resource allocation, especially over yarn where executors use 
> up compute "slots" even if idle. After a long, expensive, computation where 
> we take advantage of the dynamically scaled executors, the rest of the spark 
> jobs can use the cached data while releasing the compute resources for other 
> cluster tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12147) Off heap storage and dynamicAllocation operation

2015-12-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12147:
--
  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)

(Make the title more specific?)

I disagree it's a 'bug'. Cached data is not lost when an executor is in the 
sense that it was just a cached copy. The cached copy is lost, but can be 
recreated.

Tachyon is not required, and doesn't look like it's going to be, so I'm not 
sure in general it's a solution to something. Your suggestion requires doubling 
the amount of storage for cached data: now things live in memory or on local 
disk and also in Tachyon or something. Right?

It's also not true that after executors are decommissioned that others can keep 
using the cached data. There aren't enough executors to keep the cached data 
live any more.

I think this has problems but maybe you can clarify what you mean.

> Off heap storage and dynamicAllocation operation
> 
>
> Key: SPARK-12147
> URL: https://issues.apache.org/jira/browse/SPARK-12147
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.2
> Environment: Cloudera Hadoop 2.6.0-cdh5.4.8
> Tachyon 0.7.1
> Yarn
>Reporter: Rares Mirica
>Priority: Minor
> Attachments: spark-defaults.conf
>
>
> For the purpose of increasing computation density and efficiency I set up to 
> test off-heap storage (using Tachyon) with dynamicAllocation enabled.
> Following the available documentation (programming-guide for Spark 1.5.2) I 
> was expecting data to be cached in Tachyon for the lifetime of the 
> application (driver instance) or until unpersist() is called. This belief was 
> supported by the doc: "Cached data is not lost if individual executors 
> crash." where with crash I also assimilate Graceful Decommission. 
> Furthermore, in the GD description documented in the job-scheduling document 
> cached data preservation through off-heap storage is also hinted at.
> Seeing how Tachyon is now in a state where these promises of a better future 
> are well within reach, I consider it a bug that upon graceful decommission of 
> an executor the off-heap data is deleted (presumably as part of the cleanup 
> phase).
> Needless to say, enabling the preservation of the off-heap persisted data 
> after graceful decommission for dynamic allocation would yield significant 
> improvements in resource allocation, especially over yarn where executors use 
> up compute "slots" even if idle. After a long, expensive, computation where 
> we take advantage of the dynamically scaled executors, the rest of the spark 
> jobs can use the cached data while releasing the compute resources for other 
> cluster tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12147) Off heap storage and dynamicAllocation operation

2015-12-04 Thread Rares Mirica (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rares Mirica updated SPARK-12147:
-
Attachment: spark-defaults.conf

> Off heap storage and dynamicAllocation operation
> 
>
> Key: SPARK-12147
> URL: https://issues.apache.org/jira/browse/SPARK-12147
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
> Environment: Cloudera Hadoop 2.6.0-cdh5.4.8
> Tachyon 0.7.1
> Yarn
>Reporter: Rares Mirica
> Attachments: spark-defaults.conf
>
>
> For the purpose of increasing computation density and efficiency I set up to 
> test off-heap storage (using Tachyon) with dynamicAllocation enabled.
> Following the available documentation (programming-guide for Spark 1.5.2) I 
> was expecting data to be cached in Tachyon for the lifetime of the 
> application (driver instance) or until unpersist() is called. This belief was 
> supported by the doc: "Cached data is not lost if individual executors 
> crash." where with crash I also assimilate Graceful Decommission. 
> Furthermore, in the GD description documented in the job-scheduling document 
> cached data preservation through off-heap storage is also hinted at.
> Seeing how Tachyon is now in a state where these promises of a better future 
> are well within reach, I consider it a bug that upon graceful decommission of 
> an executor the off-heap data is deleted (presumably as part of the cleanup 
> phase).
> Needless to say, enabling the preservation of the off-heap persisted data 
> after graceful decommission for dynamic allocation would yield significant 
> improvements in resource allocation, especially over yarn where executors use 
> up compute "slots" even if idle. After a long, expensive, computation where 
> we take advantage of the dynamically scaled executors, the rest of the spark 
> jobs can use the cached data while releasing the compute resources for other 
> cluster tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org