[jira] [Commented] (SPARK-9708) Spark should create local temporary directories in Mesos sandbox when launched with Mesos

2015-09-08 Thread Iulian Dragos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735011#comment-14735011
 ] 

Iulian Dragos commented on SPARK-9708:
--

This won't work when the external shuffle service is enabled, so that could be 
the flag here (actually, it's negation: enable this behavior only when the 
external shuffle service is disabled).

> Spark should create local temporary directories in Mesos sandbox when 
> launched with Mesos
> -
>
> Key: SPARK-9708
> URL: https://issues.apache.org/jira/browse/SPARK-9708
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Reporter: Timothy Chen
>
> Currently Spark creates temporary directories with 
> Utils.getConfiguredLocalDirs, and it writes to YARN directories if YARN is 
> detected, otherwise just writes in a temporary directory in the host.
> However, Mesos does create a directory per task and ideally Spark should use 
> that directory to create its local temporary directories since it then can be 
> cleaned up when the task is gone and not left on the host or cleaned until 
> reboot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9708) Spark should create local temporary directories in Mesos sandbox when launched with Mesos

2015-09-03 Thread Chris Bannister (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728864#comment-14728864
 ] 

Chris Bannister commented on SPARK-9708:


When the executor is killed by Mesos it does not immediately clean up the 
sandbox dir, it waits for a GC time period based on total disk usage in the 
work_dir, im not entirely sure what will happen if the executor is stopped, if 
the data is still readable by external applications.

Regarding spark.local.dir, as far as I understand it when running in YARN this 
is overridden by the YARN config, I intended to do something similar here.

Would it be better to add a config option to explicitly enable this behaviour?

> Spark should create local temporary directories in Mesos sandbox when 
> launched with Mesos
> -
>
> Key: SPARK-9708
> URL: https://issues.apache.org/jira/browse/SPARK-9708
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Reporter: Timothy Chen
>
> Currently Spark creates temporary directories with 
> Utils.getConfiguredLocalDirs, and it writes to YARN directories if YARN is 
> detected, otherwise just writes in a temporary directory in the host.
> However, Mesos does create a directory per task and ideally Spark should use 
> that directory to create its local temporary directories since it then can be 
> cleaned up when the task is gone and not left on the host or cleaned until 
> reboot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9708) Spark should create local temporary directories in Mesos sandbox when launched with Mesos

2015-09-01 Thread Iulian Dragos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725657#comment-14725657
 ] 

Iulian Dragos commented on SPARK-9708:
--

I'm not sure if this is the entire story. Remember that shuffle files need to 
survive the executor when dynamic allocation is enabled. So, with this proposed 
change, if the scheduler decides to kill an executor its shuffle files will be 
gone and the external shuffle server won't be able to find them anymore. At 
least shuffle files need to go on another directory, not under the sandbox.

Also, Spark allows one to configure `spark.local.dir`, and that should take 
precedence. In the Hadoop world, this can be used to specify several 
directories on different physical disks (to allow fast parallel writes).

> Spark should create local temporary directories in Mesos sandbox when 
> launched with Mesos
> -
>
> Key: SPARK-9708
> URL: https://issues.apache.org/jira/browse/SPARK-9708
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Reporter: Timothy Chen
>
> Currently Spark creates temporary directories with 
> Utils.getConfiguredLocalDirs, and it writes to YARN directories if YARN is 
> detected, otherwise just writes in a temporary directory in the host.
> However, Mesos does create a directory per task and ideally Spark should use 
> that directory to create its local temporary directories since it then can be 
> cleaned up when the task is gone and not left on the host or cleaned until 
> reboot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9708) Spark should create local temporary directories in Mesos sandbox when launched with Mesos

2015-08-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706799#comment-14706799
 ] 

Apache Spark commented on SPARK-9708:
-

User 'Zariel' has created a pull request for this issue:
https://github.com/apache/spark/pull/8358

 Spark should create local temporary directories in Mesos sandbox when 
 launched with Mesos
 -

 Key: SPARK-9708
 URL: https://issues.apache.org/jira/browse/SPARK-9708
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Reporter: Timothy Chen

 Currently Spark creates temporary directories with 
 Utils.getConfiguredLocalDirs, and it writes to YARN directories if YARN is 
 detected, otherwise just writes in a temporary directory in the host.
 However, Mesos does create a directory per task and ideally Spark should use 
 that directory to create its local temporary directories since it then can be 
 cleaned up when the task is gone and not left on the host or cleaned until 
 reboot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org