[jira] [Commented] (SPARK-9708) Spark should create local temporary directories in Mesos sandbox when launched with Mesos
[ https://issues.apache.org/jira/browse/SPARK-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735011#comment-14735011 ] Iulian Dragos commented on SPARK-9708: -- This won't work when the external shuffle service is enabled, so that could be the flag here (actually, it's negation: enable this behavior only when the external shuffle service is disabled). > Spark should create local temporary directories in Mesos sandbox when > launched with Mesos > - > > Key: SPARK-9708 > URL: https://issues.apache.org/jira/browse/SPARK-9708 > Project: Spark > Issue Type: Bug > Components: Mesos >Reporter: Timothy Chen > > Currently Spark creates temporary directories with > Utils.getConfiguredLocalDirs, and it writes to YARN directories if YARN is > detected, otherwise just writes in a temporary directory in the host. > However, Mesos does create a directory per task and ideally Spark should use > that directory to create its local temporary directories since it then can be > cleaned up when the task is gone and not left on the host or cleaned until > reboot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9708) Spark should create local temporary directories in Mesos sandbox when launched with Mesos
[ https://issues.apache.org/jira/browse/SPARK-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728864#comment-14728864 ] Chris Bannister commented on SPARK-9708: When the executor is killed by Mesos it does not immediately clean up the sandbox dir, it waits for a GC time period based on total disk usage in the work_dir, im not entirely sure what will happen if the executor is stopped, if the data is still readable by external applications. Regarding spark.local.dir, as far as I understand it when running in YARN this is overridden by the YARN config, I intended to do something similar here. Would it be better to add a config option to explicitly enable this behaviour? > Spark should create local temporary directories in Mesos sandbox when > launched with Mesos > - > > Key: SPARK-9708 > URL: https://issues.apache.org/jira/browse/SPARK-9708 > Project: Spark > Issue Type: Bug > Components: Mesos >Reporter: Timothy Chen > > Currently Spark creates temporary directories with > Utils.getConfiguredLocalDirs, and it writes to YARN directories if YARN is > detected, otherwise just writes in a temporary directory in the host. > However, Mesos does create a directory per task and ideally Spark should use > that directory to create its local temporary directories since it then can be > cleaned up when the task is gone and not left on the host or cleaned until > reboot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9708) Spark should create local temporary directories in Mesos sandbox when launched with Mesos
[ https://issues.apache.org/jira/browse/SPARK-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725657#comment-14725657 ] Iulian Dragos commented on SPARK-9708: -- I'm not sure if this is the entire story. Remember that shuffle files need to survive the executor when dynamic allocation is enabled. So, with this proposed change, if the scheduler decides to kill an executor its shuffle files will be gone and the external shuffle server won't be able to find them anymore. At least shuffle files need to go on another directory, not under the sandbox. Also, Spark allows one to configure `spark.local.dir`, and that should take precedence. In the Hadoop world, this can be used to specify several directories on different physical disks (to allow fast parallel writes). > Spark should create local temporary directories in Mesos sandbox when > launched with Mesos > - > > Key: SPARK-9708 > URL: https://issues.apache.org/jira/browse/SPARK-9708 > Project: Spark > Issue Type: Bug > Components: Mesos >Reporter: Timothy Chen > > Currently Spark creates temporary directories with > Utils.getConfiguredLocalDirs, and it writes to YARN directories if YARN is > detected, otherwise just writes in a temporary directory in the host. > However, Mesos does create a directory per task and ideally Spark should use > that directory to create its local temporary directories since it then can be > cleaned up when the task is gone and not left on the host or cleaned until > reboot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9708) Spark should create local temporary directories in Mesos sandbox when launched with Mesos
[ https://issues.apache.org/jira/browse/SPARK-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706799#comment-14706799 ] Apache Spark commented on SPARK-9708: - User 'Zariel' has created a pull request for this issue: https://github.com/apache/spark/pull/8358 Spark should create local temporary directories in Mesos sandbox when launched with Mesos - Key: SPARK-9708 URL: https://issues.apache.org/jira/browse/SPARK-9708 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen Currently Spark creates temporary directories with Utils.getConfiguredLocalDirs, and it writes to YARN directories if YARN is detected, otherwise just writes in a temporary directory in the host. However, Mesos does create a directory per task and ideally Spark should use that directory to create its local temporary directories since it then can be cleaned up when the task is gone and not left on the host or cleaned until reboot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org