[ 
https://issues.apache.org/jira/browse/SPARK-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8716:
-----------------------------
    Description: 
More specifically, this is the feature that is currently flagged by 
`spark.files.useFetchCache`.

This is a complicated feature that has no tests. I cannot say with confidence 
that it actually works on all cluster managers. In particular, I believe it 
doesn't work on Mesos because whatever goes into this else case creates its own 
temp directory per executor: 
https://github.com/apache/spark/blob/881662e9c93893430756320f51cef0fc6643f681/core/src/main/scala/org/apache/spark/util/Utils.scala#L739.

It's also not immediately clear that it works on standalone mode due to the 
lack of comments. It actually does work there because the Worker happens to set 
a `SPARK_EXECUTOR_DIRS` variable. The linkage could be more explicitly 
documented in the code.

This is difficult to write tests for, but it's still important to do so. 
Otherwise, semi-related changes in the future may easily break it without 
anyone noticing.

Related issues: SPARK-8130, SPARK-6313, SPARK-2713

  was:
More specifically, this is the feature that is currently flagged by 
`spark.files.useFetchCache`. There are several reasons why we should remove it.

(1) It doesn't even work. Recently, each executor gets its own unique temp 
directory for security reasons.

(2) There is no way to fix it. The constraints in (1) are fundamentally opposed 
to sharing resources across executors.

(3) It is very complex. The method Utils.fetchFile would be greatly simplified 
without this feature that is not even used.

(4) There are no tests for it and it is difficult to test.

Note that we can't just revert the respective patches because they were merged 
a long time ago.

Related issues: SPARK-8130, SPARK-6313, SPARK-2713


> Write tests for executor shared cache feature
> ---------------------------------------------
>
>                 Key: SPARK-8716
>                 URL: https://issues.apache.org/jira/browse/SPARK-8716
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Andrew Or
>
> More specifically, this is the feature that is currently flagged by 
> `spark.files.useFetchCache`.
> This is a complicated feature that has no tests. I cannot say with confidence 
> that it actually works on all cluster managers. In particular, I believe it 
> doesn't work on Mesos because whatever goes into this else case creates its 
> own temp directory per executor: 
> https://github.com/apache/spark/blob/881662e9c93893430756320f51cef0fc6643f681/core/src/main/scala/org/apache/spark/util/Utils.scala#L739.
> It's also not immediately clear that it works on standalone mode due to the 
> lack of comments. It actually does work there because the Worker happens to 
> set a `SPARK_EXECUTOR_DIRS` variable. The linkage could be more explicitly 
> documented in the code.
> This is difficult to write tests for, but it's still important to do so. 
> Otherwise, semi-related changes in the future may easily break it without 
> anyone noticing.
> Related issues: SPARK-8130, SPARK-6313, SPARK-2713



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to