[ 
https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205346#comment-14205346
 ] 

Sandy Ryza commented on SPARK-4290:
-----------------------------------

SparkFiles.get needs to be called, but it will only download them once per 
executor.  The distributed cache is lazy in this way too.

> Provide an equivalent functionality of distributed cache as MR does
> -------------------------------------------------------------------
>
>                 Key: SPARK-4290
>                 URL: https://issues.apache.org/jira/browse/SPARK-4290
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Xuefu Zhang
>
> MapReduce allows client to specify files to be put in distributed cache for a 
> job and the framework guarentees that the file will be available in local 
> file system of a node where a task of the job runs and before the tasks 
> actually starts. While this might be achieved with Yarn via hacks, it's not 
> available in other clusters. It would be nice to have such an equivalent 
> functionality like this in Spark.
> It would also complement Spark's broadcast variable, which may not be 
> suitable in certain scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to