[ https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201614#comment-14201614 ]
Reynold Xin commented on SPARK-4290: ------------------------------------ We might be able to provide an alternative broadcast implementation in the future, but I think for now you can implement this quickly entirely in application code. On the executor side, you can just have a static method that checks whether the file is local, and if not, copies the file from HDFS to local tmp folders. Make sure you synchronize on that method. > Provide an equivalent functionality of distributed cache as MR does > ------------------------------------------------------------------- > > Key: SPARK-4290 > URL: https://issues.apache.org/jira/browse/SPARK-4290 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Xuefu Zhang > > MapReduce allows client to specify files to be put in distributed cache for a > job and the framework guarentees that the file will be available in local > file system of a node where a task of the job runs and before the tasks > actually starts. While this might be achieved with Yarn via hacks, it's not > available in other clusters. It would be nice to have such an equivalent > functionality like this in Spark. > It would also complement Spark's broadcast variable, which may not be > suitable in certain scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org