[ https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360896#comment-14360896 ]
Josh Rosen commented on SPARK-6313: ----------------------------------- Thanks for the pointer to the Lucene lock factory code. It's fine for the locks to be advisory in the sense that things shouldn't break if multiple executors acquire the lock and try to download the same file, but there's potentially a problem if the lock isn't released after the JVM that acquired it exits abnormally, since this could cause other executors to block indefinitely while waiting for the original lock owner to download the file. One approach might be to write the PID of the original lock owner into the lock file, which would allow blocked executors to timeout and re-attempt the lock acquisition if they detect that the original lock holder died. This might face its own portability challenges, though, and seems complex. A simple hotfix might be to add a SparkConf setting to always force this caching to bypassed (this would be a two-line change to Executor.scala). This might lose the performance benefits of the caching, though. If you're using NFS and the shared filesystem is mounted at the same path on all nodes, I think that you should be able to use use {{local://path/to/nfs/}} to specify the paths to your files / JARs, which will cause them to be read from the executor-local filesystem rather than fetched remotely. In this case, this would cause them to be read from NFS, so you may be able to use this technique to recover any performance benefits for large files that would be lost in disabling the caching. I'd be happy to review patches for this issue. > Fetch File Lock file creation doesnt work when Spark working dir is on a NFS > mount > ---------------------------------------------------------------------------------- > > Key: SPARK-6313 > URL: https://issues.apache.org/jira/browse/SPARK-6313 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.2.0, 1.3.0, 1.2.1 > Reporter: Nathan McCarthy > Priority: Critical > > When running in cluster mode and mounting the spark work dir on a NFS volume > (or some volume which doesn't support file locking), the fetchFile (used for > downloading JARs etc on the executors) method in Spark Utils class will fail. > This file locking was introduced as an improvement with SPARK-2713. > See > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L415 > > Introduced in 1.2 in commit; > https://github.com/apache/spark/commit/7aacb7bfad4ec73fd8f18555c72ef696 > As this locking is for optimisation for fetching files, could we take a > different approach here to create a temp/advisory lock file? > Typically you would just mount local disks (in say ext4 format) and provide > this as a comma separated list however we are trying to run Spark on MapR. > With MapR we can do a loop back mount to a volume on the local node and take > advantage of MapRs disk pools. This also means we dont need specific mounts > for Spark and improves the generic nature of the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org