[jira] [Commented] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-17 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365452#comment-14365452
 ] 

Josh Rosen commented on SPARK-6313:
---

I've merged Nathan's patch into 1.4.0, 1.3.1, and 1.2.2.  After this path, 
users can work around this bug by setting {{spark.files.useFetchCache=false}} 
in their SparkConf.

> Fetch File Lock file creation doesnt work when Spark working dir is on a NFS 
> mount
> --
>
> Key: SPARK-6313
> URL: https://issues.apache.org/jira/browse/SPARK-6313
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
>Reporter: Nathan McCarthy
>Assignee: Nathan McCarthy
>Priority: Critical
> Fix For: 1.2.2, 1.4.0, 1.3.1
>
>
> When running in cluster mode and mounting the spark work dir on a NFS volume 
> (or some volume which doesn't support file locking), the fetchFile (used for 
> downloading JARs etc on the executors) method in Spark Utils class will fail. 
> This file locking was introduced as an improvement with SPARK-2713. 
> See 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L415
>  
> Introduced in 1.2 in commit; 
> https://github.com/apache/spark/commit/7aacb7bfad4ec73fd8f18555c72ef696 
> As this locking is for optimisation for fetching files, could we take a 
> different approach here to create a temp/advisory lock file? 
> Typically you would just mount local disks (in say ext4 format) and provide 
> this as a comma separated list however we are trying to run Spark on MapR. 
> With MapR we can do a loop back mount to a volume on the local node and take 
> advantage of MapRs disk pools. This also means we dont need specific mounts 
> for Spark and improves the generic nature of the cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-15 Thread Nathan McCarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362820#comment-14362820
 ] 

Nathan McCarthy commented on SPARK-6313:


Thanks for the feedback guys. The config option workaround seems like the path 
of least resistance for now with some more testing being required for a 
different implementation. For us it would be great if we could get a fix ASAP. 
Ive created PR 5603 https://github.com/apache/spark/pull/5036

> Fetch File Lock file creation doesnt work when Spark working dir is on a NFS 
> mount
> --
>
> Key: SPARK-6313
> URL: https://issues.apache.org/jira/browse/SPARK-6313
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
>Reporter: Nathan McCarthy
>Priority: Critical
>
> When running in cluster mode and mounting the spark work dir on a NFS volume 
> (or some volume which doesn't support file locking), the fetchFile (used for 
> downloading JARs etc on the executors) method in Spark Utils class will fail. 
> This file locking was introduced as an improvement with SPARK-2713. 
> See 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L415
>  
> Introduced in 1.2 in commit; 
> https://github.com/apache/spark/commit/7aacb7bfad4ec73fd8f18555c72ef696 
> As this locking is for optimisation for fetching files, could we take a 
> different approach here to create a temp/advisory lock file? 
> Typically you would just mount local disks (in say ext4 format) and provide 
> this as a comma separated list however we are trying to run Spark on MapR. 
> With MapR we can do a loop back mount to a volume on the local node and take 
> advantage of MapRs disk pools. This also means we dont need specific mounts 
> for Spark and improves the generic nature of the cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362818#comment-14362818
 ] 

Apache Spark commented on SPARK-6313:
-

User 'nemccarthy' has created a pull request for this issue:
https://github.com/apache/spark/pull/5036

> Fetch File Lock file creation doesnt work when Spark working dir is on a NFS 
> mount
> --
>
> Key: SPARK-6313
> URL: https://issues.apache.org/jira/browse/SPARK-6313
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
>Reporter: Nathan McCarthy
>Priority: Critical
>
> When running in cluster mode and mounting the spark work dir on a NFS volume 
> (or some volume which doesn't support file locking), the fetchFile (used for 
> downloading JARs etc on the executors) method in Spark Utils class will fail. 
> This file locking was introduced as an improvement with SPARK-2713. 
> See 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L415
>  
> Introduced in 1.2 in commit; 
> https://github.com/apache/spark/commit/7aacb7bfad4ec73fd8f18555c72ef696 
> As this locking is for optimisation for fetching files, could we take a 
> different approach here to create a temp/advisory lock file? 
> Typically you would just mount local disks (in say ext4 format) and provide 
> this as a comma separated list however we are trying to run Spark on MapR. 
> With MapR we can do a loop back mount to a volume on the local node and take 
> advantage of MapRs disk pools. This also means we dont need specific mounts 
> for Spark and improves the generic nature of the cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-15 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362687#comment-14362687
 ] 

Patrick Wendell commented on SPARK-6313:


[~joshrosen] changing default caching behavior seems like it could silently 
regress performance for the vas majority of users who aren't on NFS. What about 
a hotfix for 1.3.1 that just exposes the config for NFS users (this is very 
small population), but doesn't change the default. That may be sufficient in 
itself... or if we want a real fix that makes it work out-of-the-box on NDFS, 
we can put it in 1.4.

> Fetch File Lock file creation doesnt work when Spark working dir is on a NFS 
> mount
> --
>
> Key: SPARK-6313
> URL: https://issues.apache.org/jira/browse/SPARK-6313
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
>Reporter: Nathan McCarthy
>Priority: Critical
>
> When running in cluster mode and mounting the spark work dir on a NFS volume 
> (or some volume which doesn't support file locking), the fetchFile (used for 
> downloading JARs etc on the executors) method in Spark Utils class will fail. 
> This file locking was introduced as an improvement with SPARK-2713. 
> See 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L415
>  
> Introduced in 1.2 in commit; 
> https://github.com/apache/spark/commit/7aacb7bfad4ec73fd8f18555c72ef696 
> As this locking is for optimisation for fetching files, could we take a 
> different approach here to create a temp/advisory lock file? 
> Typically you would just mount local disks (in say ext4 format) and provide 
> this as a comma separated list however we are trying to run Spark on MapR. 
> With MapR we can do a loop back mount to a volume on the local node and take 
> advantage of MapRs disk pools. This also means we dont need specific mounts 
> for Spark and improves the generic nature of the cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-15 Thread Nathan McCarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362605#comment-14362605
 ] 

Nathan McCarthy commented on SPARK-6313:


Stacktrace;

14/12/12 18:18:24 WARN scheduler.TaskSetManager: Lost task 7.0 in stage 0.0 
(TID 8, hadoop-016): java.io.IOException: Permission denied
at sun.nio.ch.FileDispatcherImpl.lock0(Native Method)
at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:91)
at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1022)
at java.nio.channels.FileChannel.lock(FileChannel.java:1052)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:379)
at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:350)
at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:347)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:347)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

> Fetch File Lock file creation doesnt work when Spark working dir is on a NFS 
> mount
> --
>
> Key: SPARK-6313
> URL: https://issues.apache.org/jira/browse/SPARK-6313
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
>Reporter: Nathan McCarthy
>Priority: Critical
>
> When running in cluster mode and mounting the spark work dir on a NFS volume 
> (or some volume which doesn't support file locking), the fetchFile (used for 
> downloading JARs etc on the executors) method in Spark Utils class will fail. 
> This file locking was introduced as an improvement with SPARK-2713. 
> See 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L415
>  
> Introduced in 1.2 in commit; 
> https://github.com/apache/spark/commit/7aacb7bfad4ec73fd8f18555c72ef696 
> As this locking is for optimisation for fetching files, could we take a 
> different approach here to create a temp/advisory lock file? 
> Typically you would just mount local disks (in say ext4 format) and provide 
> this as a comma separated list however we are trying to run Spark on MapR. 
> With MapR we can do a loop back mount to a volume on the local node and take 
> advantage of MapRs disk pools. This also means we dont need specific mounts 
> for Spark and improves the generic nature of the cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-13 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360896#comment-14360896
 ] 

Josh Rosen commented on SPARK-6313:
---

Thanks for the pointer to the Lucene lock factory code.

It's fine for the locks to be advisory in the sense that things shouldn't break 
if multiple executors acquire the lock and try to download the same file, but 
there's potentially a problem if the lock isn't released after the JVM that 
acquired it exits abnormally, since this could cause other executors to block 
indefinitely while waiting for the original lock owner to download the file.  
One approach might be to write the PID of the original lock owner into the lock 
file, which would allow blocked executors to timeout and re-attempt the lock 
acquisition if they detect that the original lock holder died.  This might face 
its own portability challenges, though, and seems complex.

A simple hotfix might be to add a SparkConf setting to always force this 
caching to bypassed (this would be a two-line change to Executor.scala).  This 
might lose the performance benefits of the caching, though.

If you're using NFS and the shared filesystem is mounted at the same path on 
all nodes, I think that you should be able to use use {{local://path/to/nfs/}} 
to specify the paths to your files / JARs, which will cause them to be read 
from the executor-local filesystem rather than fetched remotely.  In this case, 
this would cause them to be read from NFS, so you may be able to use this 
technique to recover any performance benefits for large files that would be 
lost in disabling the caching.

I'd be happy to review patches for this issue.

> Fetch File Lock file creation doesnt work when Spark working dir is on a NFS 
> mount
> --
>
> Key: SPARK-6313
> URL: https://issues.apache.org/jira/browse/SPARK-6313
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
>Reporter: Nathan McCarthy
>Priority: Critical
>
> When running in cluster mode and mounting the spark work dir on a NFS volume 
> (or some volume which doesn't support file locking), the fetchFile (used for 
> downloading JARs etc on the executors) method in Spark Utils class will fail. 
> This file locking was introduced as an improvement with SPARK-2713. 
> See 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L415
>  
> Introduced in 1.2 in commit; 
> https://github.com/apache/spark/commit/7aacb7bfad4ec73fd8f18555c72ef696 
> As this locking is for optimisation for fetching files, could we take a 
> different approach here to create a temp/advisory lock file? 
> Typically you would just mount local disks (in say ext4 format) and provide 
> this as a comma separated list however we are trying to run Spark on MapR. 
> With MapR we can do a loop back mount to a volume on the local node and take 
> advantage of MapRs disk pools. This also means we dont need specific mounts 
> for Spark and improves the generic nature of the cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-13 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360842#comment-14360842
 ] 

Josh Rosen commented on SPARK-6313:
---

Could you update this ticket with more details on the error-message or symptom 
that you've observed (such as a stacktrace)?  This would be helpful in order to 
make this issue more searchable / discoverable.

> Fetch File Lock file creation doesnt work when Spark working dir is on a NFS 
> mount
> --
>
> Key: SPARK-6313
> URL: https://issues.apache.org/jira/browse/SPARK-6313
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
>Reporter: Nathan McCarthy
>Priority: Critical
>
> When running in cluster mode and mounting the spark work dir on a NFS volume 
> (or some volume which doesn't support file locking), the fetchFile (used for 
> downloading JARs etc on the executors) method in Spark Utils class will fail. 
> This file locking was introduced as an improvement with SPARK-2713. 
> See 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L415
>  
> Introduced in 1.2 in commit; 
> https://github.com/apache/spark/commit/7aacb7bfad4ec73fd8f18555c72ef696 
> As this locking is for optimisation for fetching files, could we take a 
> different approach here to create a temp/advisory lock file? 
> Typically you would just mount local disks (in say ext4 format) and provide 
> this as a comma separated list however we are trying to run Spark on MapR. 
> With MapR we can do a loop back mount to a volume on the local node and take 
> advantage of MapRs disk pools. This also means we dont need specific mounts 
> for Spark and improves the generic nature of the cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-12 Thread Nathan McCarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359972#comment-14359972
 ] 

Nathan McCarthy commented on SPARK-6313:


Since the `val lockFileName = s"${url.hashCode}${timestamp}_lock"` uses a 
timestamp I can't see there being too many problems with hanging/left over lock 
files. 

> Fetch File Lock file creation doesnt work when Spark working dir is on a NFS 
> mount
> --
>
> Key: SPARK-6313
> URL: https://issues.apache.org/jira/browse/SPARK-6313
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Nathan McCarthy
>
> When running in cluster mode and mounting the spark work dir on a NFS volume 
> (or some volume which doesn't support file locking), the fetchFile (used for 
> downloading JARs etc on the executors) method in Spark Utils class will fail. 
> This file locking was introduced as an improvement with SPARK-2713. 
> See 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L415
>  
> Introduced in 1.2 in commit; 
> https://github.com/apache/spark/commit/7aacb7bfad4ec73fd8f18555c72ef696 
> As this locking is for optimisation for fetching files, could we take a 
> different approach here to create a temp/advisory lock file? 
> Typically you would just mount local disks (in say ext4 format) and provide 
> this as a comma separated list however we are trying to run Spark on MapR. 
> With MapR we can do a loop back mount to a volume on the local node and take 
> advantage of MapRs disk pools. This also means we dont need specific mounts 
> for Spark and improves the generic nature of the cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6313) Fetch File Lock file creation doesnt work when Spark working dir is on a NFS mount

2015-03-12 Thread Nathan McCarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359964#comment-14359964
 ] 

Nathan McCarthy commented on SPARK-6313:


Suggestion along the lines of;

https://github.com/apache/lucene-solr/blob/5314a56924f46522993baf106e6deca0e48a967f/lucene/core/src/java/org/apache/lucene/store/SimpleFSLockFactory.java
 
or
https://github.com/graphhopper/graphhopper/blob/master/core/src/main/java/com/graphhopper/storage/SimpleFSLockFactory.java


> Fetch File Lock file creation doesnt work when Spark working dir is on a NFS 
> mount
> --
>
> Key: SPARK-6313
> URL: https://issues.apache.org/jira/browse/SPARK-6313
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Nathan McCarthy
>
> When running in cluster mode and mounting the spark work dir on a NFS volume 
> (or some volume which doesn't support file locking), the fetchFile (used for 
> downloading JARs etc on the executors) method in Spark Utils class will fail. 
> This file locking was introduced as an improvement with SPARK-2713. 
> See 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L415
>  
> Introduced in 1.2 in commit; 
> https://github.com/apache/spark/commit/7aacb7bfad4ec73fd8f18555c72ef696 
> As this locking is for optimisation for fetching files, could we take a 
> different approach here to create a temp/advisory lock file? 
> Typically you would just mount local disks (in say ext4 format) and provide 
> this as a comma separated list however we are trying to run Spark on MapR. 
> With MapR we can do a loop back mount to a volume on the local node and take 
> advantage of MapRs disk pools. This also means we dont need specific mounts 
> for Spark and improves the generic nature of the cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org