[ 
https://issues.apache.org/jira/browse/SPARK-21570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111701#comment-16111701
 ] 

Albert Chu commented on SPARK-21570:
------------------------------------

My setup is unique.  The primary unique part is when configuring Hadoop, I 
configure it to only use a "local file system", or basically the "file:" URI.  
I don't setup HDFS.  The defaultFS in Hadoop is configured as such.

{noformat}
<property>
  <name>fs.defaultFS</name>
  <value>file:///</value>
</property>
{noformat}

B/c of this, there are no HDFS configs.  Other important configuration of temp 
dirs and such are configured as follows, always to a networked file system 
accessible on all nodes (note that I have text like "node-0" below, that is 
adjusted depending on what node you are on, i.e. "node-1", "node-2", etc. on 
other nodes).

in core-site.xml

{noformat}
<property>
  <name>hadoop.tmp.dir</name>
  <value>/p/lcratery/achu/testing/rawnetworkfs//test/1181121/node-0</value>
</property>
{noformat}

in mapred-site.xml (I'm excluding mapreduce.cluster.local.dir, 
mapreduce.jobtracker.system.dir, mapreduce.jobtracker.staging.root.dir, 
mapreduce.cluster.temp.dir, which have paths based on ${hadoop.tmp.dir} above)

{noformat}
<property>
  <name>yarn.app.mapreduce.am.staging-dir</name>
  
<value>/p/lcratery/achu/testing/rawnetworkfs//test/1181121/node-0/yarn/</value>
  </description>  
</property>
{noformat}

In spark-defaults.conf

{noformat}
spark.local.dir          
/p/lcratery/achu/testing/rawnetworkfs//test/1181121/node-0/spark/node-0
{noformat}

(I can post full config files if you're interested, but I suspect that's 
excessive).

The paths above happen to be a Lustre file system, but the problem also was 
exhibited on NFS.

Re-running my tests today, things still work on Spark 2.1.1 but broke on Spark 
2.2.0.  I re-ran against Spark 1.6.0 too, and it passed there.

The job run itself isn't particular magical.  Just a simple spark-submit call 
w/ the Spark wordcount example.  The Spark binaries are stored in NFS, where it 
is available on all nodes.  The data itself is also in NFS where all can read 
it (it's a small 4 line file, this is just a sanity test).

{noformat}
spark-submit --class org.apache.spark.examples.JavaWordCount 
${PATH_IN_NFS}/spark-examples_2.11-2.2.0.jar file://<A FILE IN NFS>
{noformat}


> File __spark_libs__XXX.zip does not exist on networked file system w/ yarn
> --------------------------------------------------------------------------
>
>                 Key: SPARK-21570
>                 URL: https://issues.apache.org/jira/browse/SPARK-21570
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.2.0
>            Reporter: Albert Chu
>
> I have a set of scripts that run Spark with data in a networked file system.  
> One of my unit tests to make sure things don't break between Spark releases 
> is to simply run a word count (via org.apache.spark.examples.JavaWordCount) 
> on a file in the networked file system.  This test broke with Spark 2.2.0 
> when I use yarn to launch the job (using the spark standalone scheduler 
> things still work).  I'm currently using Hadoop 2.7.0.  I get the following 
> error:
> {noformat}
> Diagnostics: File 
> file:/p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip
>  does not exist
> java.io.FileNotFoundException: File 
> file:/p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip
>  does not exist
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596)
>       at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
>       at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
>       at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>       at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>       at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
>       at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}
> While debugging, I sat and watched the directory and did see that 
> /p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip
>  does show up at some point.
> Wondering if it's possible something racy was introduced.  Nothing in the 
> Spark 2.2.0 release notes suggests any type of configuration change that 
> needs to be done.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to