[ https://issues.apache.org/jira/browse/SPARK-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng updated SPARK-1900: --------------------------------- Issue Type: Sub-task (was: Bug) Parent: SPARK-1652 > Fix running PySpark files on YARN > ---------------------------------- > > Key: SPARK-1900 > URL: https://issues.apache.org/jira/browse/SPARK-1900 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 1.0.0 > Reporter: Andrew Or > Assignee: Andrew Or > Priority: Blocker > Fix For: 1.0.0 > > > If I run the following on a YARN cluster > {code} > bin/spark-submit sheep.py --master yarn-client > {code} > it fails because of a mismatch in paths: `spark-submit` thinks that > `sheep.py` resides on HDFS, and balks when it can't find the file there. A > natural workaround is to add the `file:` prefix to the file: > {code} > bin/spark-submit file:/path/to/sheep.py --master yarn-client > {code} > However, this also fails. This time it is because python does not understand > URI schemes. > This PR fixes this by automatically resolving all paths passed as command > line argument to `spark-submit` properly. This has the added benefit of > keeping file and jar paths consistent across different cluster modes. -- This message was sent by Atlassian JIRA (v6.2#6252)