Andrew Or created SPARK-1900: -------------------------------- Summary: Fix running PySpark files on YARN Key: SPARK-1900 URL: https://issues.apache.org/jira/browse/SPARK-1900 Project: Spark Issue Type: Bug Reporter: Andrew Or Priority: Blocker
This fails currently because of a mismatch in paths. On a YARN cluster, spark-submit automatically assumes the file is on HDFS, even if it is a relative path that refers to a local file. A natural workaround for this is to explicitly specify the "file:" prefix. However, this prefix is not understood by python, which fails with the following: {code} python: can't open file 'file:path/to/my/file.py': [Errno 2] No such file or directory {code} -- This message was sent by Atlassian JIRA (v6.2#6252)