Hyukjin Kwon created SPARK-24384: ------------------------------------ Summary: spark-submit --py-files with .py files doesn't work in client mode before context initialization Key: SPARK-24384 URL: https://issues.apache.org/jira/browse/SPARK-24384 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.3.0, 2.4.0 Reporter: Hyukjin Kwon
In case the given Python file is .py file (zip file seems fine), seems the python path is dynamically added after the context is got initialized. with this pyFile: {code} $ cat /home/spark/tmp.py def testtest(): return 1 {code} This works: {code} $ cat app.py import pyspark pyspark.sql.SparkSession.builder.getOrCreate() import tmp print("************************%s" % tmp.testtest()) $ ./bin/spark-submit --master yarn --deploy-mode client --py-files /home/spark/tmp.py app.py ... ************************1 {code} but this doesn't: {code} $ cat app.py import pyspark import tmp pyspark.sql.SparkSession.builder.getOrCreate() print("************************%s" % tmp.testtest()) $ ./bin/spark-submit --master yarn --deploy-mode client --py-files /home/spark/tmp.py app.py Traceback (most recent call last): File "/home/spark/spark/app.py", line 2, in <module> import tmp ImportError: No module named tmp {code} See https://issues.apache.org/jira/browse/SPARK-21945?focusedCommentId=16488486&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16488486 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org