Hi all, I am working in Databricks. When I submit a spark job with the -py-files argument, it seems the first two are read in but the third is ignored.
"--py-files", "s3://some_path/appl_src.py", "s3://some_path/main.py", "s3://a_different_path/common.py", I can see the first two acknowledged in the Log4j but not the third. 24/02/28 21:41:00 INFO Utils: Fetching s3://some_path/appl_src.py to ... 24/02/28 21:41:00 INFO Utils: Fetching s3://some_path/main.py to ... As a result, the job fails because appl_src.py is importing from common.py but can't find it. I posted to both Databricks community here<https://community.databricks.com/t5/data-engineering/spark-submit-not-reading-one-of-my-py-files-arguments/m-p/62361#M31953> and Stack Overflow here<https://stackoverflow.com/questions/78077822/databricks-spark-submit-getting-error-with-py-files> but did not get a response. I'm aware that we could use a .zip file, so I tried zipping the first two arguments but then got a totally different error: "Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'null'. Please specify one with --class." Basically I just want the application code in one s3 path and a "common" utilities package in another path. Thanks for your help. Kind regards, Chuck Pedro ________________________________ This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited. TRVDiscDefault::1201