Hi all! I found this problem when I tried running python application on Amazon's EMR yarn cluster.
It is possible to run bundled example applications on EMR but I cannot figure out how to run a little bit more complex python application which depends on some other python scripts. I tried adding those files with '--py-files' and it works fine in local mode but it fails and gives me following message when run in EMR: "Error: Only local python files are supported: s3://pathtomybucket/mylibrary.py". Simplest way to reproduce in local: bin/spark-submit --py-files s3://whatever.path.com/library.py main.py Actual commands to run it in EMR #launch cluster aws emr create-cluster --name SparkCluster --ami-version 3.3.1 --instance-type m1.medium --instance-count 2 --ec2-attributes KeyName=key20141114 --log-uri s3://pathtomybucket/cluster_logs --enable-debugging --use-default-roles --bootstrap-action Name=Spark,Path=s3://pathtomybucket/bootstrap-actions/spark/install-spark,Args=["-s","http://pathtomybucket/bootstrap-actions/spark","-l","WARN","-v","1.2","-b","2014121700","-x"] #{ # "ClusterId": "j-2Y58DME79MPQJ" #} #run application aws emr add-steps --cluster-id "j-2Y58DME79MPQJ" --steps ActionOnFailure=CONTINUE,Name=SparkPy,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn-cluster,--py-files,s3://pathtomybucket/tasks/demo/main.py,main.py] #{ # "StepIds": [ # "s-2UP4PP75YX0KU" # ] #} And in stderr of that step I get "Error: Only local python files are supported: s3://pathtomybucket/tasks/demo/main.py". What is the workaround or correct way to do it? Using hadoop's distcp to copy dependency files from s3 to nodes as another pre-step? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-py-files-remote-Only-local-additional-python-files-are-supported-tp21216.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org