Hi all!

I found this problem when I tried running python application on Amazon's EMR
yarn cluster.

It is possible to run bundled example applications on EMR but I cannot
figure out how to run a little bit more complex python application which
depends on some other python scripts. I tried adding those files with
'--py-files' and it works fine in local mode but it fails and gives me
following message when run in EMR:
"Error: Only local python files are supported:
s3://pathtomybucket/mylibrary.py".

Simplest way to reproduce in local:
bin/spark-submit --py-files s3://whatever.path.com/library.py main.py

Actual commands to run it in EMR
#launch cluster
aws emr create-cluster --name SparkCluster --ami-version 3.3.1
--instance-type m1.medium --instance-count 2  --ec2-attributes
KeyName=key20141114 --log-uri s3://pathtomybucket/cluster_logs
--enable-debugging --use-default-roles  --bootstrap-action
Name=Spark,Path=s3://pathtomybucket/bootstrap-actions/spark/install-spark,Args=["-s","http://pathtomybucket/bootstrap-actions/spark","-l","WARN","-v","1.2","-b","2014121700","-x";]
#{
#   "ClusterId": "j-2Y58DME79MPQJ"
#}

#run application
aws emr add-steps --cluster-id "j-2Y58DME79MPQJ" --steps
ActionOnFailure=CONTINUE,Name=SparkPy,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn-cluster,--py-files,s3://pathtomybucket/tasks/demo/main.py,main.py]
#{
#    "StepIds": [
#        "s-2UP4PP75YX0KU"
#    ]
#}
And in stderr of that step I get "Error: Only local python files are
supported: s3://pathtomybucket/tasks/demo/main.py".

What is the workaround or correct way to do it? Using hadoop's distcp to
copy dependency files from s3 to nodes as another pre-step? 





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-py-files-remote-Only-local-additional-python-files-are-supported-tp21216.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to