[ 
https://issues.apache.org/jira/browse/SPARK-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296652#comment-14296652
 ] 

Lianhui Wang edited comment on SPARK-5162 at 1/29/15 11:24 AM:
---------------------------------------------------------------

Vladimir Grigor I have created JIRE:SPARK-5479 for support non-local files.
In addition, i have updated PR:https://github.com/apache/spark/pull/3976
I can run hdfs python files on yarn cluster mode. example:spark-submit --master 
yarn-cluster  --name python_test --num-executors 1 --driver-memory 1g 
--executor-memory 1g --py-files hdfs://xx/test.py hdfs://xx/test2.py  
but i donot know whether it can be used on s3 files. so you can try it again. 
if that is ok for your s3 files,i hope that you can reply my PR that can make 
this feature be merged quickly.
if you have any problems, please tell me.thanks.


was (Author: lianhuiwang):
Vladimir Grigor I have created JIRE:SPARK-5479 for support non-local files.
In addition, i have updated PR:https://github.com/apache/spark/pull/3976
I can run hdfs python files on yarn cluster mode. example:spark-submit --master 
yarn-cluster  --name python_test --num-executors 1 --driver-memory 1g 
--executor-memory 1g --py-files hdfs://xx/test.py hdfs://xx/test2.py  
but i donot know whether it can be used on s3 files. so you can try it again.
if it is has any problems, please tell me.thanks.

> Python yarn-cluster mode
> ------------------------
>
>                 Key: SPARK-5162
>                 URL: https://issues.apache.org/jira/browse/SPARK-5162
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark, YARN
>            Reporter: Dana Klassen
>              Labels: cluster, python, yarn
>
> Running pyspark in yarn is currently limited to ‘yarn-client’ mode. It would 
> be great to be able to submit python applications to the cluster and (just 
> like java classes) have the resource manager setup an AM on any node in the 
> cluster. Does anyone know the issues blocking this feature? I was snooping 
> around with enabling python apps:
> Removing the logic stopping python and yarn-cluster from sparkSubmit.scala
> ...
>     // The following modes are not supported or applicable
>     (clusterManager, deployMode) match {
>       ...
>       case (_, CLUSTER) if args.isPython =>
>         printErrorAndExit("Cluster deploy mode is currently not supported for 
> python applications.")
>       ...
>     }
> …
> and submitting application via:
> HADOOP_CONF_DIR={{insert conf dir}} ./bin/spark-submit --master yarn-cluster 
> --num-executors 2  —-py-files {{insert location of egg here}} 
> --executor-cores 1  ../tools/canary.py
> Everything looks to run alright, pythonRunner is picked up as main class, 
> resources get setup, yarn client gets launched but falls flat on its face:
> 2015-01-08 18:48:03,444 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { 
> {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py, 
> 1420742868009, FILE, null }, Resource 
> {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py changed 
> on src filesystem (expected 1420742868009, was 1420742869284
> and
> 2015-01-08 18:48:03,446 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py(->/data/4/yarn/nm/usercache/klassen/filecache/11/canary.py)
>  transitioned from DOWNLOADING to FAILED
> Tracked this down to the apache hadoop code(FSDownload.java line 249) related 
> to container localization of files upon downloading. At this point thought it 
> would be best to raise the issue here and get input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to