[ https://issues.apache.org/jira/browse/SPARK-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296303#comment-14296303 ]
Dana Klassen commented on SPARK-5162: ------------------------------------- Yes of course. I tried these combinations: HADOOP_CONF_DIR=conf/conf.cloudera.yarn ./bin/spark-submit --master yarn-cluster --num-executors 2 --executor-cores 1 /Users/klassen/Desktop/test.py HADOOP_CONF_DIR=conf/conf.cloudera.yarn ./bin/spark-submit --master yarn-cluster --py-files '/path/to/package1.egg' --num-executors 2 --executor-cores 1 /Users/klassen/Desktop/test.py HADOOP_CONF_DIR=conf/conf.cloudera.yarn ./bin/spark-submit --master yarn-cluster --py-files '/path/to/package1.egg,/path/to/package2.egg'--num-executors 2 --executor-cores 1 /Users/klassen/Desktop/test.py *The test script in this case makes no use of the resources in the eggs I forgot to include enough of the logs to show that packages are uploaded to hdfs sparkStaging as follows:: ``` 15/01/28 21:38:07 INFO Client: Source and destination file systems are the same. Not copying hdfs://nn01.chi.shopify.com:8020/user/sparkles/spark-assembly-python-submit.jar 15/01/28 21:38:07 INFO Client: Uploading resource file:/Users/klassen/Desktop/test.py -> hdfs://nn01.chi.shopify.com:8020/user/klassen/.sparkStaging/application_1422398120127_3034/test.py ``` This is seen for the packages as well. Before these packages are downloaded to the container and setup they are cleared from sparkStaging ( seen at the end of the previous logs). > Python yarn-cluster mode > ------------------------ > > Key: SPARK-5162 > URL: https://issues.apache.org/jira/browse/SPARK-5162 > Project: Spark > Issue Type: New Feature > Components: PySpark, YARN > Reporter: Dana Klassen > Labels: cluster, python, yarn > > Running pyspark in yarn is currently limited to ‘yarn-client’ mode. It would > be great to be able to submit python applications to the cluster and (just > like java classes) have the resource manager setup an AM on any node in the > cluster. Does anyone know the issues blocking this feature? I was snooping > around with enabling python apps: > Removing the logic stopping python and yarn-cluster from sparkSubmit.scala > ... > // The following modes are not supported or applicable > (clusterManager, deployMode) match { > ... > case (_, CLUSTER) if args.isPython => > printErrorAndExit("Cluster deploy mode is currently not supported for > python applications.") > ... > } > … > and submitting application via: > HADOOP_CONF_DIR={{insert conf dir}} ./bin/spark-submit --master yarn-cluster > --num-executors 2 —-py-files {{insert location of egg here}} > --executor-cores 1 ../tools/canary.py > Everything looks to run alright, pythonRunner is picked up as main class, > resources get setup, yarn client gets launched but falls flat on its face: > 2015-01-08 18:48:03,444 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > DEBUG: FAILED { > {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py, > 1420742868009, FILE, null }, Resource > {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py changed > on src filesystem (expected 1420742868009, was 1420742869284 > and > 2015-01-08 18:48:03,446 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py(->/data/4/yarn/nm/usercache/klassen/filecache/11/canary.py) > transitioned from DOWNLOADING to FAILED > Tracked this down to the apache hadoop code(FSDownload.java line 249) related > to container localization of files upon downloading. At this point thought it > would be best to raise the issue here and get input. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org