[
https://issues.apache.org/jira/browse/SPARK-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293271#comment-14293271
]
Lianhui Wang commented on SPARK-5162:
-------------------------------------
[~vgrigor] thank you for asking some questions. what you said are right. now
this PR doesnot support non-local files.
yes, i agree with you that it support for non-local (python) files. i will take
a look at it.
but i have some questions. Can you run non-local (python) files on yarn client
mode? if it cannot, so we should consider it together both yarn cluster and
yarn client mode.
[[email protected]] Did you use this PR? if have any
questions,please tell me. thanks.
> Python yarn-cluster mode
> ------------------------
>
> Key: SPARK-5162
> URL: https://issues.apache.org/jira/browse/SPARK-5162
> Project: Spark
> Issue Type: New Feature
> Components: PySpark, YARN
> Reporter: Dana Klassen
> Labels: cluster, python, yarn
>
> Running pyspark in yarn is currently limited to ‘yarn-client’ mode. It would
> be great to be able to submit python applications to the cluster and (just
> like java classes) have the resource manager setup an AM on any node in the
> cluster. Does anyone know the issues blocking this feature? I was snooping
> around with enabling python apps:
> Removing the logic stopping python and yarn-cluster from sparkSubmit.scala
> ...
> // The following modes are not supported or applicable
> (clusterManager, deployMode) match {
> ...
> case (_, CLUSTER) if args.isPython =>
> printErrorAndExit("Cluster deploy mode is currently not supported for
> python applications.")
> ...
> }
> …
> and submitting application via:
> HADOOP_CONF_DIR={{insert conf dir}} ./bin/spark-submit --master yarn-cluster
> --num-executors 2 —-py-files {{insert location of egg here}}
> --executor-cores 1 ../tools/canary.py
> Everything looks to run alright, pythonRunner is picked up as main class,
> resources get setup, yarn client gets launched but falls flat on its face:
> 2015-01-08 18:48:03,444 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> DEBUG: FAILED {
> {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py,
> 1420742868009, FILE, null }, Resource
> {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py changed
> on src filesystem (expected 1420742868009, was 1420742869284
> and
> 2015-01-08 18:48:03,446 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py(->/data/4/yarn/nm/usercache/klassen/filecache/11/canary.py)
> transitioned from DOWNLOADING to FAILED
> Tracked this down to the apache hadoop code(FSDownload.java line 249) related
> to container localization of files upon downloading. At this point thought it
> would be best to raise the issue here and get input.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]