Yes, you can launch (from Java code) pyspark scripts with yarn-cluster mode without using the spark-submit script.
Check SparkLauncher code in this link <https://github.com/apache/spark/tree/master/launcher/src/main/java/org/apache/spark/launcher> . SparkLauncher is not dependent on Spark core jars, so it is very easy to integrate it into your project. Code example for launching Spark job without spark-submit script: Process spark = new SparkLauncher().setSparkHome("path_to_spark") .setAppName(pythonScriptName).setMaster("yarn-cluster") .setAppResource(sparkScriptPath.toString()).addAppArgs(params) .addPyFile(otherPythonScriptPath.toString()).launch(); But in order to correctly handling python path addition of 3rd party packages, which Marcelo has implemented in patch Spark 5479 <https://issues.apache.org/jira/browse/SPARK-5479>, download latest source code of Spark, and built it yourself with maven. Other pre-built Spark versions does not include that patch. On Fri, Jul 10, 2015 at 9:52 AM, Sandy Ryza <sandy.r...@cloudera.com> wrote: > To add to this, conceptually, it makes no sense to launch something in > yarn-cluster mode by creating a SparkContext on the client - the whole > point of yarn-cluster mode is that the SparkContext runs on the cluster, > not on the client. > > On Thu, Jul 9, 2015 at 2:35 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> You cannot run Spark in cluster mode by instantiating a SparkContext like >> that. >> >> You have to launch it with the "spark-submit" command line script. >> >> On Thu, Jul 9, 2015 at 2:23 PM, jegordon <jgordo...@gmail.com> wrote: >> >>> Hi to all, >>> >>> Is there any way to run pyspark scripts with yarn-cluster mode without >>> using >>> the spark-submit script? I need it in this way because i will integrate >>> this >>> code into a django web app. >>> >>> When i try to run any script in yarn-cluster mode i got the following >>> error >>> : >>> >>> org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't >>> running on a cluster. Deployment to YARN is not supported directly by >>> SparkContext. Please use spark-submit. >>> >>> >>> I'm creating the sparkContext in the following way : >>> >>> conf = (SparkConf() >>> .setMaster("yarn-cluster") >>> .setAppName("DataFrameTest")) >>> >>> sc = SparkContext(conf = conf) >>> >>> #Dataframe code .... >>> >>> Thanks >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-not-working-on-yarn-cluster-mode-tp23755.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >> >> -- >> Marcelo >> > > -- Best regards, Elkhan Dadashov