Yes, you can launch (from Java code) pyspark scripts with yarn-cluster mode
without using the spark-submit script.

Check SparkLauncher code in this link
<https://github.com/apache/spark/tree/master/launcher/src/main/java/org/apache/spark/launcher>
. SparkLauncher is not dependent on Spark core jars, so it is very easy to
integrate it into your project.

Code example for launching Spark job without spark-submit script:

Process spark = new SparkLauncher().setSparkHome("path_to_spark")

.setAppName(pythonScriptName).setMaster("yarn-cluster")

.setAppResource(sparkScriptPath.toString()).addAppArgs(params)

.addPyFile(otherPythonScriptPath.toString()).launch();

But in order to correctly handling python path addition of 3rd party
packages, which Marcelo has implemented in patch Spark 5479
<https://issues.apache.org/jira/browse/SPARK-5479>, download latest source
code of Spark, and built it yourself with maven.

Other pre-built Spark versions does not include that patch.



On Fri, Jul 10, 2015 at 9:52 AM, Sandy Ryza <sandy.r...@cloudera.com> wrote:

> To add to this, conceptually, it makes no sense to launch something in
> yarn-cluster mode by creating a SparkContext on the client - the whole
> point of yarn-cluster mode is that the SparkContext runs on the cluster,
> not on the client.
>
> On Thu, Jul 9, 2015 at 2:35 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>
>> You cannot run Spark in cluster mode by instantiating a SparkContext like
>> that.
>>
>> You have to launch it with the "spark-submit" command line script.
>>
>> On Thu, Jul 9, 2015 at 2:23 PM, jegordon <jgordo...@gmail.com> wrote:
>>
>>> Hi to all,
>>>
>>> Is there any way to run pyspark scripts with yarn-cluster mode without
>>> using
>>> the spark-submit script? I need it in this way because i will integrate
>>> this
>>> code into a django web app.
>>>
>>> When i try to run any script in yarn-cluster mode i got the following
>>> error
>>> :
>>>
>>> org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't
>>> running on a cluster. Deployment to YARN is not supported directly by
>>> SparkContext. Please use spark-submit.
>>>
>>>
>>> I'm creating the sparkContext in the following way :
>>>
>>>         conf = (SparkConf()
>>>             .setMaster("yarn-cluster")
>>>             .setAppName("DataFrameTest"))
>>>
>>>         sc = SparkContext(conf = conf)
>>>
>>>         #Dataframe code ....
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-not-working-on-yarn-cluster-mode-tp23755.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Marcelo
>>
>
>


-- 

Best regards,
Elkhan Dadashov

Reply via email to