Re: Is there programmatic way running Spark job on Yarn cluster without using spark-submit script ?

Guru Medasani Wed, 17 Jun 2015 19:46:13 -0700

Hi Elkhan,

There are couple of ways to do this.


1) Spark-jobserver is a popular web server that is used to submit spark jobs.

https://github.com/spark-jobserver/spark-jobserver 
<https://github.com/spark-jobserver/spark-jobserver>

2) Spark-submit script sets the classpath for the job. Bypassing the 
spark-submit script means you have to manage some of this work in your program 
itself.  

Here is a link with some discussions around how to handle this scenario. 

http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/What-dependencies-to-submit-Spark-jobs-programmatically-not-via/td-p/24721
 
<http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/What-dependencies-to-submit-Spark-jobs-programmatically-not-via/td-p/24721>


Guru Medasani
gdm...@gmail.com



> On Jun 17, 2015, at 6:01 PM, Elkhan Dadashov <elkhan8...@gmail.com> wrote:
> 
> This is not independent programmatic way of running of Spark job on Yarn 
> cluster.
> 
> That example demonstrates running on Yarn-client mode, also will be dependent 
> of Jetty. Users writing Spark programs do not want to depend on that.
> 
> I found this SparkLauncher class introduced in Spark 1.4 version 
> (https://github.com/apache/spark/tree/master/launcher 
> <https://github.com/apache/spark/tree/master/launcher>) which allows running 
> Spark jobs in programmatic way. 
> 
> SparkLauncher exists in Java and Scala APIs, but I could not find in Python 
> API.
> 
> Did not try it yet, but seems promising.
> 
> Example:
> 
> import org.apache.spark.launcher.SparkLauncher;
> 
> public class MyLauncher {
> 
> public static void main(String[] args) throws Exception {
> 
>      Process spark = new SparkLauncher()
> 
>        .setAppResource("/my/app.jar")
> 
>        .setMainClass("my.spark.app.Main")
> 
>        .setMaster("local")
> 
>        .setConf(SparkLauncher.DRIVER_MEMORY, "2g")
> 
>         .launch();
> 
>       spark.waitFor();
> 
>    }
> 
>   }
> 
> }
> 
> 
> 
> On Wed, Jun 17, 2015 at 5:51 PM, Corey Nolet <cjno...@gmail.com 
> <mailto:cjno...@gmail.com>> wrote:
> An example of being able to do this is provided in the Spark Jetty Server 
> project [1] 
> 
> [1] https://github.com/calrissian/spark-jetty-server 
> <https://github.com/calrissian/spark-jetty-server>
> 
> On Wed, Jun 17, 2015 at 8:29 PM, Elkhan Dadashov <elkhan8...@gmail.com 
> <mailto:elkhan8...@gmail.com>> wrote:
> Hi all,
> 
> Is there any way running Spark job in programmatic way on Yarn cluster 
> without using spark-submit script ?
> 
> I cannot include Spark jars on my Java application (due o dependency conflict 
> and other reasons), so I'll be shipping Spark assembly uber jar 
> (spark-assembly-1.3.1-hadoop2.3.0.jar) to Yarn cluster, and then execute job 
> (Python or Java) on Yarn-cluster.
> 
> So is there any way running Spark job implemented in python file/Java class 
> without calling it through spark-submit script ?
> 
> Thanks.
> 
> 
> 
> 
> 
> 
> -- 
> 
> Best regards,
> Elkhan Dadashov

Re: Is there programmatic way running Spark job on Yarn cluster without using spark-submit script ?

Reply via email to