Re: How do you run your spark app?
Hello Michael, I have a quick question for you. Can you clarify the statement build fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and everything needed to run a Job. Can you give an example. I am using sbt assembly as well to create a fat jar, and supplying the spark and hadoop locations in the class path. Inside the main() function where spark context is created, I use SparkContext.jarOfClass(this).toList add the fat jar to my spark context. However, I seem to be running into issues with this approach. I was wondering if you had any inputs Michael. Thanks, Shivani On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal sonalgoy...@gmail.com wrote: We use maven for building our code and then invoke spark-submit through the exec plugin, passing in our parameters. Works well for us. Best Regards, Sonal Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler mich...@tumra.com wrote: P.S. Last but not least we use sbt-assembly to build fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and everything needed to run a Job. These are automatically built from source by our Jenkins and stored in HDFS. Our Chronos/Marathon jobs fetch the latest release TAR.GZ direct from HDFS, unpack it and launch the appropriate script. Makes for a much cleaner development / testing / deployment to package everything required in one go instead of relying on cluster specific classpath additions or any add-jars functionality. On 19 June 2014 22:53, Michael Cutler mich...@tumra.com wrote: When you start seriously using Spark in production there are basically two things everyone eventually needs: 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. 2. Always-On Jobs - that require monitoring, restarting etc. There are lots of ways to implement these requirements, everything from crontab through to workflow managers like Oozie. We opted for the following stack: - Apache Mesos http://mesosphere.io/ (mesosphere.io distribution) - Marathon https://github.com/mesosphere/marathon - init/control system for starting, stopping, and maintaining always-on applications. - Chronos http://airbnb.github.io/chronos/ - general-purpose scheduler for Mesos, supports job dependency graphs. - ** Spark Job Server https://github.com/ooyala/spark-jobserver - primarily for it's ability to reuse shared contexts with multiple jobs The majority of our jobs are periodic (batch) jobs run through spark-sumit, and we have several always-on Spark Streaming jobs (also run through spark-submit). We always use client mode with spark-submit because the Mesos cluster has direct connectivity to the Spark cluster and it means all the Spark stdout/stderr is externalised into Mesos logs which helps diagnosing problems. I thoroughly recommend you explore using Mesos/Marathon/Chronos to run Spark and manage your Jobs, the Mesosphere tutorials are awesome and you can be up and running in literally minutes. The Web UI's to both make it easy to get started without talking to REST API's etc. Best, Michael On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote: I use SBT, create an assembly, and then add the assembly jars when I create my spark context. The main executor I run with something like java -cp ... MyDriver. That said - as of spark 1.0 the preferred way to run spark applications is via spark-submit - http://spark.apache.org/docs/latest/submitting-applications.html On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote: I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and Shark working and ended upgrading everything and dropped CDH support. Anyways, this is what I used with master=yarn-client and app_jar being Scala code compiled with Maven. java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME $ARGS Do you use this? or something else? I could never figure out this method. SPARK_HOME/bin/spark jar APP_JAR ARGS For example: bin/spark-class jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 10 Do you use SBT or Maven to compile? or something else? ** It seams that I can't get subscribed to the mailing list and I tried both my work email and personal. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: How do you run your spark app?
and it means all the Spark stdout/stderr is externalised into Mesos logs which helps diagnosing problems. I thoroughly recommend you explore using Mesos/Marathon/Chronos to run Spark and manage your Jobs, the Mesosphere tutorials are awesome and you can be up and running in literally minutes. The Web UI's to both make it easy to get started without talking to REST API's etc. Best, Michael On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote: I use SBT, create an assembly, and then add the assembly jars when I create my spark context. The main executor I run with something like java -cp ... MyDriver. That said - as of spark 1.0 the preferred way to run spark applications is via spark-submit - http://spark.apache.org/docs/latest/submitting-applications.html On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote: I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and Shark working and ended upgrading everything and dropped CDH support. Anyways, this is what I used with master=yarn-client and app_jar being Scala code compiled with Maven. java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME $ARGS Do you use this? or something else? I could never figure out this method. SPARK_HOME/bin/spark jar APP_JAR ARGS For example: bin/spark-class jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 10 Do you use SBT or Maven to compile? or something else? ** It seams that I can't get subscribed to the mailing list and I tried both my work email and personal. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: How do you run your spark app?
/chronos/ - general-purpose scheduler for Mesos, supports job dependency graphs. - ** Spark Job Server https://github.com/ooyala/spark-jobserver - primarily for it's ability to reuse shared contexts with multiple jobs The majority of our jobs are periodic (batch) jobs run through spark-sumit, and we have several always-on Spark Streaming jobs (also run through spark-submit). We always use client mode with spark-submit because the Mesos cluster has direct connectivity to the Spark cluster and it means all the Spark stdout/stderr is externalised into Mesos logs which helps diagnosing problems. I thoroughly recommend you explore using Mesos/Marathon/Chronos to run Spark and manage your Jobs, the Mesosphere tutorials are awesome and you can be up and running in literally minutes. The Web UI's to both make it easy to get started without talking to REST API's etc. Best, Michael On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote: I use SBT, create an assembly, and then add the assembly jars when I create my spark context. The main executor I run with something like java -cp ... MyDriver. That said - as of spark 1.0 the preferred way to run spark applications is via spark-submit - http://spark.apache.org/docs/latest/submitting-applications.html On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote: I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and Shark working and ended upgrading everything and dropped CDH support. Anyways, this is what I used with master=yarn-client and app_jar being Scala code compiled with Maven. java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME $ARGS Do you use this? or something else? I could never figure out this method. SPARK_HOME/bin/spark jar APP_JAR ARGS For example: bin/spark-class jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 10 Do you use SBT or Maven to compile? or something else? ** It seams that I can't get subscribed to the mailing list and I tried both my work email and personal. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -- Software Engineer Analytics Engineering Team@ Box Mountain View, CA
Re: How do you run your spark app?
seriously using Spark in production there are basically two things everyone eventually needs: 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. 2. Always-On Jobs - that require monitoring, restarting etc. There are lots of ways to implement these requirements, everything from crontab through to workflow managers like Oozie. We opted for the following stack: - Apache Mesos http://mesosphere.io/ (mesosphere.io distribution) - Marathon https://github.com/mesosphere/marathon - init/control system for starting, stopping, and maintaining always-on applications. - Chronos http://airbnb.github.io/chronos/ - general-purpose scheduler for Mesos, supports job dependency graphs. - ** Spark Job Server https://github.com/ooyala/spark-jobserver - primarily for it's ability to reuse shared contexts with multiple jobs The majority of our jobs are periodic (batch) jobs run through spark-sumit, and we have several always-on Spark Streaming jobs (also run through spark-submit). We always use client mode with spark-submit because the Mesos cluster has direct connectivity to the Spark cluster and it means all the Spark stdout/stderr is externalised into Mesos logs which helps diagnosing problems. I thoroughly recommend you explore using Mesos/Marathon/Chronos to run Spark and manage your Jobs, the Mesosphere tutorials are awesome and you can be up and running in literally minutes. The Web UI's to both make it easy to get started without talking to REST API's etc. Best, Michael On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote: I use SBT, create an assembly, and then add the assembly jars when I create my spark context. The main executor I run with something like java -cp ... MyDriver. That said - as of spark 1.0 the preferred way to run spark applications is via spark-submit - http://spark.apache.org/docs/latest/submitting-applications.html On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote: I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and Shark working and ended upgrading everything and dropped CDH support. Anyways, this is what I used with master=yarn-client and app_jar being Scala code compiled with Maven. java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME $ARGS Do you use this? or something else? I could never figure out this method. SPARK_HOME/bin/spark jar APP_JAR ARGS For example: bin/spark-class jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 10 Do you use SBT or Maven to compile? or something else? ** It seams that I can't get subscribed to the mailing list and I tried both my work email and personal. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -- Software Engineer Analytics Engineering Team@ Box Mountain View, CA
How do you run your spark app?
I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and Shark working and ended upgrading everything and dropped CDH support. Anyways, this is what I used with master=yarn-client and app_jar being Scala code compiled with Maven. java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME $ARGS Do you use this? or something else? I could never figure out this method. SPARK_HOME/bin/spark jar APP_JAR ARGS For example: bin/spark-class jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 10 Do you use SBT or Maven to compile? or something else? ** It seams that I can't get subscribed to the mailing list and I tried both my work email and personal. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: How do you run your spark app?
I use SBT, create an assembly, and then add the assembly jars when I create my spark context. The main executor I run with something like java -cp ... MyDriver. That said - as of spark 1.0 the preferred way to run spark applications is via spark-submit - http://spark.apache.org/docs/latest/submitting-applications.html On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote: I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and Shark working and ended upgrading everything and dropped CDH support. Anyways, this is what I used with master=yarn-client and app_jar being Scala code compiled with Maven. java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME $ARGS Do you use this? or something else? I could never figure out this method. SPARK_HOME/bin/spark jar APP_JAR ARGS For example: bin/spark-class jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 10 Do you use SBT or Maven to compile? or something else? ** It seams that I can't get subscribed to the mailing list and I tried both my work email and personal. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: How do you run your spark app?
When you start seriously using Spark in production there are basically two things everyone eventually needs: 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. 2. Always-On Jobs - that require monitoring, restarting etc. There are lots of ways to implement these requirements, everything from crontab through to workflow managers like Oozie. We opted for the following stack: - Apache Mesos http://mesosphere.io/ (mesosphere.io distribution) - Marathon https://github.com/mesosphere/marathon - init/control system for starting, stopping, and maintaining always-on applications. - Chronos http://airbnb.github.io/chronos/ - general-purpose scheduler for Mesos, supports job dependency graphs. - ** Spark Job Server https://github.com/ooyala/spark-jobserver - primarily for it's ability to reuse shared contexts with multiple jobs The majority of our jobs are periodic (batch) jobs run through spark-sumit, and we have several always-on Spark Streaming jobs (also run through spark-submit). We always use client mode with spark-submit because the Mesos cluster has direct connectivity to the Spark cluster and it means all the Spark stdout/stderr is externalised into Mesos logs which helps diagnosing problems. I thoroughly recommend you explore using Mesos/Marathon/Chronos to run Spark and manage your Jobs, the Mesosphere tutorials are awesome and you can be up and running in literally minutes. The Web UI's to both make it easy to get started without talking to REST API's etc. Best, Michael On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote: I use SBT, create an assembly, and then add the assembly jars when I create my spark context. The main executor I run with something like java -cp ... MyDriver. That said - as of spark 1.0 the preferred way to run spark applications is via spark-submit - http://spark.apache.org/docs/latest/submitting-applications.html On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote: I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and Shark working and ended upgrading everything and dropped CDH support. Anyways, this is what I used with master=yarn-client and app_jar being Scala code compiled with Maven. java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME $ARGS Do you use this? or something else? I could never figure out this method. SPARK_HOME/bin/spark jar APP_JAR ARGS For example: bin/spark-class jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 10 Do you use SBT or Maven to compile? or something else? ** It seams that I can't get subscribed to the mailing list and I tried both my work email and personal. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: How do you run your spark app?
P.S. Last but not least we use sbt-assembly to build fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and everything needed to run a Job. These are automatically built from source by our Jenkins and stored in HDFS. Our Chronos/Marathon jobs fetch the latest release TAR.GZ direct from HDFS, unpack it and launch the appropriate script. Makes for a much cleaner development / testing / deployment to package everything required in one go instead of relying on cluster specific classpath additions or any add-jars functionality. On 19 June 2014 22:53, Michael Cutler mich...@tumra.com wrote: When you start seriously using Spark in production there are basically two things everyone eventually needs: 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. 2. Always-On Jobs - that require monitoring, restarting etc. There are lots of ways to implement these requirements, everything from crontab through to workflow managers like Oozie. We opted for the following stack: - Apache Mesos http://mesosphere.io/ (mesosphere.io distribution) - Marathon https://github.com/mesosphere/marathon - init/control system for starting, stopping, and maintaining always-on applications. - Chronos http://airbnb.github.io/chronos/ - general-purpose scheduler for Mesos, supports job dependency graphs. - ** Spark Job Server https://github.com/ooyala/spark-jobserver - primarily for it's ability to reuse shared contexts with multiple jobs The majority of our jobs are periodic (batch) jobs run through spark-sumit, and we have several always-on Spark Streaming jobs (also run through spark-submit). We always use client mode with spark-submit because the Mesos cluster has direct connectivity to the Spark cluster and it means all the Spark stdout/stderr is externalised into Mesos logs which helps diagnosing problems. I thoroughly recommend you explore using Mesos/Marathon/Chronos to run Spark and manage your Jobs, the Mesosphere tutorials are awesome and you can be up and running in literally minutes. The Web UI's to both make it easy to get started without talking to REST API's etc. Best, Michael On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote: I use SBT, create an assembly, and then add the assembly jars when I create my spark context. The main executor I run with something like java -cp ... MyDriver. That said - as of spark 1.0 the preferred way to run spark applications is via spark-submit - http://spark.apache.org/docs/latest/submitting-applications.html On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote: I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and Shark working and ended upgrading everything and dropped CDH support. Anyways, this is what I used with master=yarn-client and app_jar being Scala code compiled with Maven. java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME $ARGS Do you use this? or something else? I could never figure out this method. SPARK_HOME/bin/spark jar APP_JAR ARGS For example: bin/spark-class jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 10 Do you use SBT or Maven to compile? or something else? ** It seams that I can't get subscribed to the mailing list and I tried both my work email and personal. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: How do you run your spark app?
We use maven for building our code and then invoke spark-submit through the exec plugin, passing in our parameters. Works well for us. Best Regards, Sonal Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler mich...@tumra.com wrote: P.S. Last but not least we use sbt-assembly to build fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and everything needed to run a Job. These are automatically built from source by our Jenkins and stored in HDFS. Our Chronos/Marathon jobs fetch the latest release TAR.GZ direct from HDFS, unpack it and launch the appropriate script. Makes for a much cleaner development / testing / deployment to package everything required in one go instead of relying on cluster specific classpath additions or any add-jars functionality. On 19 June 2014 22:53, Michael Cutler mich...@tumra.com wrote: When you start seriously using Spark in production there are basically two things everyone eventually needs: 1. Scheduled Jobs - recurring hourly/daily/weekly jobs. 2. Always-On Jobs - that require monitoring, restarting etc. There are lots of ways to implement these requirements, everything from crontab through to workflow managers like Oozie. We opted for the following stack: - Apache Mesos http://mesosphere.io/ (mesosphere.io distribution) - Marathon https://github.com/mesosphere/marathon - init/control system for starting, stopping, and maintaining always-on applications. - Chronos http://airbnb.github.io/chronos/ - general-purpose scheduler for Mesos, supports job dependency graphs. - ** Spark Job Server https://github.com/ooyala/spark-jobserver - primarily for it's ability to reuse shared contexts with multiple jobs The majority of our jobs are periodic (batch) jobs run through spark-sumit, and we have several always-on Spark Streaming jobs (also run through spark-submit). We always use client mode with spark-submit because the Mesos cluster has direct connectivity to the Spark cluster and it means all the Spark stdout/stderr is externalised into Mesos logs which helps diagnosing problems. I thoroughly recommend you explore using Mesos/Marathon/Chronos to run Spark and manage your Jobs, the Mesosphere tutorials are awesome and you can be up and running in literally minutes. The Web UI's to both make it easy to get started without talking to REST API's etc. Best, Michael On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote: I use SBT, create an assembly, and then add the assembly jars when I create my spark context. The main executor I run with something like java -cp ... MyDriver. That said - as of spark 1.0 the preferred way to run spark applications is via spark-submit - http://spark.apache.org/docs/latest/submitting-applications.html On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote: I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and Shark working and ended upgrading everything and dropped CDH support. Anyways, this is what I used with master=yarn-client and app_jar being Scala code compiled with Maven. java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME $ARGS Do you use this? or something else? I could never figure out this method. SPARK_HOME/bin/spark jar APP_JAR ARGS For example: bin/spark-class jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 10 Do you use SBT or Maven to compile? or something else? ** It seams that I can't get subscribed to the mailing list and I tried both my work email and personal. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html Sent from the Apache Spark User List mailing list archive at Nabble.com.