subject:"How do you run your spark app\\\\\\\?"

Re: How do you run your spark app?

2014-06-20 Thread Shivani Rao

Hello Michael,

I have a quick question for you. Can you clarify the statement build fat
JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's and
everything needed to run a Job. Can you give an example.

I am using sbt assembly as well to create a fat jar, and supplying the
spark and hadoop locations in the class path. Inside the main() function
where spark context is created, I use SparkContext.jarOfClass(this).toList
add the fat jar to my spark context. However, I seem to be running into
issues with this approach. I was wondering if you had any inputs Michael.

Thanks,
Shivani

On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal sonalgoy...@gmail.com wrote:

We use maven for building our code and then invoke spark-submit through
the exec plugin, passing in our parameters. Works well for us.

Best Regards,
Sonal
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal

On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler mich...@tumra.com wrote:

P.S. Last but not least we use sbt-assembly to build fat JAR's and build
dist-style TAR.GZ packages with launch scripts, JAR's and everything needed
to run a Job. These are automatically built from source by our Jenkins and
stored in HDFS. Our Chronos/Marathon jobs fetch the latest release TAR.GZ
direct from HDFS, unpack it and launch the appropriate script.

Makes for a much cleaner development / testing / deployment to package
everything required in one go instead of relying on cluster specific
classpath additions or any add-jars functionality.

On 19 June 2014 22:53, Michael Cutler mich...@tumra.com wrote:

When you start seriously using Spark in production there are basically
two things everyone eventually needs:

1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
2. Always-On Jobs - that require monitoring, restarting etc.

There are lots of ways to implement these requirements, everything from
crontab through to workflow managers like Oozie.

We opted for the following stack:

- Apache Mesos http://mesosphere.io/ (mesosphere.io distribution)

- Marathon https://github.com/mesosphere/marathon - init/control
system for starting, stopping, and maintaining always-on applications.

- Chronos http://airbnb.github.io/chronos/ - general-purpose
scheduler for Mesos, supports job dependency graphs.

- ** Spark Job Server https://github.com/ooyala/spark-jobserver -
primarily for it's ability to reuse shared contexts with multiple jobs

The majority of our jobs are periodic (batch) jobs run through
spark-sumit, and we have several always-on Spark Streaming jobs (also run
through spark-submit).

We always use client mode with spark-submit because the Mesos cluster
has direct connectivity to the Spark cluster and it means all the Spark
stdout/stderr is externalised into Mesos logs which helps diagnosing
problems.

I thoroughly recommend you explore using Mesos/Marathon/Chronos to run
Spark and manage your Jobs, the Mesosphere tutorials are awesome and you
can be up and running in literally minutes. The Web UI's to both make it
easy to get started without talking to REST API's etc.

Best,

Michael

On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote:

I use SBT, create an assembly, and then add the assembly jars when I
create my spark context. The main executor I run with something like java
-cp ... MyDriver.

That said - as of spark 1.0 the preferred way to run spark applications
is via spark-submit -
http://spark.apache.org/docs/latest/submitting-applications.html

On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote:

I want to ask this, not because I can't read endless documentation and
several tutorials, but because there seems to be many ways of doing
things
and I keep having issues. How do you run /your /spark app?

I had it working when I was only using yarn+hadoop1 (Cloudera), then I
had
to get Spark and Shark working and ended upgrading everything and
dropped
CDH support. Anyways, this is what I used with master=yarn-client and
app_jar being Scala code compiled with Maven.

java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
$CLASSNAME
$ARGS

Do you use this? or something else? I could never figure out this
method.
SPARK_HOME/bin/spark jar APP_JAR ARGS

For example:
bin/spark-class jar

/usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
pi 10 10

Do you use SBT or Maven to compile? or something else?

** It seams that I can't get subscribed to the mailing list and I
tried both
my work email and personal.

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.

Re: How do you run your spark app?

2014-06-20 Thread Shrikar archak

and it means all the Spark
stdout/stderr is externalised into Mesos logs which helps diagnosing
problems.

Best,

Michael

On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote:

I use SBT, create an assembly, and then add the assembly jars when I
create my spark context. The main executor I run with something like java
-cp ... MyDriver.

That said - as of spark 1.0 the preferred way to run spark
applications is via spark-submit -
http://spark.apache.org/docs/latest/submitting-applications.html

On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote:

I had it working when I was only using yarn+hadoop1 (Cloudera), then
I had
to get Spark and Shark working and ended upgrading everything and
dropped
CDH support. Anyways, this is what I used with master=yarn-client and
app_jar being Scala code compiled with Maven.

java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
$CLASSNAME
$ARGS

Do you use this? or something else? I could never figure out this
method.
SPARK_HOME/bin/spark jar APP_JAR ARGS

For example:
bin/spark-class jar

/usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
pi 10 10

Do you use SBT or Maven to compile? or something else?

** It seams that I can't get subscribed to the mailing list and I
tried both
my work email and personal.

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.

Re: How do you run your spark app?

2014-06-20 Thread Shivani Rao

/chronos/ - general-purpose
scheduler for Mesos, supports job dependency graphs.

- ** Spark Job Server https://github.com/ooyala/spark-jobserver
- primarily for it's ability to reuse shared contexts with multiple
jobs

The majority of our jobs are periodic (batch) jobs run through
spark-sumit, and we have several always-on Spark Streaming jobs (also run
through spark-submit).

We always use client mode with spark-submit because the Mesos
cluster has direct connectivity to the Spark cluster and it means all the
Spark stdout/stderr is externalised into Mesos logs which helps diagnosing
problems.

Best,

Michael

On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote:

I use SBT, create an assembly, and then add the assembly jars when I
create my spark context. The main executor I run with something like
java
-cp ... MyDriver.

That said - as of spark 1.0 the preferred way to run spark
applications is via spark-submit -
http://spark.apache.org/docs/latest/submitting-applications.html

On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote:

I want to ask this, not because I can't read endless documentation
and
several tutorials, but because there seems to be many ways of doing
things
and I keep having issues. How do you run /your /spark app?

I had it working when I was only using yarn+hadoop1 (Cloudera), then
I had
to get Spark and Shark working and ended upgrading everything and
dropped
CDH support. Anyways, this is what I used with master=yarn-client and
app_jar being Scala code compiled with Maven.

java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
$CLASSNAME
$ARGS

Do you use this? or something else? I could never figure out this
method.
SPARK_HOME/bin/spark jar APP_JAR ARGS

For example:
bin/spark-class jar

/usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
pi 10 10

Do you use SBT or Maven to compile? or something else?

** It seams that I can't get subscribed to the mailing list and I
tried both
my work email and personal.

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.

--
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Re: How do you run your spark app?

2014-06-20 Thread Andrei

seriously using Spark in production there are
basically two things everyone eventually needs:

1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
2. Always-On Jobs - that require monitoring, restarting etc.

There are lots of ways to implement these requirements, everything
from crontab through to workflow managers like Oozie.

We opted for the following stack:

- Apache Mesos http://mesosphere.io/ (mesosphere.io
distribution)

- Marathon https://github.com/mesosphere/marathon -
init/control system for starting, stopping, and maintaining always-on
applications.

- Chronos http://airbnb.github.io/chronos/ - general-purpose
scheduler for Mesos, supports job dependency graphs.

- ** Spark Job Server https://github.com/ooyala/spark-jobserver
- primarily for it's ability to reuse shared contexts with multiple
jobs

The majority of our jobs are periodic (batch) jobs run through
spark-sumit, and we have several always-on Spark Streaming jobs (also run
through spark-submit).

We always use client mode with spark-submit because the Mesos
cluster has direct connectivity to the Spark cluster and it means all the
Spark stdout/stderr is externalised into Mesos logs which helps
diagnosing
problems.

I thoroughly recommend you explore using Mesos/Marathon/Chronos to
run Spark and manage your Jobs, the Mesosphere tutorials are awesome and
you can be up and running in literally minutes. The Web UI's to both
make
it easy to get started without talking to REST API's etc.

Best,

Michael

On 19 June 2014 19:44, Evan R. Sparks evan.spa...@gmail.com wrote:

I use SBT, create an assembly, and then add the assembly jars when I
create my spark context. The main executor I run with something like
java
-cp ... MyDriver.

That said - as of spark 1.0 the preferred way to run spark
applications is via spark-submit -
http://spark.apache.org/docs/latest/submitting-applications.html

On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote:

I want to ask this, not because I can't read endless documentation
and
several tutorials, but because there seems to be many ways of doing
things
and I keep having issues. How do you run /your /spark app?

I had it working when I was only using yarn+hadoop1 (Cloudera),
then I had
to get Spark and Shark working and ended upgrading everything and
dropped
CDH support. Anyways, this is what I used with master=yarn-client
and
app_jar being Scala code compiled with Maven.

java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
$CLASSNAME
$ARGS

Do you use this? or something else? I could never figure out this
method.
SPARK_HOME/bin/spark jar APP_JAR ARGS

For example:
bin/spark-class jar

/usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
pi 10 10

Do you use SBT or Maven to compile? or something else?

** It seams that I can't get subscribed to the mailing list and I
tried both
my work email and personal.

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.

--
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

How do you run your spark app?

2014-06-19 Thread ldmtwo

I want to ask this, not because I can't read endless documentation and
several tutorials, but because there seems to be many ways of doing things
and I keep having issues. How do you run /your /spark app?

I had it working when I was only using yarn+hadoop1 (Cloudera), then I had
to get Spark and Shark working and ended upgrading everything and dropped
CDH support. Anyways, this is what I used with master=yarn-client and
app_jar being Scala code compiled with Maven.

java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME
$ARGS 

Do you use this? or something else? I could never figure out this method.
SPARK_HOME/bin/spark jar APP_JAR ARGS

For example:
bin/spark-class jar
/usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
pi 10 10

Do you use SBT or Maven to compile? or something else?


** It seams that I can't get subscribed to the mailing list and I tried both
my work email and personal.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How do you run your spark app?

2014-06-19 Thread Evan R. Sparks

I use SBT, create an assembly, and then add the assembly jars when I create
my spark context. The main executor I run with something like java -cp ...
MyDriver.

That said - as of spark 1.0 the preferred way to run spark applications is
via spark-submit -
http://spark.apache.org/docs/latest/submitting-applications.html

On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo ldm...@gmail.com wrote:

I want to ask this, not because I can't read endless documentation and
several tutorials, but because there seems to be many ways of doing things
and I keep having issues. How do you run /your /spark app?

I had it working when I was only using yarn+hadoop1 (Cloudera), then I had
to get Spark and Shark working and ended upgrading everything and dropped
CDH support. Anyways, this is what I used with master=yarn-client and
app_jar being Scala code compiled with Maven.

java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER $CLASSNAME
$ARGS

Do you use this? or something else? I could never figure out this method.
SPARK_HOME/bin/spark jar APP_JAR ARGS

For example:
bin/spark-class jar
/usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
pi 10 10

Do you use SBT or Maven to compile? or something else?

** It seams that I can't get subscribed to the mailing list and I tried
both
my work email and personal.

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How do you run your spark app?

2014-06-19 Thread Michael Cutler

When you start seriously using Spark in production there are basically two
things everyone eventually needs:

1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
2. Always-On Jobs - that require monitoring, restarting etc.

There are lots of ways to implement these requirements, everything from
crontab through to workflow managers like Oozie.

We opted for the following stack:

- Apache Mesos http://mesosphere.io/ (mesosphere.io distribution)

- Marathon https://github.com/mesosphere/marathon - init/control
system for starting, stopping, and maintaining always-on applications.

- Chronos http://airbnb.github.io/chronos/ - general-purpose scheduler
for Mesos, supports job dependency graphs.

- ** Spark Job Server https://github.com/ooyala/spark-jobserver -
primarily for it's ability to reuse shared contexts with multiple jobs

The majority of our jobs are periodic (batch) jobs run through spark-sumit,
and we have several always-on Spark Streaming jobs (also run through
spark-submit).

We always use client mode with spark-submit because the Mesos cluster has
direct connectivity to the Spark cluster and it means all the Spark
stdout/stderr is externalised into Mesos logs which helps diagnosing
problems.