Re: Programatically running of the Spark Jobs.

2014-09-04 Thread Matt Chu
https://github.com/spark-jobserver/spark-jobserver

Ooyala's Spark jobserver is the current de facto standard, IIUC. I just
added it to our prototype stack, and will begin trying it out soon. Note
that you can only do standalone or Mesos; YARN isn't quite there yet.

(The repo just moved from https://github.com/ooyala/spark-jobserver, so
don't trust Google on this one (yet); development is happening in the first
repo.)



On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak vicky@gmail.com wrote:

 I have been able to submit the spark jobs using the submit script but I
 would like to do it via code.
 I am unable to search anything matching to my need.
 I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may
 be have to write some utility that passes the parameters required for this
 class.
 I would be interested to know how community is doing.

 Thanks,
 Vicky



RE: Programatically running of the Spark Jobs.

2014-09-04 Thread Ruebenacker, Oliver A

 Hello,

  Can this be used as a library from within another application?
  Thanks!

 Best, Oliver

From: Matt Chu [mailto:m...@kabam.com]
Sent: Thursday, September 04, 2014 2:46 AM
To: Vicky Kak
Cc: user
Subject: Re: Programatically running of the Spark Jobs.

https://github.com/spark-jobserver/spark-jobserver

Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added 
it to our prototype stack, and will begin trying it out soon. Note that you can 
only do standalone or Mesos; YARN isn't quite there yet.

(The repo just moved from https://github.com/ooyala/spark-jobserver, so don't 
trust Google on this one (yet); development is happening in the first repo.)


On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak 
vicky@gmail.commailto:vicky@gmail.com wrote:
I have been able to submit the spark jobs using the submit script but I would 
like to do it via code.
I am unable to search anything matching to my need.
I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be 
have to write some utility that passes the parameters required for this class.
I would be interested to know how community is doing.
Thanks,
Vicky

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


Re: Programatically running of the Spark Jobs.

2014-09-04 Thread Vicky Kak
I don't think so.


On Thu, Sep 4, 2014 at 5:36 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.com wrote:



  Hello,



   Can this be used as a library from within another application?

   Thanks!



  Best, Oliver



 *From:* Matt Chu [mailto:m...@kabam.com]
 *Sent:* Thursday, September 04, 2014 2:46 AM
 *To:* Vicky Kak
 *Cc:* user
 *Subject:* Re: Programatically running of the Spark Jobs.



 https://github.com/spark-jobserver/spark-jobserver



 Ooyala's Spark jobserver is the current de facto standard, IIUC. I just
 added it to our prototype stack, and will begin trying it out soon. Note
 that you can only do standalone or Mesos; YARN isn't quite there yet.



 (The repo just moved from https://github.com/ooyala/spark-jobserver, so
 don't trust Google on this one (yet); development is happening in the first
 repo.)





 On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak vicky@gmail.com wrote:

 I have been able to submit the spark jobs using the submit script but I
 would like to do it via code.

 I am unable to search anything matching to my need.

 I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may
 be have to write some utility that passes the parameters required for this
 class.

 I would be interested to know how community is doing.

 Thanks,
 Vicky




 ***

 This email message and any attachments are intended solely for the use of
 the addressee. If you are not the intended recipient, you are prohibited
 from reading, disclosing, reproducing, distributing, disseminating or
 otherwise using this transmission. If you have received this message in
 error, please promptly notify the sender by reply email and immediately
 delete this message from your system.
 This message and any attachments may contain information that is
 confidential, privileged or exempt from disclosure. Delivery of this
 message to any person other than the intended recipient is not intended to
 waive any right or privilege. Message transmission is not guaranteed to be
 secure or free of software viruses.

 ***



Re: Programatically running of the Spark Jobs.

2014-09-04 Thread ericacm
Ahh - that probably explains an issue I am seeing.  I am a brand new user and
I tried running the SimpleApp class that is on the Quick Start page
(http://spark.apache.org/docs/latest/quick-start.html).

When I use conf.setMaster(local) then I can run the class directly from my
IDE.  But when I try to set the master to my standalone cluster using
conf.setMaster(spark://myhost:7077) and then run the class directly from
the IDE I got this error in the local application (running from the IDE):

14/09/01 10:56:04 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 4 times;
aborting job
14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
whose tasks have all completed, from pool 
14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
14/09/01 10:56:04 INFO client.AppClient$ClientActor: Executor updated:
app-20140901105546-0001/3 is now EXITED (Command exited with code 52)
14/09/01 10:56:04 INFO cluster.SparkDeploySchedulerBackend: Executor
app-20140901105546-0001/3 removed: Command exited with code 52
14/09/01 10:56:04 INFO scheduler.DAGScheduler: Failed to run count at
SimpleApp.scala:17
Exception in thread main 14/09/01 10:56:04 INFO
client.AppClient$ClientActor: Executor added: app-20140901105546-0001/4 on
worker-20140901105055-10.0.1.5-56156 (10.0.1.5:56156) with 8 cores
org.apache.spark.SparkException: Job aborted due to stage failure: Task
0.0:0 failed 4 times, most recent failure: TID 3 on host 10.0.1.5 failed for
unknown reason

and this error in the worker stderr:

14/09/01 10:55:54 ERROR Executor: Exception in task ID 1
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)
at 
org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2378)
at 
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
at 
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
at
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1872)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)

Which made no sense because I also gave the worker 1gb of heap and it was
trying to process a 4k README.md file.  I'm guessing it must have tried to
deserialize a bogus object because I was not submitting the job correctly
(via spark-submit or this spark-jobserver)?

Thanks,



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Programatically-running-of-the-Spark-Jobs-tp13426p13518.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Programatically running of the Spark Jobs.

2014-09-04 Thread Guru Medasani
I am able to run Spark jobs and Spark Streaming jobs successfully via YARN on a 
CDH cluster. 

When you mean YARN isn’t quite there yet, you mean to submit the jobs 
programmatically? or just in general?
 

On Sep 4, 2014, at 1:45 AM, Matt Chu m...@kabam.com wrote:

 https://github.com/spark-jobserver/spark-jobserver
 
 Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added 
 it to our prototype stack, and will begin trying it out soon. Note that you 
 can only do standalone or Mesos; YARN isn't quite there yet.
 
 (The repo just moved from https://github.com/ooyala/spark-jobserver, so don't 
 trust Google on this one (yet); development is happening in the first repo.)
 
 
 
 On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak vicky@gmail.com wrote:
 I have been able to submit the spark jobs using the submit script but I would 
 like to do it via code.
 I am unable to search anything matching to my need.
 I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be 
 have to write some utility that passes the parameters required for this class.
 I would be interested to know how community is doing.
 
 Thanks,
 Vicky
 



Re: Programatically running of the Spark Jobs.

2014-09-04 Thread Vicky Kak
I don't want to use YARN or Mesos, just trying the standalone spark cluster.
We need a way to do seamless submission with the API which I don't see.
To my surprise I was hit by this issue when i tried running the submit from
another machine, it is crazy that I have to submit the job from the worked
node or play with the envirnments variables. It is the seamless
http://apache-spark-user-list.1001560.n3.nabble.com/executor-failed-cannot-find-compute-classpath-sh-td859.html


On Fri, Sep 5, 2014 at 8:33 AM, Guru Medasani gdm...@outlook.com wrote:

 I am able to run Spark jobs and Spark Streaming jobs successfully via YARN
 on a CDH cluster.

 When you mean YARN isn’t quite there yet, you mean to submit the jobs
 programmatically? or just in general?


 On Sep 4, 2014, at 1:45 AM, Matt Chu m...@kabam.com wrote:

 https://github.com/spark-jobserver/spark-jobserver

 Ooyala's Spark jobserver is the current de facto standard, IIUC. I just
 added it to our prototype stack, and will begin trying it out soon. Note
 that you can only do standalone or Mesos; YARN isn't quite there yet.

 (The repo just moved from https://github.com/ooyala/spark-jobserver, so
 don't trust Google on this one (yet); development is happening in the first
 repo.)



 On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak vicky@gmail.com wrote:

 I have been able to submit the spark jobs using the submit script but I
 would like to do it via code.
 I am unable to search anything matching to my need.
 I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may
 be have to write some utility that passes the parameters required for this
 class.
 I would be interested to know how community is doing.

 Thanks,
 Vicky






Re: Programatically running of the Spark Jobs.

2014-09-04 Thread Vicky Kak
I get this error when i run it from IDE
***

Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage failure: Master removed our application: FAILED
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

***



On Fri, Sep 5, 2014 at 7:35 AM, ericacm eric...@gmail.com wrote:

 Ahh - that probably explains an issue I am seeing.  I am a brand new user
 and
 I tried running the SimpleApp class that is on the Quick Start page
 (http://spark.apache.org/docs/latest/quick-start.html).

 When I use conf.setMaster(local) then I can run the class directly from
 my
 IDE.  But when I try to set the master to my standalone cluster using
 conf.setMaster(spark://myhost:7077) and then run the class directly from
 the IDE I got this error in the local application (running from the IDE):

 14/09/01 10:56:04 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 4
 times;
 aborting job
 14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
 whose tasks have all completed, from pool
 14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
 14/09/01 10:56:04 INFO client.AppClient$ClientActor: Executor updated:
 app-20140901105546-0001/3 is now EXITED (Command exited with code 52)
 14/09/01 10:56:04 INFO cluster.SparkDeploySchedulerBackend: Executor
 app-20140901105546-0001/3 removed: Command exited with code 52
 14/09/01 10:56:04 INFO scheduler.DAGScheduler: Failed to run count at
 SimpleApp.scala:17
 Exception in thread main 14/09/01 10:56:04 INFO
 client.AppClient$ClientActor: Executor added: app-20140901105546-0001/4 on
 worker-20140901105055-10.0.1.5-56156 (10.0.1.5:56156) with 8 cores
 org.apache.spark.SparkException: Job aborted due to stage failure: Task
 0.0:0 failed 4 times, most recent failure: TID 3 on host 10.0.1.5 failed
 for
 unknown reason

 and this error in the worker stderr:

 14/09/01 10:55:54 ERROR Executor: Exception in task ID 1
 java.lang.OutOfMemoryError: Java heap space
 at

 org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)
 at
 org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2378)
 at
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
 at
 org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
 at

 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:42)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1872)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)