Re: Programatically running of the Spark Jobs.
https://github.com/spark-jobserver/spark-jobserver Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet. (The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.) On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak vicky@gmail.com wrote: I have been able to submit the spark jobs using the submit script but I would like to do it via code. I am unable to search anything matching to my need. I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class. I would be interested to know how community is doing. Thanks, Vicky
RE: Programatically running of the Spark Jobs.
Hello, Can this be used as a library from within another application? Thanks! Best, Oliver From: Matt Chu [mailto:m...@kabam.com] Sent: Thursday, September 04, 2014 2:46 AM To: Vicky Kak Cc: user Subject: Re: Programatically running of the Spark Jobs. https://github.com/spark-jobserver/spark-jobserver Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet. (The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.) On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak vicky@gmail.commailto:vicky@gmail.com wrote: I have been able to submit the spark jobs using the submit script but I would like to do it via code. I am unable to search anything matching to my need. I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class. I would be interested to know how community is doing. Thanks, Vicky *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
Re: Programatically running of the Spark Jobs.
I don't think so. On Thu, Sep 4, 2014 at 5:36 PM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.com wrote: Hello, Can this be used as a library from within another application? Thanks! Best, Oliver *From:* Matt Chu [mailto:m...@kabam.com] *Sent:* Thursday, September 04, 2014 2:46 AM *To:* Vicky Kak *Cc:* user *Subject:* Re: Programatically running of the Spark Jobs. https://github.com/spark-jobserver/spark-jobserver Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet. (The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.) On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak vicky@gmail.com wrote: I have been able to submit the spark jobs using the submit script but I would like to do it via code. I am unable to search anything matching to my need. I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class. I would be interested to know how community is doing. Thanks, Vicky *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
Re: Programatically running of the Spark Jobs.
Ahh - that probably explains an issue I am seeing. I am a brand new user and I tried running the SimpleApp class that is on the Quick Start page (http://spark.apache.org/docs/latest/quick-start.html). When I use conf.setMaster(local) then I can run the class directly from my IDE. But when I try to set the master to my standalone cluster using conf.setMaster(spark://myhost:7077) and then run the class directly from the IDE I got this error in the local application (running from the IDE): 14/09/01 10:56:04 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 4 times; aborting job 14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0 14/09/01 10:56:04 INFO client.AppClient$ClientActor: Executor updated: app-20140901105546-0001/3 is now EXITED (Command exited with code 52) 14/09/01 10:56:04 INFO cluster.SparkDeploySchedulerBackend: Executor app-20140901105546-0001/3 removed: Command exited with code 52 14/09/01 10:56:04 INFO scheduler.DAGScheduler: Failed to run count at SimpleApp.scala:17 Exception in thread main 14/09/01 10:56:04 INFO client.AppClient$ClientActor: Executor added: app-20140901105546-0001/4 on worker-20140901105055-10.0.1.5-56156 (10.0.1.5:56156) with 8 cores org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: TID 3 on host 10.0.1.5 failed for unknown reason and this error in the worker stderr: 14/09/01 10:55:54 ERROR Executor: Exception in task ID 1 java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183) at org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2378) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:42) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1872) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) Which made no sense because I also gave the worker 1gb of heap and it was trying to process a 4k README.md file. I'm guessing it must have tried to deserialize a bogus object because I was not submitting the job correctly (via spark-submit or this spark-jobserver)? Thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Programatically-running-of-the-Spark-Jobs-tp13426p13518.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Programatically running of the Spark Jobs.
I am able to run Spark jobs and Spark Streaming jobs successfully via YARN on a CDH cluster. When you mean YARN isn’t quite there yet, you mean to submit the jobs programmatically? or just in general? On Sep 4, 2014, at 1:45 AM, Matt Chu m...@kabam.com wrote: https://github.com/spark-jobserver/spark-jobserver Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet. (The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.) On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak vicky@gmail.com wrote: I have been able to submit the spark jobs using the submit script but I would like to do it via code. I am unable to search anything matching to my need. I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class. I would be interested to know how community is doing. Thanks, Vicky
Re: Programatically running of the Spark Jobs.
I don't want to use YARN or Mesos, just trying the standalone spark cluster. We need a way to do seamless submission with the API which I don't see. To my surprise I was hit by this issue when i tried running the submit from another machine, it is crazy that I have to submit the job from the worked node or play with the envirnments variables. It is the seamless http://apache-spark-user-list.1001560.n3.nabble.com/executor-failed-cannot-find-compute-classpath-sh-td859.html On Fri, Sep 5, 2014 at 8:33 AM, Guru Medasani gdm...@outlook.com wrote: I am able to run Spark jobs and Spark Streaming jobs successfully via YARN on a CDH cluster. When you mean YARN isn’t quite there yet, you mean to submit the jobs programmatically? or just in general? On Sep 4, 2014, at 1:45 AM, Matt Chu m...@kabam.com wrote: https://github.com/spark-jobserver/spark-jobserver Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet. (The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.) On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak vicky@gmail.com wrote: I have been able to submit the spark jobs using the submit script but I would like to do it via code. I am unable to search anything matching to my need. I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class. I would be interested to know how community is doing. Thanks, Vicky
Re: Programatically running of the Spark Jobs.
I get this error when i run it from IDE *** Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Master removed our application: FAILED at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) *** On Fri, Sep 5, 2014 at 7:35 AM, ericacm eric...@gmail.com wrote: Ahh - that probably explains an issue I am seeing. I am a brand new user and I tried running the SimpleApp class that is on the Quick Start page (http://spark.apache.org/docs/latest/quick-start.html). When I use conf.setMaster(local) then I can run the class directly from my IDE. But when I try to set the master to my standalone cluster using conf.setMaster(spark://myhost:7077) and then run the class directly from the IDE I got this error in the local application (running from the IDE): 14/09/01 10:56:04 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 4 times; aborting job 14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/09/01 10:56:04 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0 14/09/01 10:56:04 INFO client.AppClient$ClientActor: Executor updated: app-20140901105546-0001/3 is now EXITED (Command exited with code 52) 14/09/01 10:56:04 INFO cluster.SparkDeploySchedulerBackend: Executor app-20140901105546-0001/3 removed: Command exited with code 52 14/09/01 10:56:04 INFO scheduler.DAGScheduler: Failed to run count at SimpleApp.scala:17 Exception in thread main 14/09/01 10:56:04 INFO client.AppClient$ClientActor: Executor added: app-20140901105546-0001/4 on worker-20140901105055-10.0.1.5-56156 (10.0.1.5:56156) with 8 cores org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: TID 3 on host 10.0.1.5 failed for unknown reason and this error in the worker stderr: 14/09/01 10:55:54 ERROR Executor: Exception in task ID 1 java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183) at org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2378) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:42) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1872) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)