Usually you add exception handling within the transformations, in your case
you have it added in the driver code. This approach won't be able to catch
those exceptions happening inside the executor.

eg:

try {
      val rdd = sc.parallelize(1 to 100)

      rdd.foreach(x => throw new Exception("Real failure!")) //This could
be rdd.map etc

      val count = rdd.count

      println(s"Count: $count")

      *throw new Exception("Fail!")*

    } finally {
      sc.stop
    }

Thanks
Best Regards

On Wed, Oct 28, 2015 at 7:10 AM, Isabelle Phan <nlip...@gmail.com> wrote:

> Hello,
>
> I had a question about error handling in Spark job: if an exception occurs
> during the job, what is the best way to get notification of the failure?
> Can Spark jobs return with different exit codes?
>
> For example, I wrote a dummy Spark job just throwing out an Exception, as
> follows:
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkContext._
> import org.apache.spark.SparkConf
>
> object ExampleJob {
>   def main(args: Array[String]): Unit = {
>     val conf = new SparkConf().setAppName("Test Job")
>     val sc = new SparkContext(conf)
>     try {
>       val count = sc.parallelize(1 to 100).count
>       println(s"Count: $count")
>
>       *throw new Exception("Fail!")*
>
>     } finally {
>       sc.stop
>     }
>   }
>
> }
>
> The spark-submit execution trace shows the error:
> spark-submit --class com.test.ExampleJob test.jar
> 15/10/03 03:13:16 INFO SparkContext: Running Spark version 1.4.0
> 15/10/03 03:13:19 WARN SparkConf: In Spark 1.0 and later spark.local.dir
> will be overridden by the value set by the cluster manager (via
> SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
> 15/10/03 03:13:19 WARN SparkConf:
> ...
> 15/10/03 03:13:59 INFO DAGScheduler: Job 0 finished: count at
> ExampleJob.scala:12, took 18.879104 s
> Count: 100
> 15/10/03 03:13:59 INFO SparkUI: Stopped Spark web UI at []
> 15/10/03 03:13:59 INFO DAGScheduler: Stopping DAGScheduler
> 15/10/03 03:13:59 INFO SparkDeploySchedulerBackend: Shutting down all
> executors
> 15/10/03 03:13:59 INFO SparkDeploySchedulerBackend: Asking each executor
> to shut down
> 15/10/03 03:13:59 INFO MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
> 15/10/03 03:13:59 INFO Utils: path =
> /data1/spark/tmp/spark-d8c0a18f-6e45-46c8-a208-1e1ad36ae596/blockmgr-d8e40805-3b8c-45f4-97b3-b89874158796,
> already present as root for deletion.
> 15/10/03 03:13:59 INFO MemoryStore: MemoryStore cleared
> 15/10/03 03:13:59 INFO BlockManager: BlockManager stopped
> 15/10/03 03:13:59 INFO BlockManagerMaster: BlockManagerMaster stopped
> 15/10/03 03:13:59 INFO
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped!
> 15/10/03 03:13:59 INFO SparkContext: Successfully stopped SparkContext
> Exception in thread "main" java.lang.Exception: Fail!
> at com.test.ExampleJob$.main(ExampleJob.scala:14)
> at com.test.ExampleJob.main(ExampleJob.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 15/10/03 03:13:59 INFO RemoteActorRefProvider$RemotingTerminator: Shutting
> down remote daemon.
> 15/10/03 03:13:59 INFO RemoteActorRefProvider$RemotingTerminator: Remote
> daemon shut down; proceeding with flushing remote transports.
> 15/10/03 03:13:59 INFO Utils: Shutdown hook called
> 15/10/03 03:13:59 INFO Utils: Deleting directory
> /data1/spark/tmp/spark-d8c0a18f-6e45-46c8-a208-1e1ad36ae596
> 15/10/03 03:14:00 INFO RemoteActorRefProvider$RemotingTerminator: Remoting
> shut down.
>
>
> However, the Spark UI just shows the status as "FINISHED". Is this a
> configuration error on my side?
> [image: Inline image 1]
>
>
> Thanks,
>
> Isabelle
>

Reply via email to