[ 
https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737453#comment-14737453
 ] 

Timothy Chen commented on SPARK-9503:
-------------------------------------

Sorry this is indeed a bug and a fix is already in 1.5.
Please try out the just released 1.5 and it shouldn't happen.

> Mesos dispatcher NullPointerException (MesosClusterScheduler)
> -------------------------------------------------------------
>
>                 Key: SPARK-9503
>                 URL: https://issues.apache.org/jira/browse/SPARK-9503
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>    Affects Versions: 1.4.1
>         Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83
>            Reporter: Sebastian YEPES FERNANDEZ
>              Labels: mesosphere
>
> Hello,
> I have just started using start-mesos-dispatcher and have been noticing that 
> some random crashes NPE's
> By looking at the exception it looks like in certain situations the 
> "queuedDrivers" is empty and causes the NPE "submission.cores"
> https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516
> {code:title=log|borderStyle=solid}
> 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting 
> applications on port 7077
> Exception in thread "Thread-1647" java.lang.NullPointerException
>         at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
>         at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
>         at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
>         at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
> I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
> I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework 
> '20150730-234528-4261456064-5050-61754-0000'
> 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code 
> DRIVER_ABORTED
> {code}
> A side effect of this NPE is that after the crash the dispatcher will not 
> start because its already registered #SPARK-7831
> {code:title=log|borderStyle=solid}
> 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at 
> http://192.168.0.254:8081
> I0731 09:55:47.715039  8162 sched.cpp:157] Version: 0.23.0
> I0731 09:55:47.717013  8163 sched.cpp:254] New master detected at 
> master@192.168.0.254:5050
> I0731 09:55:47.717381  8163 sched.cpp:264] No credentials provided. 
> Attempting to register without authentication
> I0731 09:55:47.718246  8177 sched.cpp:819] Got error 'Completed framework 
> attempted to re-register'
> I0731 09:55:47.718268  8177 sched.cpp:1625] Asked to abort the driver
> 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed 
> framework attempted to re-register
> I0731 09:55:47.719091  8177 sched.cpp:861] Aborting framework 
> '20150730-234528-4261456064-5050-61754-0038'
> 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code 
> DRIVER_ABORTED
> 15/07/31 09:55:47 INFO Utils: Shutdown hook called
> {code}
> I can get around this by removing the zk data:
> {code:title=zkCli.sh|borderStyle=solid}
> rmr /spark_mesos_dispatcher
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to