Sebastian YEPES FERNANDEZ created SPARK-9503:
------------------------------------------------

             Summary: Mesos dispatcher NullPointerException 
(MesosClusterScheduler)
                 Key: SPARK-9503
                 URL: https://issues.apache.org/jira/browse/SPARK-9503
             Project: Spark
          Issue Type: Bug
          Components: Mesos
    Affects Versions: 1.4.1
         Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83
            Reporter: Sebastian YEPES FERNANDEZ


Hello,

I have just started using start-mesos-dispatcher and have been noticing that 
some random crashes NPE's

By looking at the exception it looks like in certain situations the 
"queuedDrivers" is empty and causes the NPE "submission.cores"

https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516

{code:title=log|borderStyle=solid}
15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting 
applications on port 7077
Exception in thread "Thread-1647" java.lang.NullPointerException
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework 
'20150730-234528-4261456064-5050-61754-0000'
15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code 
DRIVER_ABORTED
{code}

A side effect of this NPE is that after the crash the dispatcher will not start 
because its already registered #SPARK-7831
I can get around this by removing the zk data:
{code:title=zkCli.sh|borderStyle=solid}
rmr /spark_mesos_dispatcher
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to