[ 
https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian YEPES FERNANDEZ updated SPARK-9503:
---------------------------------------------
    Description: 
Hello,

I have just started using start-mesos-dispatcher and have been noticing that 
some random crashes NPE's

By looking at the exception it looks like in certain situations the 
"queuedDrivers" is empty and causes the NPE "submission.cores"

https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516

{code:title=log|borderStyle=solid}
15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting 
applications on port 7077
Exception in thread "Thread-1647" java.lang.NullPointerException
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework 
'20150730-234528-4261456064-5050-61754-0000'
15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code 
DRIVER_ABORTED
{code}

A side effect of this NPE is that after the crash the dispatcher will not start 
because its already registered #SPARK-7831
{code:title=log|borderStyle=solid}
15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at 
http://192.168.0.254:8081
I0731 09:55:47.715039  8162 sched.cpp:157] Version: 0.23.0
I0731 09:55:47.717013  8163 sched.cpp:254] New master detected at 
master@192.168.0.254:5050
I0731 09:55:47.717381  8163 sched.cpp:264] No credentials provided. Attempting 
to register without authentication
I0731 09:55:47.718246  8177 sched.cpp:819] Got error 'Completed framework 
attempted to re-register'
I0731 09:55:47.718268  8177 sched.cpp:1625] Asked to abort the driver
15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed 
framework attempted to re-register
I0731 09:55:47.719091  8177 sched.cpp:861] Aborting framework 
'20150730-234528-4261456064-5050-61754-0038'
15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code 
DRIVER_ABORTED
15/07/31 09:55:47 INFO Utils: Shutdown hook called
{code}

I can get around this by removing the zk data:
{code:title=zkCli.sh|borderStyle=solid}
rmr /spark_mesos_dispatcher
{code}


  was:
Hello,

I have just started using start-mesos-dispatcher and have been noticing that 
some random crashes NPE's

By looking at the exception it looks like in certain situations the 
"queuedDrivers" is empty and causes the NPE "submission.cores"

https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516

{code:title=log|borderStyle=solid}
15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting 
applications on port 7077
Exception in thread "Thread-1647" java.lang.NullPointerException
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework 
'20150730-234528-4261456064-5050-61754-0000'
15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code 
DRIVER_ABORTED
{code}

A side effect of this NPE is that after the crash the dispatcher will not start 
because its already registered #SPARK-7831
I can get around this by removing the zk data:
{code:title=zkCli.sh|borderStyle=solid}
rmr /spark_mesos_dispatcher
{code}



> Mesos dispatcher NullPointerException (MesosClusterScheduler)
> -------------------------------------------------------------
>
>                 Key: SPARK-9503
>                 URL: https://issues.apache.org/jira/browse/SPARK-9503
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>    Affects Versions: 1.4.1
>         Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83
>            Reporter: Sebastian YEPES FERNANDEZ
>              Labels: mesosphere
>
> Hello,
> I have just started using start-mesos-dispatcher and have been noticing that 
> some random crashes NPE's
> By looking at the exception it looks like in certain situations the 
> "queuedDrivers" is empty and causes the NPE "submission.cores"
> https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516
> {code:title=log|borderStyle=solid}
> 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting 
> applications on port 7077
> Exception in thread "Thread-1647" java.lang.NullPointerException
>         at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
>         at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
>         at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
>         at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
> I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
> I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework 
> '20150730-234528-4261456064-5050-61754-0000'
> 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code 
> DRIVER_ABORTED
> {code}
> A side effect of this NPE is that after the crash the dispatcher will not 
> start because its already registered #SPARK-7831
> {code:title=log|borderStyle=solid}
> 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at 
> http://192.168.0.254:8081
> I0731 09:55:47.715039  8162 sched.cpp:157] Version: 0.23.0
> I0731 09:55:47.717013  8163 sched.cpp:254] New master detected at 
> master@192.168.0.254:5050
> I0731 09:55:47.717381  8163 sched.cpp:264] No credentials provided. 
> Attempting to register without authentication
> I0731 09:55:47.718246  8177 sched.cpp:819] Got error 'Completed framework 
> attempted to re-register'
> I0731 09:55:47.718268  8177 sched.cpp:1625] Asked to abort the driver
> 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed 
> framework attempted to re-register
> I0731 09:55:47.719091  8177 sched.cpp:861] Aborting framework 
> '20150730-234528-4261456064-5050-61754-0038'
> 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code 
> DRIVER_ABORTED
> 15/07/31 09:55:47 INFO Utils: Shutdown hook called
> {code}
> I can get around this by removing the zk data:
> {code:title=zkCli.sh|borderStyle=solid}
> rmr /spark_mesos_dispatcher
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to