I am running on a spark 1.5.1 cluster managed by Mesos - I have an
application that handled a chemistry problem which can be increased by
increasing the number of atoms - increasing the number of Spark stages. I
do a repartition at each stage - Stage 9 is the last stage. At each stage
the size and complexity increases by a factor of 8 or so.
Problems with 8 stages run with no difficulty - ones with 9 stages never
work - the always crash in a manner similar to the stack dump below ( sorry
for the length but NONE of steps are mine.
I do not see any slaves throwing an exception (which has different errors
anyway)
I am completely baffled and believe the error is in something Spark is
doing - I use 7000 or so tasks to try to divide the work - I see the same
issue when I cut the parallelism to 256 but tasks run longer - my mean task
takes about 5 minutes (oh yes I expect the job to take about 8 hours on my
15 node cluster.
Any bright ideas


[Stage 9:======================================>             (5827 + 60) /
7776]Exception in thread "main" org.apache.spark.SparkException: Job 0
cancelled because Stage 9 was cancelled
        at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
        at
org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleStageCancellation$1.apply$mcVI$sp(DAGScheduler.scala:1217)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleStageCancellation$1.apply(DAGScheduler.scala:1216)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleStageCancellation$1.apply(DAGScheduler.scala:1216)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.ArrayOps$ofInt.foreach(ArrayOps.scala:156)
        at
org.apache.spark.scheduler.DAGScheduler.handleStageCancellation(DAGScheduler.scala:1216)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1469)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1848)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1919)
        at org.apache.spark.rdd.RDD.count(RDD.scala:1121)
        at
org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:445)
        at
org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:47)
        at
com.lordjoe.molgen.SparkAtomGenerator.run(SparkAtomGenerator.java:150)
        at
com.lordjoe.molgen.SparkAtomGenerator.run(SparkAtomGenerator.java:110)
        at com.lordjoe.molgen.VariantCounter.main(VariantCounter.java:80)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/12/14 09:53:20 WARN ServletHandler: /stages/stage/kill/
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at
org.apache.spark.ui.jobs.StagesTab.handleKillRequest(StagesTab.scala:49)
        at org.apache.spark.ui.SparkUI$$anonfun$3.apply(SparkUI.scala:71)
        at org.apache.spark.ui.SparkUI$$anonfun$3.apply(SparkUI.scala:71)
        at
org.apache.spark.ui.JettyUtils$$anon$2.doRequest(JettyUtils.scala:141)
        at
org.apache.spark.ui.JettyUtils$$anon$2.doGet(JettyUtils.scala:128)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at
org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
        at
org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
        at
org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
        at
org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
        at
org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
        at
org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
        at
org.spark-project.jetty.server.handler.GzipHandler.handle(GzipHandler.java:264)
        at
org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
        at
org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.spark-project.jetty.server.Server.handle(Server.java:370)
        at
org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
        at
org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
        at
org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
        at
org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644)
        at
org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at
org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
        at
org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
        at
org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
        at
org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at
org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
I1214 09:53:20.040680 31127 sched.cpp:1589] Asked to stop the driver
I1214 09:53:20.040848 22738 sched.cpp:831] Stopping framework
'20151020-114053-711206558-5050-2549-0220'

Reply via email to