[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162913#comment-14162913 ] Xu Zhongxing commented on SPARK-3005: - Resolved in https://github.com/apache/spark/pull/2453 > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scal
[jira] [Closed] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Zhongxing closed SPARK-3005. --- Resolution: Fixed > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.con
[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118069#comment-14118069 ] Xu Zhongxing edited comment on SPARK-3005 at 9/2/14 9:59 AM: - By "tasks themselves already died and exited", I mean that even if we do nothing in killTasks(), there won't be any zombie tasks left on the slaves. This is what I get from testing the Mesos fine-grained mode. If I'm wrong, please correct me. But the logic here is incomplete or inconsistent, and needs to be fixed. was (Author: xuzhongxing): By "tasks themselves already died and exited", I mean that even if we do nothing in killTasks(), there won't be any zombie tasks left on the slaves. This is what I get from testing. If I'm wrong, please correct me. But the logic here is incomplete or inconsistent, and needs to be fixed. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSch
[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118069#comment-14118069 ] Xu Zhongxing commented on SPARK-3005: - By "tasks themselves already died and exited", I mean that even if we do nothing in killTasks(), there won't be any zombie tasks left on the slaves. This is what I get from testing. If I'm wrong, please correct me. But the logic here is incomplete or inconsistent, and needs to be fixed. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaFork
[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677 ] Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:33 AM: -- [SPARK-1749] didn't fix this problem. It just catches the UnsupportedOperationException and logs it. Then it sets ableToCancelStages = false. This is exactly the reason that causes the hang. Because the code only does cleanup when ableToCancelStages = true. {code} + if (ableToCancelStages) { + job.listener.jobFailed(error) + cleanupStateForJobAndIndependentStages(job, resultStage) + listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error))) {code} The fact is that in the mesos fine-grained case, it is unnecessary to killTask(). So throwing UnsupportedOperationException and set ableToCancelStages = false is wrong behaviour for this case. We just need to do nothing in killTask() and let the driver do the rest of the cleanup. The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend does not need to kill tasks, and should not throw UnsupportedOperationException. The tasks themselves already died and exited. was (Author: xuzhongxing): [SPARK-1749] didn't fix this problem. It just catches the UnsupportedOperationException and logs it. Then it sets ableToCancelStages = false. This is exactly the reason that causes the hang. Because the code only does cleanup when ableToCancelStages = true. {{code}} + if (ableToCancelStages) { + job.listener.jobFailed(error) + cleanupStateForJobAndIndependentStages(job, resultStage) + listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error))) {{code}} The fact is that in the mesos fine-grained case, it is unnecessary to killTask(). So throwing UnsupportedOperationException and set ableToCancelStages = false is wrong behaviour for this case. We just need to do nothing in killTask() and let the driver do the rest of the cleanup. The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend does not need to kill tasks, and should not throw UnsupportedOperationException. The tasks themselves already died and exited. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1
[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677 ] Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:32 AM: -- [SPARK-1749] didn't fix this problem. It just catches the UnsupportedOperationException and logs it. Then it sets ableToCancelStages = false. This is exactly the reason that causes the hang. Because the code only does cleanup when ableToCancelStages = true. {{code}} + if (ableToCancelStages) { + job.listener.jobFailed(error) + cleanupStateForJobAndIndependentStages(job, resultStage) + listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error))) {{code}} The fact is that in the mesos fine-grained case, it is unnecessary to killTask(). So throwing UnsupportedOperationException and set ableToCancelStages = false is wrong behaviour for this case. We just need to do nothing in killTask() and let the driver do the rest of the cleanup. The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend does not need to kill tasks, and should not throw UnsupportedOperationException. The tasks themselves already died and exited. was (Author: xuzhongxing): [SPARK-1749] didn't fix this problem. It just catches the UnsupportedOperationException and logs it. Then it sets ableToCancelStages = false. This is exactly the reason that causes the hang. Because the code only do cleanup when ableToCancelStages = true. + if (ableToCancelStages) { + job.listener.jobFailed(error) + cleanupStateForJobAndIndependentStages(job, resultStage) + listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error))) The fact is that in the mesos fine-grained case, it is unnecessary to killTask(). So throwing UnsupportedOperationException and set ableToCancelStages = false is wrong behaviour for this case. We just need to do nothing in killTask() and let the driver do the rest of the cleanup. The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend does not need to kill tasks, and should not throw UnsupportedOperationException. The tasks themselves already died and exited. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.
[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677 ] Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:30 AM: -- [SPARK-1749] didn't fix this problem. It just catches the UnsupportedOperationException and logs it. Then it sets ableToCancelStages = false. This is exactly the reason that causes the hang. Because the code only do cleanup when ableToCancelStages = true. + if (ableToCancelStages) { + job.listener.jobFailed(error) + cleanupStateForJobAndIndependentStages(job, resultStage) + listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error))) The fact is that in the mesos fine-grained case, it is unnecessary to killTask(). So throwing UnsupportedOperationException and set ableToCancelStages = false is wrong behaviour for this case. We just need to do nothing in killTask() and let the driver do the rest of the cleanup. The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend does not need to kill tasks, and should not throw UnsupportedOperationException. The tasks themselves already died and exited. was (Author: xuzhongxing): [SPARK-1749] didn't fix this problem completely. It just catch the UnsupportedOperationException and log it. Then it sets ableToCancelStages = false. This is exactly the reason that causes the hang. Because the code only do cleanup when ableToCancelStages = true. + if (ableToCancelStages) { + job.listener.jobFailed(error) + cleanupStateForJobAndIndependentStages(job, resultStage) + listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error))) The fact is that in the mesos fine-grained case, it is unnecessary to killTask(). So throwing UnsupportedOperationException and set ableToCancelStages = false is wrong behaviour for this case. We just need to do nothing in killTask() and let the driver do the rest of the cleanup. It actually is able to cancel later stages. It does not need to kill tasks. The tasks themselves already died and exited. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$schedul
[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677 ] Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:19 AM: -- [SPARK-1749] didn't fix this problem completely. It just catch the UnsupportedOperationException and log it. Then it sets ableToCancelStages = false. This is exactly the reason that causes the hang. Because the code only do cleanup when ableToCancelStages = true. + if (ableToCancelStages) { + job.listener.jobFailed(error) + cleanupStateForJobAndIndependentStages(job, resultStage) + listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error))) The fact is that in the mesos fine-grained case, it is unnecessary to killTask(). So throwing UnsupportedOperationException and set ableToCancelStages = false is wrong behaviour for this case. We just need to do nothing in killTask() and let the driver do the rest of the cleanup. It actually is able to cancel later stages. It does not need to kill tasks. The tasks themselves already died and exited. was (Author: xuzhongxing): [SPARK-1749] didn't fix this problem completely. It just catch the UnsupportedOperationException and log it. Then it sets ableToCancelStages = false. This is the exactly the reason that causes the hang. The fact is that in the mesos fine-grained case, it is unnecessary to killTask(). So throwing UnsupportedOperationException and set ableToCancelStages = false is wrong behaviour. We should suppress the exception and let the code go the normal path. It actually is able to cancel later stages. It does not need to kill tasks. The tasks themselves already died and exited. We just need to tell the driver the truth. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org
[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677 ] Xu Zhongxing commented on SPARK-3005: - [SPARK-1749] didn't fix this problem completely. It just catch the UnsupportedOperationException and log it. Then it sets ableToCancelStages = false. This is the exactly the reason that causes the hang. The fact is that in the mesos fine-grained case, it is unnecessary to killTask(). So throwing UnsupportedOperationException and set ableToCancelStages = false is wrong behaviour. We should suppress the exception and let the code go the normal path. It actually is able to cancel later stages. It does not need to kill tasks. The tasks themselves already died and exited. We just need to tell the driver the truth. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGSc
[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108674#comment-14108674 ] Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:04 AM: -- I was using spark to process data from Cassandra. And when Cassandra is under heavy load, the executors on the slaves throws timeout exception, and the tasks fail. Then the driver need to cancel the job. In coarse-grained mode, it works fine. The coarse-grained backend has the killTask() method. But in fine-grained mode, the backend didn't override the killTask() method. So it throws exception. I could tune Cassandra to avoid read timeout. But I just want spark to exit when the job fails. Currently it just throws that operation-not-supported exception and hangs there. This is unacceptable behaviour. was (Author: xuzhongxing): I was using spark to process data from Cassandra. And when Cassandra is under heavy load, the executors on the slaves throws timeout exception, and the tasks fail. Then the driver need to cancel the job. In coarse-grained mode, it works fine. The coarse-grained backend has the killTask() method. But in fine-grained mode, the backend didn't override the killTask() method. So it throws exception. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at >
[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108674#comment-14108674 ] Xu Zhongxing commented on SPARK-3005: - I was using spark to process data from Cassandra. And when Cassandra is under heavy load, the executors on the slaves throws timeout exception, and the tasks fail. Then the driver need to cancel the job. In coarse-grained mode, it works fine. The coarse-grained backend has the killTask() method. But in fine-grained mode, the backend didn't override the killTask() method. So it throws exception. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >
[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539 ] Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 7:37 AM: -- Could adding an empty killTask method to MesosSchedulerBackend fix this problem? {code} override def killTask(taskId: Long, executorId: String, interruptThread: Boolean) {} {code} This works for my tests. was (Author: xuzhongxing): Could adding an empty killTask method to MesosSchedulerBackend fix this problem? override def killTask(taskId: Long, executorId: String, interruptThread: Boolean) {} This works for my tests. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:45
[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539 ] Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 7:37 AM: -- Could adding an empty killTask method to MesosSchedulerBackend fix this problem? {code:title=MesosSchedulerBackend.scala} override def killTask(taskId: Long, executorId: String, interruptThread: Boolean) {} {code} This works for my tests. was (Author: xuzhongxing): Could adding an empty killTask method to MesosSchedulerBackend fix this problem? {code} override def killTask(taskId: Long, executorId: String, interruptThread: Boolean) {} {code} This works for my tests. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at
[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095288#comment-14095288 ] Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 7:36 AM: -- Some additional driver logs during the spark driver hang: {code} TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,908 Logging.scala (line 66) Checking for newly runnable parent stages INFO [Result resolver thread-1] 2014-08-13 15:58:15,908 Logging.scala (line 58) Removed TaskSet 1.0, whose tasks have all completed, from pool TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 66) running: Set(Stage 1, Stage 2) TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 66) waiting: Set(Stage 0) TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 66) failed: Set() DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 62) submitStage(Stage 0) DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 Logging.scala (line 62) missing: List(Stage 1, Stage 2) DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 Logging.scala (line 62) submitStage(Stage 1) DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 Logging.scala (line 62) submitStage(Stage 2) TRACE [spark-akka.actor.default-dispatcher-3] 2014-08-13 15:58:56,643 Logging.scala (line 66) Checking for hosts with no recent heart beats in BlockManagerMaster. TRACE [spark-akka.actor.default-dispatcher-6] 2014-08-13 15:59:56,653 Logging.scala (line 66) Checking for hosts with no recent heart beats in BlockManagerMaster. TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 16:00:56,652 Logging.scala (line 66) Checking for hosts with no recent heart beats in BlockManagerMaster. {code} was (Author: xuzhongxing): Some additional logs during the spark driver hang: TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,908 Logging.scala (line 66) Checking for newly runnable parent stages INFO [Result resolver thread-1] 2014-08-13 15:58:15,908 Logging.scala (line 58) Removed TaskSet 1.0, whose tasks have all completed, from pool TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 66) running: Set(Stage 1, Stage 2) TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 66) waiting: Set(Stage 0) TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 66) failed: Set() DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 62) submitStage(Stage 0) DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 Logging.scala (line 62) missing: List(Stage 1, Stage 2) DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 Logging.scala (line 62) submitStage(Stage 1) DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 Logging.scala (line 62) submitStage(Stage 2) TRACE [spark-akka.actor.default-dispatcher-3] 2014-08-13 15:58:56,643 Logging.scala (line 66) Checking for hosts with no recent heart beats in BlockManagerMaster. TRACE [spark-akka.actor.default-dispatcher-6] 2014-08-13 15:59:56,653 Logging.scala (line 66) Checking for hosts with no recent heart beats in BlockManagerMaster. TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 16:00:56,652 Logging.scala (line 66) Checking for hosts with no recent heart beats in BlockManagerMaster. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.Unsuppor
[jira] [Issue Comment Deleted] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Zhongxing updated SPARK-3005: Comment: was deleted (was: A related question: why does fined-grain mode and coarse-grained mode perform differently? Neither of them implement the killTask() method.) > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPo
[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539 ] Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 5:57 AM: -- Could adding an empty killTask method to MesosSchedulerBackend fix this problem? override def killTask(taskId: Long, executorId: String, interruptThread: Boolean) {} This works for my tests. was (Author: xuzhongxing): Could adding an empty killTask method to MesosSchedulerBackend fix this problem? override def killTask(taskId: Long, executorId: String, interruptThread: Boolean) {} > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > {code} > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox
[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539 ] Xu Zhongxing commented on SPARK-3005: - Could adding an empty killTask method to MesosSchedulerBackend fix this problem? override def killTask(taskId: Long, executorId: String, interruptThread: Boolean) {} > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > Attachments: SPARK-3005_1.diff > > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >
[jira] [Issue Comment Deleted] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
[ https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Zhongxing updated SPARK-2204: Comment: was deleted (was: I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector master branch. Maybe this is not fixed on some failure/exception paths. I run spark in coarse-grained mode. There are some exceptions thrown at the executors. But the spark driver is waiting and printing repeatedly: TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 Logging.scala (line 66) Checking for hosts with\ no recent heart beats in BlockManagerMaster. The mesos master WARNING log: W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\ 0-1645-2 (ndb9) W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\ 0-1645-5 (ndb5) W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\ 0-1645-3 (ndb6) W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\ 0-1645-0 (ndb0) W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\ 050-20258-11 (ndb2) W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\ 50-26571-2 (ndb3) W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\ 050-20258-19 (ndb1) W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\ 50-26571-0 (ndb4) Some other logs are at: https://github.com/datastax/spark-cassandra-connector/issues/134 ) > Scheduler for Mesos in fine-grained mode launches tasks on wrong executors > -- > > Key: SPARK-2204 > URL: https://issues.apache.org/jira/browse/SPARK-2204 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.0.0 >Reporter: Sebastien Rainville >Assignee: Sebastien Rainville >Priority: Blocker > Fix For: 1.0.1, 1.1.0 > > > MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is > assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning > task lists in the same order as the offers it was passed, but in the current > implementation TaskSchedulerImpl.resourceOffers shuffles the offers to avoid > assigning the tasks always to the same executors. The result is that the > tasks are launched on the wrong executors. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095288#comment-14095288 ] Xu Zhongxing commented on SPARK-3005: - Some additional logs during the spark driver hang: TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,908 Logging.scala (line 66) Checking for newly runnable parent stages INFO [Result resolver thread-1] 2014-08-13 15:58:15,908 Logging.scala (line 58) Removed TaskSet 1.0, whose tasks have all completed, from pool TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 66) running: Set(Stage 1, Stage 2) TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 66) waiting: Set(Stage 0) TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 66) failed: Set() DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 Logging.scala (line 62) submitStage(Stage 0) DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 Logging.scala (line 62) missing: List(Stage 1, Stage 2) DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 Logging.scala (line 62) submitStage(Stage 1) DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 Logging.scala (line 62) submitStage(Stage 2) TRACE [spark-akka.actor.default-dispatcher-3] 2014-08-13 15:58:56,643 Logging.scala (line 66) Checking for hosts with no recent heart beats in BlockManagerMaster. TRACE [spark-akka.actor.default-dispatcher-6] 2014-08-13 15:59:56,653 Logging.scala (line 66) Checking for hosts with no recent heart beats in BlockManagerMaster. TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 16:00:56,652 Logging.scala (line 66) Checking for hosts with no recent heart beats in BlockManagerMaster. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGSche
[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
[ https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095285#comment-14095285 ] Xu Zhongxing commented on SPARK-3005: - A related question: why does fined-grain mode and coarse-grained mode perform differently? Neither of them implement the killTask() method. > Spark with Mesos fine-grained mode throws UnsupportedOperationException in > MesosSchedulerBackend.killTask() > --- > > Key: SPARK-3005 > URL: https://issues.apache.org/jira/browse/SPARK-3005 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.2 > Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector >Reporter: Xu Zhongxing > > I am using Spark, Mesos, spark-cassandra-connector to do some work on a > cassandra cluster. > During the job running, I killed the Cassandra daemon to simulate some > failure cases. This results in task failures. > If I run the job in Mesos coarse-grained mode, the spark driver program > throws an exception and shutdown cleanly. > But when I run the job in Mesos fine-grained mode, the spark driver program > hangs. > The spark log is: > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 > Logging.scala (line 58) Cancelling stage 1 > INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 > Logging.scala (line 79) Could not cancel tasks for stage 1 > java.lang.UnsupportedOperationException > at > org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) > at > org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoi
[jira] [Created] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()
Xu Zhongxing created SPARK-3005: --- Summary: Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask() Key: SPARK-3005 URL: https://issues.apache.org/jira/browse/SPARK-3005 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2 Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector Reporter: Xu Zhongxing I am using Spark, Mesos, spark-cassandra-connector to do some work on a cassandra cluster. During the job running, I killed the Cassandra daemon to simulate some failure cases. This results in task failures. If I run the job in Mesos coarse-grained mode, the spark driver program throws an exception and shutdown cleanly. But when I run the job in Mesos fine-grained mode, the spark driver program hangs. The spark log is: INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 Logging.scala (line 58) Cancelling stage 1 INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 Logging.scala (line 79) Could not cancel tasks for stage 1 java.lang.UnsupportedOperationException at org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) at org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
[ https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092376#comment-14092376 ] Xu Zhongxing edited comment on SPARK-2204 at 8/11/14 6:49 AM: -- I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector master branch. Maybe this is not fixed on some failure/exception paths. I run spark in coarse-grained mode. There are some exceptions thrown at the executors. But the spark driver is waiting and printing repeatedly: TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 Logging.scala (line 66) Checking for hosts with\ no recent heart beats in BlockManagerMaster. The mesos master WARNING log: W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\ 0-1645-2 (ndb9) W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\ 0-1645-5 (ndb5) W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\ 0-1645-3 (ndb6) W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\ 0-1645-0 (ndb0) W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\ 050-20258-11 (ndb2) W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\ 50-26571-2 (ndb3) W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\ 050-20258-19 (ndb1) W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\ 50-26571-0 (ndb4) Some other logs are at: https://github.com/datastax/spark-cassandra-connector/issues/134 was (Author: xuzhongxing): I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector master branch. I run spark in coarse-grained mode. There are some exceptions thrown at the executors. But the spark driver is waiting and printing repeatedly: TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 Logging.scala (line 66) Checking for hosts with\ no recent heart beats in BlockManagerMaster. The mesos master WARNING log: W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\ 0-1645-2 (ndb9) W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\ 0-1645-5 (ndb5) W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\ 0-1645-3 (ndb6) W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\ 0-1645-0 (ndb0) W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\ 050-20258-11 (ndb2) W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\ 50-26571-2 (ndb3) W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\ 050-20258-19 (ndb1) W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\ 50-26571-0 (ndb4) Some other logs are at: https://github.com/datastax/spark-cassandra-connector/issues/134 > Scheduler for Mesos in fine-grained mode launches tasks on wrong executors > -- > > Key: SPARK-2204 > URL: https://issues.apache.org/jira/browse/SPARK-2204 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.0.0 >Reporter: Sebastien Rainville >Assignee: Sebastien Rainville >Priority: Blocker > Fix For: 1.0.1, 1.1.0 > > > MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is > assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning > task lists in the same order as the offers it was
[jira] [Commented] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
[ https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092376#comment-14092376 ] Xu Zhongxing commented on SPARK-2204: - I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector master branch. I run spark in coarse-grained mode. There are some exceptions thrown at the executors. But the spark driver is waiting and printing repeatedly: TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 Logging.scala (line 66) Checking for hosts with\ no recent heart beats in BlockManagerMaster. The mesos master WARNING log: W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\ 0-1645-2 (ndb9) W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\ 0-1645-5 (ndb5) W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\ 0-1645-3 (ndb6) W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\ 0-1645-0 (ndb0) W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\ 050-20258-11 (ndb2) W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\ 50-26571-2 (ndb3) W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\ 050-20258-19 (ndb1) W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\ 50-26571-0 (ndb4) Some other logs are at: https://github.com/datastax/spark-cassandra-connector/issues/134 > Scheduler for Mesos in fine-grained mode launches tasks on wrong executors > -- > > Key: SPARK-2204 > URL: https://issues.apache.org/jira/browse/SPARK-2204 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.0.0 >Reporter: Sebastien Rainville >Assignee: Sebastien Rainville >Priority: Blocker > Fix For: 1.0.1, 1.1.0 > > > MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is > assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning > task lists in the same order as the offers it was passed, but in the current > implementation TaskSchedulerImpl.resourceOffers shuffles the offers to avoid > assigning the tasks always to the same executors. The result is that the > tasks are launched on the wrong executors. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org