[jira] [Commented] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors

2014-08-10 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092376#comment-14092376
 ] 

Xu Zhongxing commented on SPARK-2204:
-

I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, 
spark-cassandra-connector master branch.

I run spark in coarse-grained mode. There are some exceptions thrown at the 
executors. But the spark driver is waiting and printing repeatedly:

TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 
Logging.scala (line 66) Checking for hosts with\
 no recent heart beats in BlockManagerMaster.

The mesos master WARNING log:
W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\
0-1645-2 (ndb9)
W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\
0-1645-5 (ndb5)
W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\
0-1645-3 (ndb6)
W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\
0-1645-0 (ndb0)
W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\
050-20258-11 (ndb2)
W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\
50-26571-2 (ndb3)
W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\
050-20258-19 (ndb1)
W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\
50-26571-0 (ndb4)

Some other logs are at: 
https://github.com/datastax/spark-cassandra-connector/issues/134


> Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
> --
>
> Key: SPARK-2204
> URL: https://issues.apache.org/jira/browse/SPARK-2204
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.0.0
>Reporter: Sebastien Rainville
>Assignee: Sebastien Rainville
>Priority: Blocker
> Fix For: 1.0.1, 1.1.0
>
>
> MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is 
> assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning 
> task lists in the same order as the offers it was passed, but in the current 
> implementation TaskSchedulerImpl.resourceOffers shuffles the offers to avoid 
> assigning the tasks always to the same executors. The result is that the 
> tasks are launched on the wrong executors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors

2014-08-10 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092376#comment-14092376
 ] 

Xu Zhongxing edited comment on SPARK-2204 at 8/11/14 6:49 AM:
--

I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, 
spark-cassandra-connector master branch.

Maybe this is not fixed on some failure/exception paths.

I run spark in coarse-grained mode. There are some exceptions thrown at the 
executors. But the spark driver is waiting and printing repeatedly:

TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 
Logging.scala (line 66) Checking for hosts with\
 no recent heart beats in BlockManagerMaster.

The mesos master WARNING log:
W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\
0-1645-2 (ndb9)
W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\
0-1645-5 (ndb5)
W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\
0-1645-3 (ndb6)
W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\
0-1645-0 (ndb0)
W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\
050-20258-11 (ndb2)
W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\
50-26571-2 (ndb3)
W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\
050-20258-19 (ndb1)
W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\
50-26571-0 (ndb4)

Some other logs are at: 
https://github.com/datastax/spark-cassandra-connector/issues/134



was (Author: xuzhongxing):
I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, 
spark-cassandra-connector master branch.

I run spark in coarse-grained mode. There are some exceptions thrown at the 
executors. But the spark driver is waiting and printing repeatedly:

TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 
Logging.scala (line 66) Checking for hosts with\
 no recent heart beats in BlockManagerMaster.

The mesos master WARNING log:
W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\
0-1645-2 (ndb9)
W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\
0-1645-5 (ndb5)
W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\
0-1645-3 (ndb6)
W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\
0-1645-0 (ndb0)
W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\
050-20258-11 (ndb2)
W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\
50-26571-2 (ndb3)
W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\
050-20258-19 (ndb1)
W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\
50-26571-0 (ndb4)

Some other logs are at: 
https://github.com/datastax/spark-cassandra-connector/issues/134


> Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
> --
>
> Key: SPARK-2204
> URL: https://issues.apache.org/jira/browse/SPARK-2204
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.0.0
>Reporter: Sebastien Rainville
>Assignee: Sebastien Rainville
>Priority: Blocker
> Fix For: 1.0.1, 1.1.0
>
>
> MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is 
> assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning 
> task lists in the same order as the offers it was

[jira] [Created] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-13 Thread Xu Zhongxing (JIRA)
Xu Zhongxing created SPARK-3005:
---

 Summary: Spark with Mesos fine-grained mode throws 
UnsupportedOperationException in MesosSchedulerBackend.killTask()
 Key: SPARK-3005
 URL: https://issues.apache.org/jira/browse/SPARK-3005
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2
 Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
Reporter: Xu Zhongxing


I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
cassandra cluster.

During the job running, I killed the Cassandra daemon to simulate some failure 
cases. This results in task failures.

If I run the job in Mesos coarse-grained mode, the spark driver program throws 
an exception and shutdown cleanly.

But when I run the job in Mesos fine-grained mode, the spark driver program 
hangs.

The spark log is: 

 INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
Logging.scala (line 58) Cancelling stage 1
 INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
Logging.scala (line 79) Could not cancel tasks for stage 1
java.lang.UnsupportedOperationException
at 
org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
at 
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-13 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095285#comment-14095285
 ] 

Xu Zhongxing commented on SPARK-3005:
-

A related question: why does fined-grain mode and coarse-grained mode perform 
differently? Neither of them implement the killTask() method.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoi

[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-13 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095288#comment-14095288
 ] 

Xu Zhongxing commented on SPARK-3005:
-

Some additional logs during the spark driver hang:

TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,908 
Logging.scala (line 66) Checking for newly runnable parent stages
 INFO [Result resolver thread-1] 2014-08-13 15:58:15,908 Logging.scala (line 
58) Removed TaskSet 1.0, whose tasks have all completed, from pool 
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) running: Set(Stage 1, Stage 2)
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) waiting: Set(Stage 0)
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) failed: Set()
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 62) submitStage(Stage 0)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) missing: List(Stage 1, Stage 2)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) submitStage(Stage 1)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) submitStage(Stage 2)
TRACE [spark-akka.actor.default-dispatcher-3] 2014-08-13 15:58:56,643 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
TRACE [spark-akka.actor.default-dispatcher-6] 2014-08-13 15:59:56,653 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 16:00:56,652 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGSche

[jira] [Issue Comment Deleted] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors

2014-08-13 Thread Xu Zhongxing (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Zhongxing updated SPARK-2204:


Comment: was deleted

(was: I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, 
spark-cassandra-connector master branch.

Maybe this is not fixed on some failure/exception paths.

I run spark in coarse-grained mode. There are some exceptions thrown at the 
executors. But the spark driver is waiting and printing repeatedly:

TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 
Logging.scala (line 66) Checking for hosts with\
 no recent heart beats in BlockManagerMaster.

The mesos master WARNING log:
W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\
0-1645-2 (ndb9)
W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\
0-1645-5 (ndb5)
W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\
0-1645-3 (ndb6)
W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\
0-1645-0 (ndb0)
W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\
050-20258-11 (ndb2)
W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\
50-26571-2 (ndb3)
W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\
050-20258-19 (ndb1)
W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\
50-26571-0 (ndb4)

Some other logs are at: 
https://github.com/datastax/spark-cassandra-connector/issues/134
)

> Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
> --
>
> Key: SPARK-2204
> URL: https://issues.apache.org/jira/browse/SPARK-2204
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.0.0
>Reporter: Sebastien Rainville
>Assignee: Sebastien Rainville
>Priority: Blocker
> Fix For: 1.0.1, 1.1.0
>
>
> MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is 
> assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning 
> task lists in the same order as the offers it was passed, but in the current 
> implementation TaskSchedulerImpl.resourceOffers shuffles the offers to avoid 
> assigning the tasks always to the same executors. The result is that the 
> tasks are launched on the wrong executors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-13 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539
 ] 

Xu Zhongxing commented on SPARK-3005:
-

Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?

override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}




> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>  

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-13 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 5:57 AM:
--

Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?

override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}

This works for my tests.


was (Author: xuzhongxing):
Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?

override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}




> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox

[jira] [Issue Comment Deleted] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-14 Thread Xu Zhongxing (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Zhongxing updated SPARK-3005:


Comment: was deleted

(was: A related question: why does fined-grain mode and coarse-grained mode 
perform differently? Neither of them implement the killTask() method.)

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPo

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-14 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095288#comment-14095288
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 7:36 AM:
--

Some additional driver logs during the spark driver hang:
{code}
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,908 
Logging.scala (line 66) Checking for newly runnable parent stages
 INFO [Result resolver thread-1] 2014-08-13 15:58:15,908 Logging.scala (line 
58) Removed TaskSet 1.0, whose tasks have all completed, from pool 
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) running: Set(Stage 1, Stage 2)
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) waiting: Set(Stage 0)
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) failed: Set()
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 62) submitStage(Stage 0)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) missing: List(Stage 1, Stage 2)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) submitStage(Stage 1)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) submitStage(Stage 2)
TRACE [spark-akka.actor.default-dispatcher-3] 2014-08-13 15:58:56,643 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
TRACE [spark-akka.actor.default-dispatcher-6] 2014-08-13 15:59:56,653 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 16:00:56,652 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
{code}


was (Author: xuzhongxing):
Some additional logs during the spark driver hang:

TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,908 
Logging.scala (line 66) Checking for newly runnable parent stages
 INFO [Result resolver thread-1] 2014-08-13 15:58:15,908 Logging.scala (line 
58) Removed TaskSet 1.0, whose tasks have all completed, from pool 
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) running: Set(Stage 1, Stage 2)
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) waiting: Set(Stage 0)
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) failed: Set()
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 62) submitStage(Stage 0)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) missing: List(Stage 1, Stage 2)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) submitStage(Stage 1)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) submitStage(Stage 2)
TRACE [spark-akka.actor.default-dispatcher-3] 2014-08-13 15:58:56,643 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
TRACE [spark-akka.actor.default-dispatcher-6] 2014-08-13 15:59:56,653 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 16:00:56,652 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.Unsuppor

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-14 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 7:37 AM:
--

Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?
{code:title=MesosSchedulerBackend.scala}
override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}
{code}
This works for my tests.


was (Author: xuzhongxing):
Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?
{code}
override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}
{code}
This works for my tests.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at 

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-14 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 7:37 AM:
--

Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?
{code}
override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}
{code}
This works for my tests.


was (Author: xuzhongxing):
Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?

override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}

This works for my tests.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:45

[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108674#comment-14108674
 ] 

Xu Zhongxing commented on SPARK-3005:
-

I was using spark to process data from Cassandra. And when Cassandra is under 
heavy load, the executors on the slaves throws timeout exception, and the tasks 
fail. Then the driver need to cancel the job.

In coarse-grained mode, it works fine. The coarse-grained backend has the 
killTask() method.

But in fine-grained mode, the backend didn't override the killTask() method. So 
it throws exception.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108674#comment-14108674
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:04 AM:
--

I was using spark to process data from Cassandra. And when Cassandra is under 
heavy load, the executors on the slaves throws timeout exception, and the tasks 
fail. Then the driver need to cancel the job.

In coarse-grained mode, it works fine. The coarse-grained backend has the 
killTask() method.

But in fine-grained mode, the backend didn't override the killTask() method. So 
it throws exception.

I could tune Cassandra to avoid read timeout. But I just want spark to exit 
when the job fails. Currently it just throws that operation-not-supported 
exception and hangs there. This is unacceptable behaviour.


was (Author: xuzhongxing):
I was using spark to process data from Cassandra. And when Cassandra is under 
heavy load, the executors on the slaves throws timeout exception, and the tasks 
fail. Then the driver need to cancel the job.

In coarse-grained mode, it works fine. The coarse-grained backend has the 
killTask() method.

But in fine-grained mode, the backend didn't override the killTask() method. So 
it throws exception.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> 

[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677
 ] 

Xu Zhongxing commented on SPARK-3005:
-

[SPARK-1749] didn't fix this problem completely. It just catch the 
UnsupportedOperationException and log it. Then it sets ableToCancelStages = 
false. This is the exactly the reason that causes the hang.

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour. 

We should suppress the exception and let the code go the normal path. It 
actually is able to cancel later stages. It does not need to kill tasks. The 
tasks themselves already died and exited. We just need to tell the driver the 
truth.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGSc

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:19 AM:
--

[SPARK-1749] didn't fix this problem completely. It just catch the 
UnsupportedOperationException and log it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
do cleanup when ableToCancelStages = true.

+ if (ableToCancelStages) {
+ job.listener.jobFailed(error)
+ cleanupStateForJobAndIndependentStages(job, resultStage)
+ listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

It actually is able to cancel later stages. It does not need to kill tasks. The 
tasks themselves already died and exited.


was (Author: xuzhongxing):
[SPARK-1749] didn't fix this problem completely. It just catch the 
UnsupportedOperationException and log it. Then it sets ableToCancelStages = 
false. This is the exactly the reason that causes the hang.

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour. 

We should suppress the exception and let the code go the normal path. It 
actually is able to cancel later stages. It does not need to kill tasks. The 
tasks themselves already died and exited. We just need to tell the driver the 
truth.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:30 AM:
--

[SPARK-1749] didn't fix this problem. It just catches the 
UnsupportedOperationException and logs it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
do cleanup when ableToCancelStages = true.

+ if (ableToCancelStages) {
+ job.listener.jobFailed(error)
+ cleanupStateForJobAndIndependentStages(job, resultStage)
+ listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend 
does not need to kill tasks, and should not throw 
UnsupportedOperationException. The tasks themselves already died and exited.



was (Author: xuzhongxing):
[SPARK-1749] didn't fix this problem completely. It just catch the 
UnsupportedOperationException and log it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
do cleanup when ableToCancelStages = true.

+ if (ableToCancelStages) {
+ job.listener.jobFailed(error)
+ cleanupStateForJobAndIndependentStages(job, resultStage)
+ listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

It actually is able to cancel later stages. It does not need to kill tasks. The 
tasks themselves already died and exited.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$schedul

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:32 AM:
--

[SPARK-1749] didn't fix this problem. It just catches the 
UnsupportedOperationException and logs it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
does cleanup when ableToCancelStages = true.
{{code}}
+ if (ableToCancelStages) {
+   job.listener.jobFailed(error)
+   cleanupStateForJobAndIndependentStages(job, resultStage)
+   listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))
{{code}}
The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend 
does not need to kill tasks, and should not throw 
UnsupportedOperationException. The tasks themselves already died and exited.



was (Author: xuzhongxing):
[SPARK-1749] didn't fix this problem. It just catches the 
UnsupportedOperationException and logs it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
do cleanup when ableToCancelStages = true.

+ if (ableToCancelStages) {
+ job.listener.jobFailed(error)
+ cleanupStateForJobAndIndependentStages(job, resultStage)
+ listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend 
does not need to kill tasks, and should not throw 
UnsupportedOperationException. The tasks themselves already died and exited.


> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:33 AM:
--

[SPARK-1749] didn't fix this problem. It just catches the 
UnsupportedOperationException and logs it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
does cleanup when ableToCancelStages = true.
{code}
+ if (ableToCancelStages) {
+   job.listener.jobFailed(error)
+   cleanupStateForJobAndIndependentStages(job, resultStage)
+   listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))
{code}
The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend 
does not need to kill tasks, and should not throw 
UnsupportedOperationException. The tasks themselves already died and exited.



was (Author: xuzhongxing):
[SPARK-1749] didn't fix this problem. It just catches the 
UnsupportedOperationException and logs it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
does cleanup when ableToCancelStages = true.
{{code}}
+ if (ableToCancelStages) {
+   job.listener.jobFailed(error)
+   cleanupStateForJobAndIndependentStages(job, resultStage)
+   listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))
{{code}}
The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend 
does not need to kill tasks, and should not throw 
UnsupportedOperationException. The tasks themselves already died and exited.


> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1

[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-09-02 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118069#comment-14118069
 ] 

Xu Zhongxing commented on SPARK-3005:
-

By "tasks themselves already died and exited", I mean that even if we do 
nothing in killTasks(), there won't be any zombie tasks left on the slaves. 
This is what I get from testing. If I'm wrong, please correct me. But the logic 
here is incomplete or inconsistent, and needs to be fixed.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaFork

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-09-02 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118069#comment-14118069
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 9/2/14 9:59 AM:
-

By "tasks themselves already died and exited", I mean that even if we do 
nothing in killTasks(), there won't be any zombie tasks left on the slaves. 
This is what I get from testing the Mesos fine-grained mode. If I'm wrong, 
please correct me. But the logic here is incomplete or inconsistent, and needs 
to be fixed.


was (Author: xuzhongxing):
By "tasks themselves already died and exited", I mean that even if we do 
nothing in killTasks(), there won't be any zombie tasks left on the slaves. 
This is what I get from testing. If I'm wrong, please correct me. But the logic 
here is incomplete or inconsistent, and needs to be fixed.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSch

[jira] [Closed] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-10-07 Thread Xu Zhongxing (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Zhongxing closed SPARK-3005.
---
Resolution: Fixed

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.con

[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-10-07 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162913#comment-14162913
 ] 

Xu Zhongxing commented on SPARK-3005:
-

Resolved in https://github.com/apache/spark/pull/2453

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scal