[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-09-02 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118069#comment-14118069
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 9/2/14 9:59 AM:
-

By "tasks themselves already died and exited", I mean that even if we do 
nothing in killTasks(), there won't be any zombie tasks left on the slaves. 
This is what I get from testing the Mesos fine-grained mode. If I'm wrong, 
please correct me. But the logic here is incomplete or inconsistent, and needs 
to be fixed.


was (Author: xuzhongxing):
By "tasks themselves already died and exited", I mean that even if we do 
nothing in killTasks(), there won't be any zombie tasks left on the slaves. 
This is what I get from testing. If I'm wrong, please correct me. But the logic 
here is incomplete or inconsistent, and needs to be fixed.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSch

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:33 AM:
--

[SPARK-1749] didn't fix this problem. It just catches the 
UnsupportedOperationException and logs it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
does cleanup when ableToCancelStages = true.
{code}
+ if (ableToCancelStages) {
+   job.listener.jobFailed(error)
+   cleanupStateForJobAndIndependentStages(job, resultStage)
+   listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))
{code}
The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend 
does not need to kill tasks, and should not throw 
UnsupportedOperationException. The tasks themselves already died and exited.



was (Author: xuzhongxing):
[SPARK-1749] didn't fix this problem. It just catches the 
UnsupportedOperationException and logs it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
does cleanup when ableToCancelStages = true.
{{code}}
+ if (ableToCancelStages) {
+   job.listener.jobFailed(error)
+   cleanupStateForJobAndIndependentStages(job, resultStage)
+   listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))
{{code}}
The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend 
does not need to kill tasks, and should not throw 
UnsupportedOperationException. The tasks themselves already died and exited.


> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:32 AM:
--

[SPARK-1749] didn't fix this problem. It just catches the 
UnsupportedOperationException and logs it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
does cleanup when ableToCancelStages = true.
{{code}}
+ if (ableToCancelStages) {
+   job.listener.jobFailed(error)
+   cleanupStateForJobAndIndependentStages(job, resultStage)
+   listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))
{{code}}
The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend 
does not need to kill tasks, and should not throw 
UnsupportedOperationException. The tasks themselves already died and exited.



was (Author: xuzhongxing):
[SPARK-1749] didn't fix this problem. It just catches the 
UnsupportedOperationException and logs it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
do cleanup when ableToCancelStages = true.

+ if (ableToCancelStages) {
+ job.listener.jobFailed(error)
+ cleanupStateForJobAndIndependentStages(job, resultStage)
+ listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend 
does not need to kill tasks, and should not throw 
UnsupportedOperationException. The tasks themselves already died and exited.


> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:30 AM:
--

[SPARK-1749] didn't fix this problem. It just catches the 
UnsupportedOperationException and logs it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
do cleanup when ableToCancelStages = true.

+ if (ableToCancelStages) {
+ job.listener.jobFailed(error)
+ cleanupStateForJobAndIndependentStages(job, resultStage)
+ listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

The problem here is in the MesosSchedulerBackend. The MesosSchedulerBackend 
does not need to kill tasks, and should not throw 
UnsupportedOperationException. The tasks themselves already died and exited.



was (Author: xuzhongxing):
[SPARK-1749] didn't fix this problem completely. It just catch the 
UnsupportedOperationException and log it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
do cleanup when ableToCancelStages = true.

+ if (ableToCancelStages) {
+ job.listener.jobFailed(error)
+ cleanupStateForJobAndIndependentStages(job, resultStage)
+ listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

It actually is able to cancel later stages. It does not need to kill tasks. The 
tasks themselves already died and exited.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$schedul

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108677#comment-14108677
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:19 AM:
--

[SPARK-1749] didn't fix this problem completely. It just catch the 
UnsupportedOperationException and log it. Then it sets ableToCancelStages = 
false. This is exactly the reason that causes the hang. Because the code only 
do cleanup when ableToCancelStages = true.

+ if (ableToCancelStages) {
+ job.listener.jobFailed(error)
+ cleanupStateForJobAndIndependentStages(job, resultStage)
+ listenerBus.post(SparkListenerJobEnd(job.jobId, JobFailed(error)))

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour for this case. We just need to do 
nothing in killTask() and let the driver do the rest of the cleanup.

It actually is able to cancel later stages. It does not need to kill tasks. The 
tasks themselves already died and exited.


was (Author: xuzhongxing):
[SPARK-1749] didn't fix this problem completely. It just catch the 
UnsupportedOperationException and log it. Then it sets ableToCancelStages = 
false. This is the exactly the reason that causes the hang.

The fact is that in the mesos fine-grained case, it is unnecessary to 
killTask(). So throwing UnsupportedOperationException and set 
ableToCancelStages = false is wrong behaviour. 

We should suppress the exception and let the code go the normal path. It 
actually is able to cancel later stages. It does not need to kill tasks. The 
tasks themselves already died and exited. We just need to tell the driver the 
truth.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-24 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108674#comment-14108674
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/25/14 2:04 AM:
--

I was using spark to process data from Cassandra. And when Cassandra is under 
heavy load, the executors on the slaves throws timeout exception, and the tasks 
fail. Then the driver need to cancel the job.

In coarse-grained mode, it works fine. The coarse-grained backend has the 
killTask() method.

But in fine-grained mode, the backend didn't override the killTask() method. So 
it throws exception.

I could tune Cassandra to avoid read timeout. But I just want spark to exit 
when the job fails. Currently it just throws that operation-not-supported 
exception and hangs there. This is unacceptable behaviour.


was (Author: xuzhongxing):
I was using spark to process data from Cassandra. And when Cassandra is under 
heavy load, the executors on the slaves throws timeout exception, and the tasks 
fail. Then the driver need to cancel the job.

In coarse-grained mode, it works fine. The coarse-grained backend has the 
killTask() method.

But in fine-grained mode, the backend didn't override the killTask() method. So 
it throws exception.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> 

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-14 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 7:37 AM:
--

Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?
{code}
override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}
{code}
This works for my tests.


was (Author: xuzhongxing):
Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?

override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}

This works for my tests.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:45

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-14 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 7:37 AM:
--

Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?
{code:title=MesosSchedulerBackend.scala}
override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}
{code}
This works for my tests.


was (Author: xuzhongxing):
Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?
{code}
override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}
{code}
This works for my tests.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at 

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-14 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095288#comment-14095288
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 7:36 AM:
--

Some additional driver logs during the spark driver hang:
{code}
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,908 
Logging.scala (line 66) Checking for newly runnable parent stages
 INFO [Result resolver thread-1] 2014-08-13 15:58:15,908 Logging.scala (line 
58) Removed TaskSet 1.0, whose tasks have all completed, from pool 
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) running: Set(Stage 1, Stage 2)
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) waiting: Set(Stage 0)
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) failed: Set()
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 62) submitStage(Stage 0)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) missing: List(Stage 1, Stage 2)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) submitStage(Stage 1)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) submitStage(Stage 2)
TRACE [spark-akka.actor.default-dispatcher-3] 2014-08-13 15:58:56,643 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
TRACE [spark-akka.actor.default-dispatcher-6] 2014-08-13 15:59:56,653 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 16:00:56,652 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
{code}


was (Author: xuzhongxing):
Some additional logs during the spark driver hang:

TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,908 
Logging.scala (line 66) Checking for newly runnable parent stages
 INFO [Result resolver thread-1] 2014-08-13 15:58:15,908 Logging.scala (line 
58) Removed TaskSet 1.0, whose tasks have all completed, from pool 
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) running: Set(Stage 1, Stage 2)
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) waiting: Set(Stage 0)
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 66) failed: Set()
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,909 
Logging.scala (line 62) submitStage(Stage 0)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) missing: List(Stage 1, Stage 2)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) submitStage(Stage 1)
DEBUG [spark-akka.actor.default-dispatcher-2] 2014-08-13 15:58:15,910 
Logging.scala (line 62) submitStage(Stage 2)
TRACE [spark-akka.actor.default-dispatcher-3] 2014-08-13 15:58:56,643 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
TRACE [spark-akka.actor.default-dispatcher-6] 2014-08-13 15:59:56,653 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.
TRACE [spark-akka.actor.default-dispatcher-2] 2014-08-13 16:00:56,652 
Logging.scala (line 66) Checking for hosts with no recent heart beats in 
BlockManagerMaster.

> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.Unsuppor

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-13 Thread Xu Zhongxing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096539#comment-14096539
 ] 

Xu Zhongxing edited comment on SPARK-3005 at 8/14/14 5:57 AM:
--

Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?

override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}

This works for my tests.


was (Author: xuzhongxing):
Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?

override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}




> Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
> MesosSchedulerBackend.killTask()
> ---
>
> Key: SPARK-3005
> URL: https://issues.apache.org/jira/browse/SPARK-3005
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
> Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
>Reporter: Xu Zhongxing
> Attachments: SPARK-3005_1.diff
>
>
> I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
> cassandra cluster.
> During the job running, I killed the Cassandra daemon to simulate some 
> failure cases. This results in task failures.
> If I run the job in Mesos coarse-grained mode, the spark driver program 
> throws an exception and shutdown cleanly.
> But when I run the job in Mesos fine-grained mode, the spark driver program 
> hangs.
> The spark log is: 
> {code}
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
> Logging.scala (line 58) Cancelling stage 1
>  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
> Logging.scala (line 79) Could not cancel tasks for stage 1
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
>   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox