[ https://issues.apache.org/jira/browse/SPARK-14658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kay Ousterhout updated SPARK-14658: ----------------------------------- Fix Version/s: 2.2.0 > when executor lost DagScheduer may submit one stage twice even if the first > running taskset for this stage is not finished > -------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-14658 > URL: https://issues.apache.org/jira/browse/SPARK-14658 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 1.6.1, 2.0.0, 2.1.0, 2.2.0 > Environment: spark1.6.1 hadoop-2.6.0-cdh5.4.2 > Reporter: yixiaohua > Fix For: 2.2.0 > > > {code} > 16/04/14 15:35:22 ERROR DAGSchedulerEventProcessLoop: > DAGSchedulerEventProcessLoop failed; shutting down SparkContext > java.lang.IllegalStateException: more than one active taskSet for stage 57: > 57.2,57.1 > at > org.apache.spark.scheduler.TaskSchedulerImpl.submitTasks(TaskSchedulerImpl.scala:173) > at > org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1052) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:921) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1214) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1637) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > First Time: > {code} > 16/04/14 15:35:20 INFO DAGScheduler: Resubmitting ShuffleMapStage 57 (run at > AccessController.java:-2) because some of its tasks had failed: 5, 8, 9, 12, > 13, 16, 17, 18, 19, 23, 26, 27, 28, 29, 30, 31, 40, 42, 43, 48, 49, 50, 51, > 52, 53, 55, 56, 57, 59, 60, 61, 67, 70, 71, 84, 85, 86, 87, 98, 99, 100, 101, > 108, 109, 110, 111, 112, 113, 114, 115, 126, 127, 134, 136, 137, 146, 147, > 150, 151, 154, 155, 158, 159, 162, 163, 164, 165, 166, 167, 170, 171, 172, > 173, 174, 175, 176, 177, 178, 179, 180, 181, 188, 189, 190, 191, 198, 199, > 204, 206, 207, 208, 218, 219, 222, 223, 230, 231, 236, 238, 239 > 16/04/14 15:35:20 DEBUG DAGScheduler: submitStage(ShuffleMapStage 57) > 16/04/14 15:35:20 DEBUG DAGScheduler: missing: List() > 16/04/14 15:35:20 INFO DAGScheduler: Submitting ShuffleMapStage 57 > (MapPartitionsRDD[7887] at run at AccessController.java:-2), which has no > missing parents > 16/04/14 15:35:20 DEBUG DAGScheduler: submitMissingTasks(ShuffleMapStage 57) > 16/04/14 15:35:20 INFO DAGScheduler: Submitting 100 missing tasks from > ShuffleMapStage 57 (MapPartitionsRDD[7887] at run at AccessController.java:-2) > 16/04/14 15:35:20 DEBUG DAGScheduler: New pending partitions: Set(206, 177, > 127, 98, 48, 27, 23, 163, 238, 188, 159, 28, 109, 59, 9, 176, 126, 207, 174, > 43, 170, 208, 158, 108, 29, 8, 204, 154, 223, 173, 219, 190, 111, 61, 40, > 136, 115, 86, 57, 155, 55, 230, 222, 180, 172, 151, 101, 18, 166, 56, 137, > 87, 52, 171, 71, 42, 167, 198, 67, 17, 236, 165, 13, 5, 53, 178, 99, 70, 49, > 218, 147, 164, 114, 85, 60, 31, 179, 150, 19, 100, 50, 175, 146, 134, 113, > 84, 51, 30, 199, 26, 16, 191, 162, 112, 12, 239, 231, 189, 181, 110) > {code} > Second Time: > {code} > 16/04/14 15:35:22 INFO DAGScheduler: Resubmitting ShuffleMapStage 57 (run at > AccessController.java:-2) because some of its tasks had failed: 26 > 16/04/14 15:35:22 DEBUG DAGScheduler: submitStage(ShuffleMapStage 57) > 16/04/14 15:35:22 DEBUG DAGScheduler: missing: List() > 16/04/14 15:35:22 INFO DAGScheduler: Submitting ShuffleMapStage 57 > (MapPartitionsRDD[7887] at run at AccessController.java:-2), which has no > missing parents > 16/04/14 15:35:22 DEBUG DAGScheduler: submitMissingTasks(ShuffleMapStage 57) > 16/04/14 15:35:22 INFO DAGScheduler: Submitting 1 missing tasks from > ShuffleMapStage 57 (MapPartitionsRDD[7887] at run at AccessController.java:-2) > 16/04/14 15:35:22 DEBUG DAGScheduler: New pending partitions: Set(26) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org