[jira] [Commented] (SPARK-6880) Spark Shutdowns with NoSuchElementException when running parallel collect on cachedRDD

2015-09-17 Thread Mark Hamstra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803261#comment-14803261
 ] 

Mark Hamstra commented on SPARK-6880:
-

see SPARK-10666

> Spark Shutdowns with NoSuchElementException when running parallel collect on 
> cachedRDD
> --
>
> Key: SPARK-6880
> URL: https://issues.apache.org/jira/browse/SPARK-6880
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: CentOs6.0, java7
>Reporter: pankaj arora
>Assignee: pankaj arora
> Fix For: 1.4.0
>
>
> Spark Shutdowns with NoSuchElementException when running parallel collect on 
> cachedRDDs
> Below is the stack trace
> 15/03/27 11:12:43 ERROR DAGSchedulerActorSupervisor: eventProcesserActor 
> failed; shutting down SparkContext
> java.util.NoSuchElementException: key not found: 28
> at scala.collection.MapLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:58)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:808)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1389)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6880) Spark Shutdowns with NoSuchElementException when running parallel collect on cachedRDD

2015-09-02 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727617#comment-14727617
 ] 

Imran Rashid commented on SPARK-6880:
-

Explanation of the remaining issue from [~markhamstra] on why the previous fix 
was an improvement, and did completely remove the NPE, but wasn't quite the 
right fix, because it left some minor issues:

bq. Tasks for a Stage that was previously part of a Job that is no longer 
active would be re-submitted as though they were part of the prior Job and with 
no properties set. Since properties are what are used to set an 
other-than-default scheduling pool, this would affect FAIR scheduler usage, but 
it would also affect anything else that depends on the settings of the 
properties (which would be just user code at this point, since Spark itself 
doesn't really use the properties for anything else other than Job Group and 
Description, which end up in the WebUI, can be used to kill by JobGroup, etc.) 
Even the default, FIFO scheduling would be affected, however, since the 
resubmission of the Tasks under the earlier jobId would effectively give them a 
higher priority/greater urgency than the ActiveJob that now actually needs 
them.  In any event, the Tasks would generate correct results.

> Spark Shutdowns with NoSuchElementException when running parallel collect on 
> cachedRDD
> --
>
> Key: SPARK-6880
> URL: https://issues.apache.org/jira/browse/SPARK-6880
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.1
> Environment: CentOs6.0, java7
>Reporter: pankaj arora
>Assignee: pankaj arora
> Fix For: 1.4.0
>
>
> Spark Shutdowns with NoSuchElementException when running parallel collect on 
> cachedRDDs
> Below is the stack trace
> 15/03/27 11:12:43 ERROR DAGSchedulerActorSupervisor: eventProcesserActor 
> failed; shutting down SparkContext
> java.util.NoSuchElementException: key not found: 28
> at scala.collection.MapLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:58)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:808)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1389)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6880) Spark Shutdowns with NoSuchElementException when running parallel collect on cachedRDD

2015-05-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553008#comment-14553008
 ] 

Apache Spark commented on SPARK-6880:
-

User 'markhamstra' has created a pull request for this issue:
https://github.com/apache/spark/pull/6291

 Spark Shutdowns with NoSuchElementException when running parallel collect on 
 cachedRDD
 --

 Key: SPARK-6880
 URL: https://issues.apache.org/jira/browse/SPARK-6880
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.1
 Environment: CentOs6.0, java7
Reporter: pankaj arora
Assignee: pankaj arora
 Fix For: 1.4.0


 Spark Shutdowns with NoSuchElementException when running parallel collect on 
 cachedRDDs
 Below is the stack trace
 15/03/27 11:12:43 ERROR DAGSchedulerActorSupervisor: eventProcesserActor 
 failed; shutting down SparkContext
 java.util.NoSuchElementException: key not found: 28
 at scala.collection.MapLike$class.default(MapLike.scala:228)
 at scala.collection.AbstractMap.default(Map.scala:58)
 at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:808)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
 at 
 org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762)
 at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1389)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6880) Spark Shutdowns with NoSuchElementException when running parallel collect on cachedRDD

2015-05-19 Thread Mark Hamstra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551647#comment-14551647
 ] 

Mark Hamstra commented on SPARK-6880:
-

This fix should also be applied as far back as we care to go in the maintenance 
branches, since the bug also exists in branch-1.0, branch-1.1, branch-1.2 and 
branch-1.3.

 Spark Shutdowns with NoSuchElementException when running parallel collect on 
 cachedRDD
 --

 Key: SPARK-6880
 URL: https://issues.apache.org/jira/browse/SPARK-6880
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.1
 Environment: CentOs6.0, java7
Reporter: pankaj arora
Assignee: pankaj arora
 Fix For: 1.4.0


 Spark Shutdowns with NoSuchElementException when running parallel collect on 
 cachedRDDs
 Below is the stack trace
 15/03/27 11:12:43 ERROR DAGSchedulerActorSupervisor: eventProcesserActor 
 failed; shutting down SparkContext
 java.util.NoSuchElementException: key not found: 28
 at scala.collection.MapLike$class.default(MapLike.scala:228)
 at scala.collection.AbstractMap.default(Map.scala:58)
 at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:808)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
 at 
 org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762)
 at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1389)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6880) Spark Shutdowns with NoSuchElementException when running parallel collect on cachedRDD

2015-04-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492622#comment-14492622
 ] 

Apache Spark commented on SPARK-6880:
-

User 'pankajarora12' has created a pull request for this issue:
https://github.com/apache/spark/pull/5494

 Spark Shutdowns with NoSuchElementException when running parallel collect on 
 cachedRDD
 --

 Key: SPARK-6880
 URL: https://issues.apache.org/jira/browse/SPARK-6880
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.1
 Environment: CentOs6.0, java7
Reporter: pankaj arora

 Spark Shutdowns with NoSuchElementException when running parallel collect on 
 cachedRDDs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6880) Spark Shutdowns with NoSuchElementException when running parallel collect on cachedRDD

2015-04-13 Thread pankaj arora (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492642#comment-14492642
 ] 

pankaj arora commented on SPARK-6880:
-

Sean,
Sorry for missing stack trace. Added that in description.

 Spark Shutdowns with NoSuchElementException when running parallel collect on 
 cachedRDD
 --

 Key: SPARK-6880
 URL: https://issues.apache.org/jira/browse/SPARK-6880
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.1
 Environment: CentOs6.0, java7
Reporter: pankaj arora

 Spark Shutdowns with NoSuchElementException when running parallel collect on 
 cachedRDDs
 Below is the stack trace
 15/03/27 11:12:43 ERROR DAGSchedulerActorSupervisor: eventProcesserActor 
 failed; shutting down SparkContext
 java.util.NoSuchElementException: key not found: 28
 at scala.collection.MapLike$class.default(MapLike.scala:228)
 at scala.collection.AbstractMap.default(Map.scala:58)
 at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:808)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:781)
 at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:780)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:780)
 at 
 org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762)
 at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1389)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org