[ https://issues.apache.org/jira/browse/SPARK-13631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178980#comment-15178980 ]
Andy Sloane commented on SPARK-13631: ------------------------------------- Hm. I see, it's just a coincidence that a thread tends to get scheduled from the thread pool while the shuffle task is running, and tries to schedule a new job based on the locations of the (currently in-process) shuffle tasks. There's no blocking wait on an RDD happening that's causing this job to kick off -- it's just trying to schedule it. It seems like we'd want to defer finding preferred locations on an RDD which is currently being computed, somehow, but the job planning seems to happen completely up front and there aren't any indicators in an individual RDD that it's presently being computed. Which means we are probably unnecessarily recomputing RDDs in this multithreaded scheme. > getPreferredLocations race condition in spark 1.6.0? > ---------------------------------------------------- > > Key: SPARK-13631 > URL: https://issues.apache.org/jira/browse/SPARK-13631 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 1.6.0 > Reporter: Andy Sloane > > We are seeing something that looks a lot like a regression from spark 1.2. > When we run jobs with multiple threads, we have a crash somewhere inside > getPreferredLocations, as was fixed in SPARK-4454. Except now it's inside > org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs > instead of DAGScheduler directly. > I tried Spark 1.2 post-SPARK-4454 (before this patch it's only slightly > flaky), 1.4.1, and 1.5.2 and all are fine. 1.6.0 immediately crashes on our > threaded test case, though once in a while it passes. > The stack trace is huge, but starts like this: > Caused by: java.lang.NullPointerException: null > at > org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs(MapOutputTracker.scala:406) > at > org.apache.spark.MapOutputTrackerMaster.getPreferredLocationsForShuffle(MapOutputTracker.scala:366) > at > org.apache.spark.rdd.ShuffledRDD.getPreferredLocations(ShuffledRDD.scala:92) > at > org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:257) > at > org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:257) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:256) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1545) > The full trace is available here: > https://gist.github.com/andy256/97611f19924bbf65cf49 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org