[ https://issues.apache.org/jira/browse/SPARK-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Or updated SPARK-8707: ----------------------------- Assignee: Navis > RDD#toDebugString fails if any cached RDD has invalid partitions > ---------------------------------------------------------------- > > Key: SPARK-8707 > URL: https://issues.apache.org/jira/browse/SPARK-8707 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.4.0, 1.4.1 > Reporter: Aaron Davidson > Assignee: Navis > Labels: starter > Fix For: 1.6.0 > > > Repro: > {code} > sc.textFile("/ThisFileDoesNotExist").cache() > sc.parallelize(0 until 100).toDebugString > {code} > Output: > {code} > java.io.IOException: Not a file: /ThisFileDoesNotExist > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:215) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > at org.apache.spark.storage.RDDInfo$.fromRdd(RDDInfo.scala:59) > at > org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455) > at > org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.SparkContext.getRDDStorageInfo(SparkContext.scala:1455) > at org.apache.spark.rdd.RDD.debugSelf$1(RDD.scala:1573) > at org.apache.spark.rdd.RDD.firstDebugString$1(RDD.scala:1607) > at org.apache.spark.rdd.RDD.toDebugString(RDD.scala:1637 > {code} > This is because toDebugString gets all the partitions from all RDDs, which > fails (via SparkContext#getRDDStorageInfo). This pathway should definitely be > resilient to other RDDs being invalid (and getRDDStorageInfo should probably > also be). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org