[ https://issues.apache.org/jira/browse/SPARK-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron Davidson updated SPARK-8707: ---------------------------------- Description: Repro: {code} sc.textFile("/ThisFileDoesNotExist").cache() sc.parallelize(0 until 100).toDebugString {code} Output: {code} java.io.IOException: Not a file: /ThisFileDoesNotExist at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:215) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.storage.RDDInfo$.fromRdd(RDDInfo.scala:59) at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455) at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.SparkContext.getRDDStorageInfo(SparkContext.scala:1455) at org.apache.spark.rdd.RDD.debugSelf$1(RDD.scala:1573) at org.apache.spark.rdd.RDD.firstDebugString$1(RDD.scala:1607) at org.apache.spark.rdd.RDD.toDebugString(RDD.scala:1637 {code} This is because toDebugString gets all the partitions from all RDDs, which fails (via SparkContext#getRDDStorageInfo). This pathway should definitely be resilient to other RDDs being invalid (and getRDDStorageInfo should probably also be). was: Repro: {code} sc.parallelize(0 until 100).toDebugString sc.textFile("/ThisFileDoesNotExist").cache() sc.parallelize(0 until 100).toDebugString {code} Output: {code} java.io.IOException: Not a file: /ThisFileDoesNotExist at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:215) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.storage.RDDInfo$.fromRdd(RDDInfo.scala:59) at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455) at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.SparkContext.getRDDStorageInfo(SparkContext.scala:1455) at org.apache.spark.rdd.RDD.debugSelf$1(RDD.scala:1573) at org.apache.spark.rdd.RDD.firstDebugString$1(RDD.scala:1607) at org.apache.spark.rdd.RDD.toDebugString(RDD.scala:1637 {code} This is because toDebugString gets all the partitions from all RDDs, which fails (via SparkContext#getRDDStorageInfo). This pathway should definitely be resilient to other RDDs being invalid (and getRDDStorageInfo should probably also be). > RDD#toDebugString fails if any cached RDD has invalid partitions > ---------------------------------------------------------------- > > Key: SPARK-8707 > URL: https://issues.apache.org/jira/browse/SPARK-8707 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.4.0, 1.4.1 > Reporter: Aaron Davidson > Labels: starter > > Repro: > {code} > sc.textFile("/ThisFileDoesNotExist").cache() > sc.parallelize(0 until 100).toDebugString > {code} > Output: > {code} > java.io.IOException: Not a file: /ThisFileDoesNotExist > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:215) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > at org.apache.spark.storage.RDDInfo$.fromRdd(RDDInfo.scala:59) > at > org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455) > at > org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.SparkContext.getRDDStorageInfo(SparkContext.scala:1455) > at org.apache.spark.rdd.RDD.debugSelf$1(RDD.scala:1573) > at org.apache.spark.rdd.RDD.firstDebugString$1(RDD.scala:1607) > at org.apache.spark.rdd.RDD.toDebugString(RDD.scala:1637 > {code} > This is because toDebugString gets all the partitions from all RDDs, which > fails (via SparkContext#getRDDStorageInfo). This pathway should definitely be > resilient to other RDDs being invalid (and getRDDStorageInfo should probably > also be). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org