Eunsu Yun created SPARK-1960: -------------------------------- Summary: EOFException when 0 size file exists when use sc.sequenceFile[K,V]("path") Key: SPARK-1960 URL: https://issues.apache.org/jira/browse/SPARK-1960 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Eunsu Yun
java.io.EOFException throws when use sc.sequenceFile[K,V] if there is a file which size is 0. I also tested sc.textFile() in the same condition and it does not throw EOFException. val text = sc.sequenceFile[Long, String]("data-gz/*.dat.gz") val result = text.filter(filterValid) result.saveAsTextFile("data-out/") ------------------ java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) .............. -- This message was sent by Atlassian JIRA (v6.2#6252)