[ https://issues.apache.org/jira/browse/SPARK-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eunsu Yun updated SPARK-1960: ----------------------------- Description: java.io.EOFException throws when use sc.sequenceFile[K,V] if there is a file which size is 0. I also tested sc.textFile() in the same condition and it does not throw EOFException. val text = sc.sequenceFile[Long, String]("data-gz/*.dat.gz") val result = text.filter(filterValid) result.saveAsTextFile("data-out/") ------------------ java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) .............. was: java.io.EOFException throws when use sc.sequenceFile[K,V] if there is a file which size is 0. I also tested sc.textFile() in the same condition and it does not throw EOFException. val text = sc.sequenceFile[Long, String]("data-gz/*.dat.gz") val result = text.filter(filterValid) result.saveAsTextFile("data-out/") ------------------ java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) .............. Summary: EOFException when file size 0 exists when use sc.sequenceFile[K,V]("path") (was: EOFException when 0 size file exists when use sc.sequenceFile[K,V]("path")) > EOFException when file size 0 exists when use sc.sequenceFile[K,V]("path") > -------------------------------------------------------------------------- > > Key: SPARK-1960 > URL: https://issues.apache.org/jira/browse/SPARK-1960 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.0 > Reporter: Eunsu Yun > > java.io.EOFException throws when use sc.sequenceFile[K,V] if there is a file > which size is 0. > I also tested sc.textFile() in the same condition and it does not throw > EOFException. > val text = sc.sequenceFile[Long, String]("data-gz/*.dat.gz") > val result = text.filter(filterValid) > result.saveAsTextFile("data-out/") > ------------------ > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845) > at > org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49) > at > org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) > at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > .............. -- This message was sent by Atlassian JIRA (v6.2#6252)