[ 
https://issues.apache.org/jira/browse/SPARK-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eunsu Yun updated SPARK-1960:
-----------------------------

    Description: 
java.io.EOFException throws when use sc.sequenceFile[K,V] if there is a file 
which size is 0. 
I also tested sc.textFile() in the same condition and it does not throw 
EOFException.

val text = sc.sequenceFile[Long, String]("data-gz/*.dat.gz")
val result = text.filter(filterValid)
result.saveAsTextFile("data-out/")


------------------

java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:197)
        at java.io.DataInputStream.readFully(DataInputStream.java:169)
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845)
        at 
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
        at 
org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
        at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
        at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
        at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
..............

  was:

java.io.EOFException throws when use sc.sequenceFile[K,V] if there is a file 
which size is 0. 
I also tested sc.textFile() in the same condition and it does not throw 
EOFException.

val text = sc.sequenceFile[Long, String]("data-gz/*.dat.gz")
val result = text.filter(filterValid)
result.saveAsTextFile("data-out/")


------------------

java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:197)
        at java.io.DataInputStream.readFully(DataInputStream.java:169)
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845)
        at 
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
        at 
org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
        at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
        at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
        at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
..............

        Summary: EOFException when file size 0 exists when use 
sc.sequenceFile[K,V]("path")  (was: EOFException when 0 size file exists when 
use sc.sequenceFile[K,V]("path"))

> EOFException when file size 0 exists when use sc.sequenceFile[K,V]("path")
> --------------------------------------------------------------------------
>
>                 Key: SPARK-1960
>                 URL: https://issues.apache.org/jira/browse/SPARK-1960
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Eunsu Yun
>
> java.io.EOFException throws when use sc.sequenceFile[K,V] if there is a file 
> which size is 0. 
> I also tested sc.textFile() in the same condition and it does not throw 
> EOFException.
> val text = sc.sequenceFile[Long, String]("data-gz/*.dat.gz")
> val result = text.filter(filterValid)
> result.saveAsTextFile("data-out/")
> ------------------
> java.io.EOFException
>       at java.io.DataInputStream.readFully(DataInputStream.java:197)
>       at java.io.DataInputStream.readFully(DataInputStream.java:169)
>       at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
>       at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
>       at 
> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
>       at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
>       at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
>       at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>       at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>       at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
> ..............



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to