I have a bunch of gzip files which I am trying to process with Hadoop task. The task fails with exception: java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java: 141) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92) at org.apache.hadoop.io.compress.GzipCodec $GzipInputStream.read(GzipCodec.java:124) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org .apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java: 136) at org .apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java: 128) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java: 117) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java: 39) at org.apache.hadoop.mapred.MapTask $TrackedRecordReader.next(MapTask.java:147) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016) I guess some of files are invalid. However I could not find anywhere in logs file name of the file causing this exception. Due to the huge size of the dataset I would not want to extract files from DFS and verify them with Gzip one by one. Any suggestions? Thanks!
Sincerely,
Vadim

Reply via email to