Re: broken gzip file

Ted Dunning Tue, 29 Jan 2008 10:51:08 -0800

Vadim,

IF you drill into the task using the job tracker's web interface, you can
get to the tasks xml configuration.  That configuration will have the input
file split specification in it.


You may also be able to see the input file elsewhere, but the xml
configuration is definitive.


On 1/29/08 10:33 AM, "Vadim Zaliva" <[EMAIL PROTECTED]> wrote:

> I have a bunch of gzip files which I am trying to process with Hadoop
> task. The task fails with exception:
> java.io.EOFException: Unexpected end of ZLIB input stream at
> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223)
> at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:
> 141) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92) at
> org.apache.hadoop.io.compress.GzipCodec
> $GzipInputStream.read(GzipCodec.java:124) at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at
> java.io.BufferedInputStream.read(BufferedInputStream.java:237) at
> org 
> .apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java:
> 136) at  
> org 
> .apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java:
> 128) at  
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:
> 117) at  
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:
> 39) at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.next(MapTask.java:147) at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016)
> I guess some of files are invalid. However I could not find anywhere
> in logs file name of the file causing this exception. Due to the huge
> size of the dataset I would not want to extract files from DFS and
> verify them with Gzip one by one. Any suggestions? Thanks!
> Sincerely,
> Vadim
> 
>

Re: broken gzip file

Reply via email to