Re: broken gzip file

Jason Venner Tue, 29 Jan 2008 14:36:52 -0800

Our change for this is mixed up in some other code we have, I will haveto separate it out.


Arun C Murthy wrote:

On Jan 29, 2008, at 1:30 PM, Jason Venner wrote:
We have overridden the base class public class MapReduceBase extendsorg.apache.hadoop.mapred.MapReduceBaseto have the configure method log the split name and split section (orin the case of gzip'd files the file name).
We find it very helpful to make the job errors to the section of theinput file causing the problem.
Maybe we should just log it by default? Want to submit that patch?

Arun
Vadim Zaliva wrote:
I have a bunch of gzip files which I am trying to process withHadoop task. The task fails with exception:java.io.EOFException: Unexpected end of ZLIB input stream atjava.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223)atjava.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92) atorg.apache.hadoop.io.compress.GzipCodec$GzipInputStream.read(GzipCodec.java:124)at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) atjava.io.BufferedInputStream.read(BufferedInputStream.java:237) atorg.apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java:136)atorg.apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java:128)atorg.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:117)atorg.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:39)atorg.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:147)at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) atorg.apache.hadoop.mapred.MapTask.run(MapTask.java:208) atorg.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016)I guess some of files are invalid. However I could not find anywherein logs file name of the file causing this exception. Due to thehuge size of the dataset I would not want to extract files from DFSand verify them with Gzip one by one. Any suggestions? Thanks!
Sincerely,
Vadim
--
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Re: broken gzip file

Reply via email to