Vadim, IF you drill into the task using the job tracker's web interface, you can get to the tasks xml configuration. That configuration will have the input file split specification in it.
You may also be able to see the input file elsewhere, but the xml configuration is definitive. On 1/29/08 10:33 AM, "Vadim Zaliva" <[EMAIL PROTECTED]> wrote: > I have a bunch of gzip files which I am trying to process with Hadoop > task. The task fails with exception: > java.io.EOFException: Unexpected end of ZLIB input stream at > java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java: > 141) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92) at > org.apache.hadoop.io.compress.GzipCodec > $GzipInputStream.read(GzipCodec.java:124) at > java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at > java.io.BufferedInputStream.read(BufferedInputStream.java:237) at > org > .apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java: > 136) at > org > .apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java: > 128) at > org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java: > 117) at > org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java: > 39) at org.apache.hadoop.mapred.MapTask > $TrackedRecordReader.next(MapTask.java:147) at > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016) > I guess some of files are invalid. However I could not find anywhere > in logs file name of the file causing this exception. Due to the huge > size of the dataset I would not want to extract files from DFS and > verify them with Gzip one by one. Any suggestions? Thanks! > Sincerely, > Vadim > >
