Re: Errors reading lzo-compressed files from Hadoop

Todd Lipcon Thu, 08 Apr 2010 10:39:59 -0700

Doh, a couple more silly bugs in there. Don't use that version quite yet -
I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for
pointing out the additional problems)


-Todd

On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon <t...@cloudera.com> wrote:

> For Dmitriy and anyone else who has seen this error, I just committed a fix
> to my github repository:
>
>
> http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58
>
> The problem turned out to be an assumption that InputStream.read() would
> return all the bytes that were asked for. This turns out to almost always be
> true on local filesystems, but on HDFS it's not true if the read crosses a
> block boundary. So, every couple of TB of lzo compressed data one might see
> this error.
>
> Big thanks to Alex Roetter who was able to provide a file that exhibited
> the bug!
>
> Thanks
> -Todd
>
>
> On Tue, Apr 6, 2010 at 10:35 AM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> Hi Alex,
>> Unfortunately I wasn't able to reproduce, and the data Dmitriy is
>> working with is sensitive.
>> Do you have some data you could upload (or send me off list) that
>> exhibits the issue?
>> -Todd
>>
>> On Tue, Apr 6, 2010 at 9:50 AM, Alex Roetter <aroet...@imageshack.net>
>> wrote:
>> >
>> > Todd Lipcon <t...@...> writes:
>> >
>> > >
>> > > Hey Dmitriy,
>> > >
>> > > This is very interesting (and worrisome in a way!) I'll try to take a
>> look
>> > > this afternoon.
>> > >
>> > > -Todd
>> > >
>> >
>> > Hi Todd,
>> >
>> > I wanted to see if you made any progress on this front. I'm seeing a
>> very
>> > similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of
>> > LZOP compressed / indexed files (using Kevin Weil's package), and I have
>> one
>> > map task that always fails in what looks like the same place as
>> described in
>> > the previous post. I haven't yet done the experimentation mentioned
>> above
>> > (isolating the input file corresponding to the failed map task,
>> decompressing
>> > it / recompressing it, testing it out operating directly on local disk
>> > instead of HDFS, etc).
>> >
>> > However, since I am crashing in exactly the same place it seems likely
>> this
>> > is related, and thought I'd check on your work in the meantime.
>> >
>> > FYI, my stack track is below:
>> >
>> > 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker:
>> Error
>> > running child : java.lang.InternalError: lzo1x_decompress_safe returned:
>> >        at
>> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect
>> > (Native Method)
>> >        at com.hadoop.compression.lzo.LzoDecompressor.decompress
>> > (LzoDecompressor.java:303)
>> >        at
>> > com.hadoop.compression.lzo.LzopDecompressor.decompress
>> > (LzopDecompressor.java:104)
>> >        at com.hadoop.compression.lzo.LzopInputStream.decompress
>> > (LzopInputStream.java:223)
>> >        at
>> > org.apache.hadoop.io.compress.DecompressorStream.read
>> > (DecompressorStream.java:74)
>> >        at java.io.InputStream.read(InputStream.java:85)
>> >        at
>> org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
>> >        at
>> org.apache.hadoop.util.LineReader.readLine(LineReader.java:187)
>> >        at
>> > com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue
>> > (LzoLineRecordReader.java:126)
>> >        at
>> > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue
>> > (MapTask.java:423)
>> >        at
>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>> >        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>> >        at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> >        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> >
>> >
>> > Any update much appreciated,
>> > Alex
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Errors reading lzo-compressed files from Hadoop

Reply via email to