OK, fixed, unit tests passing again. If anyone sees any more problems let one of us know!
Thanks -Todd On Thu, Apr 8, 2010 at 10:39 AM, Todd Lipcon <t...@cloudera.com> wrote: > Doh, a couple more silly bugs in there. Don't use that version quite yet - > I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for > pointing out the additional problems) > > -Todd > > > On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon <t...@cloudera.com> wrote: > >> For Dmitriy and anyone else who has seen this error, I just committed a >> fix to my github repository: >> >> >> http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58 >> >> The problem turned out to be an assumption that InputStream.read() would >> return all the bytes that were asked for. This turns out to almost always be >> true on local filesystems, but on HDFS it's not true if the read crosses a >> block boundary. So, every couple of TB of lzo compressed data one might see >> this error. >> >> Big thanks to Alex Roetter who was able to provide a file that exhibited >> the bug! >> >> Thanks >> -Todd >> >> >> On Tue, Apr 6, 2010 at 10:35 AM, Todd Lipcon <t...@cloudera.com> wrote: >> >>> Hi Alex, >>> Unfortunately I wasn't able to reproduce, and the data Dmitriy is >>> working with is sensitive. >>> Do you have some data you could upload (or send me off list) that >>> exhibits the issue? >>> -Todd >>> >>> On Tue, Apr 6, 2010 at 9:50 AM, Alex Roetter <aroet...@imageshack.net> >>> wrote: >>> > >>> > Todd Lipcon <t...@...> writes: >>> > >>> > > >>> > > Hey Dmitriy, >>> > > >>> > > This is very interesting (and worrisome in a way!) I'll try to take a >>> look >>> > > this afternoon. >>> > > >>> > > -Todd >>> > > >>> > >>> > Hi Todd, >>> > >>> > I wanted to see if you made any progress on this front. I'm seeing a >>> very >>> > similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of >>> > LZOP compressed / indexed files (using Kevin Weil's package), and I >>> have one >>> > map task that always fails in what looks like the same place as >>> described in >>> > the previous post. I haven't yet done the experimentation mentioned >>> above >>> > (isolating the input file corresponding to the failed map task, >>> decompressing >>> > it / recompressing it, testing it out operating directly on local disk >>> > instead of HDFS, etc). >>> > >>> > However, since I am crashing in exactly the same place it seems likely >>> this >>> > is related, and thought I'd check on your work in the meantime. >>> > >>> > FYI, my stack track is below: >>> > >>> > 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker: >>> Error >>> > running child : java.lang.InternalError: lzo1x_decompress_safe >>> returned: >>> > at >>> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect >>> > (Native Method) >>> > at com.hadoop.compression.lzo.LzoDecompressor.decompress >>> > (LzoDecompressor.java:303) >>> > at >>> > com.hadoop.compression.lzo.LzopDecompressor.decompress >>> > (LzopDecompressor.java:104) >>> > at com.hadoop.compression.lzo.LzopInputStream.decompress >>> > (LzopInputStream.java:223) >>> > at >>> > org.apache.hadoop.io.compress.DecompressorStream.read >>> > (DecompressorStream.java:74) >>> > at java.io.InputStream.read(InputStream.java:85) >>> > at >>> org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) >>> > at >>> org.apache.hadoop.util.LineReader.readLine(LineReader.java:187) >>> > at >>> > com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue >>> > (LzoLineRecordReader.java:126) >>> > at >>> > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue >>> > (MapTask.java:423) >>> > at >>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) >>> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) >>> > at >>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) >>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>> > at org.apache.hadoop.mapred.Child.main(Child.java:170) >>> > >>> > >>> > Any update much appreciated, >>> > Alex >>> > >>> > >>> > >>> > >>> > >>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Todd Lipcon Software Engineer, Cloudera