Re: Errors reading lzo-compressed files from Hadoop
Doh, a couple more silly bugs in there. Don't use that version quite yet - I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for pointing out the additional problems) -Todd On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon t...@cloudera.com wrote: For Dmitriy and anyone else who has seen this error, I just committed a fix to my github repository: http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58 The problem turned out to be an assumption that InputStream.read() would return all the bytes that were asked for. This turns out to almost always be true on local filesystems, but on HDFS it's not true if the read crosses a block boundary. So, every couple of TB of lzo compressed data one might see this error. Big thanks to Alex Roetter who was able to provide a file that exhibited the bug! Thanks -Todd On Tue, Apr 6, 2010 at 10:35 AM, Todd Lipcon t...@cloudera.com wrote: Hi Alex, Unfortunately I wasn't able to reproduce, and the data Dmitriy is working with is sensitive. Do you have some data you could upload (or send me off list) that exhibits the issue? -Todd On Tue, Apr 6, 2010 at 9:50 AM, Alex Roetter aroet...@imageshack.net wrote: Todd Lipcon t...@... writes: Hey Dmitriy, This is very interesting (and worrisome in a way!) I'll try to take a look this afternoon. -Todd Hi Todd, I wanted to see if you made any progress on this front. I'm seeing a very similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of LZOP compressed / indexed files (using Kevin Weil's package), and I have one map task that always fails in what looks like the same place as described in the previous post. I haven't yet done the experimentation mentioned above (isolating the input file corresponding to the failed map task, decompressing it / recompressing it, testing it out operating directly on local disk instead of HDFS, etc). However, since I am crashing in exactly the same place it seems likely this is related, and thought I'd check on your work in the meantime. FYI, my stack track is below: 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.InternalError: lzo1x_decompress_safe returned: at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect (Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress (LzoDecompressor.java:303) at com.hadoop.compression.lzo.LzopDecompressor.decompress (LzopDecompressor.java:104) at com.hadoop.compression.lzo.LzopInputStream.decompress (LzopInputStream.java:223) at org.apache.hadoop.io.compress.DecompressorStream.read (DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:187) at com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue (LzoLineRecordReader.java:126) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue (MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Any update much appreciated, Alex -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Errors reading lzo-compressed files from Hadoop
OK, fixed, unit tests passing again. If anyone sees any more problems let one of us know! Thanks -Todd On Thu, Apr 8, 2010 at 10:39 AM, Todd Lipcon t...@cloudera.com wrote: Doh, a couple more silly bugs in there. Don't use that version quite yet - I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for pointing out the additional problems) -Todd On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon t...@cloudera.com wrote: For Dmitriy and anyone else who has seen this error, I just committed a fix to my github repository: http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58 The problem turned out to be an assumption that InputStream.read() would return all the bytes that were asked for. This turns out to almost always be true on local filesystems, but on HDFS it's not true if the read crosses a block boundary. So, every couple of TB of lzo compressed data one might see this error. Big thanks to Alex Roetter who was able to provide a file that exhibited the bug! Thanks -Todd On Tue, Apr 6, 2010 at 10:35 AM, Todd Lipcon t...@cloudera.com wrote: Hi Alex, Unfortunately I wasn't able to reproduce, and the data Dmitriy is working with is sensitive. Do you have some data you could upload (or send me off list) that exhibits the issue? -Todd On Tue, Apr 6, 2010 at 9:50 AM, Alex Roetter aroet...@imageshack.net wrote: Todd Lipcon t...@... writes: Hey Dmitriy, This is very interesting (and worrisome in a way!) I'll try to take a look this afternoon. -Todd Hi Todd, I wanted to see if you made any progress on this front. I'm seeing a very similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of LZOP compressed / indexed files (using Kevin Weil's package), and I have one map task that always fails in what looks like the same place as described in the previous post. I haven't yet done the experimentation mentioned above (isolating the input file corresponding to the failed map task, decompressing it / recompressing it, testing it out operating directly on local disk instead of HDFS, etc). However, since I am crashing in exactly the same place it seems likely this is related, and thought I'd check on your work in the meantime. FYI, my stack track is below: 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.InternalError: lzo1x_decompress_safe returned: at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect (Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress (LzoDecompressor.java:303) at com.hadoop.compression.lzo.LzopDecompressor.decompress (LzopDecompressor.java:104) at com.hadoop.compression.lzo.LzopInputStream.decompress (LzopInputStream.java:223) at org.apache.hadoop.io.compress.DecompressorStream.read (DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:187) at com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue (LzoLineRecordReader.java:126) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue (MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Any update much appreciated, Alex -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Errors reading lzo-compressed files from Hadoop
Both Kevin's and Todd's branches now pass my tests. Thanks again Todd. -D On Thu, Apr 8, 2010 at 10:46 AM, Todd Lipcon t...@cloudera.com wrote: OK, fixed, unit tests passing again. If anyone sees any more problems let one of us know! Thanks -Todd On Thu, Apr 8, 2010 at 10:39 AM, Todd Lipcon t...@cloudera.com wrote: Doh, a couple more silly bugs in there. Don't use that version quite yet - I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for pointing out the additional problems) -Todd On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon t...@cloudera.com wrote: For Dmitriy and anyone else who has seen this error, I just committed a fix to my github repository: http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58 The problem turned out to be an assumption that InputStream.read() would return all the bytes that were asked for. This turns out to almost always be true on local filesystems, but on HDFS it's not true if the read crosses a block boundary. So, every couple of TB of lzo compressed data one might see this error. Big thanks to Alex Roetter who was able to provide a file that exhibited the bug! Thanks -Todd On Tue, Apr 6, 2010 at 10:35 AM, Todd Lipcon t...@cloudera.com wrote: Hi Alex, Unfortunately I wasn't able to reproduce, and the data Dmitriy is working with is sensitive. Do you have some data you could upload (or send me off list) that exhibits the issue? -Todd On Tue, Apr 6, 2010 at 9:50 AM, Alex Roetter aroet...@imageshack.net wrote: Todd Lipcon t...@... writes: Hey Dmitriy, This is very interesting (and worrisome in a way!) I'll try to take a look this afternoon. -Todd Hi Todd, I wanted to see if you made any progress on this front. I'm seeing a very similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of LZOP compressed / indexed files (using Kevin Weil's package), and I have one map task that always fails in what looks like the same place as described in the previous post. I haven't yet done the experimentation mentioned above (isolating the input file corresponding to the failed map task, decompressing it / recompressing it, testing it out operating directly on local disk instead of HDFS, etc). However, since I am crashing in exactly the same place it seems likely this is related, and thought I'd check on your work in the meantime. FYI, my stack track is below: 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.InternalError: lzo1x_decompress_safe returned: at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect (Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress (LzoDecompressor.java:303) at com.hadoop.compression.lzo.LzopDecompressor.decompress (LzopDecompressor.java:104) at com.hadoop.compression.lzo.LzopInputStream.decompress (LzopInputStream.java:223) at org.apache.hadoop.io.compress.DecompressorStream.read (DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:187) at com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue (LzoLineRecordReader.java:126) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue (MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Any update much appreciated, Alex -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: Errors reading lzo-compressed files from Hadoop
Todd Lipcon t...@... writes: Hey Dmitriy, This is very interesting (and worrisome in a way!) I'll try to take a look this afternoon. -Todd Hi Todd, I wanted to see if you made any progress on this front. I'm seeing a very similar error, trying to run a MR (Hadoop 0.20.1) over a bunch of LZOP compressed / indexed files (using Kevin Weil's package), and I have one map task that always fails in what looks like the same place as described in the previous post. I haven't yet done the experimentation mentioned above (isolating the input file corresponding to the failed map task, decompressing it / recompressing it, testing it out operating directly on local disk instead of HDFS, etc). However, since I am crashing in exactly the same place it seems likely this is related, and thought I'd check on your work in the meantime. FYI, my stack track is below: 2010-04-05 18:15:16,895 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.InternalError: lzo1x_decompress_safe returned: at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect (Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress (LzoDecompressor.java:303) at com.hadoop.compression.lzo.LzopDecompressor.decompress (LzopDecompressor.java:104) at com.hadoop.compression.lzo.LzopInputStream.decompress (LzopInputStream.java:223) at org.apache.hadoop.io.compress.DecompressorStream.read (DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:187) at com.hadoop.mapreduce.LzoLineRecordReader.nextKeyValue (LzoLineRecordReader.java:126) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue (MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Any update much appreciated, Alex
Re: Errors reading lzo-compressed files from Hadoop
Hey Dmitriy, This is very interesting (and worrisome in a way!) I'll try to take a look this afternoon. -Todd On Thu, Apr 1, 2010 at 12:16 AM, Dmitriy Ryaboy dmit...@twitter.com wrote: Hi folks, We write a lot of lzo-compressed files to HDFS -- some via scribe, some using internal tools. Occasionally, we discover that the created lzo files cannot be read from HDFS -- they get through some (often large) portion of the file, and then fail with the following stack trace: Exception in thread main java.lang.InternalError: lzo1x_decompress_safe returned: at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:303) at com.hadoop.compression.lzo.LzopDecompressor.decompress(LzopDecompressor.java:122) at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:223) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at com.twitter.twadoop.jobs.LzoReadTest.main(LzoReadTest.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) The initial thought is of course that the lzo file is corrupt -- however, plain-jane lzop is able to read these files. Moreover, if we pull the files out of hadoop, uncompress them, compress them again, and put them back into HDFS, we can usually read them from HDFS as well. We've been thinking that this strange behavior is caused by a bug in the hadoop-lzo libraries (we use the version with Twitter and Cloudera fixes, on github: http://github.com/kevinweil/hadoop-lzo ) However, today I discovered that using the exact same environment, codec, and InputStreams, we can successfully read from the local file system, but cannot read from HDFS. This appears to point at possible issues in the FSDataInputStream or further down the stack. Here's a small test class that tries to read the same file from HDFS and from the local FS, and the output of running it on our cluster. We are using the CDH2 distribution. https://gist.github.com/e1bf7e4327c7aef56303 Any ideas on what could be going on? Thanks, -Dmitriy -- Todd Lipcon Software Engineer, Cloudera