Hi Chris, I was also unable to decompress by simply doing a map/reducer with "cat" as a mapper and then doing dfs -get either.
I will try using LzopCodec. Thanks, - Alex On Fri, Sep 19, 2008 at 2:34 AM, Chris Douglas <[EMAIL PROTECTED]> wrote: > It's probably not corrupted. If by "compressed lzo file" you mean something > readable with lzop, you should use LzopCodec, not LzoCodec. LzoCodec doesn't > write header information required by that tool. > > Guessing at the output format (length encoded blocks of data compressed by > the lzo algorithm), it's probably readable by TextInputFormat, but YMMV. If > you wanted to use the C tool, you'll have to add the appropriate header (see > lzop source or LzopCodec) using a hex editor and four zero bytes to the end > of the file. You can also use lzo compression in SequenceFiles. -C > > On Sep 18, 2008, at 9:15 PM, Alex Feinberg wrote: > >> Hello, >> >> I am running a custom crawler (written internally) using hadoop >> streaming. I am attempting to >> compress the output using LZO, but instead I am receiving corrupted >> output that is neither in the >> format I am aiming for nor as a compressed lzo file. Is this a known >> issue? Is there anything >> I am doing inherently wrong? >> >> Here is the command line I am using: >> >> ~/hadoop/bin/hadoop jar >> /home/hadoop/hadoop/contrib/streaming/hadoop-0.17.2.1-streaming.jar >> -inputformat org.apache.hadoop.mapred.SequenceFileAsTextInputFormat >> -mapper /home/hadoop/crawl_map -reducer NONE -jobconf >> mapred.output.compress=true -jobconf >> mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec >> -input pages -output crawl.lzo -jobconf mapred.reduce.tasks=0 >> >> The input is in in form of URLs stored as a SequenceFile >> >> When running this without LZO compression, no such issue occurs. >> >> Is there any way for me to recover the corrupted data as to be able to >> process it by other >> hadoop jobs or offline? >> >> Thanks, >> >> -- >> Alex Feinberg >> Platform Engineer, SocialMedia Networks > > -- Alex Feinberg Platform Engineer, SocialMedia Networks