LZO Compression Libraries don't appear to work properly with MultipleOutputs
Hello everyone, I am having problems using MultipleOutputs with LZO compression (could be a bug or something wrong in my own code). In my driver I set MultipleOutputs.addNamedOutput(job, test, TextOutputFormat.class, NullWritable.class, Text.class); In my reducer I have: MultipleOutputsNullWritable, Text mOutput = new MultipleOutputsNullWritable, Text(context); public String generateFileName(Key key){ return custom_file_name; } Then in the reduce() method I have: mOutput.write(mNullWritable, mValue, generateFileName(key)); This results in creating LZO files that do not decompress properly (lzop -d throws the error lzop: unexpected end of file: outputFile.lzo) If I switch back to the regular context.write(mNullWritable, mValue); everything works fine. Am I forgetting a step needed when using MultipleOutputs or is this a bug/non-feature of using LZO compression in Hadoop. Thank you! ~Ed
Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs
Hi Ed, Sounds like this might be a bug, either in MultipleOutputs or in LZO. Does it work properly with gzip compression? Which LZO implementation are you using? The one from google code or the more up to date one from github (either kevinweil's or mine)? Any chance you could write a unit test that shows the issue? Thanks -Todd On Thu, Oct 21, 2010 at 2:52 PM, ed hadoopn...@gmail.com wrote: Hello everyone, I am having problems using MultipleOutputs with LZO compression (could be a bug or something wrong in my own code). In my driver I set MultipleOutputs.addNamedOutput(job, test, TextOutputFormat.class, NullWritable.class, Text.class); In my reducer I have: MultipleOutputsNullWritable, Text mOutput = new MultipleOutputsNullWritable, Text(context); public String generateFileName(Key key){ return custom_file_name; } Then in the reduce() method I have: mOutput.write(mNullWritable, mValue, generateFileName(key)); This results in creating LZO files that do not decompress properly (lzop -d throws the error lzop: unexpected end of file: outputFile.lzo) If I switch back to the regular context.write(mNullWritable, mValue); everything works fine. Am I forgetting a step needed when using MultipleOutputs or is this a bug/non-feature of using LZO compression in Hadoop. Thank you! ~Ed -- Todd Lipcon Software Engineer, Cloudera
Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs
Hi Todd, I don't have the code in front of me right but I was looking over the API docs and it looks like I forgot to call close() on the MultipleOutput. I'll post back if that fixes the problem. If not I'll put together a unit test. Thanks! ~Ed On Thu, Oct 21, 2010 at 6:31 PM, Todd Lipcon t...@cloudera.com wrote: Hi Ed, Sounds like this might be a bug, either in MultipleOutputs or in LZO. Does it work properly with gzip compression? Which LZO implementation are you using? The one from google code or the more up to date one from github (either kevinweil's or mine)? Any chance you could write a unit test that shows the issue? Thanks -Todd On Thu, Oct 21, 2010 at 2:52 PM, ed hadoopn...@gmail.com wrote: Hello everyone, I am having problems using MultipleOutputs with LZO compression (could be a bug or something wrong in my own code). In my driver I set MultipleOutputs.addNamedOutput(job, test, TextOutputFormat.class, NullWritable.class, Text.class); In my reducer I have: MultipleOutputsNullWritable, Text mOutput = new MultipleOutputsNullWritable, Text(context); public String generateFileName(Key key){ return custom_file_name; } Then in the reduce() method I have: mOutput.write(mNullWritable, mValue, generateFileName(key)); This results in creating LZO files that do not decompress properly (lzop -d throws the error lzop: unexpected end of file: outputFile.lzo) If I switch back to the regular context.write(mNullWritable, mValue); everything works fine. Am I forgetting a step needed when using MultipleOutputs or is this a bug/non-feature of using LZO compression in Hadoop. Thank you! ~Ed -- Todd Lipcon Software Engineer, Cloudera