LZO Compression Libraries don't appear to work properly with MultipleOutputs

2010-10-21 Thread ed
Hello everyone,

I am having problems using MultipleOutputs with LZO compression (could be a
bug or something wrong in my own code).

In my driver I set

 MultipleOutputs.addNamedOutput(job, test, TextOutputFormat.class,
NullWritable.class, Text.class);

In my reducer I have:

 MultipleOutputsNullWritable, Text mOutput = new
MultipleOutputsNullWritable, Text(context);

 public String generateFileName(Key key){
return custom_file_name;
 }

Then in the reduce() method I have:

 mOutput.write(mNullWritable, mValue, generateFileName(key));

This results in creating LZO files that do not decompress properly (lzop -d
throws the error lzop: unexpected end of file: outputFile.lzo)

If I switch back to the regular context.write(mNullWritable, mValue);
everything works fine.

Am I forgetting a step needed when using MultipleOutputs or is this a
bug/non-feature of using LZO compression in Hadoop.

Thank you!


~Ed


Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs

2010-10-21 Thread Todd Lipcon
Hi Ed,

Sounds like this might be a bug, either in MultipleOutputs or in LZO.

Does it work properly with gzip compression? Which LZO implementation
are you using? The one from google code or the more up to date one
from github (either kevinweil's or mine)?

Any chance you could write a unit test that shows the issue?

Thanks
-Todd

On Thu, Oct 21, 2010 at 2:52 PM, ed hadoopn...@gmail.com wrote:
 Hello everyone,

 I am having problems using MultipleOutputs with LZO compression (could be a
 bug or something wrong in my own code).

 In my driver I set

     MultipleOutputs.addNamedOutput(job, test, TextOutputFormat.class,
 NullWritable.class, Text.class);

 In my reducer I have:

     MultipleOutputsNullWritable, Text mOutput = new
 MultipleOutputsNullWritable, Text(context);

     public String generateFileName(Key key){
        return custom_file_name;
     }

 Then in the reduce() method I have:

     mOutput.write(mNullWritable, mValue, generateFileName(key));

 This results in creating LZO files that do not decompress properly (lzop -d
 throws the error lzop: unexpected end of file: outputFile.lzo)

 If I switch back to the regular context.write(mNullWritable, mValue);
 everything works fine.

 Am I forgetting a step needed when using MultipleOutputs or is this a
 bug/non-feature of using LZO compression in Hadoop.

 Thank you!


 ~Ed




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs

2010-10-21 Thread ed
Hi Todd,

I don't have the code in front of me right but I was looking over the API
docs and it looks like I forgot to call close() on the MultipleOutput.  I'll
post back if that fixes the problem.  If not I'll put together a unit test.
Thanks!

~Ed

On Thu, Oct 21, 2010 at 6:31 PM, Todd Lipcon t...@cloudera.com wrote:

 Hi Ed,

 Sounds like this might be a bug, either in MultipleOutputs or in LZO.

 Does it work properly with gzip compression? Which LZO implementation
 are you using? The one from google code or the more up to date one
 from github (either kevinweil's or mine)?

 Any chance you could write a unit test that shows the issue?

 Thanks
 -Todd

 On Thu, Oct 21, 2010 at 2:52 PM, ed hadoopn...@gmail.com wrote:
  Hello everyone,
 
  I am having problems using MultipleOutputs with LZO compression (could be
 a
  bug or something wrong in my own code).
 
  In my driver I set
 
  MultipleOutputs.addNamedOutput(job, test, TextOutputFormat.class,
  NullWritable.class, Text.class);
 
  In my reducer I have:
 
  MultipleOutputsNullWritable, Text mOutput = new
  MultipleOutputsNullWritable, Text(context);
 
  public String generateFileName(Key key){
 return custom_file_name;
  }
 
  Then in the reduce() method I have:
 
  mOutput.write(mNullWritable, mValue, generateFileName(key));
 
  This results in creating LZO files that do not decompress properly (lzop
 -d
  throws the error lzop: unexpected end of file: outputFile.lzo)
 
  If I switch back to the regular context.write(mNullWritable, mValue);
  everything works fine.
 
  Am I forgetting a step needed when using MultipleOutputs or is this a
  bug/non-feature of using LZO compression in Hadoop.
 
  Thank you!
 
 
  ~Ed
 



 --
 Todd Lipcon
 Software Engineer, Cloudera