Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs

2010-10-21 Thread Todd Lipcon
Hi Ed,

Sounds like this might be a bug, either in MultipleOutputs or in LZO.

Does it work properly with gzip compression? Which LZO implementation
are you using? The one from google code or the more up to date one
from github (either kevinweil's or mine)?

Any chance you could write a unit test that shows the issue?

Thanks
-Todd

On Thu, Oct 21, 2010 at 2:52 PM, ed  wrote:
> Hello everyone,
>
> I am having problems using MultipleOutputs with LZO compression (could be a
> bug or something wrong in my own code).
>
> In my driver I set
>
>     MultipleOutputs.addNamedOutput(job, "test", TextOutputFormat.class,
> NullWritable.class, Text.class);
>
> In my reducer I have:
>
>     MultipleOutputs mOutput = new
> MultipleOutputs(context);
>
>     public String generateFileName(Key key){
>        return "custom_file_name";
>     }
>
> Then in the reduce() method I have:
>
>     mOutput.write(mNullWritable, mValue, generateFileName(key));
>
> This results in creating LZO files that do not decompress properly (lzop -d
> throws the error "lzop: unexpected end of file: outputFile.lzo")
>
> If I switch back to the regular context.write(mNullWritable, mValue);
> everything works fine.
>
> Am I forgetting a step needed when using MultipleOutputs or is this a
> bug/non-feature of using LZO compression in Hadoop.
>
> Thank you!
>
>
> ~Ed
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs

2010-10-21 Thread ed
Hi Todd,

I don't have the code in front of me right but I was looking over the API
docs and it looks like I forgot to call close() on the MultipleOutput.  I'll
post back if that fixes the problem.  If not I'll put together a unit test.
Thanks!

~Ed

On Thu, Oct 21, 2010 at 6:31 PM, Todd Lipcon  wrote:

> Hi Ed,
>
> Sounds like this might be a bug, either in MultipleOutputs or in LZO.
>
> Does it work properly with gzip compression? Which LZO implementation
> are you using? The one from google code or the more up to date one
> from github (either kevinweil's or mine)?
>
> Any chance you could write a unit test that shows the issue?
>
> Thanks
> -Todd
>
> On Thu, Oct 21, 2010 at 2:52 PM, ed  wrote:
> > Hello everyone,
> >
> > I am having problems using MultipleOutputs with LZO compression (could be
> a
> > bug or something wrong in my own code).
> >
> > In my driver I set
> >
> > MultipleOutputs.addNamedOutput(job, "test", TextOutputFormat.class,
> > NullWritable.class, Text.class);
> >
> > In my reducer I have:
> >
> > MultipleOutputs mOutput = new
> > MultipleOutputs(context);
> >
> > public String generateFileName(Key key){
> >return "custom_file_name";
> > }
> >
> > Then in the reduce() method I have:
> >
> > mOutput.write(mNullWritable, mValue, generateFileName(key));
> >
> > This results in creating LZO files that do not decompress properly (lzop
> -d
> > throws the error "lzop: unexpected end of file: outputFile.lzo")
> >
> > If I switch back to the regular context.write(mNullWritable, mValue);
> > everything works fine.
> >
> > Am I forgetting a step needed when using MultipleOutputs or is this a
> > bug/non-feature of using LZO compression in Hadoop.
> >
> > Thank you!
> >
> >
> > ~Ed
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs

2010-10-26 Thread ed
Calling close() on the MultipleOutputs objects in the cleanup() method of
the reducer fixed the lzo file problem.  Thanks!

~Ed

On Thu, Oct 21, 2010 at 9:12 PM, ed  wrote:

> Hi Todd,
>
> I don't have the code in front of me right but I was looking over the API
> docs and it looks like I forgot to call close() on the MultipleOutput.  I'll
> post back if that fixes the problem.  If not I'll put together a unit test.
> Thanks!
>
> ~Ed
>
>
> On Thu, Oct 21, 2010 at 6:31 PM, Todd Lipcon  wrote:
>
>> Hi Ed,
>>
>> Sounds like this might be a bug, either in MultipleOutputs or in LZO.
>>
>> Does it work properly with gzip compression? Which LZO implementation
>> are you using? The one from google code or the more up to date one
>> from github (either kevinweil's or mine)?
>>
>> Any chance you could write a unit test that shows the issue?
>>
>> Thanks
>> -Todd
>>
>> On Thu, Oct 21, 2010 at 2:52 PM, ed  wrote:
>> > Hello everyone,
>> >
>> > I am having problems using MultipleOutputs with LZO compression (could
>> be a
>> > bug or something wrong in my own code).
>> >
>> > In my driver I set
>> >
>> > MultipleOutputs.addNamedOutput(job, "test", TextOutputFormat.class,
>> > NullWritable.class, Text.class);
>> >
>> > In my reducer I have:
>> >
>> > MultipleOutputs mOutput = new
>> > MultipleOutputs(context);
>> >
>> > public String generateFileName(Key key){
>> >return "custom_file_name";
>> > }
>> >
>> > Then in the reduce() method I have:
>> >
>> > mOutput.write(mNullWritable, mValue, generateFileName(key));
>> >
>> > This results in creating LZO files that do not decompress properly (lzop
>> -d
>> > throws the error "lzop: unexpected end of file: outputFile.lzo")
>> >
>> > If I switch back to the regular context.write(mNullWritable, mValue);
>> > everything works fine.
>> >
>> > Am I forgetting a step needed when using MultipleOutputs or is this a
>> > bug/non-feature of using LZO compression in Hadoop.
>> >
>> > Thank you!
>> >
>> >
>> > ~Ed
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>