Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-08 Thread Todd Lipcon
On Sat, Aug 7, 2010 at 9:18 PM, Alex Luya  wrote:

> Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22?
>

I don't know that anyone has tested it against 0.21 or trunk, but I don't
see any reasons it won't work just fine  -- the APIs are pretty stable
between 0.20 and above.

-Todd


> On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote:
> > On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett 
> wrote:
> > > Hi Josh,
> > >
> > > No real pain points... just trying to investigate/research the "best"
> > > way to create the necessary libraries and jar files to support LZO
> > > compression in Hadoop. In particular, there are the 2 "repositories"
> > > to build from and I am trying to find out if one should be used over
> > > the other. For instance, in your previous posting, you refer to
> > > hadoop-gpl-compression while the Twitter blog post from last year
> > > mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO
> > > is preferable but we're curious if there are any caveats/gotchas we
> > > should be aware of.
> >
> > Yes, definitely use the hadoop-lzo project from github -- either from my
> > repo or from kevinweil's (the two are kept in sync)
> >
> > The repo on Google Code has a number of known bugs, which is why we
> forked
> > it over to github last year.
> >
> > -Todd
> >
> > On Thu, Aug 5, 2010 at 15:59, Josh Patterson  wrote:
> > > > Bobby,
> > > >
> > > > We're working hard to make compression easier, the biggest hurdle
> > > > currently is the licensing issues around the LZO codec libs (GPL,
> > > > which is not compatible with ASF bsd-style license).
> > > >
> > > > Outside of making the changes to the mapred-site.xml file, with your
> > > > setup would do you view as the biggest pain point?
> > > >
> > > > Josh Patterson
> > > > Cloudera
> > > >
> > > > On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
> > > >
> > > >  <
> bdennett%2bsoftw...@gmail.com >> wrote:
> > > >> We are looking to enable LZO compression of the map outputs on our
> > > >> Cloudera 0.20.1 cluster. It seems there are various sets of
> > > >> instructions available and I am curious what your thoughts are
> > > >> regarding which one would be best for our Hadoop distribution and OS
> > > >> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
> > > >> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
> > > >> (http://github.com/kevinweil/hadoop-lzo).
> > > >>
> > > >> Some of what appear to be the better instructions/guides out there:
> > > >> * Josh Patterson's reply on June 25th to the "Newbie to HDFS
> > > >> compression" thread --
> > >
> > >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%
> > > 3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
> > >
> > > >> * hadoop-gpl-compression FAQ --
> > > >> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
> > > >> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post
> > > >> --
> > >
> > >
> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-
> > > lzo-compression/
> > >
> > > >> Thanks in advance,
> > > >> -Bobby
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-07 Thread Alex Luya
Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22? 
On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote:
> On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett  wrote:
> > Hi Josh,
> > 
> > No real pain points... just trying to investigate/research the "best"
> > way to create the necessary libraries and jar files to support LZO
> > compression in Hadoop. In particular, there are the 2 "repositories"
> > to build from and I am trying to find out if one should be used over
> > the other. For instance, in your previous posting, you refer to
> > hadoop-gpl-compression while the Twitter blog post from last year
> > mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO
> > is preferable but we're curious if there are any caveats/gotchas we
> > should be aware of.
> 
> Yes, definitely use the hadoop-lzo project from github -- either from my
> repo or from kevinweil's (the two are kept in sync)
> 
> The repo on Google Code has a number of known bugs, which is why we forked
> it over to github last year.
> 
> -Todd
> 
> On Thu, Aug 5, 2010 at 15:59, Josh Patterson  wrote:
> > > Bobby,
> > > 
> > > We're working hard to make compression easier, the biggest hurdle
> > > currently is the licensing issues around the LZO codec libs (GPL,
> > > which is not compatible with ASF bsd-style license).
> > > 
> > > Outside of making the changes to the mapred-site.xml file, with your
> > > setup would do you view as the biggest pain point?
> > > 
> > > Josh Patterson
> > > Cloudera
> > > 
> > > On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
> > > 
> > > > wrote:
> > >> We are looking to enable LZO compression of the map outputs on our
> > >> Cloudera 0.20.1 cluster. It seems there are various sets of
> > >> instructions available and I am curious what your thoughts are
> > >> regarding which one would be best for our Hadoop distribution and OS
> > >> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
> > >> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
> > >> (http://github.com/kevinweil/hadoop-lzo).
> > >> 
> > >> Some of what appear to be the better instructions/guides out there:
> > >> * Josh Patterson's reply on June 25th to the "Newbie to HDFS
> > >> compression" thread --
> > 
> > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%
> > 3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
> > 
> > >> * hadoop-gpl-compression FAQ --
> > >> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
> > >> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post
> > >> --
> > 
> > http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-
> > lzo-compression/
> > 
> > >> Thanks in advance,
> > >> -Bobby


Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Todd Lipcon
On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett  wrote:

> Hi Josh,
>
> No real pain points... just trying to investigate/research the "best"
> way to create the necessary libraries and jar files to support LZO
> compression in Hadoop. In particular, there are the 2 "repositories"
> to build from and I am trying to find out if one should be used over
> the other. For instance, in your previous posting, you refer to
> hadoop-gpl-compression while the Twitter blog post from last year
> mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO
> is preferable but we're curious if there are any caveats/gotchas we
> should be aware of.
>

Yes, definitely use the hadoop-lzo project from github -- either from my
repo or from kevinweil's (the two are kept in sync)

The repo on Google Code has a number of known bugs, which is why we forked
it over to github last year.

-Todd

On Thu, Aug 5, 2010 at 15:59, Josh Patterson  wrote:
> > Bobby,
> >
> > We're working hard to make compression easier, the biggest hurdle
> > currently is the licensing issues around the LZO codec libs (GPL,
> > which is not compatible with ASF bsd-style license).
> >
> > Outside of making the changes to the mapred-site.xml file, with your
> > setup would do you view as the biggest pain point?
> >
> > Josh Patterson
> > Cloudera
> >
> > On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
> > > wrote:
> >> We are looking to enable LZO compression of the map outputs on our
> >> Cloudera 0.20.1 cluster. It seems there are various sets of
> >> instructions available and I am curious what your thoughts are
> >> regarding which one would be best for our Hadoop distribution and OS
> >> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
> >> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
> >> (http://github.com/kevinweil/hadoop-lzo).
> >>
> >> Some of what appear to be the better instructions/guides out there:
> >> * Josh Patterson's reply on June 25th to the "Newbie to HDFS
> >> compression" thread --
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
> >> * hadoop-gpl-compression FAQ --
> >> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
> >> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post
> >> --
> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
> >>
> >> Thanks in advance,
> >> -Bobby
> >>
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Bobby Dennett
Hi Josh,

No real pain points... just trying to investigate/research the "best"
way to create the necessary libraries and jar files to support LZO
compression in Hadoop. In particular, there are the 2 "repositories"
to build from and I am trying to find out if one should be used over
the other. For instance, in your previous posting, you refer to
hadoop-gpl-compression while the Twitter blog post from last year
mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO
is preferable but we're curious if there are any caveats/gotchas we
should be aware of.

Thanks,
-Bobby

On Thu, Aug 5, 2010 at 15:59, Josh Patterson  wrote:
> Bobby,
>
> We're working hard to make compression easier, the biggest hurdle
> currently is the licensing issues around the LZO codec libs (GPL,
> which is not compatible with ASF bsd-style license).
>
> Outside of making the changes to the mapred-site.xml file, with your
> setup would do you view as the biggest pain point?
>
> Josh Patterson
> Cloudera
>
> On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
>  wrote:
>> We are looking to enable LZO compression of the map outputs on our
>> Cloudera 0.20.1 cluster. It seems there are various sets of
>> instructions available and I am curious what your thoughts are
>> regarding which one would be best for our Hadoop distribution and OS
>> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
>> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
>> (http://github.com/kevinweil/hadoop-lzo).
>>
>> Some of what appear to be the better instructions/guides out there:
>> * Josh Patterson's reply on June 25th to the "Newbie to HDFS
>> compression" thread --
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
>> * hadoop-gpl-compression FAQ --
>> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
>> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post
>> -- 
>> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
>>
>> Thanks in advance,
>> -Bobby
>>
>


Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Josh Patterson
Bobby,

We're working hard to make compression easier, the biggest hurdle
currently is the licensing issues around the LZO codec libs (GPL,
which is not compatible with ASF bsd-style license).

Outside of making the changes to the mapred-site.xml file, with your
setup would do you view as the biggest pain point?

Josh Patterson
Cloudera

On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
 wrote:
> We are looking to enable LZO compression of the map outputs on our
> Cloudera 0.20.1 cluster. It seems there are various sets of
> instructions available and I am curious what your thoughts are
> regarding which one would be best for our Hadoop distribution and OS
> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
> (http://github.com/kevinweil/hadoop-lzo).
>
> Some of what appear to be the better instructions/guides out there:
> * Josh Patterson's reply on June 25th to the "Newbie to HDFS
> compression" thread --
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
> * hadoop-gpl-compression FAQ --
> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post
> -- 
> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
>
> Thanks in advance,
> -Bobby
>


Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Arun C Murthy

Please take questions on Cloudera Distro to their internal lists.

On Aug 5, 2010, at 3:52 PM, Bobby Dennett wrote:


We are looking to enable LZO compression of the map outputs on our
Cloudera 0.20.1 cluster. It seems there are various sets of
instructions available and I am curious what your thoughts are
regarding which one would be best for our Hadoop distribution and OS
(Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
(http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
(http://github.com/kevinweil/hadoop-lzo).

Some of what appear to be the better instructions/guides out there:
* Josh Patterson's reply on June 25th to the "Newbie to HDFS
compression" thread --
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
* hadoop-gpl-compression FAQ --
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
* "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post
-- 
http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/

Thanks in advance,
-Bobby




Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Bobby Dennett
We are looking to enable LZO compression of the map outputs on our
Cloudera 0.20.1 cluster. It seems there are various sets of
instructions available and I am curious what your thoughts are
regarding which one would be best for our Hadoop distribution and OS
(Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
(http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
(http://github.com/kevinweil/hadoop-lzo).

Some of what appear to be the better instructions/guides out there:
* Josh Patterson's reply on June 25th to the "Newbie to HDFS
compression" thread --
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
* hadoop-gpl-compression FAQ --
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
* "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post
-- 
http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/

Thanks in advance,
-Bobby