Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-08 Thread Todd Lipcon
On Sat, Aug 7, 2010 at 9:18 PM, Alex Luya alexander.l...@gmail.com wrote:

 Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22?


I don't know that anyone has tested it against 0.21 or trunk, but I don't
see any reasons it won't work just fine  -- the APIs are pretty stable
between 0.20 and above.

-Todd


 On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote:
  On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett bdenn...@gmail.com
 wrote:
   Hi Josh,
  
   No real pain points... just trying to investigate/research the best
   way to create the necessary libraries and jar files to support LZO
   compression in Hadoop. In particular, there are the 2 repositories
   to build from and I am trying to find out if one should be used over
   the other. For instance, in your previous posting, you refer to
   hadoop-gpl-compression while the Twitter blog post from last year
   mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO
   is preferable but we're curious if there are any caveats/gotchas we
   should be aware of.
 
  Yes, definitely use the hadoop-lzo project from github -- either from my
  repo or from kevinweil's (the two are kept in sync)
 
  The repo on Google Code has a number of known bugs, which is why we
 forked
  it over to github last year.
 
  -Todd
 
  On Thu, Aug 5, 2010 at 15:59, Josh Patterson j...@cloudera.com wrote:
Bobby,
   
We're working hard to make compression easier, the biggest hurdle
currently is the licensing issues around the LZO codec libs (GPL,
which is not compatible with ASF bsd-style license).
   
Outside of making the changes to the mapred-site.xml file, with your
setup would do you view as the biggest pain point?
   
Josh Patterson
Cloudera
   
On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
   
bdennett+softw...@gmail.com bdennett%2bsoftw...@gmail.com 
 bdennett%2bsoftw...@gmail.com bdennett%252bsoftw...@gmail.com wrote:
We are looking to enable LZO compression of the map outputs on our
Cloudera 0.20.1 cluster. It seems there are various sets of
instructions available and I am curious what your thoughts are
regarding which one would be best for our Hadoop distribution and OS
(Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
(http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
(http://github.com/kevinweil/hadoop-lzo).
   
Some of what appear to be the better instructions/guides out there:
* Josh Patterson's reply on June 25th to the Newbie to HDFS
compression thread --
  
  
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%
   3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
  
* hadoop-gpl-compression FAQ --
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
* Hadoop at Twitter (part 1): Splittable LZO Compression blog post
--
  
  
 http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-
   lzo-compression/
  
Thanks in advance,
-Bobby




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-07 Thread Alex Luya
Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22? 
On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote:
 On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett bdenn...@gmail.com wrote:
  Hi Josh,
  
  No real pain points... just trying to investigate/research the best
  way to create the necessary libraries and jar files to support LZO
  compression in Hadoop. In particular, there are the 2 repositories
  to build from and I am trying to find out if one should be used over
  the other. For instance, in your previous posting, you refer to
  hadoop-gpl-compression while the Twitter blog post from last year
  mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO
  is preferable but we're curious if there are any caveats/gotchas we
  should be aware of.
 
 Yes, definitely use the hadoop-lzo project from github -- either from my
 repo or from kevinweil's (the two are kept in sync)
 
 The repo on Google Code has a number of known bugs, which is why we forked
 it over to github last year.
 
 -Todd
 
 On Thu, Aug 5, 2010 at 15:59, Josh Patterson j...@cloudera.com wrote:
   Bobby,
   
   We're working hard to make compression easier, the biggest hurdle
   currently is the licensing issues around the LZO codec libs (GPL,
   which is not compatible with ASF bsd-style license).
   
   Outside of making the changes to the mapred-site.xml file, with your
   setup would do you view as the biggest pain point?
   
   Josh Patterson
   Cloudera
   
   On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
   
   bdennett+softw...@gmail.com bdennett%2bsoftw...@gmail.com wrote:
   We are looking to enable LZO compression of the map outputs on our
   Cloudera 0.20.1 cluster. It seems there are various sets of
   instructions available and I am curious what your thoughts are
   regarding which one would be best for our Hadoop distribution and OS
   (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
   (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
   (http://github.com/kevinweil/hadoop-lzo).
   
   Some of what appear to be the better instructions/guides out there:
   * Josh Patterson's reply on June 25th to the Newbie to HDFS
   compression thread --
  
  http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%
  3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
  
   * hadoop-gpl-compression FAQ --
   http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
   * Hadoop at Twitter (part 1): Splittable LZO Compression blog post
   --
  
  http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-
  lzo-compression/
  
   Thanks in advance,
   -Bobby


Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Arun C Murthy

Please take questions on Cloudera Distro to their internal lists.

On Aug 5, 2010, at 3:52 PM, Bobby Dennett wrote:


We are looking to enable LZO compression of the map outputs on our
Cloudera 0.20.1 cluster. It seems there are various sets of
instructions available and I am curious what your thoughts are
regarding which one would be best for our Hadoop distribution and OS
(Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
(http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
(http://github.com/kevinweil/hadoop-lzo).

Some of what appear to be the better instructions/guides out there:
* Josh Patterson's reply on June 25th to the Newbie to HDFS
compression thread --
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
* hadoop-gpl-compression FAQ --
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
* Hadoop at Twitter (part 1): Splittable LZO Compression blog post
-- 
http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/

Thanks in advance,
-Bobby




Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Josh Patterson
Bobby,

We're working hard to make compression easier, the biggest hurdle
currently is the licensing issues around the LZO codec libs (GPL,
which is not compatible with ASF bsd-style license).

Outside of making the changes to the mapred-site.xml file, with your
setup would do you view as the biggest pain point?

Josh Patterson
Cloudera

On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
bdennett+softw...@gmail.com wrote:
 We are looking to enable LZO compression of the map outputs on our
 Cloudera 0.20.1 cluster. It seems there are various sets of
 instructions available and I am curious what your thoughts are
 regarding which one would be best for our Hadoop distribution and OS
 (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
 (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
 (http://github.com/kevinweil/hadoop-lzo).

 Some of what appear to be the better instructions/guides out there:
 * Josh Patterson's reply on June 25th to the Newbie to HDFS
 compression thread --
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
 * hadoop-gpl-compression FAQ --
 http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
 * Hadoop at Twitter (part 1): Splittable LZO Compression blog post
 -- 
 http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/

 Thanks in advance,
 -Bobby



Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Bobby Dennett
Hi Josh,

No real pain points... just trying to investigate/research the best
way to create the necessary libraries and jar files to support LZO
compression in Hadoop. In particular, there are the 2 repositories
to build from and I am trying to find out if one should be used over
the other. For instance, in your previous posting, you refer to
hadoop-gpl-compression while the Twitter blog post from last year
mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO
is preferable but we're curious if there are any caveats/gotchas we
should be aware of.

Thanks,
-Bobby

On Thu, Aug 5, 2010 at 15:59, Josh Patterson j...@cloudera.com wrote:
 Bobby,

 We're working hard to make compression easier, the biggest hurdle
 currently is the licensing issues around the LZO codec libs (GPL,
 which is not compatible with ASF bsd-style license).

 Outside of making the changes to the mapred-site.xml file, with your
 setup would do you view as the biggest pain point?

 Josh Patterson
 Cloudera

 On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
 bdennett+softw...@gmail.com wrote:
 We are looking to enable LZO compression of the map outputs on our
 Cloudera 0.20.1 cluster. It seems there are various sets of
 instructions available and I am curious what your thoughts are
 regarding which one would be best for our Hadoop distribution and OS
 (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
 (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
 (http://github.com/kevinweil/hadoop-lzo).

 Some of what appear to be the better instructions/guides out there:
 * Josh Patterson's reply on June 25th to the Newbie to HDFS
 compression thread --
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
 * hadoop-gpl-compression FAQ --
 http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
 * Hadoop at Twitter (part 1): Splittable LZO Compression blog post
 -- 
 http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/

 Thanks in advance,
 -Bobby




Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1

2010-08-05 Thread Todd Lipcon
On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett bdenn...@gmail.com wrote:

 Hi Josh,

 No real pain points... just trying to investigate/research the best
 way to create the necessary libraries and jar files to support LZO
 compression in Hadoop. In particular, there are the 2 repositories
 to build from and I am trying to find out if one should be used over
 the other. For instance, in your previous posting, you refer to
 hadoop-gpl-compression while the Twitter blog post from last year
 mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO
 is preferable but we're curious if there are any caveats/gotchas we
 should be aware of.


Yes, definitely use the hadoop-lzo project from github -- either from my
repo or from kevinweil's (the two are kept in sync)

The repo on Google Code has a number of known bugs, which is why we forked
it over to github last year.

-Todd

On Thu, Aug 5, 2010 at 15:59, Josh Patterson j...@cloudera.com wrote:
  Bobby,
 
  We're working hard to make compression easier, the biggest hurdle
  currently is the licensing issues around the LZO codec libs (GPL,
  which is not compatible with ASF bsd-style license).
 
  Outside of making the changes to the mapred-site.xml file, with your
  setup would do you view as the biggest pain point?
 
  Josh Patterson
  Cloudera
 
  On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
  bdennett+softw...@gmail.com bdennett%2bsoftw...@gmail.com wrote:
  We are looking to enable LZO compression of the map outputs on our
  Cloudera 0.20.1 cluster. It seems there are various sets of
  instructions available and I am curious what your thoughts are
  regarding which one would be best for our Hadoop distribution and OS
  (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
  (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
  (http://github.com/kevinweil/hadoop-lzo).
 
  Some of what appear to be the better instructions/guides out there:
  * Josh Patterson's reply on June 25th to the Newbie to HDFS
  compression thread --
 
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e
  * hadoop-gpl-compression FAQ --
  http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
  * Hadoop at Twitter (part 1): Splittable LZO Compression blog post
  --
 http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
 
  Thanks in advance,
  -Bobby
 
 




-- 
Todd Lipcon
Software Engineer, Cloudera