Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
On Sat, Aug 7, 2010 at 9:18 PM, Alex Luya wrote: > Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22? > I don't know that anyone has tested it against 0.21 or trunk, but I don't see any reasons it won't work just fine -- the APIs are pretty stable between 0.20 and above. -Todd > On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote: > > On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett > wrote: > > > Hi Josh, > > > > > > No real pain points... just trying to investigate/research the "best" > > > way to create the necessary libraries and jar files to support LZO > > > compression in Hadoop. In particular, there are the 2 "repositories" > > > to build from and I am trying to find out if one should be used over > > > the other. For instance, in your previous posting, you refer to > > > hadoop-gpl-compression while the Twitter blog post from last year > > > mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO > > > is preferable but we're curious if there are any caveats/gotchas we > > > should be aware of. > > > > Yes, definitely use the hadoop-lzo project from github -- either from my > > repo or from kevinweil's (the two are kept in sync) > > > > The repo on Google Code has a number of known bugs, which is why we > forked > > it over to github last year. > > > > -Todd > > > > On Thu, Aug 5, 2010 at 15:59, Josh Patterson wrote: > > > > Bobby, > > > > > > > > We're working hard to make compression easier, the biggest hurdle > > > > currently is the licensing issues around the LZO codec libs (GPL, > > > > which is not compatible with ASF bsd-style license). > > > > > > > > Outside of making the changes to the mapred-site.xml file, with your > > > > setup would do you view as the biggest pain point? > > > > > > > > Josh Patterson > > > > Cloudera > > > > > > > > On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett > > > > > > > > < > bdennett%2bsoftw...@gmail.com >> wrote: > > > >> We are looking to enable LZO compression of the map outputs on our > > > >> Cloudera 0.20.1 cluster. It seems there are various sets of > > > >> instructions available and I am curious what your thoughts are > > > >> regarding which one would be best for our Hadoop distribution and OS > > > >> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression > > > >> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo > > > >> (http://github.com/kevinweil/hadoop-lzo). > > > >> > > > >> Some of what appear to be the better instructions/guides out there: > > > >> * Josh Patterson's reply on June 25th to the "Newbie to HDFS > > > >> compression" thread -- > > > > > > > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/% > > > 3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e > > > > > > >> * hadoop-gpl-compression FAQ -- > > > >> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ > > > >> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post > > > >> -- > > > > > > > http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable- > > > lzo-compression/ > > > > > > >> Thanks in advance, > > > >> -Bobby > -- Todd Lipcon Software Engineer, Cloudera
Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22? On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote: > On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett wrote: > > Hi Josh, > > > > No real pain points... just trying to investigate/research the "best" > > way to create the necessary libraries and jar files to support LZO > > compression in Hadoop. In particular, there are the 2 "repositories" > > to build from and I am trying to find out if one should be used over > > the other. For instance, in your previous posting, you refer to > > hadoop-gpl-compression while the Twitter blog post from last year > > mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO > > is preferable but we're curious if there are any caveats/gotchas we > > should be aware of. > > Yes, definitely use the hadoop-lzo project from github -- either from my > repo or from kevinweil's (the two are kept in sync) > > The repo on Google Code has a number of known bugs, which is why we forked > it over to github last year. > > -Todd > > On Thu, Aug 5, 2010 at 15:59, Josh Patterson wrote: > > > Bobby, > > > > > > We're working hard to make compression easier, the biggest hurdle > > > currently is the licensing issues around the LZO codec libs (GPL, > > > which is not compatible with ASF bsd-style license). > > > > > > Outside of making the changes to the mapred-site.xml file, with your > > > setup would do you view as the biggest pain point? > > > > > > Josh Patterson > > > Cloudera > > > > > > On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett > > > > > > > wrote: > > >> We are looking to enable LZO compression of the map outputs on our > > >> Cloudera 0.20.1 cluster. It seems there are various sets of > > >> instructions available and I am curious what your thoughts are > > >> regarding which one would be best for our Hadoop distribution and OS > > >> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression > > >> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo > > >> (http://github.com/kevinweil/hadoop-lzo). > > >> > > >> Some of what appear to be the better instructions/guides out there: > > >> * Josh Patterson's reply on June 25th to the "Newbie to HDFS > > >> compression" thread -- > > > > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/% > > 3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e > > > > >> * hadoop-gpl-compression FAQ -- > > >> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ > > >> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post > > >> -- > > > > http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable- > > lzo-compression/ > > > > >> Thanks in advance, > > >> -Bobby
Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett wrote: > Hi Josh, > > No real pain points... just trying to investigate/research the "best" > way to create the necessary libraries and jar files to support LZO > compression in Hadoop. In particular, there are the 2 "repositories" > to build from and I am trying to find out if one should be used over > the other. For instance, in your previous posting, you refer to > hadoop-gpl-compression while the Twitter blog post from last year > mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO > is preferable but we're curious if there are any caveats/gotchas we > should be aware of. > Yes, definitely use the hadoop-lzo project from github -- either from my repo or from kevinweil's (the two are kept in sync) The repo on Google Code has a number of known bugs, which is why we forked it over to github last year. -Todd On Thu, Aug 5, 2010 at 15:59, Josh Patterson wrote: > > Bobby, > > > > We're working hard to make compression easier, the biggest hurdle > > currently is the licensing issues around the LZO codec libs (GPL, > > which is not compatible with ASF bsd-style license). > > > > Outside of making the changes to the mapred-site.xml file, with your > > setup would do you view as the biggest pain point? > > > > Josh Patterson > > Cloudera > > > > On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett > > > wrote: > >> We are looking to enable LZO compression of the map outputs on our > >> Cloudera 0.20.1 cluster. It seems there are various sets of > >> instructions available and I am curious what your thoughts are > >> regarding which one would be best for our Hadoop distribution and OS > >> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression > >> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo > >> (http://github.com/kevinweil/hadoop-lzo). > >> > >> Some of what appear to be the better instructions/guides out there: > >> * Josh Patterson's reply on June 25th to the "Newbie to HDFS > >> compression" thread -- > >> > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e > >> * hadoop-gpl-compression FAQ -- > >> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ > >> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post > >> -- > http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ > >> > >> Thanks in advance, > >> -Bobby > >> > > > -- Todd Lipcon Software Engineer, Cloudera
Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
Hi Josh, No real pain points... just trying to investigate/research the "best" way to create the necessary libraries and jar files to support LZO compression in Hadoop. In particular, there are the 2 "repositories" to build from and I am trying to find out if one should be used over the other. For instance, in your previous posting, you refer to hadoop-gpl-compression while the Twitter blog post from last year mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO is preferable but we're curious if there are any caveats/gotchas we should be aware of. Thanks, -Bobby On Thu, Aug 5, 2010 at 15:59, Josh Patterson wrote: > Bobby, > > We're working hard to make compression easier, the biggest hurdle > currently is the licensing issues around the LZO codec libs (GPL, > which is not compatible with ASF bsd-style license). > > Outside of making the changes to the mapred-site.xml file, with your > setup would do you view as the biggest pain point? > > Josh Patterson > Cloudera > > On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett > wrote: >> We are looking to enable LZO compression of the map outputs on our >> Cloudera 0.20.1 cluster. It seems there are various sets of >> instructions available and I am curious what your thoughts are >> regarding which one would be best for our Hadoop distribution and OS >> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression >> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo >> (http://github.com/kevinweil/hadoop-lzo). >> >> Some of what appear to be the better instructions/guides out there: >> * Josh Patterson's reply on June 25th to the "Newbie to HDFS >> compression" thread -- >> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e >> * hadoop-gpl-compression FAQ -- >> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ >> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post >> -- >> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ >> >> Thanks in advance, >> -Bobby >> >
Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
Bobby, We're working hard to make compression easier, the biggest hurdle currently is the licensing issues around the LZO codec libs (GPL, which is not compatible with ASF bsd-style license). Outside of making the changes to the mapred-site.xml file, with your setup would do you view as the biggest pain point? Josh Patterson Cloudera On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett wrote: > We are looking to enable LZO compression of the map outputs on our > Cloudera 0.20.1 cluster. It seems there are various sets of > instructions available and I am curious what your thoughts are > regarding which one would be best for our Hadoop distribution and OS > (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression > (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo > (http://github.com/kevinweil/hadoop-lzo). > > Some of what appear to be the better instructions/guides out there: > * Josh Patterson's reply on June 25th to the "Newbie to HDFS > compression" thread -- > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e > * hadoop-gpl-compression FAQ -- > http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ > * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post > -- > http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ > > Thanks in advance, > -Bobby >
Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
Please take questions on Cloudera Distro to their internal lists. On Aug 5, 2010, at 3:52 PM, Bobby Dennett wrote: We are looking to enable LZO compression of the map outputs on our Cloudera 0.20.1 cluster. It seems there are various sets of instructions available and I am curious what your thoughts are regarding which one would be best for our Hadoop distribution and OS (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo (http://github.com/kevinweil/hadoop-lzo). Some of what appear to be the better instructions/guides out there: * Josh Patterson's reply on June 25th to the "Newbie to HDFS compression" thread -- http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e * hadoop-gpl-compression FAQ -- http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post -- http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ Thanks in advance, -Bobby
Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
We are looking to enable LZO compression of the map outputs on our Cloudera 0.20.1 cluster. It seems there are various sets of instructions available and I am curious what your thoughts are regarding which one would be best for our Hadoop distribution and OS (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo (http://github.com/kevinweil/hadoop-lzo). Some of what appear to be the better instructions/guides out there: * Josh Patterson's reply on June 25th to the "Newbie to HDFS compression" thread -- http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e * hadoop-gpl-compression FAQ -- http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post -- http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ Thanks in advance, -Bobby