On Sat, Aug 7, 2010 at 9:18 PM, Alex Luya <alexander.l...@gmail.com> wrote:
> Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22? > I don't know that anyone has tested it against 0.21 or trunk, but I don't see any reasons it won't work just fine -- the APIs are pretty stable between 0.20 and above. -Todd > On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote: > > On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett <bdenn...@gmail.com> > wrote: > > > Hi Josh, > > > > > > No real pain points... just trying to investigate/research the "best" > > > way to create the necessary libraries and jar files to support LZO > > > compression in Hadoop. In particular, there are the 2 "repositories" > > > to build from and I am trying to find out if one should be used over > > > the other. For instance, in your previous posting, you refer to > > > hadoop-gpl-compression while the Twitter blog post from last year > > > mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO > > > is preferable but we're curious if there are any caveats/gotchas we > > > should be aware of. > > > > Yes, definitely use the hadoop-lzo project from github -- either from my > > repo or from kevinweil's (the two are kept in sync) > > > > The repo on Google Code has a number of known bugs, which is why we > forked > > it over to github last year. > > > > -Todd > > > > On Thu, Aug 5, 2010 at 15:59, Josh Patterson <j...@cloudera.com> wrote: > > > > Bobby, > > > > > > > > We're working hard to make compression easier, the biggest hurdle > > > > currently is the licensing issues around the LZO codec libs (GPL, > > > > which is not compatible with ASF bsd-style license). > > > > > > > > Outside of making the changes to the mapred-site.xml file, with your > > > > setup would do you view as the biggest pain point? > > > > > > > > Josh Patterson > > > > Cloudera > > > > > > > > On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett > > > > > > > > <bdennett+softw...@gmail.com <bdennett%2bsoftw...@gmail.com> < > bdennett%2bsoftw...@gmail.com <bdennett%252bsoftw...@gmail.com>>> wrote: > > > >> We are looking to enable LZO compression of the map outputs on our > > > >> Cloudera 0.20.1 cluster. It seems there are various sets of > > > >> instructions available and I am curious what your thoughts are > > > >> regarding which one would be best for our Hadoop distribution and OS > > > >> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression > > > >> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo > > > >> (http://github.com/kevinweil/hadoop-lzo). > > > >> > > > >> Some of what appear to be the better instructions/guides out there: > > > >> * Josh Patterson's reply on June 25th to the "Newbie to HDFS > > > >> compression" thread -- > > > > > > > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/% > > > 3caanlktileo-q8useip8y3na9pdyhlyufippr0in0lk...@mail.gmail.com%3e > > > > > > >> * hadoop-gpl-compression FAQ -- > > > >> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ > > > >> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post > > > >> -- > > > > > > > http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable- > > > lzo-compression/ > > > > > > >> Thanks in advance, > > > >> -Bobby > -- Todd Lipcon Software Engineer, Cloudera