Hi Ted, You need to install liblzo from EPEL:
http://fr.rpmfind.net/linux/RPM/Extras_Packages_for_Enterprise_Linux.html -Todd On Mon, Jan 11, 2010 at 3:21 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Can someone tell me how I can install liblzo ? > > [r...@tyu-linux lzo-2.03]# uname -a > Linux tyu-linux 2.6.18-128.2.1.el5 #1 SMP Tue Jul 14 06:36:37 EDT 2009 > x86_64 x86_64 x86_64 GNU/Linux > [r...@tyu-linux lzo-2.03]# yum install liblzo-devel > Loaded plugins: fastestmirror > Loading mirror speeds from cached hostfile > * base: mirrors.usc.edu > * updates: mirror.san.fastserv.com > * addons: centos.promopeddler.com > * extras: mirrors.versaweb.com > Setting up Install Process > Parsing package install arguments > No package liblzo-devel available. > Nothing to do > > Thanks > > On Mon, Jan 11, 2010 at 12:45 PM, Steve Kuo <kuosen...@gmail.com> wrote: > > > Ted, > > > > You may want to consider LZO compression, which allows splitting for a > > comporessed file for Map jobs. On the other hand, gzip is not > splittable. > > > > Check out these links. > > > > > > > http://www.cloudera.com/blog/2009/11/17/hadoop-at-twitter-part-1-splittable-lzo-compression/ > > http://wiki.apache.org/hadoop/UsingLzoCompression > > > > > > On Fri, Jan 8, 2010 at 1:13 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > The input file is in .gz format > > > FYI > > > > > > On Fri, Jan 8, 2010 at 11:08 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > My current project processes input file of size 333302161 bytes. > > > > What I plan to do is to split the file into equal size pieces (and on > > > blank > > > > line boundary) to improve performance. > > > > > > > > I found 12 classes in 0.20.1 source code which implement InputSplit. > > > > > > > > If someone has written code similar to what I plan to do, please > share > > > some > > > > hint. > > > > > > > > Thanks > > > > > > > > > > > > On Fri, Jan 8, 2010 at 2:27 AM, Amogh Vasekar <am...@yahoo-inc.com> > > > wrote: > > > > > > > >> Hi, > > > >> The deprecation is due to the new evolving mapreduce ( > o.a.h.mapreduce > > ) > > > >> APIs. Old APIs are supported for available distributions. The > > equivalent > > > of > > > >> TextInputFormat is available in new API : > > > >> > > > >> > > > >> > > > > > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html > > > >> > > > >> Thanks, > > > >> Amogh > > > >> > > > >> > > > >> On 1/8/10 3:47 AM, "Ted Yu" <yuzhih...@gmail.com> wrote: > > > >> > > > >> According to: > > > >> > > > >> > > > > > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/TextInputFormat.html#isSplitable%28org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path%29 > > > >> > > > >> isSplitable() is deprecated. > > > >> > > > >> Which method should I use to replace it ? > > > >> > > > >> Thanks > > > >> > > > >> > > > > > > > > > >