Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Dmitriy Ryaboy Wed, 22 Sep 2010 12:30:22 -0700

try PIG_CLASSPATH

Oh and you might need to explicitly register them.. sorry, forgot. We just
have them on the hadoop classpath on the nodes themselves, so we don't have
to do that, but you might if you are starting fresh.


-D

On Wed, Sep 22, 2010 at 12:01 PM, pig <hadoopn...@gmail.com> wrote:

> [foo]$ echo $CLASSPATH
> :/usr/lib/elephant-bird/lib/*
>
> This has been set for both user foo and hadoop but I still get the same
> error.  Is this the correct environment variable to be setting?
>
> Thank you!
>
> ~Ed
>
>
> On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> wrote:
>
> > elephant-bird/lib/* (the * is important)
> >
> > On Wed, Sep 22, 2010 at 11:42 AM, pig <hadoopn...@gmail.com> wrote:
> >
> > > Well I thought that would be a simple enough fix but no luck so far.
> > >
> > > I've added the elephant-bird/lib directory (which I made world readable
> > and
> > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH as
> > > both
> > > the user running grunt and the hadoop user. (sort of a shotgun
> approach)
> > >
> > > I still get the error where it complains about nogplcompression and in
> > the
> > > log it has an error where it can't find com.google.common.collect.Maps
> > >
> > > Are these two separate problems or is it one problem that is causing
> two
> > > different errors?  Thank you for the help!
> > >
> > > ~Ed
> > >
> > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> > > wrote:
> > >
> > > > You need the jars in elephant-bird's lib/ on your classpath to run
> > > > Elephant-Bird.
> > > >
> > > >
> > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <hadoopn...@gmail.com> wrote:
> > > >
> > > > > Thank you for pointing out the 0.7 branch.   I'm giving the 0.7
> > branch
> > > a
> > > > > shot and have run into a problem when trying to run the following
> > test
> > > > pig
> > > > > script:
> > > > >
> > > > > REGISTER elephant-bird-1.0.jar
> > > > > A = LOAD '/user/foo/input' USING
> > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> > > > > B = LIMIT A 100;
> > > > > DUMP B;
> > > > >
> > > > > When I try to run this I get the following error:
> > > > >
> > > > > java.lang.UnsatisfiedLinkError: no gplcompression in
> > java.library.path
> > > > >  ....
> > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
> > > > without
> > > > > native-hadoop
> > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected
> > internal
> > > > > error.  could not instantiate
> > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> arguments
> > > '[
> > > > > ]'
> > > > >
> > > > > Looking at the log file it gives the following:
> > > > >
> > > > > java.lang.RuntimeException: could not instantiate
> > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> arguments
> > > '[
> > > > > ]'
> > > > > ...
> > > > > Caused by: java.lang.reflect.InvocationTargetException
> > > > > ...
> > > > > Caused by: java.lang.NoClassDefFoundError:
> > > com/google/common/collect/Maps
> > > > > ...
> > > > > Caused by: java.lang.ClassNotFoundException:
> > > > com.google.common.collect.Maps
> > > > >
> > > > > What is confusing me is that LZO compression and decompression
> works
> > > fine
> > > > > when I'm running a normal java based map reduce program so I feel
> as
> > > > though
> > > > > the libraries have to be in the right place with the right settings
> > for
> > > > > java.library.path.  Otherwise how would normal java map-reduce
> work?
> > >  Is
> > > > > there some other location I need to set JAVA_LIBRARY_PATH for pig
> to
> > > pick
> > > > > it
> > > > > up?  My understanding was that it would get this from
> hadoop-env.sh.
> > >  Are
> > > > > the missing com.google.common.collect.Maps the real problem here?
> > >  Thank
> > > > > you
> > > > > for any help!
> > > > >
> > > > > ~Ed
> > > > >
> > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <
> dvrya...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ed,
> > > > > > Elephant-bird only works with 0.6 at the moment. There's a branch
> > for
> > > > 0.7
> > > > > > that I haven't tested:
> http://github.com/hirohanin/elephant-bird/
> > > > > > Try it, let me know if it works.
> > > > > >
> > > > > > -D
> > > > > >
> > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <hadoopn...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I have a small cluster up and running with LZO compressed files
> > in
> > > > it.
> > > > > >  I'm
> > > > > > > using the lzo compression libraries available at
> > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
> > maintaining
> > > > > this!)
> > > > > > >
> > > > > > > So far everything works fine when I write regular map-reduce
> > jobs.
> > >  I
> > > > > can
> > > > > > > read in lzo files and write out lzo files without any problem.
> > > > > > >
> > > > > > > I'm also using Pig 0.7 and it appears to be able to read LZO
> > files
> > > > out
> > > > > of
> > > > > > > the box using the default LoadFunc (PigStorage).  However, I am
> > > > > currently
> > > > > > > testing a large LZO file (20GB) which I indexed using the
> > > LzoIndexer
> > > > > and
> > > > > > > Pig
> > > > > > > does not appear to be making use of the indexes.  The pig
> scripts
> > > > that
> > > > > > I've
> > > > > > > run so far only have 3 mappers when processing the 20GB file.
>  My
> > > > > > > understanding was that there should be 1 map for each block
> > (256MB
> > > > > > blocks)
> > > > > > > so about 80 mappers when processing the 20GB lzo file.  Does
> Pig
> > > 0.7
> > > > > > > support
> > > > > > > indexed lzo files with the default load function?
> > > > > > >
> > > > > > > If not, I was looking at elephant-bird and noticed it is only
> > > > > compatible
> > > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be the
> > > > > > recommended
> > > > > > > solution for processing index lzo files using Pig 0.7.
> > > > > > >
> > > > > > > Thank you for any assistance!
> > > > > > >
> > > > > > > ~Ed
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Reply via email to