Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

pig Wed, 22 Sep 2010 13:47:53 -0700

I added the jars to all my nodes in /usr/lib/elephant-pig/lib

I then modified hadoop-env.sh for all nodes so that it includes the entry


     export PIG_CLASSPATH=/usr/lib/elephant-pig/lib/*:$PIG_CLASSPATH

I start up the grunt shell and first past the line:

     REGISTER elephant-bird-1.0.jar

This has no problems.  Then I add the line:

     A = LOAD '/user/foo/input' USING
com.twitter.elephantbird.pig.load.LzoTokenizedLoader('|');

At this point the following error prints to screen:

--------------------
[main] ERROR com.hadoop.compression.lzo.GPLNativeCodeLoader - Could not load
native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
...
[main] ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
without native-hadoop
--------------------

No log entry is generated and the grunt shell continues to work.  (LZO works
fine with when I run java based map-reduce programs). I then add the final 2
lines of the pig script:

     B=LIMIT A 100;
     DUMP B;

The program starts to execute and fails.  The nodes running the mapper give
the error java.lang.ClassNotFoundException: com.google.common.collect.Maps
and fails.  (This was the same error I was getting before in my pig log
files).  The class not found exception no longer shows up in my pig log
file.  In its place is a more generic RunTimeException.

On all nodes I also tried

     export PIG_CLASSPATH=/usr/lib/elephant-pig/lib:$PIG_CLASSPATH

(without the *)

and I also tried modifying JAVA_LIBRARY_PATH to include the location of the
elephant-pig jar files.

I'm using the cloudera distro of Hadoop 0.20.2 if that might someone be
causing problems.  When you said I might need to "register" the jar files
was does that mean exactly?  Thanks again for all your assistance and prompt
responses.

~Ed

On Wed, Sep 22, 2010 at 3:46 PM, pig <hadoopn...@gmail.com> wrote:

> Ah,
>
> I didn't realize I need to put the jars on all the nodes since the error is
> being thrown before the pig script actually executes (it's throwing the
> error in the parsing stage).  I assumed since the pig script hasn't executed
> yet it wasn't doing anything with the Hadoop nodes.
>
> I will try adding PIG_CLASSPATH to my hadoop-env.sh and will then put the
> jar files on all the slave nodes.  Hopefully that will solve the problem.
>
> ~Ed
>
>
> On Wed, Sep 22, 2010 at 3:28 PM, Dmitriy Ryaboy <dvrya...@gmail.com>wrote:
>
>> try PIG_CLASSPATH
>>
>> Oh and you might need to explicitly register them.. sorry, forgot. We just
>> have them on the hadoop classpath on the nodes themselves, so we don't
>> have
>> to do that, but you might if you are starting fresh.
>>
>> -D
>>
>> On Wed, Sep 22, 2010 at 12:01 PM, pig <hadoopn...@gmail.com> wrote:
>>
>> > [foo]$ echo $CLASSPATH
>> > :/usr/lib/elephant-bird/lib/*
>> >
>> > This has been set for both user foo and hadoop but I still get the same
>> > error.  Is this the correct environment variable to be setting?
>> >
>> > Thank you!
>> >
>> > ~Ed
>> >
>> >
>> > On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
>> > wrote:
>> >
>> > > elephant-bird/lib/* (the * is important)
>> > >
>> > > On Wed, Sep 22, 2010 at 11:42 AM, pig <hadoopn...@gmail.com> wrote:
>> > >
>> > > > Well I thought that would be a simple enough fix but no luck so far.
>> > > >
>> > > > I've added the elephant-bird/lib directory (which I made world
>> readable
>> > > and
>> > > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH
>> as
>> > > > both
>> > > > the user running grunt and the hadoop user. (sort of a shotgun
>> > approach)
>> > > >
>> > > > I still get the error where it complains about nogplcompression and
>> in
>> > > the
>> > > > log it has an error where it can't find
>> com.google.common.collect.Maps
>> > > >
>> > > > Are these two separate problems or is it one problem that is causing
>> > two
>> > > > different errors?  Thank you for the help!
>> > > >
>> > > > ~Ed
>> > > >
>> > > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dvrya...@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > You need the jars in elephant-bird's lib/ on your classpath to run
>> > > > > Elephant-Bird.
>> > > > >
>> > > > >
>> > > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <hadoopn...@gmail.com>
>> wrote:
>> > > > >
>> > > > > > Thank you for pointing out the 0.7 branch.   I'm giving the 0.7
>> > > branch
>> > > > a
>> > > > > > shot and have run into a problem when trying to run the
>> following
>> > > test
>> > > > > pig
>> > > > > > script:
>> > > > > >
>> > > > > > REGISTER elephant-bird-1.0.jar
>> > > > > > A = LOAD '/user/foo/input' USING
>> > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
>> > > > > > B = LIMIT A 100;
>> > > > > > DUMP B;
>> > > > > >
>> > > > > > When I try to run this I get the following error:
>> > > > > >
>> > > > > > java.lang.UnsatisfiedLinkError: no gplcompression in
>> > > java.library.path
>> > > > > >  ....
>> > > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load
>> native-lzo
>> > > > > without
>> > > > > > native-hadoop
>> > > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected
>> > > internal
>> > > > > > error.  could not instantiate
>> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
>> > arguments
>> > > > '[
>> > > > > > ]'
>> > > > > >
>> > > > > > Looking at the log file it gives the following:
>> > > > > >
>> > > > > > java.lang.RuntimeException: could not instantiate
>> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
>> > arguments
>> > > > '[
>> > > > > > ]'
>> > > > > > ...
>> > > > > > Caused by: java.lang.reflect.InvocationTargetException
>> > > > > > ...
>> > > > > > Caused by: java.lang.NoClassDefFoundError:
>> > > > com/google/common/collect/Maps
>> > > > > > ...
>> > > > > > Caused by: java.lang.ClassNotFoundException:
>> > > > > com.google.common.collect.Maps
>> > > > > >
>> > > > > > What is confusing me is that LZO compression and decompression
>> > works
>> > > > fine
>> > > > > > when I'm running a normal java based map reduce program so I
>> feel
>> > as
>> > > > > though
>> > > > > > the libraries have to be in the right place with the right
>> settings
>> > > for
>> > > > > > java.library.path.  Otherwise how would normal java map-reduce
>> > work?
>> > > >  Is
>> > > > > > there some other location I need to set JAVA_LIBRARY_PATH for
>> pig
>> > to
>> > > > pick
>> > > > > > it
>> > > > > > up?  My understanding was that it would get this from
>> > hadoop-env.sh.
>> > > >  Are
>> > > > > > the missing com.google.common.collect.Maps the real problem
>> here?
>> > > >  Thank
>> > > > > > you
>> > > > > > for any help!
>> > > > > >
>> > > > > > ~Ed
>> > > > > >
>> > > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <
>> > dvrya...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi Ed,
>> > > > > > > Elephant-bird only works with 0.6 at the moment. There's a
>> branch
>> > > for
>> > > > > 0.7
>> > > > > > > that I haven't tested:
>> > http://github.com/hirohanin/elephant-bird/
>> > > > > > > Try it, let me know if it works.
>> > > > > > >
>> > > > > > > -D
>> > > > > > >
>> > > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <hadoopn...@gmail.com>
>> > wrote:
>> > > > > > >
>> > > > > > > > Hello,
>> > > > > > > >
>> > > > > > > > I have a small cluster up and running with LZO compressed
>> files
>> > > in
>> > > > > it.
>> > > > > > >  I'm
>> > > > > > > > using the lzo compression libraries available at
>> > > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
>> > > maintaining
>> > > > > > this!)
>> > > > > > > >
>> > > > > > > > So far everything works fine when I write regular map-reduce
>> > > jobs.
>> > > >  I
>> > > > > > can
>> > > > > > > > read in lzo files and write out lzo files without any
>> problem.
>> > > > > > > >
>> > > > > > > > I'm also using Pig 0.7 and it appears to be able to read LZO
>> > > files
>> > > > > out
>> > > > > > of
>> > > > > > > > the box using the default LoadFunc (PigStorage).  However, I
>> am
>> > > > > > currently
>> > > > > > > > testing a large LZO file (20GB) which I indexed using the
>> > > > LzoIndexer
>> > > > > > and
>> > > > > > > > Pig
>> > > > > > > > does not appear to be making use of the indexes.  The pig
>> > scripts
>> > > > > that
>> > > > > > > I've
>> > > > > > > > run so far only have 3 mappers when processing the 20GB
>> file.
>> >  My
>> > > > > > > > understanding was that there should be 1 map for each block
>> > > (256MB
>> > > > > > > blocks)
>> > > > > > > > so about 80 mappers when processing the 20GB lzo file.  Does
>> > Pig
>> > > > 0.7
>> > > > > > > > support
>> > > > > > > > indexed lzo files with the default load function?
>> > > > > > > >
>> > > > > > > > If not, I was looking at elephant-bird and noticed it is
>> only
>> > > > > > compatible
>> > > > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be
>> the
>> > > > > > > recommended
>> > > > > > > > solution for processing index lzo files using Pig 0.7.
>> > > > > > > >
>> > > > > > > > Thank you for any assistance!
>> > > > > > > >
>> > > > > > > > ~Ed
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Reply via email to