I added the jars to all my nodes in /usr/lib/elephant-pig/lib I then modified hadoop-env.sh for all nodes so that it includes the entry
export PIG_CLASSPATH=/usr/lib/elephant-pig/lib/*:$PIG_CLASSPATH I start up the grunt shell and first past the line: REGISTER elephant-bird-1.0.jar This has no problems. Then I add the line: A = LOAD '/user/foo/input' USING com.twitter.elephantbird.pig.load.LzoTokenizedLoader('|'); At this point the following error prints to screen: -------------------- [main] ERROR com.hadoop.compression.lzo.GPLNativeCodeLoader - Could not load native gpl library java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path ... [main] ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo without native-hadoop -------------------- No log entry is generated and the grunt shell continues to work. (LZO works fine with when I run java based map-reduce programs). I then add the final 2 lines of the pig script: B=LIMIT A 100; DUMP B; The program starts to execute and fails. The nodes running the mapper give the error java.lang.ClassNotFoundException: com.google.common.collect.Maps and fails. (This was the same error I was getting before in my pig log files). The class not found exception no longer shows up in my pig log file. In its place is a more generic RunTimeException. On all nodes I also tried export PIG_CLASSPATH=/usr/lib/elephant-pig/lib:$PIG_CLASSPATH (without the *) and I also tried modifying JAVA_LIBRARY_PATH to include the location of the elephant-pig jar files. I'm using the cloudera distro of Hadoop 0.20.2 if that might someone be causing problems. When you said I might need to "register" the jar files was does that mean exactly? Thanks again for all your assistance and prompt responses. ~Ed On Wed, Sep 22, 2010 at 3:46 PM, pig <hadoopn...@gmail.com> wrote: > Ah, > > I didn't realize I need to put the jars on all the nodes since the error is > being thrown before the pig script actually executes (it's throwing the > error in the parsing stage). I assumed since the pig script hasn't executed > yet it wasn't doing anything with the Hadoop nodes. > > I will try adding PIG_CLASSPATH to my hadoop-env.sh and will then put the > jar files on all the slave nodes. Hopefully that will solve the problem. > > ~Ed > > > On Wed, Sep 22, 2010 at 3:28 PM, Dmitriy Ryaboy <dvrya...@gmail.com>wrote: > >> try PIG_CLASSPATH >> >> Oh and you might need to explicitly register them.. sorry, forgot. We just >> have them on the hadoop classpath on the nodes themselves, so we don't >> have >> to do that, but you might if you are starting fresh. >> >> -D >> >> On Wed, Sep 22, 2010 at 12:01 PM, pig <hadoopn...@gmail.com> wrote: >> >> > [foo]$ echo $CLASSPATH >> > :/usr/lib/elephant-bird/lib/* >> > >> > This has been set for both user foo and hadoop but I still get the same >> > error. Is this the correct environment variable to be setting? >> > >> > Thank you! >> > >> > ~Ed >> > >> > >> > On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dvrya...@gmail.com> >> > wrote: >> > >> > > elephant-bird/lib/* (the * is important) >> > > >> > > On Wed, Sep 22, 2010 at 11:42 AM, pig <hadoopn...@gmail.com> wrote: >> > > >> > > > Well I thought that would be a simple enough fix but no luck so far. >> > > > >> > > > I've added the elephant-bird/lib directory (which I made world >> readable >> > > and >> > > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH >> as >> > > > both >> > > > the user running grunt and the hadoop user. (sort of a shotgun >> > approach) >> > > > >> > > > I still get the error where it complains about nogplcompression and >> in >> > > the >> > > > log it has an error where it can't find >> com.google.common.collect.Maps >> > > > >> > > > Are these two separate problems or is it one problem that is causing >> > two >> > > > different errors? Thank you for the help! >> > > > >> > > > ~Ed >> > > > >> > > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dvrya...@gmail.com >> > >> > > > wrote: >> > > > >> > > > > You need the jars in elephant-bird's lib/ on your classpath to run >> > > > > Elephant-Bird. >> > > > > >> > > > > >> > > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <hadoopn...@gmail.com> >> wrote: >> > > > > >> > > > > > Thank you for pointing out the 0.7 branch. I'm giving the 0.7 >> > > branch >> > > > a >> > > > > > shot and have run into a problem when trying to run the >> following >> > > test >> > > > > pig >> > > > > > script: >> > > > > > >> > > > > > REGISTER elephant-bird-1.0.jar >> > > > > > A = LOAD '/user/foo/input' USING >> > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); >> > > > > > B = LIMIT A 100; >> > > > > > DUMP B; >> > > > > > >> > > > > > When I try to run this I get the following error: >> > > > > > >> > > > > > java.lang.UnsatisfiedLinkError: no gplcompression in >> > > java.library.path >> > > > > > .... >> > > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load >> native-lzo >> > > > > without >> > > > > > native-hadoop >> > > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected >> > > internal >> > > > > > error. could not instantiate >> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with >> > arguments >> > > > '[ >> > > > > > ]' >> > > > > > >> > > > > > Looking at the log file it gives the following: >> > > > > > >> > > > > > java.lang.RuntimeException: could not instantiate >> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with >> > arguments >> > > > '[ >> > > > > > ]' >> > > > > > ... >> > > > > > Caused by: java.lang.reflect.InvocationTargetException >> > > > > > ... >> > > > > > Caused by: java.lang.NoClassDefFoundError: >> > > > com/google/common/collect/Maps >> > > > > > ... >> > > > > > Caused by: java.lang.ClassNotFoundException: >> > > > > com.google.common.collect.Maps >> > > > > > >> > > > > > What is confusing me is that LZO compression and decompression >> > works >> > > > fine >> > > > > > when I'm running a normal java based map reduce program so I >> feel >> > as >> > > > > though >> > > > > > the libraries have to be in the right place with the right >> settings >> > > for >> > > > > > java.library.path. Otherwise how would normal java map-reduce >> > work? >> > > > Is >> > > > > > there some other location I need to set JAVA_LIBRARY_PATH for >> pig >> > to >> > > > pick >> > > > > > it >> > > > > > up? My understanding was that it would get this from >> > hadoop-env.sh. >> > > > Are >> > > > > > the missing com.google.common.collect.Maps the real problem >> here? >> > > > Thank >> > > > > > you >> > > > > > for any help! >> > > > > > >> > > > > > ~Ed >> > > > > > >> > > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy < >> > dvrya...@gmail.com> >> > > > > > wrote: >> > > > > > >> > > > > > > Hi Ed, >> > > > > > > Elephant-bird only works with 0.6 at the moment. There's a >> branch >> > > for >> > > > > 0.7 >> > > > > > > that I haven't tested: >> > http://github.com/hirohanin/elephant-bird/ >> > > > > > > Try it, let me know if it works. >> > > > > > > >> > > > > > > -D >> > > > > > > >> > > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <hadoopn...@gmail.com> >> > wrote: >> > > > > > > >> > > > > > > > Hello, >> > > > > > > > >> > > > > > > > I have a small cluster up and running with LZO compressed >> files >> > > in >> > > > > it. >> > > > > > > I'm >> > > > > > > > using the lzo compression libraries available at >> > > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for >> > > maintaining >> > > > > > this!) >> > > > > > > > >> > > > > > > > So far everything works fine when I write regular map-reduce >> > > jobs. >> > > > I >> > > > > > can >> > > > > > > > read in lzo files and write out lzo files without any >> problem. >> > > > > > > > >> > > > > > > > I'm also using Pig 0.7 and it appears to be able to read LZO >> > > files >> > > > > out >> > > > > > of >> > > > > > > > the box using the default LoadFunc (PigStorage). However, I >> am >> > > > > > currently >> > > > > > > > testing a large LZO file (20GB) which I indexed using the >> > > > LzoIndexer >> > > > > > and >> > > > > > > > Pig >> > > > > > > > does not appear to be making use of the indexes. The pig >> > scripts >> > > > > that >> > > > > > > I've >> > > > > > > > run so far only have 3 mappers when processing the 20GB >> file. >> > My >> > > > > > > > understanding was that there should be 1 map for each block >> > > (256MB >> > > > > > > blocks) >> > > > > > > > so about 80 mappers when processing the 20GB lzo file. Does >> > Pig >> > > > 0.7 >> > > > > > > > support >> > > > > > > > indexed lzo files with the default load function? >> > > > > > > > >> > > > > > > > If not, I was looking at elephant-bird and noticed it is >> only >> > > > > > compatible >> > > > > > > > with Pig 0.6 and not 0.7+ Is that accurate? What would be >> the >> > > > > > > recommended >> > > > > > > > solution for processing index lzo files using Pig 0.7. >> > > > > > > > >> > > > > > > > Thank you for any assistance! >> > > > > > > > >> > > > > > > > ~Ed >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >