Hi Rohan, The test file (test_input_chars.txt.lzo) is not indexed. I created it using the command
'lzop test_input_chars.txt' It's a really small file (only 6 lines) so I didn't think it needed to be index. Do all files regardless of size need to be indexed for the LzoTokenizedLoader to work? Thank you! ~Ed On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan....@inmobi.com> wrote: > Oh Sorry I am completely out of sync... > > Can you tell how did you lzo'ed and indexed the file > > > Regards > Rohan > > Rohan Rai wrote: > >> Oh Sorry I did not see this mail ... >> >> Its not an official patch/release >> >> But here is a fork on elephant-bird which works with pig 0.7 >> >> for normal LZOText Loading etc >> >> (NOt HbaseLoader) >> >> Regards >> Rohan >> >> Dmitriy Ryaboy wrote: >> >> The 0.7 branch is not tested.. it's quite likely it doesn't actually work >>> :). >>> Rohan Rai was working on it.. Rohan, think you can take a look and help >>> Ed >>> out? >>> >>> Ed, you may want to check if the same input works when you use Pig 0.6 >>> (and >>> the official elephant-bird, on Kevin Weil's github). >>> >>> -D >>> >>> On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com> wrote: >>> >>> >>> >>> Hello, >>>> >>>> After getting all the errors to go away with LZO libraries not being >>>> found >>>> and missing jar files for elephant-bird I've run into a new problem when >>>> using the elephant-bird branch for pig 0.7 >>>> >>>> The following simple pig script works as expected >>>> >>>> REGISTER elephant-bird-1.0.jar >>>> REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar >>>> A = load '/usr/foo/input/test_input_chars.txt'; >>>> DUMP A; >>>> >>>> This just dumps out the contents of the test_input_chars.txt file which >>>> is >>>> tab delimited. The output looks like: >>>> >>>> (1,a,a,a,a,a,a) >>>> (2,b,b,b,b,b,b) >>>> (3,c,c,c,c,c,c) >>>> (4,d,d,d,d,d,d) >>>> (5,e,e,e,e,e,e) >>>> >>>> I then lzop the test file to get test_input_chars.txt.lzo (I >>>> decompressed >>>> this with lzop -d to make sure the compression worked fine and >>>> everything >>>> looks good). >>>> If I run the exact same script provided above on the lzo file it works >>>> fine. However, this file is really small and doesn't need to use >>>> indexes. >>>> As a result, I wanted to >>>> have LZO support that worked with indexes. Based on this I decided to >>>> try >>>> out the elephant-bird branch for pig 0.7 located here ( >>>> http://github.com/hirohanin/elephant-bird/) as >>>> recommended by Dimitriy. >>>> >>>> I created the following pig script that mirrors the above script but >>>> should >>>> hopefully work on LZO files (including indexed ones) >>>> >>>> REGISTER elephant-bird-1.0.jar >>>> REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar >>>> A = load '/usr/foo/input/test_input_chars.txt.lzo' USING >>>> com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t'); >>>> DUMP A; >>>> >>>> When I run this script which uses the LzoTokenizedLoader there is no >>>> output. The script appears to run without errors but there are zero >>>> Records >>>> Written and 0 Bytes Written. >>>> >>>> Here is the exact output: >>>> >>>> grunt > DUMP A; >>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - >>>> LzoTokenizedLoader with given delimited [ ] >>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - >>>> LzoTokenizedLoader with given delimited [ ] >>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - >>>> LzoTokenizedLoader with given delimited [ ] >>>> [main] INFO >>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine >>>> - >>>> (Name: >>>> >>>> >>>> Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage) >>>> - 1-4 Operator Key: 1-4 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>> - MR plan size before optimization: 1 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >>>> - MR plan size after optimization: 1 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to default >>>> 0.3 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>>> - Setting up single store job >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - 1 map-reduce job(s) waiting for submission. >>>> [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use >>>> GenericOptionsParser for parsing the arguments. Applications should >>>> implement Tool for the same. >>>> [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader - >>>> LzoTokenizedLoader with given delimiter [ ] >>>> [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - >>>> Total input paths to process : 1 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - 0% complete >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - HadoopJobId: job_201009101108_0151 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - More information at >>>> http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - 50% complete >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - 100% complete >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - Succesfully stored result in >>>> "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - Records written: 0 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - Bytes written: 0 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - Spillable Memory Manager spill count : 0 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - Proactive spill count : 0 >>>> [main] INFO >>>> >>>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>> - Success! >>>> [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - >>>> Total >>>> input paths to process: 1 >>>> [main] INFO >>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - >>>> Total input paths to process: 1 >>>> grunt > >>>> >>>> I'm not sure if I'm doing something wrong in my use of >>>> LzoTokenizedLoader >>>> or >>>> if there is a problem with the class itself (most likely the problem is >>>> with >>>> my code heh) Thank you for any help! >>>> >>>> ~Ed >>>> >>>> >>>> >>>> . >>> >>> >>> >>> >> >> The information contained in this communication is intended solely for the >> use of the individual or entity to whom it is addressed and others >> authorized to receive it. It may contain confidential or legally privileged >> information. If you are not the intended recipient you are hereby notified >> that any disclosure, copying, distribution or taking any action in reliance >> on the contents of this information is strictly prohibited and may be >> unlawful. If you have received this communication in error, please notify us >> immediately by responding to this email and then delete it from your system. >> The firm is neither liable for the proper and complete transmission of the >> information contained in this communication nor for any delay in its >> receipt. >> . >> >> >> > > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you have received this communication in error, please notify us > immediately by responding to this email and then delete it from your system. > The firm is neither liable for the proper and complete transmission of the > information contained in this communication nor for any delay in its > receipt. >