Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

pig Mon, 27 Sep 2010 05:56:07 -0700

Hi Rohan,

The test file (test_input_chars.txt.lzo) is not indexed.  I created it using
the command


'lzop test_input_chars.txt'

It's a really small file (only 6 lines) so I didn't think it needed to be
index.  Do all files regardless of size need to be indexed for the
LzoTokenizedLoader to work?

Thank you!

~Ed

On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan....@inmobi.com> wrote:

> Oh  Sorry I am completely out of  sync...
>
> Can you tell how did you lzo'ed and indexed  the file
>
>
> Regards
> Rohan
>
> Rohan Rai wrote:
>
>> Oh Sorry I did not see this mail ...
>>
>> Its not an official patch/release
>>
>> But here is a fork on elephant-bird which works with pig 0.7
>>
>> for  normal LZOText Loading etc
>>
>> (NOt HbaseLoader)
>>
>> Regards
>> Rohan
>>
>> Dmitriy Ryaboy wrote:
>>
>>  The 0.7 branch is not tested.. it's quite likely it doesn't actually work
>>> :).
>>> Rohan Rai was working on it.. Rohan, think you can take a look and help
>>> Ed
>>> out?
>>>
>>> Ed, you may want to check if the same input works when you use Pig 0.6
>>> (and
>>> the official elephant-bird, on Kevin Weil's github).
>>>
>>> -D
>>>
>>> On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com> wrote:
>>>
>>>
>>>
>>>  Hello,
>>>>
>>>> After getting all the errors to go away with LZO libraries not being
>>>> found
>>>> and missing jar files for elephant-bird I've run into a new problem when
>>>> using the elephant-bird branch for pig 0.7
>>>>
>>>> The following simple pig script works as expected
>>>>
>>>>    REGISTER elephant-bird-1.0.jar
>>>>    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
>>>>    A = load '/usr/foo/input/test_input_chars.txt';
>>>>    DUMP A;
>>>>
>>>> This just dumps out the contents of the test_input_chars.txt file which
>>>> is
>>>> tab delimited. The output looks like:
>>>>
>>>>    (1,a,a,a,a,a,a)
>>>>    (2,b,b,b,b,b,b)
>>>>    (3,c,c,c,c,c,c)
>>>>    (4,d,d,d,d,d,d)
>>>>    (5,e,e,e,e,e,e)
>>>>
>>>> I then lzop the test file to get test_input_chars.txt.lzo (I
>>>> decompressed
>>>> this with lzop -d to make sure the compression worked fine and
>>>> everything
>>>> looks good).
>>>> If I run the exact same script provided above on the lzo file it works
>>>> fine.  However, this file is really small and doesn't need to use
>>>> indexes.
>>>> As a result, I wanted to
>>>> have LZO support that worked with indexes.  Based on this I decided to
>>>> try
>>>> out the elephant-bird branch for pig 0.7 located here (
>>>> http://github.com/hirohanin/elephant-bird/) as
>>>> recommended by Dimitriy.
>>>>
>>>> I created the following pig script that mirrors the above script but
>>>> should
>>>> hopefully work on LZO files (including indexed ones)
>>>>
>>>>    REGISTER elephant-bird-1.0.jar
>>>>    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
>>>>    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
>>>> com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
>>>>    DUMP A;
>>>>
>>>> When I run this script which uses the LzoTokenizedLoader there is no
>>>> output.  The script appears to run without errors but there are zero
>>>> Records
>>>> Written and 0 Bytes Written.
>>>>
>>>> Here is the exact output:
>>>>
>>>> grunt > DUMP A;
>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
>>>> LzoTokenizedLoader with given delimited [     ]
>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
>>>> LzoTokenizedLoader with given delimited [     ]
>>>> [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
>>>> LzoTokenizedLoader with given delimited [     ]
>>>> [main] INFO
>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
>>>> -
>>>> (Name:
>>>>
>>>>
>>>> Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
>>>> - 1-4 Operator Key: 1-4
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>> - MR plan size before optimization: 1
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>> - MR plan size after optimization: 1
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to default
>>>> 0.3
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>>> - Setting up single store job
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - 1 map-reduce job(s) waiting for submission.
>>>> [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
>>>> GenericOptionsParser for parsing the arguments.  Applications should
>>>> implement Tool for the same.
>>>> [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
>>>> LzoTokenizedLoader with given delimiter [     ]
>>>> [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
>>>> Total input paths to process : 1
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - 0% complete
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - HadoopJobId: job_201009101108_0151
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - More information at
>>>> http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - 50% complete
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - 100% complete
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - Succesfully stored result in
>>>> "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - Records written: 0
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - Bytes written: 0
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - Spillable Memory Manager spill count : 0
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - Proactive spill count : 0
>>>> [main] INFO
>>>>
>>>>
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - Success!
>>>> [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
>>>> Total
>>>> input paths to process: 1
>>>> [main] INFO
>>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
>>>> Total input paths to process: 1
>>>> grunt >
>>>>
>>>> I'm not sure if I'm doing something wrong in my use of
>>>> LzoTokenizedLoader
>>>> or
>>>> if there is a problem with the class itself (most likely the problem is
>>>> with
>>>> my code heh)  Thank you for any help!
>>>>
>>>> ~Ed
>>>>
>>>>
>>>>
>>>>  .
>>>
>>>
>>>
>>>
>>
>> The information contained in this communication is intended solely for the
>> use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you have received this communication in error, please notify us
>> immediately by responding to this email and then delete it from your system.
>> The firm is neither liable for the proper and complete transmission of the
>> information contained in this communication nor for any delay in its
>> receipt.
>> .
>>
>>
>>
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>

Re: Problem with LzoTokenizedLoader with elephant-bird branch for Pig 0.7

Reply via email to