The validation done by me was on
Apache 0.20.2 and Apache Pig 0.7..
I haven't tried it with Cloudera's version.
Can we verify that it doesnt work with them too
Regards
Rohan
ed wrote:
Hello,
I'm using Cloudera's Hadoop CDH3B2--Hadoop-0.20.2+320 (based on Apache
Hadoop 20.2) with Pig 0.7 (from Cloudera's distro).
Thank you!
~Ed
On Wed, Sep 29, 2010 at 11:56 PM, Rohan Rai <rohan....@inmobi.com> wrote:
Hi
Which Hadoop/ PIg version are you using ??
Regards
Rohan
ed wrote:
Hello,
I tested the newest push to the hirohanin elephant-bird branch (for pig
0.7)
and had an error when trying to use LzoTokenizedLoader with the following
pig script:
REGISTER elephant-bird-1.0.jar
REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
DUMP A;
The error I get is in the mapper logs and is as follows:
INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
library
INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded & initialized
native-lzo library
INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader:
LzoTokenizedLoader with given delimiter [ ]
INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: Seeking to
split start at pos 0
FATAL org.apache.hadoop.mapred.TaskTracker: Error running child :
java.lang.NoSuchMethodError:
org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext;
at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC (Unknown
Source)
at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter
(Unknown Source)
at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter
(Unknown Source)
at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext
(Unknown Source)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue
(PigRecordReader.java:142)
at
org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
(MapContent.java:67)
at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Do you think I'm forgetting some required library?
Thank you!
~Ed
On Tue, Sep 28, 2010 at 2:10 PM, ed <hadoopn...@gmail.com> wrote:
Thank you Rohan, I really appreciate your help! I'll give it shot and
post back if it works.
~Ed
On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai <rohan....@inmobi.com>
wrote:
Just corrected/tested and pushed LzoTokenizedLoader to the personal fork
Hopefully it works now
Regards
Rohan
Dmitriy Ryaboy wrote:
lzop should work.
On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan....@inmobi.com>
wrote:
Well
I haven't tried (rather I don't remember) compressing via lzop and
then
putting on cluster...
So cant tell you about that...Here is what works for me.
I do it by first putting the file on cluster and then doing Stream
Compression.
And yes it need not be indexed (I guess it doesn't matter for small
test file, otherwise it is unwise
for one loses the benefit of parallelism)
Regards
Rohan
pig wrote:
Hi Rohan,
The test file (test_input_chars.txt.lzo) is not indexed. I created
it
using
the command
'lzop test_input_chars.txt'
It's a really small file (only 6 lines) so I didn't think it needed
to
be
index. Do all files regardless of size need to be indexed for the
LzoTokenizedLoader to work?
Thank you!
~Ed
On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan....@inmobi.com>
wrote:
Oh Sorry I am completely out of sync...
Can you tell how did you lzo'ed and indexed the file
Regards
Rohan
Rohan Rai wrote:
Oh Sorry I did not see this mail ...
Its not an official patch/release
But here is a fork on elephant-bird which works with pig 0.7
for normal LZOText Loading etc
(NOt HbaseLoader)
Regards
Rohan
Dmitriy Ryaboy wrote:
The 0.7 branch is not tested.. it's quite likely it doesn't
actually
work
:).
Rohan Rai was working on it.. Rohan, think you can take a look and
help
Ed
out?
Ed, you may want to check if the same input works when you use Pig
0.6
(and
the official elephant-bird, on Kevin Weil's github).
-D
On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopn...@gmail.com>
wrote:
Hello,
After getting all the errors to go away with LZO libraries not
being
found
and missing jar files for elephant-bird I've run into a new
problem
when
using the elephant-bird branch for pig 0.7
The following simple pig script works as expected
REGISTER elephant-bird-1.0.jar
REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
A = load '/usr/foo/input/test_input_chars.txt';
DUMP A;
This just dumps out the contents of the test_input_chars.txt file
which
is
tab delimited. The output looks like:
(1,a,a,a,a,a,a)
(2,b,b,b,b,b,b)
(3,c,c,c,c,c,c)
(4,d,d,d,d,d,d)
(5,e,e,e,e,e,e)
I then lzop the test file to get test_input_chars.txt.lzo (I
decompressed
this with lzop -d to make sure the compression worked fine and
everything
looks good).
If I run the exact same script provided above on the lzo file it
works
fine. However, this file is really small and doesn't need to use
indexes.
As a result, I wanted to
have LZO support that worked with indexes. Based on this I
decided
to
try
out the elephant-bird branch for pig 0.7 located here (
http://github.com/hirohanin/elephant-bird/) as
recommended by Dimitriy.
I created the following pig script that mirrors the above script
but
should
hopefully work on LZO files (including indexed ones)
REGISTER elephant-bird-1.0.jar
REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
DUMP A;
When I run this script which uses the LzoTokenizedLoader there is
no
output. The script appears to run without errors but there are
zero
Records
Written and 0 Bytes Written.
Here is the exact output:
grunt > DUMP A;
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
-
LzoTokenizedLoader with given delimited [ ]
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
-
LzoTokenizedLoader with given delimited [ ]
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
-
LzoTokenizedLoader with given delimited [ ]
[main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
-
(Name:
Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
- 1-4 Operator Key: 1-4
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to
default
0.3
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
[Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
GenericOptionsParser for parsing the arguments. Applications
should
implement Tool for the same.
[Thread-12] INFO
com.twitter.elephantbird.pig.load.LzoTokenizedLoader
-
LzoTokenizedLoader with given delimiter [ ]
[Thread-12] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat
-
Total input paths to process : 1
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201009101108_0151
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at
http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Succesfully stored result in
"hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written: 0
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written: 0
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Spillable Memory Manager spill count : 0
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Proactive spill count : 0
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
[main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat
-
Total
input paths to process: 1
[main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
Total input paths to process: 1
grunt >
I'm not sure if I'm doing something wrong in my use of
LzoTokenizedLoader
or
if there is a problem with the class itself (most likely the
problem
is
with
my code heh) Thank you for any help!
~Ed
.
The information contained in this communication is intended
solely for
the
use of the individual or entity to whom it is addressed and
others
authorized to receive it. It may contain confidential or legally
privileged
information. If you are not the intended recipient you are hereby
notified
that any disclosure, copying, distribution or taking any action in
reliance
on the contents of this information is strictly prohibited and may
be
unlawful. If you have received this communication in error, please
notify us
immediately by responding to this email and then delete it from
your
system.
The firm is neither liable for the proper and complete transmission
of
the
information contained in this communication nor for any delay in
its
receipt.
.
The information contained in this communication is intended solely
for
the
use of the individual or entity to whom it is addressed and others
authorized to receive it. It may contain confidential or legally
privileged
information. If you are not the intended recipient you are hereby
notified
that any disclosure, copying, distribution or taking any action in
reliance
on the contents of this information is strictly prohibited and may
be
unlawful. If you have received this communication in error, please
notify
us
immediately by responding to this email and then delete it from your
system.
The firm is neither liable for the proper and complete transmission
of
the
information contained in this communication nor for any delay in its
receipt.
.
The information contained in this communication is intended solely
for
the
use of the individual or entity to whom it is addressed and others
authorized to receive it. It may contain confidential or legally
privileged
information. If you are not the intended recipient you are hereby
notified
that any disclosure, copying, distribution or taking any action in
reliance
on the contents of this information is strictly prohibited and may be
unlawful. If you have received this communication in error, please
notify us
immediately by responding to this email and then delete it from your
system.
The firm is neither liable for the proper and complete transmission of
the
information contained in this communication nor for any delay in its
receipt.
.
The information contained in this communication is intended solely for
the
use of the individual or entity to whom it is addressed and others
authorized to receive it. It may contain confidential or legally
privileged
information. If you are not the intended recipient you are hereby
notified
that any disclosure, copying, distribution or taking any action in
reliance
on the contents of this information is strictly prohibited and may be
unlawful. If you have received this communication in error, please
notify us
immediately by responding to this email and then delete it from your
system.
The firm is neither liable for the proper and complete transmission of
the
information contained in this communication nor for any delay in its
receipt.
.
The information contained in this communication is intended solely for the
use of the individual or entity to whom it is addressed and others
authorized to receive it. It may contain confidential or legally privileged
information. If you are not the intended recipient you are hereby notified
that any disclosure, copying, distribution or taking any action in reliance
on the contents of this information is strictly prohibited and may be
unlawful. If you have received this communication in error, please notify us
immediately by responding to this email and then delete it from your system.
The firm is neither liable for the proper and complete transmission of the
information contained in this communication nor for any delay in its
receipt.
.
The information contained in this communication is intended solely for the use
of the individual or entity to whom it is addressed and others authorized to
receive it. It may contain confidential or legally privileged information. If
you are not the intended recipient you are hereby notified that any disclosure,
copying, distribution or taking any action in reliance on the contents of this
information is strictly prohibited and may be unlawful. If you have received
this communication in error, please notify us immediately by responding to this
email and then delete it from your system. The firm is neither liable for the
proper and complete transmission of the information contained in this
communication nor for any delay in its receipt.