Re: MR2 Job over LZO data

2014-03-09 Thread Gordon Wang
Can you run MR jobs (not pig job) which takes Lzo Files as input ?

If you can not run MR jobs. You may want to check the lzo compression
configuration in core-site.xml. Make sure the dynamic library is in
HADOOP_HOME/lib/native/

Here is a FAQ about how to configure lzo
https://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/wiki/FAQ?redir=1






On Sat, Mar 8, 2014 at 12:04 AM, Viswanathan J
jayamviswanat...@gmail.comwrote:

 Hi,

 Getting the below error while running pig job in hadoop-2.x,

 Caused by: java.io.IOException: No codec for file found
 2639   at
 com.twitter.elephantbird.mapreduce.input.MultiInputFormat.determineFileFormat(MultiInputFormat.java:176)
 2640   at
 com.twitter.elephantbird.mapreduce.input.MultiInputFormat.createRecordReader(MultiInputFormat.java:88)
 2641   at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:256)

 Have copied the respective lzo jars to lib folders, but facing this issue.

 pls help.



 On Fri, Mar 7, 2014 at 7:53 PM, German Florez-Larrahondo 
 german...@samsung.com wrote:

 King

 Here is my raw log of installing Hadoop LZO. This works on 2.2.0 and 2.3.0



 I hope this helps



 ./g





 *Where to get Hadoop LZO*

 https://github.com/twitter/hadoop-lzo




 http://asmarterplanet.com/studentsfor/blog/2013/11/hadoop-cluster-module-lzo-compression.html



 *Requirements*

 On cents:

 sudo yum install lzo*  -- /usr/lib64/liblzo2.so.2



 On ubuntu:

 sudo apt-get install liblzo --  on X86:  /usr/lib64/liblzo2.so.2



 *Clone:*

 git clone https://github.com/twitter/hadoop-lzo.git



 Follow instructions on README.md from this github site, basically



  cd hadoop-lzo

 * mvn clean package  test*



 *To enable this at run time do:*

 a.   Copy the library to the hadoop/share/common (if  you don't want
 to modify classpaths by putting the library somewhere else)



 cp lzo..././target/hadoop-lzo-0.4.20-SNAPSHOT.jar  ..
 hadoop/share/hadoop/common/



 a.   Copy /usr/lib64/liblzo2.so.2 to  .. Hadoop/lib/native/





 *From:* Gordon Wang [mailto:gw...@gopivotal.com]
 *Sent:* Thursday, March 06, 2014 11:50 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: MR2 Job over LZO data



 You can try to get the source code https://github.com/twitter/hadoop-lzo and 
 then compile it against hadoop 2.2.0.



 In my memory, as long as rebuild it, lzo should work with hadoop 2.2.0



 On Thu, Mar 6, 2014 at 6:29 PM, KingDavies kingdav...@gmail.com wrote:

 Running on Hadoop 2.2.0



 The Java MR2 job works as expected on an uncompressed data source using
 the TextInputFormat.class.

 But when using the LZO format the job fails:

 import com.hadoop.mapreduce.LzoTextInputFormat;

 job.setInputFormatClass(LzoTextInputFormat.class);



 Dependencies from the maven repository:

 http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/

 Also tried with elephant-bird-core 4.4



 The same data can be queried fine from within Hive(0.12) on the same
 cluster.





 The exception:

 Exception in thread main java.lang.IncompatibleClassChangeError: Found
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected

 at
 com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62)

 at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)

 at
 com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)

 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)

 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)

 at com.cloudreach.DataQuality.Main.main(Main.java:42)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



 I believe the issue is related to the changes in Hadoop 2, but where can
 I find a H2 compatible version?



 Thanks





 --

 Regards

 Gordon Wang




 --
 Regards,
 Viswa.J




-- 
Regards
Gordon Wang


RE: MR2 Job over LZO data

2014-03-07 Thread German Florez-Larrahondo
King

Here is my raw log of installing Hadoop LZO. This works on 2.2.0 and 2.3.0

 

I hope this helps

 

./g

 

 

Where to get Hadoop LZO

https://github.com/twitter/hadoop-lzo

 

http://asmarterplanet.com/studentsfor/blog/2013/11/hadoop-cluster-module-lzo
-compression.html

 

Requirements

On cents:

sudo yum install lzo*  -- /usr/lib64/liblzo2.so.2..

 

On ubuntu: 

sudo apt-get install liblzo --  on X86:  /usr/lib64/liblzo2.so.2   

 

Clone:

git clone https://github.com/twitter/hadoop-lzo.git

 

Follow instructions on README.md from this github site, basically

 

 cd hadoop-lzo

 mvn clean package  test

 

To enable this at run time do:

a.   Copy the library to the hadoop/share/common (if  you don't want to
modify classpaths by putting the library somewhere else)

 

cp lzo././target/hadoop-lzo-0.4.20-SNAPSHOT.jar  ..
hadoop/share/hadoop/common/

 

a.   Copy /usr/lib64/liblzo2.so.2 to  .. Hadoop/lib/native/

 

 

From: Gordon Wang [mailto:gw...@gopivotal.com] 
Sent: Thursday, March 06, 2014 11:50 PM
To: user@hadoop.apache.org
Subject: Re: MR2 Job over LZO data

 

You can try to get the source code https://github.com/twitter/hadoop-lzo
and then compile it against hadoop 2.2.0.

 

In my memory, as long as rebuild it, lzo should work with hadoop 2.2.0

 

On Thu, Mar 6, 2014 at 6:29 PM, KingDavies kingdav...@gmail.com wrote:

Running on Hadoop 2.2.0

 

The Java MR2 job works as expected on an uncompressed data source using the
TextInputFormat.class.

But when using the LZO format the job fails:

import com.hadoop.mapreduce.LzoTextInputFormat;

job.setInputFormatClass(LzoTextInputFormat.class);

 

Dependencies from the maven repository:

http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/

Also tried with elephant-bird-core 4.4

 

The same data can be queried fine from within Hive(0.12) on the same
cluster.

 

 

The exception:

Exception in thread main java.lang.IncompatibleClassChangeError: Found
interface org.apache.hadoop.mapreduce.JobContext, but class was expected

at
com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:6
2)

at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFor
mat.java:340)

at
com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:10
1)

at
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:49
1)

at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)

at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java
:392)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1491)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)

at com.cloudreach.DataQuality.Main.main(Main.java:42)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57
)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

 

I believe the issue is related to the changes in Hadoop 2, but where can I
find a H2 compatible version?

 

Thanks





 

-- 

Regards

Gordon Wang



Re: MR2 Job over LZO data

2014-03-07 Thread Viswanathan J
Hi,

Getting the below error while running pig job in hadoop-2.x,

Caused by: java.io.IOException: No codec for file found
2639   at
com.twitter.elephantbird.mapreduce.input.MultiInputFormat.determineFileFormat(MultiInputFormat.java:176)
2640   at
com.twitter.elephantbird.mapreduce.input.MultiInputFormat.createRecordReader(MultiInputFormat.java:88)
2641   at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:256)

Have copied the respective lzo jars to lib folders, but facing this issue.

pls help.



On Fri, Mar 7, 2014 at 7:53 PM, German Florez-Larrahondo 
german...@samsung.com wrote:

 King

 Here is my raw log of installing Hadoop LZO. This works on 2.2.0 and 2.3.0



 I hope this helps



 ./g





 *Where to get Hadoop LZO*

 https://github.com/twitter/hadoop-lzo




 http://asmarterplanet.com/studentsfor/blog/2013/11/hadoop-cluster-module-lzo-compression.html



 *Requirements*

 On cents:

 sudo yum install lzo*  -- /usr/lib64/liblzo2.so.2



 On ubuntu:

 sudo apt-get install liblzo --  on X86:  /usr/lib64/liblzo2.so.2



 *Clone:*

 git clone https://github.com/twitter/hadoop-lzo.git



 Follow instructions on README.md from this github site, basically



  cd hadoop-lzo

 * mvn clean package  test*



 *To enable this at run time do:*

 a.   Copy the library to the hadoop/share/common (if  you don't want
 to modify classpaths by putting the library somewhere else)



 cp lzo..././target/hadoop-lzo-0.4.20-SNAPSHOT.jar  ..
 hadoop/share/hadoop/common/



 a.   Copy /usr/lib64/liblzo2.so.2 to  .. Hadoop/lib/native/





 *From:* Gordon Wang [mailto:gw...@gopivotal.com]
 *Sent:* Thursday, March 06, 2014 11:50 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: MR2 Job over LZO data



 You can try to get the source code https://github.com/twitter/hadoop-lzo and 
 then compile it against hadoop 2.2.0.



 In my memory, as long as rebuild it, lzo should work with hadoop 2.2.0



 On Thu, Mar 6, 2014 at 6:29 PM, KingDavies kingdav...@gmail.com wrote:

 Running on Hadoop 2.2.0



 The Java MR2 job works as expected on an uncompressed data source using
 the TextInputFormat.class.

 But when using the LZO format the job fails:

 import com.hadoop.mapreduce.LzoTextInputFormat;

 job.setInputFormatClass(LzoTextInputFormat.class);



 Dependencies from the maven repository:

 http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/

 Also tried with elephant-bird-core 4.4



 The same data can be queried fine from within Hive(0.12) on the same
 cluster.





 The exception:

 Exception in thread main java.lang.IncompatibleClassChangeError: Found
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected

 at
 com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62)

 at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)

 at
 com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)

 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)

 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)

 at com.cloudreach.DataQuality.Main.main(Main.java:42)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



 I believe the issue is related to the changes in Hadoop 2, but where can I
 find a H2 compatible version?



 Thanks





 --

 Regards

 Gordon Wang




-- 
Regards,
Viswa.J


Re: MR2 Job over LZO data

2014-03-06 Thread Stanley Shi
May be you can try download the LZO class and rebuild it against Hadoop
2.2.0;
If build success, you should be good to go;
if failed, then maybe you need to wait for the LZO guys to update their
code.

Regards,
*Stanley Shi,*



On Thu, Mar 6, 2014 at 6:29 PM, KingDavies kingdav...@gmail.com wrote:

 Running on Hadoop 2.2.0

 The Java MR2 job works as expected on an uncompressed data source using
 the TextInputFormat.class.
 But when using the LZO format the job fails:
 import com.hadoop.mapreduce.LzoTextInputFormat;
 job.setInputFormatClass(LzoTextInputFormat.class);

 Dependencies from the maven repository:
 http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/
 Also tried with elephant-bird-core 4.4

 The same data can be queried fine from within Hive(0.12) on the same
 cluster.


 The exception:
 Exception in thread main java.lang.IncompatibleClassChangeError: Found
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
  at
 com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62)
 at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
  at
 com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
  at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
  at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
  at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
  at com.cloudreach.DataQuality.Main.main(Main.java:42)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

 I believe the issue is related to the changes in Hadoop 2, but where can I
 find a H2 compatible version?

 Thanks



Re: MR2 Job over LZO data

2014-03-06 Thread Gordon Wang
You can try to get the source code
https://github.com/twitter/hadoop-lzo and then compile it against
hadoop 2.2.0.

In my memory, as long as rebuild it, lzo should work with hadoop 2.2.0


On Thu, Mar 6, 2014 at 6:29 PM, KingDavies kingdav...@gmail.com wrote:

 Running on Hadoop 2.2.0

 The Java MR2 job works as expected on an uncompressed data source using
 the TextInputFormat.class.
 But when using the LZO format the job fails:
 import com.hadoop.mapreduce.LzoTextInputFormat;
 job.setInputFormatClass(LzoTextInputFormat.class);

 Dependencies from the maven repository:
 http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/
 Also tried with elephant-bird-core 4.4

 The same data can be queried fine from within Hive(0.12) on the same
 cluster.


 The exception:
 Exception in thread main java.lang.IncompatibleClassChangeError: Found
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
  at
 com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62)
 at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
  at
 com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
  at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
  at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
  at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
  at com.cloudreach.DataQuality.Main.main(Main.java:42)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

 I believe the issue is related to the changes in Hadoop 2, but where can I
 find a H2 compatible version?

 Thanks




-- 
Regards
Gordon Wang