java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase
Hi, I try to run a Hadoop reduce-side join, then I get the following: java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at DataJoin.run(DataJoin.java:105) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at DataJoin.main(DataJoin.java:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.contrib.utils.join.DataJoinMapperBase at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) ... 23 more The command I use : hadoop jar JoinHadoop.jar DataJoin /group/asciaa/fst/input_test_join /group/asciaa/fst/out_test_join The source is from Hadoop in action, chapter5, listing 5.3. I use eclipse to export it as a jar My Hadoop is 0.19.2 Thanks! The source code: import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; //import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; //import org.apache.hadoop.mapred.KeyValueTextInputFormat; //import org.apache.hadoop.mapred.MapReduceBase; //import org.apache.hadoop.mapred.Mapper; //import org.apache.hadoop.mapred.OutputCollector; //import org.apache.hadoop.mapred.Reducer; //import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase; import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase; import org.apache.hadoop.contrib.utils.join.TaggedMapOutput; public class DataJoin extends Configured implements Tool { public static class MapClass extends DataJoinMapperBase { protected Text generateInputTag(String inputFile) { return new Text(inputFile); } protected Text generateGroupKey(TaggedMapOutput aRecord) { String line = ((Text) aRecord.getData()).toString(); String[] tokens = line.split(,); String groupKey = tokens[0]; return new Text(groupKey); } protected TaggedMapOutput generateTaggedMapOutput(Object value) { TaggedWritable retv = new TaggedWritable((Text) value); retv.setTag(this.inputTag); return retv; } } public static class Reduce extends DataJoinReducerBase { protected TaggedMapOutput combine(Object[] tags, Object[] values) { if (tags.length 2) return null; String joinedStr = ; for (int i=0; ivalues.length; i++) { if (i 0) joinedStr += ,; TaggedWritable tw = (TaggedWritable) values[i]; String line = ((Text) tw.getData()).toString(); String[] tokens = line.split(,, 2); joinedStr += tokens[1]; } TaggedWritable retv = new TaggedWritable(new
java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase
Hi, I try to run a Hadoop reduce-side join, then I get the following: java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at DataJoin.run(DataJoin.java:105) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at DataJoin.main(DataJoin.java:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.contrib.utils.join.DataJoinMapperBase at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) ... 23 more The command I use : hadoop jar JoinHadoop.jar DataJoin /group/asciaa/fst/input_test_join /group/asciaa/fst/out_test_join The source is from Hadoop in action, chapter5, listing 5.3. I use eclipse to export it as a jar My Hadoop is 0.19.2 Thanks! The source code: import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; //import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; //import org.apache.hadoop.mapred.KeyValueTextInputFormat; //import org.apache.hadoop.mapred.MapReduceBase; //import org.apache.hadoop.mapred.Mapper; //import org.apache.hadoop.mapred.OutputCollector; //import org.apache.hadoop.mapred.Reducer; //import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase; import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase; import org.apache.hadoop.contrib.utils.join.TaggedMapOutput; public class DataJoin extends Configured implements Tool { public static class MapClass extends DataJoinMapperBase { protected Text generateInputTag(String inputFile) { return new Text(inputFile); } protected Text generateGroupKey(TaggedMapOutput aRecord) { String line = ((Text) aRecord.getData()).toString(); String[] tokens = line.split(,); String groupKey = tokens[0]; return new Text(groupKey); } protected TaggedMapOutput generateTaggedMapOutput(Object value) { TaggedWritable retv = new TaggedWritable((Text) value); retv.setTag(this.inputTag); return retv; } } public static class Reduce extends DataJoinReducerBase { protected TaggedMapOutput combine(Object[] tags, Object[] values) { if (tags.length 2) return null; String joinedStr = ; for (int i=0; ivalues.length; i++) { if (i 0) joinedStr += ,; TaggedWritable tw = (TaggedWritable) values[i]; String line = ((Text) tw.getData()).toString(); String[] tokens = line.split(,, 2); joinedStr += tokens[1]; } TaggedWritable retv = new TaggedWritable(new
Re: Splitting data input to Distcp
On 3 May 2012, at 23:47, Himanshu Vijay wrote: Pedro, Thanks for the response. Unfortunately I am running it on in-house cluster and from there I need to upload to S3. Hi, Last night I was thinking about this... what happens if you copy s3://region.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar to your cluster and run hadoop jar s3distcp.jar --src hdfs:///path/to/files --dest s3://bucket/path --outputCodec lzo (or what have you) ? Alternatively, you could run the following Pig or Hive jobs (using output compression): --- pig --- local_data = load '/path/to/files' as ( ... ); store local_data into 's3://bucket/path' using ...; --- hive --- create external table foo ( ... ) [row format ... | serde] location '/path/to/files'; create external table s3_foo ( ... ) [row format ... | serde] location 's3://bucket/path'; insert overwrite table s3_foo select * from foo; Obviously an equivalent Native or Streaming job is trivial to write, too. Cheers, Pedro Figueiredo Skype: pfig.89clouds http://89clouds.com/ - Big Data Consulting
java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase
Hi, I try to run a Hadoop reduce-side join, then I get the following: java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at DataJoin.run(DataJoin.java:105) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at DataJoin.main(DataJoin.java:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.contrib.utils.join.DataJoinMapperBase at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) ... 23 more What's the problem? The command I use : hadoop jar JoinHadoop.jar DataJoin /group/asciaa/fst/input_test_join /group/asciaa/fst/out_test_join The source is from *Hadoop in action*, chapter5, listing 5.3. I use eclipse to export it as a jar My Hadoop is 0.19.2 Thanks! The source code: import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; //import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; //import org.apache.hadoop.mapred.KeyValueTextInputFormat; //import org.apache.hadoop.mapred.MapReduceBase; //import org.apache.hadoop.mapred.Mapper; //import org.apache.hadoop.mapred.OutputCollector; //import org.apache.hadoop.mapred.Reducer; //import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase; import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase; import org.apache.hadoop.contrib.utils.join.TaggedMapOutput; public class DataJoin extends Configured implements Tool { public static class MapClass extends DataJoinMapperBase { protected Text generateInputTag(String inputFile) { return new Text(inputFile); } protected Text generateGroupKey(TaggedMapOutput aRecord) { String line = ((Text) aRecord.getData()).toString(); String[] tokens = line.split(,); String groupKey = tokens[0]; return new Text(groupKey); } protected TaggedMapOutput generateTaggedMapOutput(Object value) { TaggedWritable retv = new TaggedWritable((Text) value); retv.setTag(this.inputTag); return retv; } } public static class Reduce extends DataJoinReducerBase { protected TaggedMapOutput combine(Object[] tags, Object[] values) { if (tags.length 2) return null; String joinedStr = ; for (int i=0; ivalues.length; i++) { if (i 0) joinedStr += ,; TaggedWritable tw = (TaggedWritable) values[i]; String line = ((Text) tw.getData()).toString(); String[] tokens = line.split(,, 2); joinedStr += tokens[1]; } TaggedWritable retv = new
Re: java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase
is any other error log, check 1. JoinHadoop.jar collectly submit to hadoop 2. DataJoinMapperBase really in the JoinHadoop.jar 2012/5/4 唐方爽 fstang...@gmail.com Hi, I try to run a Hadoop reduce-side join, then I get the following: java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at DataJoin.run(DataJoin.java:105) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at DataJoin.main(DataJoin.java:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.contrib.utils.join.DataJoinMapperBase at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) ... 23 more What's the problem? The command I use : hadoop jar JoinHadoop.jar DataJoin /group/asciaa/fst/input_test_join /group/asciaa/fst/out_test_join The source is from *Hadoop in action*, chapter5, listing 5.3. I use eclipse to export it as a jar My Hadoop is 0.19.2 Thanks! The source code: import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; //import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; //import org.apache.hadoop.mapred.KeyValueTextInputFormat; //import org.apache.hadoop.mapred.MapReduceBase; //import org.apache.hadoop.mapred.Mapper; //import org.apache.hadoop.mapred.OutputCollector; //import org.apache.hadoop.mapred.Reducer; //import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase; import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase; import org.apache.hadoop.contrib.utils.join.TaggedMapOutput; public class DataJoin extends Configured implements Tool { public static class MapClass extends DataJoinMapperBase { protected Text generateInputTag(String inputFile) { return new Text(inputFile); } protected Text generateGroupKey(TaggedMapOutput aRecord) { String line = ((Text) aRecord.getData()).toString(); String[] tokens = line.split(,); String groupKey = tokens[0]; return new Text(groupKey); } protected TaggedMapOutput generateTaggedMapOutput(Object value) { TaggedWritable retv = new TaggedWritable((Text) value); retv.setTag(this.inputTag); return retv; } } public static class Reduce extends DataJoinReducerBase { protected TaggedMapOutput combine(Object[] tags, Object[] values) { if (tags.length 2) return null; String joinedStr = ; for (int i=0; ivalues.length; i++) { if (i 0) joinedStr += ,; TaggedWritable tw = (TaggedWritable) values[i];
Re: java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase
DataJoinMapperBase is not in the JoinHadoop.jar. When I add it and related classes to JoinHadoop.jar, it works! (although I got an IOException at reduce stage... maybe I should check the code or input files) thanks! 2012/5/4 JunYong Li lij...@gmail.com is any other error log, check 1. JoinHadoop.jar collectly submit to hadoop 2. DataJoinMapperBase really in the JoinHadoop.jar 2012/5/4 唐方爽 fstang...@gmail.com Hi, I try to run a Hadoop reduce-side join, then I get the following: java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at DataJoin.run(DataJoin.java:105) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at DataJoin.main(DataJoin.java:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.contrib.utils.join.DataJoinMapperBase at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) ... 23 more What's the problem? The command I use : hadoop jar JoinHadoop.jar DataJoin /group/asciaa/fst/input_test_join /group/asciaa/fst/out_test_join The source is from *Hadoop in action*, chapter5, listing 5.3. I use eclipse to export it as a jar My Hadoop is 0.19.2 Thanks! The source code: import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; //import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; //import org.apache.hadoop.mapred.KeyValueTextInputFormat; //import org.apache.hadoop.mapred.MapReduceBase; //import org.apache.hadoop.mapred.Mapper; //import org.apache.hadoop.mapred.OutputCollector; //import org.apache.hadoop.mapred.Reducer; //import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase; import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase; import org.apache.hadoop.contrib.utils.join.TaggedMapOutput; public class DataJoin extends Configured implements Tool { public static class MapClass extends DataJoinMapperBase { protected Text generateInputTag(String inputFile) { return new Text(inputFile); } protected Text generateGroupKey(TaggedMapOutput aRecord) { String line = ((Text) aRecord.getData()).toString(); String[] tokens = line.split(,); String groupKey = tokens[0]; return new Text(groupKey); } protected TaggedMapOutput generateTaggedMapOutput(Object value) { TaggedWritable retv = new TaggedWritable((Text) value); retv.setTag(this.inputTag); return retv; } } public
How to create an archive-file in Java to distribute a MapFile via Distributed Cache
Hello, I have written a chain of map-reduce jobs which creates a Mapfile. I want to use the Mapfile in a proximate map-reduce job via distributed cache. Therefore I have to create an archive file of the folder with holds the /data and /index files. In the documentation and in the Book Hadoop the definite guide there are only examples how this is done on the command line. Is this possible in HDFS via the Hadoop Java Api, too? P.S.: To distribute the files separately is not a solution. They would go in different temporary folders. Thanks in advance Christian
Re: How to create an archive-file in Java to distribute a MapFile via Distributed Cache
Hi, The Java API offers a DistributedCache class which lets you do this. The usage is detailed at http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html On Fri, May 4, 2012 at 5:11 PM, i...@christianherta.de i...@christianherta.de wrote: Hello, I have written a chain of map-reduce jobs which creates a Mapfile. I want to use the Mapfile in a proximate map-reduce job via distributed cache. Therefore I have to create an archive file of the folder with holds the /data and /index files. In the documentation and in the Book Hadoop the definite guide there are only examples how this is done on the command line. Is this possible in HDFS via the Hadoop Java Api, too? P.S.: To distribute the files separately is not a solution. They would go in different temporary folders. Thanks in advance Christian -- Harsh J
Re: How to create an archive-file in Java to distribute a MapFile via Distributed Cache
My humble experience: I would prefer specifying the files in command line using -files option, then treat them explicitly in the Mapper configure or setup function using File f1 = new File(file1name); File f2 = new File(file2name); Cause I am not 100% sure how does distributed cached determine the order of paths (archives) stored in the array. I once messed up at this point so from then on I stick on the old method.
Bad connect ack with firstBadLink
Hi, We are running a three node cluster . From two days whenever we copy file to hdfs , it is throwing java.IO.Exception Bad connect ack with firstBadLink . I searched in net, but not able to resolve the issue. The following is the stack trace from datanode log 2012-05-04 18:08:08,868 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-7520371350112346377_50118 received exception java.net.SocketException: Connection reset 2012-05-04 18:08:08,869 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 172.23.208.17:50010, storageID=DS-1340171424-172.23.208.17-50010-1334672673051, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:662) It will be great if some one can point to the direction how to solve this problem. -- https://github.com/zinnia-phatak-dev/Nectar
Re: Bad connect ack with firstBadLink
Please see: http://hbase.apache.org/book.html#dfs.datanode.max.xcievers On Fri, May 4, 2012 at 5:46 AM, madhu phatak phatak@gmail.com wrote: Hi, We are running a three node cluster . From two days whenever we copy file to hdfs , it is throwing java.IO.Exception Bad connect ack with firstBadLink . I searched in net, but not able to resolve the issue. The following is the stack trace from datanode log 2012-05-04 18:08:08,868 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-7520371350112346377_50118 received exception java.net.SocketException: Connection reset 2012-05-04 18:08:08,869 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 172.23.208.17:50010, storageID=DS-1340171424-172.23.208.17-50010-1334672673051, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:662) It will be great if some one can point to the direction how to solve this problem. -- https://github.com/zinnia-phatak-dev/Nectar
Re: Reduce Hangs at 66%
Well That was one of the things I had asked. ulimit -a says it all. But you have to do this for the users... hdfs, mapred, and hadoop (Which is why I asked about which flavor.) On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote: Keith What is the the output for ulimit -n? Your value for number of open files is probably too low. Raj From: Keith Thompson kthom...@binghamton.edu To: common-user@hadoop.apache.org Sent: Thursday, May 3, 2012 4:33 PM Subject: Re: Reduce Hangs at 66% I am not sure about ulimits, but I can answer the rest. It's a Cloudera distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step, I am taking keys in the form of (gridID, date), each with a value of 1. The reduce step just sums the 1's as the final output value for the key (It's counting how many people were in the gridID on a certain day). I have been running other more complicated jobs with no problem, so I'm not sure why this one is being peculiar. This is the code I used to execute the program from the command line (the source is a file on the hdfs): hadoop jar jarfile driver source /thompson/outputDensity/density1 The program then executes the map and gets to 66% of the reduce, then stops responding. The cause of the error seems to be: Error from attempt_201202240659_6432_r_00_1: java.io.IOException: The temporary job-output directory hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary doesn't exist! I don't understand what the _temporary is. I am assuming it's something Hadoop creates automatically. On Thu, May 3, 2012 at 5:02 AM, Michel Segel michael_se...@hotmail.comwrote: Well... Lots of information but still missing some of the basics... Which release and version? What are your ulimits set to? How much free disk space do you have? What are you attempting to do? Stuff like that. Sent from a remote device. Please excuse any typos... Mike Segel On May 2, 2012, at 4:49 PM, Keith Thompson kthom...@binghamton.edu wrote: I am running a task which gets to 66% of the Reduce step and then hangs indefinitely. Here is the log file (I apologize if I am putting too much here but I am not exactly sure what is relevant): 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip task_201202240659_6433_r_00, for tracker 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515' 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201202240659_6433_m_01_0' has completed task_201202240659_6433_m_01 successfully. 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_00_0: Task attempt_201202240659_6432_r_00_0 failed to report status for 1800 seconds. Killing! 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_00_0' 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to tip task_201202240659_6432_r_00, for tracker 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204' 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_00_0' 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6432_r_00_1' to tip task_201202240659_6432_r_00, for tracker 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117' 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_00_1: java.io.IOException: The temporary job-output directory hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_00_1' 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6432_r_00_2' to tip task_201202240659_6432_r_00, for tracker
Re: Bad connect ack with firstBadLink
Check your number of blocks in the cluster. This also indicates that your datanodes are more full than they should be. Try deleting unnecessary blocks. On Fri, May 4, 2012 at 7:40 AM, Mohit Anchlia mohitanch...@gmail.comwrote: Please see: http://hbase.apache.org/book.html#dfs.datanode.max.xcievers On Fri, May 4, 2012 at 5:46 AM, madhu phatak phatak@gmail.com wrote: Hi, We are running a three node cluster . From two days whenever we copy file to hdfs , it is throwing java.IO.Exception Bad connect ack with firstBadLink . I searched in net, but not able to resolve the issue. The following is the stack trace from datanode log 2012-05-04 18:08:08,868 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-7520371350112346377_50118 received exception java.net.SocketException: Connection reset 2012-05-04 18:08:08,869 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 172.23.208.17:50010, storageID=DS-1340171424-172.23.208.17-50010-1334672673051, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:662) It will be great if some one can point to the direction how to solve this problem. -- https://github.com/zinnia-phatak-dev/Nectar
Re: Reduce Hangs at 66%
Thanks everyone for your help. It is running fine now. On Fri, May 4, 2012 at 11:22 AM, Michael Segel michael_se...@hotmail.comwrote: Well That was one of the things I had asked. ulimit -a says it all. But you have to do this for the users... hdfs, mapred, and hadoop (Which is why I asked about which flavor.) On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote: Keith What is the the output for ulimit -n? Your value for number of open files is probably too low. Raj From: Keith Thompson kthom...@binghamton.edu To: common-user@hadoop.apache.org Sent: Thursday, May 3, 2012 4:33 PM Subject: Re: Reduce Hangs at 66% I am not sure about ulimits, but I can answer the rest. It's a Cloudera distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step, I am taking keys in the form of (gridID, date), each with a value of 1. The reduce step just sums the 1's as the final output value for the key (It's counting how many people were in the gridID on a certain day). I have been running other more complicated jobs with no problem, so I'm not sure why this one is being peculiar. This is the code I used to execute the program from the command line (the source is a file on the hdfs): hadoop jar jarfile driver source /thompson/outputDensity/density1 The program then executes the map and gets to 66% of the reduce, then stops responding. The cause of the error seems to be: Error from attempt_201202240659_6432_r_00_1: java.io.IOException: The temporary job-output directory hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary doesn't exist! I don't understand what the _temporary is. I am assuming it's something Hadoop creates automatically. On Thu, May 3, 2012 at 5:02 AM, Michel Segel michael_se...@hotmail.com wrote: Well... Lots of information but still missing some of the basics... Which release and version? What are your ulimits set to? How much free disk space do you have? What are you attempting to do? Stuff like that. Sent from a remote device. Please excuse any typos... Mike Segel On May 2, 2012, at 4:49 PM, Keith Thompson kthom...@binghamton.edu wrote: I am running a task which gets to 66% of the Reduce step and then hangs indefinitely. Here is the log file (I apologize if I am putting too much here but I am not exactly sure what is relevant): 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip task_201202240659_6433_r_00, for tracker 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515' 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201202240659_6433_m_01_0' has completed task_201202240659_6433_m_01 successfully. 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_00_0: Task attempt_201202240659_6432_r_00_0 failed to report status for 1800 seconds. Killing! 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_00_0' 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to tip task_201202240659_6432_r_00, for tracker 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204' 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_00_0' 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6432_r_00_1' to tip task_201202240659_6432_r_00, for tracker 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117' 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_00_1: java.io.IOException: The temporary job-output directory hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
Fwd: May 21 talk at Pasadena JUG
(apologies for cross posting) Hey Folks in the SoCal area -- if you're around on May 21st, I'll be speaking at the Pasadena JUG on Apache OODT, Big Data and likely Apache Hadoop (in prep for my Hadoop Summit coming talk). Info is below thanks to David Noble for setting this up! Cheers, Chris Begin forwarded message: The announcement is up on the Meetup site and the Pasadena JUG website, and has been sent to mailing lists for the Pasadena JUG, LA JUG, and OC JUG. If you invite people, please do encourage them to RSVP on the Meetup site. It's useful to make sure we have enough food, but also to make sure we set up the right room. Last month's talk on Mule MongoDB had 55 people RSVP (and probably more attend) and we had to bump up to a larger room than usual. Fortunately Idealab is equipped for that size group :-) http://www.meetup.com/pasadenajug/ http://www.pasadenajug.org/ I'll follow up with the Apache lists in the next day or so, unless you beat me to it. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.govmailto:chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Need to improve documentation for v 0.23.x ( v 2.x)
Hello All, As Apache Hadoop community is ready to release the next 2.0 alpha version of Hadoop , i would like to bring attention towards need to make better documentation of the tutorials and examples for the same. Just one short example See the Single Node Setup tutorials for v 1.xhttp://hadoop.apache.org/common/docs/r1.0.2/single_node_setup.htmland v 0.23http://hadoop.apache.org/common/docs/r0.23.1/hadoop-yarn/hadoop-yarn-site/SingleCluster.html, you can say 0.23 author is in hurry with keeping all things in assumption that reader knows everything what and where to do. We should spend some time on documentation , with so many beautiful features coming it would be great if you guys plan some special hackathon meetings to improve its documentation , code examples so that people can understand how to use them effectively. At present only two people can understand 0.23 , those who wrote the code and the other one is java compiler who is verifying its code :) *Tom White , *I request if you are reading this message , please pick-up your pen again to write Hadoop Definitive Guide edition 4th dedicated to next release for greater benefit of community. Thanks
Re: How to add debugging to map- red code
Hi Harsh, Could you show one sample of how to do this ? I have not seen/written any mapper code where people use log4j logger or log4j file to set the log level. Thanks in advance -JJ On Thu, May 3, 2012 at 4:32 PM, Harsh J ha...@cloudera.com wrote: Doing (ii) would be an isolated app-level config and wouldn't get affected by the toggling of (i). The feature from (i) is available already in CDH 4.0.0-b2 btw. On Fri, May 4, 2012 at 4:58 AM, Mapred Learn mapred.le...@gmail.com wrote: Hi Harsh, Does doing (ii) mess up with hadoop (i) level ? Or does it happen in both the options anyways ? Thanks, -JJ On Fri, Apr 20, 2012 at 8:28 AM, Harsh J ha...@cloudera.com wrote: Yes this is possible, and there's two ways to do this. 1. Use a distro/release that carries the https://issues.apache.org/jira/browse/MAPREDUCE-336 fix. This will let you avoid work (see 2, which is same as your idea) 2. Configure your implementation's logger object's level in the setup/setConf methods of the task, by looking at some conf prop to decide the level. This will work just as well - and will also avoid changing Hadoop's own Child log levels, unlike the (1) method. On Fri, Apr 20, 2012 at 8:47 PM, Mapred Learn mapred.le...@gmail.com wrote: Hi, I m trying to find out best way to add debugging in map- red code. I have System.out.println() statements that I keep on commenting and uncommenting so as not to increase stdout size But problem is anytime I need debug, I Hv to re-compile. If there a way, I can define log levels using log4j in map-red code and define log level as conf option ? Thanks, JJ Sent from my iPhone -- Harsh J -- Harsh J
Re: How to add debugging to map- red code
here is a sample code from log4j documentation if you want to specify a specific file where you want to write the log .. you can have a log4j properties file and add it to the classpath import com.foo.Bar; // Import log4j classes. *import org.apache.log4j.Logger; import org.apache.log4j.BasicConfigurator;* public class MyApp { // Define a static logger variable so that it references the // Logger instance named MyApp. *static* Logger logger = *Logger.getLogger(MyApp.class);* public static void main(String[] args) { // Set up a simple configuration that logs on the console. *BasicConfigurator.configure();* logger.info(Entering application.); Bar bar = new Bar(); bar.doIt(); logger.info(Exiting application.); } } On Sat, May 5, 2012 at 3:40 AM, Mapred Learn mapred.le...@gmail.com wrote: Hi Harsh, Could you show one sample of how to do this ? I have not seen/written any mapper code where people use log4j logger or log4j file to set the log level. Thanks in advance -JJ On Thu, May 3, 2012 at 4:32 PM, Harsh J ha...@cloudera.com wrote: Doing (ii) would be an isolated app-level config and wouldn't get affected by the toggling of (i). The feature from (i) is available already in CDH 4.0.0-b2 btw. On Fri, May 4, 2012 at 4:58 AM, Mapred Learn mapred.le...@gmail.com wrote: Hi Harsh, Does doing (ii) mess up with hadoop (i) level ? Or does it happen in both the options anyways ? Thanks, -JJ On Fri, Apr 20, 2012 at 8:28 AM, Harsh J ha...@cloudera.com wrote: Yes this is possible, and there's two ways to do this. 1. Use a distro/release that carries the https://issues.apache.org/jira/browse/MAPREDUCE-336 fix. This will let you avoid work (see 2, which is same as your idea) 2. Configure your implementation's logger object's level in the setup/setConf methods of the task, by looking at some conf prop to decide the level. This will work just as well - and will also avoid changing Hadoop's own Child log levels, unlike the (1) method. On Fri, Apr 20, 2012 at 8:47 PM, Mapred Learn mapred.le...@gmail.com wrote: Hi, I m trying to find out best way to add debugging in map- red code. I have System.out.println() statements that I keep on commenting and uncommenting so as not to increase stdout size But problem is anytime I need debug, I Hv to re-compile. If there a way, I can define log levels using log4j in map-red code and define log level as conf option ? Thanks, JJ Sent from my iPhone -- Harsh J -- Harsh J -- Nitin Pawar
Re: How to add debugging to map- red code
Thanks Nitin but I was asking in context to mapper code.. Sent from my iPhone On May 4, 2012, at 9:06 PM, Nitin Pawar nitinpawar...@gmail.com wrote: here is a sample code from log4j documentation if you want to specify a specific file where you want to write the log .. you can have a log4j properties file and add it to the classpath import com.foo.Bar; // Import log4j classes. *import org.apache.log4j.Logger; import org.apache.log4j.BasicConfigurator;* public class MyApp { // Define a static logger variable so that it references the // Logger instance named MyApp. *static* Logger logger = *Logger.getLogger(MyApp.class);* public static void main(String[] args) { // Set up a simple configuration that logs on the console. *BasicConfigurator.configure();* logger.info(Entering application.); Bar bar = new Bar(); bar.doIt(); logger.info(Exiting application.); } } On Sat, May 5, 2012 at 3:40 AM, Mapred Learn mapred.le...@gmail.com wrote: Hi Harsh, Could you show one sample of how to do this ? I have not seen/written any mapper code where people use log4j logger or log4j file to set the log level. Thanks in advance -JJ On Thu, May 3, 2012 at 4:32 PM, Harsh J ha...@cloudera.com wrote: Doing (ii) would be an isolated app-level config and wouldn't get affected by the toggling of (i). The feature from (i) is available already in CDH 4.0.0-b2 btw. On Fri, May 4, 2012 at 4:58 AM, Mapred Learn mapred.le...@gmail.com wrote: Hi Harsh, Does doing (ii) mess up with hadoop (i) level ? Or does it happen in both the options anyways ? Thanks, -JJ On Fri, Apr 20, 2012 at 8:28 AM, Harsh J ha...@cloudera.com wrote: Yes this is possible, and there's two ways to do this. 1. Use a distro/release that carries the https://issues.apache.org/jira/browse/MAPREDUCE-336 fix. This will let you avoid work (see 2, which is same as your idea) 2. Configure your implementation's logger object's level in the setup/setConf methods of the task, by looking at some conf prop to decide the level. This will work just as well - and will also avoid changing Hadoop's own Child log levels, unlike the (1) method. On Fri, Apr 20, 2012 at 8:47 PM, Mapred Learn mapred.le...@gmail.com wrote: Hi, I m trying to find out best way to add debugging in map- red code. I have System.out.println() statements that I keep on commenting and uncommenting so as not to increase stdout size But problem is anytime I need debug, I Hv to re-compile. If there a way, I can define log levels using log4j in map-red code and define log level as conf option ? Thanks, JJ Sent from my iPhone -- Harsh J -- Harsh J -- Nitin Pawar