Re: Hadoop cluster monitoring
Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying technology and has much better UI etc. hth, Arun On Mon, Apr 14, 2014 at 9:08 PM, Shashidhar Rao raoshashidhar...@gmail.comwrote: Hi, Can somebody please help me in clarifying how hadoop cluster is monitored and profiled in real production environment. What are the tools and links if any. I heard Ganglia and HPROF. For HPROF , can somebody share some experience of how to configure to use HPROF to use with Hadoop Regards Shashi -- -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Hadoop cluster monitoring
Thanks Arun Murthy On Tue, Apr 15, 2014 at 11:32 AM, Arun Murthy a...@hortonworks.com wrote: Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying technology and has much better UI etc. hth, Arun On Mon, Apr 14, 2014 at 9:08 PM, Shashidhar Rao raoshashidhar...@gmail.com wrote: Hi, Can somebody please help me in clarifying how hadoop cluster is monitored and profiled in real production environment. What are the tools and links if any. I heard Ganglia and HPROF. For HPROF , can somebody share some experience of how to configure to use HPROF to use with Hadoop Regards Shashi -- -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Hadoop NoClassDefFoundError
Hello EveryOne: I am new to hadoop,and i am reading Hadoop in action.When i tried to run a demo from this book,I got a problem and could not find answer from the net. Can you help me on this ? below is the error info : $ hadoop jar myjob.jar MyJob input outputException in thread main java.lang.NoClassDefFoundError: MyJob (wrong name: myjob/MyJob) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) and this is the command that i compile the .java , I compiled in Win7 and ran on ubuntu . below is MyJob.java package myjob; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.KeyValueTextInputFormat; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class MyJob extends Configured implements Tool{ @Override public int run(String[] args) throws Exception { Configuration conf = getConf(); JobConf job = new JobConf(conf,MyJob.class); Path in = new Path(args[0]); Path out = new Path(args[1]); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out); job.setJobName(MyJob); job.setJarByClass(MyJob.class); job.setMapperClass(MapClass.class); job.setReducerClass(Reduce.class); job.setInputFormat(KeyValueTextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.set(key.value.separator.in.input.line,,); JobClient.runJob(job); return 0; } public static class MapClass extends MapReduceBase implements MapperText,Text,Text,Text{ @Override public void map(Text key, Text value, OutputCollectorText, Text output, Reporter reporter) throws IOException { output.collect(value, key); } } public static class Reduce extends MapReduceBase implements ReducerText,Text,Text,Text{ @Override public void reduce(Text key, IteratorText values, OutputCollectorText, Text output, Reporter reporter) throws IOException { String csv = ; while(values.hasNext()){ if(csv.length() 0) csv += ,; csv += values.next().toString(); } } } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new MyJob(), args); System.exit(res); } } Thank you for your kindly help ! 2014-04-15_150135.png
Re: Offline image viewer - account for edits ?
I think you are right because the the offline image viewer only takes the fsimage file as input. On Tue, Apr 15, 2014 at 9:29 AM, Manoj Samel manojsamelt...@gmail.comwrote: Hi, Is it correct to say that the offline image viewer does not accounts for any edits that are not yet merged into the fsimage? Thanks, -- Cheers -MJ
Re: Hadoop NoClassDefFoundError
Please use: hadoop jar myjob.jar myjob.MyJob input output On Tue, Apr 15, 2014 at 3:06 PM, laozh...@sina.cn laozh...@sina.cn wrote: Hello EveryOne: I am new to hadoop,and i am reading Hadoop in action. When i tried to run a demo from this book,I got a problem and could not find answer from the net. Can you help me on this ? below is the error info : $ hadoop jar myjob.jar MyJob input output Exception in thread main java.lang.NoClassDefFoundError: MyJob (wrong name: myjob/MyJob) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) and this is the command that i compile the .java , I compiled in Win7 and ran on ubuntu . below is MyJob.java package myjob; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.KeyValueTextInputFormat; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class MyJob extends Configured implements Tool{ @Override public int run(String[] args) throws Exception { Configuration conf = getConf(); JobConf job = new JobConf(conf,MyJob.class); Path in = new Path(args[0]); Path out = new Path(args[1]); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out); job.setJobName(MyJob); job.setJarByClass(MyJob.class); job.setMapperClass(MapClass.class); job.setReducerClass(Reduce.class); job.setInputFormat(KeyValueTextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.set(key.value.separator.in.input.line,,); JobClient.runJob(job); return 0; } public static class MapClass extends MapReduceBase implements MapperText,Text,Text,Text{ @Override public void map(Text key, Text value, OutputCollectorText, Text output, Reporter reporter) throws IOException { output.collect(value, key); } } public static class Reduce extends MapReduceBase implements ReducerText,Text,Text,Text{ @Override public void reduce(Text key, IteratorText values, OutputCollectorText, Text output, Reporter reporter) throws IOException { String csv = ; while(values.hasNext()){ if(csv.length() 0) csv += ,; csv += values.next().toString(); } } } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new MyJob(), args); System.exit(res); } } -- Thank you for your kindly help ! inline: 2014-04-15_150135.png
Re: Hadoop NoClassDefFoundError
Please use: hadoop jar myjob.jar myjob.MyJob input output On Tue, Apr 15, 2014 at 3:06 PM, laozh...@sina.cn laozh...@sina.cn wrote: Hello EveryOne: I am new to hadoop,and i am reading Hadoop in action. When i tried to run a demo from this book,I got a problem and could not find answer from the net. Can you help me on this ? below is the error info : $ hadoop jar myjob.jar MyJob input output Exception in thread main java.lang.NoClassDefFoundError: MyJob (wrong name: myjob/MyJob) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) and this is the command that i compile the .java , I compiled in Win7 and ran on ubuntu . below is MyJob.java package myjob; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.KeyValueTextInputFormat; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class MyJob extends Configured implements Tool{ @Override public int run(String[] args) throws Exception { Configuration conf = getConf(); JobConf job = new JobConf(conf,MyJob.class); Path in = new Path(args[0]); Path out = new Path(args[1]); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out); job.setJobName(MyJob); job.setJarByClass(MyJob.class); job.setMapperClass(MapClass.class); job.setReducerClass(Reduce.class); job.setInputFormat(KeyValueTextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.set(key.value.separator.in.input.line,,); JobClient.runJob(job); return 0; } public static class MapClass extends MapReduceBase implements MapperText,Text,Text,Text{ @Override public void map(Text key, Text value, OutputCollectorText, Text output, Reporter reporter) throws IOException { output.collect(value, key); } } public static class Reduce extends MapReduceBase implements ReducerText,Text,Text,Text{ @Override public void reduce(Text key, IteratorText values, OutputCollectorText, Text output, Reporter reporter) throws IOException { String csv = ; while(values.hasNext()){ if(csv.length() 0) csv += ,; csv += values.next().toString(); } } } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new MyJob(), args); System.exit(res); } } -- Thank you for your kindly help ! inline: 2014-04-15_150135.png
About Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode
Hi, I'm trying to setup Hadoop(version 2.2.0) on Windows(32-bit) with cygwin(version 1.7.5). I export JAVA_HOME=/cygdrive/c/Java/jdk1.7.0_51 in hadoop-env.sh and the classpath is /home/Administrator/hadoop-2.2.0/etc/hadoop: /home/Administrator/hadoop-2.2.0/share/hadoop/common/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/common/*: /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs: /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/*: /home/Administrator/hadoop-2.2.0/share/hadoop/yarn/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/yarn/*: /home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/*: /contrib/capacity-scheduler/*.jar when I execute bin/hdfs namenode -format, I get Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode Anybody know why? Thanks!
Re: About Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode
try to use bin/hadoop classpath to check whether the classpath is what you set On Tue, Apr 15, 2014 at 4:16 PM, Anacristing 99403...@qq.com wrote: Hi, I'm trying to setup Hadoop(version 2.2.0) on Windows(32-bit) with cygwin(version 1.7.5). I export JAVA_HOME=/cygdrive/c/Java/jdk1.7.0_51 in hadoop-env.sh and the classpath is /home/Administrator/hadoop-2.2.0/etc/hadoop: /home/Administrator/hadoop-2.2.0/share/hadoop/common/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/common/*: /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs: /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/*: /home/Administrator/hadoop-2.2.0/share/hadoop/yarn/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/yarn/*: /home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/*: /contrib/capacity-scheduler/*.jar when I execute bin/hdfs namenode -format, I get Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode Anybody know why? Thanks! -- Regards Shengjun
hadoop eclipse plugin compile path
Trying to use the below command to generate hadoop eclipse plugin, but seem the directory =/usr/local/hadoop-2.2.0 not correct. I just used the ambari to installed the hadoop. $ant jar -Dversion=2.2.0 -Declipse.home=/usr/local/eclipse -Dhadoop.home=/usr/local/hadoop-2.2.0 error log BUILD FAILED /usr/local/hadoop2x-eclipse-plugin/src/contrib/eclipse-plugin/build.xml:76: /usr/local/hadoop-2.2.0/share/hadoop/mapreduce does not exist. May I know where is the install path for hadoop. Any suggestion, thanks.
Re: Setting debug log level for individual daemons
Put the following line in the log4j setting file. log4j.logger.org.apache.hadoop.yarn.server.resourcemanager=DEBUG,console On Tue, Apr 15, 2014 at 8:33 AM, Ashwin Shankar ashwinshanka...@gmail.comwrote: Hi, How do we set log level to debug for lets say only Resource manager and not the other hadoop daemons ? -- Thanks, Ashwin -- Regards Gordon Wang
Re: Re: Hadoop NoClassDefFoundError
Thank you for your advice . When i user your command , i get the below error info .$ hadoop jar myjob.jar myjob.MyJob input outputException in thread main java.lang.ClassNotFoundException: myjob.MyJob at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) From: Azuryy YuDate: 2014-04-15 16:14To: user@hadoop.apache.orgSubject: Re: Hadoop NoClassDefFoundErrorPlease use: hadoop jar myjob.jar myjob.MyJob input output On Tue, Apr 15, 2014 at 3:06 PM, laozh...@sina.cn laozh...@sina.cn wrote: Hello EveryOne: I am new to hadoop,and i am reading Hadoop in action.When i tried to run a demo from this book,I got a problem and could not find answer from the net. Can you help me on this ? below is the error info : $ hadoop jar myjob.jar MyJob input output Exception in thread main java.lang.NoClassDefFoundError: MyJob (wrong name: myjob/MyJob) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) and this is the command that i compile the .java , I compiled in Win7 and ran on ubuntu . below is MyJob.java package myjob; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.KeyValueTextInputFormat; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class MyJob extends Configured implements Tool{ @Override public int run(String[] args) throws Exception { Configuration conf = getConf(); JobConf job = new JobConf(conf,MyJob.class); Path in = new Path(args[0]); Path out = new Path(args[1]); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out); job.setJobName(MyJob); job.setJarByClass(MyJob.class); job.setMapperClass(MapClass.class); job.setReducerClass(Reduce.class); job.setInputFormat(KeyValueTextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.set(key.value.separator.in.input.line,,); JobClient.runJob(job); return 0; } public static class MapClass extends MapReduceBase implements MapperText,Text,Text,Text{ @Override public void map(Text key, Text value, OutputCollectorText, Text output, Reporter reporter) throws IOException { output.collect(value, key); } } public static class Reduce extends MapReduceBase implements ReducerText,Text,Text,Text{ @Override public void reduce(Text key, IteratorText values, OutputCollectorText, Text output, Reporter reporter) throws IOException { String csv = ; while(values.hasNext()){ if(csv.length() 0)
memoryjava.lang.OutOfMemoryError related with number of reducer?
I can fix this by changing heap size. But what confuse me is that when i change the reducer number from 24 to 84, there's no this error. Any insight on this? Thanks Lei Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306) at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51) leiwang...@gmail.com
unsubscribe
Pls unsubscribe me. Thx. 在 2013-3-16,上午3:03,kishore raju hadoop1...@gmail.com 写道: HI, We are having an issue where multiple Task Trackers are running out of memory. I have collected HeapDump on those TaskTrackers to analyze further. They are currently running with 1GB Heap. we are planning to bump it to 2GB, Is there a way that we can find which Job is causing this OOM on TT's ? Any help is appreciated. -Thanks kishore
Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work with provided the data is distributed evenly between them - in this case about one third of the original work. It is eessentially the same thing as increasing the heap size - it's just distributed between more reducers. /th On Tue, 2014-04-15 at 20:41 +0800, leiwang...@gmail.com wrote: I can fix this by changing heap size. But what confuse me is that when i change the reducer number from 24 to 84, there's no this error. Any insight on this? Thanks Lei Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306) at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51) __ leiwang...@gmail.com
Re: HDFS file system size issue
Hi Rahman, These are few lines from hadoop fsck / -blocks -files -locations /mnt/hadoop/hive/warehouse/user.db/table1/000255_0 44323326 bytes, 1 block(s): OK 0. blk_-7919979022650423857_446500 len=44323326 repl=3 [ip1:50010, ip2:50010, ip3:50010] /mnt/hadoop/hive/warehouse/user.db/table1/000256_0 44566965 bytes, 1 block(s): OK 0. blk_-576894812882540_446288 len=44566965 repl=3 [ip1:50010, ip2:50010, ip4:50010] Biswa may have guessed replication factor from fsck summary that I posted earlier. I am posting it again for today's run: Status: HEALTHY Total size:58143055251 B Total dirs:307 Total files: 5093 Total blocks (validated): 3903 (avg. block size 14897016 B) Minimally replicated blocks: 3903 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 92 (2.357161 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:2 Average block replication: 3.1401486 Corrupt blocks:0 Missing replicas: 92 (0.75065273 %) Number of data-nodes: 9 Number of racks: 1 FSCK ended at Tue Apr 15 13:20:25 UTC 2014 in 655 milliseconds The filesystem under path '/' is HEALTHY I have not overridden dfs.datanode.du.reserved. It defaults to 0. $ less $HADOOP_HOME/conf/hdfs-site.xml |grep -A3 'dfs.datanode.du.reserved' $ less $HADOOP_HOME/src/hdfs/hdfs-default.xml |grep -A3 'dfs.datanode.du.reserved' namedfs.datanode.du.reserved/name value0/value descriptionReserved space in bytes per volume. Always leave this much space free for non dfs use. /description Below is du -h on every node. FYI, my dfs.data.dir is /mnt/hadoop/dfs/data and all hadoop/hive logs are dumped in /mnt/logs in various directories. All machines have 400GB for /mnt. $for i in `echo $dfs_slaves`; do ssh $i 'du -sh /mnt/hadoop; du -sh /mnt/hadoop/dfs/data; du -sh /mnt/logs;'; done 225G/mnt/hadoop 224G/mnt/hadoop/dfs/data 61M /mnt/logs 281G/mnt/hadoop 281G/mnt/hadoop/dfs/data 63M /mnt/logs 139G/mnt/hadoop 139G/mnt/hadoop/dfs/data 68M /mnt/logs 135G/mnt/hadoop 134G/mnt/hadoop/dfs/data 92M /mnt/logs 165G/mnt/hadoop 164G/mnt/hadoop/dfs/data 75M /mnt/logs 137G/mnt/hadoop 137G/mnt/hadoop/dfs/data 95M /mnt/logs 160G/mnt/hadoop 160G/mnt/hadoop/dfs/data 74M /mnt/logs 180G/mnt/hadoop 122G/mnt/hadoop/dfs/data 23M /mnt/logs 139G/mnt/hadoop 138G/mnt/hadoop/dfs/data 76M /mnt/logs All these numbers are for today, and may differ bit from yesterday. Today hadoop dfs -dus is 58GB and namenode is reporting DFS Used as 1.46TB. Pardon me for making the mail dirty by lot of copy-pastes, hope it's still readable, -- Saumitra S. Shahapure On Tue, Apr 15, 2014 at 2:57 AM, Abdelrahman Shettia ashet...@hortonworks.com wrote: Hi Biswa, Are you sure that the replication factor of the files are three? Please run a ‘hadoop fsck / -blocks -files -locations’ and see the replication factor for each file. Also, Post the configuration of namedfs.datanode. du.reserved/name and please check the real space presented by a DataNode by running ‘du -h’ Thanks, Rahman On Apr 14, 2014, at 2:07 PM, Saumitra saumitra.offic...@gmail.com wrote: Hello, Biswanath, looks like we have confusion in calculation, 1TB would be equal to 1024GB, not 114GB. Sandeep, I checked log directory size as well. Log directories are hardly in few GBs, I have configured log4j properties so that logs won’t be too large. In our slave machines, we have 450GB disk partition for hadoop logs and DFS. Over there logs directory is 10GBs and rest space is occupied by DFS. 10GB partition is for /. Let me quote my confusion point once again: Basically I wanted to point out discrepancy in name node status page and hadoop dfs -dus. In my case, earlier one reports DFS usage as 1TB and later one reports it to be 35GB. What are the factors that can cause this difference? And why is just 35GB data causing DFS to hit its limits? I am talking about name node status page on 50070 port. Here is the screenshot of my name node status page Screen Shot 2014-04-15 at 2.07.19 am.png As I understand, 'DFS used’ is the space taken by DFS, non-DFS used is spaces taken by non-DFS data like logs or other local files from users. Namenode shows that DFS used is ~1TB but hadoop dfs -dus shows it to be ~38GB. On 14-Apr-2014, at 12:33 pm, Sandeep Nemuri nhsande...@gmail.com wrote: Please check your logs directory usage. On Mon, Apr 14, 2014 at 12:08 PM, Biswajit Nayak biswajit.na...@inmobi.com wrote: Whats the replication factor you have? I believe it should be 3. hadoop dus shows that disk usage without replication. While name node ui page gives with replication. 38gb * 3 =114gb ~ 1TB ~Biswa -oThe important thing is not to stop questioning o- On Mon, Apr 14, 2014 at 9:38 AM, Saumitra
Re: Offline image viewer - account for edits ?
If you want to parse the edits, please use the Offline Edits Viewer. http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html Thanks, Akira (2014/04/15 16:41), Mingjiang Shi wrote: I think you are right because the the offline image viewer only takes the fsimage file as input. On Tue, Apr 15, 2014 at 9:29 AM, Manoj Samel manojsamelt...@gmail.com mailto:manojsamelt...@gmail.com wrote: Hi, Is it correct to say that the offline image viewer does not accounts for any edits that are not yet merged into the fsimage? Thanks, -- Cheers -MJ
Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks Thomas. Anohter question. I have no idea what is Failed to merge in memory. Does the 'merge' is the shuffle phase in reducer side? Why it is in memory? Except the two methods(increase reducer number and increase heap size), is there any other alternatives to fix this issue? Thanks a lot. leiwang...@gmail.com From: Thomas Bentsen Date: 2014-04-15 21:53 To: user Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer? When you increase the number of reducers they each have less to work with provided the data is distributed evenly between them - in this case about one third of the original work. It is eessentially the same thing as increasing the heap size - it's just distributed between more reducers. /th On Tue, 2014-04-15 at 20:41 +0800, leiwang...@gmail.com wrote: I can fix this by changing heap size. But what confuse me is that when i change the reducer number from 24 to 84, there's no this error. Any insight on this? Thanks Lei Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306) at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51) __ leiwang...@gmail.com
Re: Update interval of default counters
Moved to user@hadoop.apache.org. You can configure the interval by setting mapreduce.client.progressmonitor.pollinterval parameter. The default value is 1000 ms. For more details, please see http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml. Regards, Akira (2014/04/15 15:29), Dharmesh Kakadia wrote: Hi, What is the update interval of inbuilt framework counters? Is that configurable? I am trying to collect very fine grained information about the job execution and using counters for that. It would be great if someone can point me to documentation/code for it. Thanks in advance. Thanks, Dharmesh
RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
Lei A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort From: leiwang...@gmail.com [mailto:leiwang...@gmail.com] Sent: Tuesday, April 15, 2014 9:29 AM To: user; th Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer? Thanks Thomas. Anohter question. I have no idea what is Failed to merge in memory. Does the 'merge' is the shuffle phase in reducer side? Why it is in memory? Except the two methods(increase reducer number and increase heap size), is there any other alternatives to fix this issue? Thanks a lot. _ leiwang...@gmail.com From: Thomas Bentsen mailto:t...@bentzn.com Date: 2014-04-15 21:53 To: user mailto:user@hadoop.apache.org Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer? When you increase the number of reducers they each have less to work with provided the data is distributed evenly between them - in this case about one third of the original work. It is eessentially the same thing as increasing the heap size - it's just distributed between more reducers. /th On Tue, 2014-04-15 at 20:41 +0800, leiwang...@gmail.com wrote: I can fix this by changing heap size. But what confuse me is that when i change the reducer number from 24 to 84, there's no this error. Any insight on this? Thanks Lei Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306) at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51) __ leiwang...@gmail.com
Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?
Thanks, let me take a careful look at it. leiwang...@gmail.com From: German Florez-Larrahondo Date: 2014-04-15 23:27 To: user; 'th' Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer? Lei A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort From: leiwang...@gmail.com [mailto:leiwang...@gmail.com] Sent: Tuesday, April 15, 2014 9:29 AM To: user; th Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer? Thanks Thomas. Anohter question. I have no idea what is Failed to merge in memory. Does the 'merge' is the shuffle phase in reducer side? Why it is in memory? Except the two methods(increase reducer number and increase heap size), is there any other alternatives to fix this issue? Thanks a lot. leiwang...@gmail.com From: Thomas Bentsen Date: 2014-04-15 21:53 To: user Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer? When you increase the number of reducers they each have less to work with provided the data is distributed evenly between them - in this case about one third of the original work. It is eessentially the same thing as increasing the heap size - it's just distributed between more reducers. /th On Tue, 2014-04-15 at 20:41 +0800, leiwang...@gmail.com wrote: I can fix this by changing heap size. But what confuse me is that when i change the reducer number from 24 to 84, there's no this error. Any insight on this? Thanks Lei Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306) at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51) __ leiwang...@gmail.com
Re: Offline image viewer - account for edits ?
So, is it correct to say that if one wants to get the latest state of the Name node, the information from imageviewer and from edits viewer has to be combined somehow ? Thanks, On Tue, Apr 15, 2014 at 7:26 AM, Akira AJISAKA ajisa...@oss.nttdata.co.jpwrote: If you want to parse the edits, please use the Offline Edits Viewer. http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/ hadoop-hdfs/HdfsEditsViewer.html Thanks, Akira (2014/04/15 16:41), Mingjiang Shi wrote: I think you are right because the the offline image viewer only takes the fsimage file as input. On Tue, Apr 15, 2014 at 9:29 AM, Manoj Samel manojsamelt...@gmail.com mailto:manojsamelt...@gmail.com wrote: Hi, Is it correct to say that the offline image viewer does not accounts for any edits that are not yet merged into the fsimage? Thanks, -- Cheers -MJ
Re: Setting debug log level for individual daemons
Thanks Gordon and Stanley, but this would require us to bounce the process. Is there a way to change log levels without bouncing the process ? On Tue, Apr 15, 2014 at 3:23 AM, Gordon Wang gw...@gopivotal.com wrote: Put the following line in the log4j setting file. log4j.logger.org.apache.hadoop.yarn.server.resourcemanager=DEBUG,console On Tue, Apr 15, 2014 at 8:33 AM, Ashwin Shankar ashwinshanka...@gmail.com wrote: Hi, How do we set log level to debug for lets say only Resource manager and not the other hadoop daemons ? -- Thanks, Ashwin -- Regards Gordon Wang -- Thanks, Ashwin
Find the task and it's datanode which is taking the most time in a cluster
Hi, Can somebody please help me how to find the task and the datanode in a large cluster which has failed or which is taking the most time to execute considering thousands of mappers and reducers are running. Regards Shashi
Compiling from Source
I’m using the guide at http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html to try to compile the native Hadoop libraries because I’m running a 64 bit OS and it keeps complaining that the native libraries can’t be found. After running the third command (mvn clean install assembly:assembly -Pnative) I get the output shown in the gist at https://gist.github.com/anonymous/dd8e1833d09b48bdb813 I’m installing Hadoop 2.4.0 on CentOS 6.5 64-bit. The operating system is a clean install and is running nothing but Hadoop. Where should I go from here? There are so many packages used in Hadoop that I have no idea where to begin, and Maven gives no indication of what the actual error is.
Compiling from Source
I’m using the guide at http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html to try to compile the native Hadoop libraries because I’m running a 64 bit OS and it keeps complaining that the native libraries can’t be found. After running the third command (mvn clean install assembly:assembly -Pnative) I get the output shown in the gist at https://gist.github.com/anonymous/dd8e1833d09b48bdb813 I’m installing Hadoop 2.4.0 on CentOS 6.5 64-bit. The operating system is a clean install and is running nothing but Hadoop. Where should I go from here? There are so many packages used in Hadoop that I have no idea where to begin, and Maven gives no indication of what the actual error is.
Warning: $HADOOP_HOME is deprecated
Hello All, I have configured Apache Hadoop 1.2.0 and set the $HADOOP_HOME env. variable: I keep getting :Warning: $HADOOP_HOME is deprecated Solution:(After googling)I replaced HADOOP_HOME with HADOOP_PREFIX and the warning disappeared. Does that mean HADOOP_HOME is replaced by HADOOP_PREFIX? If Yes from which version did this got changed? I tried googling but could get the release no. Is HADOOP_PREFIX the correct env. variable that should be used for all latest Apache Hadoop releases including Apache Hadoop 2 (YARN)? Thanks,-RR
Re: HDFS file system size issue
Hi Saumitra, It looks like the over replicated blocks root cause is not the issue that the cluster is experiencing. I can only think of miss configuring the dfs.data.dir parameter. Can you ensure that each one of the data directories is using only one partition(mount) and there is no other data directory sharing the same partition(mount)? The role should be one data directory per partition(mount). Also, please check inside the dfs.data.dir for a third party files/directories. Hope this helps. Thanks -Rahman On Tue, Apr 15, 2014 at 6:54 AM, Saumitra Shahapure saumitra.offic...@gmail.com wrote: Hi Rahman, These are few lines from hadoop fsck / -blocks -files -locations /mnt/hadoop/hive/warehouse/user.db/table1/000255_0 44323326 bytes, 1 block(s): OK 0. blk_-7919979022650423857_446500 len=44323326 repl=3 [ip1:50010, ip2:50010, ip3:50010] /mnt/hadoop/hive/warehouse/user.db/table1/000256_0 44566965 bytes, 1 block(s): OK 0. blk_-576894812882540_446288 len=44566965 repl=3 [ip1:50010, ip2:50010, ip4:50010] Biswa may have guessed replication factor from fsck summary that I posted earlier. I am posting it again for today's run: Status: HEALTHY Total size:58143055251 B Total dirs:307 Total files: 5093 Total blocks (validated): 3903 (avg. block size 14897016 B) Minimally replicated blocks: 3903 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 92 (2.357161 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:2 Average block replication: 3.1401486 Corrupt blocks:0 Missing replicas: 92 (0.75065273 %) Number of data-nodes: 9 Number of racks: 1 FSCK ended at Tue Apr 15 13:20:25 UTC 2014 in 655 milliseconds The filesystem under path '/' is HEALTHY I have not overridden dfs.datanode.du.reserved. It defaults to 0. $ less $HADOOP_HOME/conf/hdfs-site.xml |grep -A3 'dfs.datanode.du.reserved' $ less $HADOOP_HOME/src/hdfs/hdfs-default.xml |grep -A3 'dfs.datanode.du.reserved' namedfs.datanode.du.reserved/name value0/value descriptionReserved space in bytes per volume. Always leave this much space free for non dfs use. /description Below is du -h on every node. FYI, my dfs.data.dir is /mnt/hadoop/dfs/data and all hadoop/hive logs are dumped in /mnt/logs in various directories. All machines have 400GB for /mnt. $for i in `echo $dfs_slaves`; do ssh $i 'du -sh /mnt/hadoop; du -sh /mnt/hadoop/dfs/data; du -sh /mnt/logs;'; done 225G/mnt/hadoop 224G/mnt/hadoop/dfs/data 61M /mnt/logs 281G/mnt/hadoop 281G/mnt/hadoop/dfs/data 63M /mnt/logs 139G/mnt/hadoop 139G/mnt/hadoop/dfs/data 68M /mnt/logs 135G/mnt/hadoop 134G/mnt/hadoop/dfs/data 92M /mnt/logs 165G/mnt/hadoop 164G/mnt/hadoop/dfs/data 75M /mnt/logs 137G/mnt/hadoop 137G/mnt/hadoop/dfs/data 95M /mnt/logs 160G/mnt/hadoop 160G/mnt/hadoop/dfs/data 74M /mnt/logs 180G/mnt/hadoop 122G/mnt/hadoop/dfs/data 23M /mnt/logs 139G/mnt/hadoop 138G/mnt/hadoop/dfs/data 76M /mnt/logs All these numbers are for today, and may differ bit from yesterday. Today hadoop dfs -dus is 58GB and namenode is reporting DFS Used as 1.46TB. Pardon me for making the mail dirty by lot of copy-pastes, hope it's still readable, -- Saumitra S. Shahapure On Tue, Apr 15, 2014 at 2:57 AM, Abdelrahman Shettia ashet...@hortonworks.com wrote: Hi Biswa, Are you sure that the replication factor of the files are three? Please run a 'hadoop fsck / -blocks -files -locations' and see the replication factor for each file. Also, Post the configuration of namedfs.datanode. du.reserved/name and please check the real space presented by a DataNode by running 'du -h' Thanks, Rahman On Apr 14, 2014, at 2:07 PM, Saumitra saumitra.offic...@gmail.com wrote: Hello, Biswanath, looks like we have confusion in calculation, 1TB would be equal to 1024GB, not 114GB. Sandeep, I checked log directory size as well. Log directories are hardly in few GBs, I have configured log4j properties so that logs won't be too large. In our slave machines, we have 450GB disk partition for hadoop logs and DFS. Over there logs directory is 10GBs and rest space is occupied by DFS. 10GB partition is for /. Let me quote my confusion point once again: Basically I wanted to point out discrepancy in name node status page and hadoop dfs -dus. In my case, earlier one reports DFS usage as 1TB and later one reports it to be 35GB. What are the factors that can cause this difference? And why is just 35GB data causing DFS to hit its limits? I am talking about name node status page on 50070 port. Here is the screenshot of my name node status page Screen Shot 2014-04-15 at 2.07.19 am.png As I understand, 'DFS used' is the space taken by DFS, non-DFS used is spaces
Re: Find the task and it's datanode which is taking the most time in a cluster
Hi Shashi, I am assuming that you are running hadoop 1.x. There is an option to see the failed tasks on the Job tracker UI. Please replace the jobtracker host with the actual host and click on the following link and look for the task failure. http://[jobtrackerhost]:50030/machines.jsp?type=active Thanks -Rahman On Tue, Apr 15, 2014 at 11:11 AM, Shashidhar Rao raoshashidhar...@gmail.com wrote: Hi, Can somebody please help me how to find the task and the datanode in a large cluster which has failed or which is taking the most time to execute considering thousands of mappers and reducers are running. Regards Shashi -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Apache Hadoop 2.x installation *environment variables*
Hello All, For Apache Hadoop 2.x (YARN) installation which *environment variables* are REALLY needed. By referring to various blogs I am getting a mix: HADOOP_COMMON_HOMEHADOOP_CONF_DIRHADOOP_HDFS_HOMEHADOOP_HOMEHADOOP_MAPRED_HOMEHADOOP_PREFIXYARN_HOME HADOOP_COMMON_HOMEHADOOP_CONF_DIRHADOOP_HDFS_HOMEHADOOP_PREFIXHADOOP_MAPRED_HOMEHADOOP_YARN_HOMEYARN_CONF_DIRMAPRED_CONF_DIRYARN_CLASSPATH HADOOP_COMMON_HOMEHADOOP_CONF_DIRHADOOP_HDFS_HOMEHADOOP_HOMEHADOOP_MAPRED_HOMEHADOOP_YARN_HOME HADOOP_COMMON_HOMEHADOOP_CONF_DIRHADOOP_HDFS_HOMEHADOOP_HOMEHADOOP_MAPRED_HOMEYARN_HOME From Apache Hadoop Site: $HADOOP_COMMON_HOME $HADOOP_CONF_DIR$HADOOP_HDFS_HOME$HADOO_MAPRED_HOME $HADOOP_YARN_HOME$YARN_CONF_DIR the same as $HADOOP_CONF_DIR Thanks,-RR
RE: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
Thanks John for your comments, I believe MRv2 has support for both the old *mapred* APIs and new *mapreduce* APIs. I see this way:[1.] One may have binaries i.e. jar file of the M\R program that used old *mapred* APIsThis will work directly on MRv2(YARN). [2.] One may have the source code i.e. Java Programs of the M\R program that used old *mapred* APIsFor this I need to recompile and generate the binaries i.e. jar file. Do I have to change the old *org.apache.hadoop.mapred* APIs to new *org.apache.hadoop.mapreduce* APIs? or No code changes are needed? -RR Date: Mon, 14 Apr 2014 10:37:56 -0400 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: john.meag...@gmail.com To: user@hadoop.apache.org Also, Source Compatibility also means ONLY a recompile is needed. No code changes should be needed. On Mon, Apr 14, 2014 at 10:37 AM, John Meagher john.meag...@gmail.com wrote: Source Compatibility = you need to recompile and use the new version as part of the compilation Binary Compatibility = you can take something compiled against the old version and run it on the new version On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe radhe.krishna.ra...@live.com wrote: Hello People, As per the Apache site http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html Binary Compatibility First, we ensure binary compatibility to the applications that use old mapred APIs. This means that applications which were built against MRv1 mapred APIs can run directly on YARN without recompilation, merely by pointing them to an Apache Hadoop 2.x cluster via configuration. Source Compatibility We cannot ensure complete binary compatibility with the applications that use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, we ensure source compatibility for mapreduce APIs that break binary compatibility. In other words, users should recompile their applications that use mapreduce APIs against MRv2 jars. One notable binary incompatibility break is Counter and CounterGroup. For Binary Compatibility I understand that if I had build a MR job with old *mapred* APIs then they can be run directly on YARN without and changes. Can anybody explain what do we mean by Source Compatibility here and also a use case where one will need it? Does that mean code changes if I already have a MR job source code written with with old *mapred* APIs and I need to make some changes to it to run in then I need to use the new mapreduce* API and generate the new binaries? Thanks, -RR
Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
1. If you have the binaries that were compiled against MRv1 *mapred* libs, it should just work with MRv2. 2. If you have the source code that refers to MRv1 *mapred* libs, it should be compilable without code changes. Of course, you're free to change your code. 3. If you have the binaries that were compiled against MRv1 *mapreduce* libs, it may not be executable directly with MRv2, but you should able to compile it against MRv2 *mapreduce* libs without code changes, and execute it. - Zhijie On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe radhe.krishna.ra...@live.comwrote: Thanks John for your comments, I believe MRv2 has support for both the old *mapred* APIs and new *mapreduce* APIs. I see this way: [1.] One may have binaries i.e. jar file of the M\R program that used old *mapred* APIs This will work directly on MRv2(YARN). [2.] One may have the source code i.e. Java Programs of the M\R program that used old *mapred* APIs For this I need to recompile and generate the binaries i.e. jar file. Do I have to change the old *org.apache.hadoop.mapred* APIs to new * org.apache.hadoop.mapreduce* APIs? or No code changes are needed? -RR Date: Mon, 14 Apr 2014 10:37:56 -0400 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: john.meag...@gmail.com To: user@hadoop.apache.org Also, Source Compatibility also means ONLY a recompile is needed. No code changes should be needed. On Mon, Apr 14, 2014 at 10:37 AM, John Meagher john.meag...@gmail.com wrote: Source Compatibility = you need to recompile and use the new version as part of the compilation Binary Compatibility = you can take something compiled against the old version and run it on the new version On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe radhe.krishna.ra...@live.com wrote: Hello People, As per the Apache site http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html Binary Compatibility First, we ensure binary compatibility to the applications that use old mapred APIs. This means that applications which were built against MRv1 mapred APIs can run directly on YARN without recompilation, merely by pointing them to an Apache Hadoop 2.x cluster via configuration. Source Compatibility We cannot ensure complete binary compatibility with the applications that use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, we ensure source compatibility for mapreduce APIs that break binary compatibility. In other words, users should recompile their applications that use mapreduce APIs against MRv2 jars. One notable binary incompatibility break is Counter and CounterGroup. For Binary Compatibility I understand that if I had build a MR job with old *mapred* APIs then they can be run directly on YARN without and changes. Can anybody explain what do we mean by Source Compatibility here and also a use case where one will need it? Does that mean code changes if I already have a MR job source code written with with old *mapred* APIs and I need to make some changes to it to run in then I need to use the new mapreduce* API and generate the new binaries? Thanks, -RR -- Zhijie Shen Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
Thanks Zhijie for the explanation. Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build against old MRv1 mapred APIS) then how can I compile it since I don't have the source code i.e. Java files. All I can do with binaries i.e. jar file is execute it. -RR Date: Tue, 15 Apr 2014 13:03:53 -0700 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: zs...@hortonworks.com To: user@hadoop.apache.org 1. If you have the binaries that were compiled against MRv1 mapred libs, it should just work with MRv2.2. If you have the source code that refers to MRv1 mapred libs, it should be compilable without code changes. Of course, you're free to change your code. 3. If you have the binaries that were compiled against MRv1 mapreduce libs, it may not be executable directly with MRv2, but you should able to compile it against MRv2 mapreduce libs without code changes, and execute it. - Zhijie On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe radhe.krishna.ra...@live.com wrote: Thanks John for your comments, I believe MRv2 has support for both the old *mapred* APIs and new *mapreduce* APIs. I see this way:[1.] One may have binaries i.e. jar file of the M\R program that used old *mapred* APIs This will work directly on MRv2(YARN). [2.] One may have the source code i.e. Java Programs of the M\R program that used old *mapred* APIs For this I need to recompile and generate the binaries i.e. jar file. Do I have to change the old *org.apache.hadoop.mapred* APIs to new *org.apache.hadoop.mapreduce* APIs? or No code changes are needed? -RR Date: Mon, 14 Apr 2014 10:37:56 -0400 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: john.meag...@gmail.com To: user@hadoop.apache.org Also, Source Compatibility also means ONLY a recompile is needed. No code changes should be needed. On Mon, Apr 14, 2014 at 10:37 AM, John Meagher john.meag...@gmail.com wrote: Source Compatibility = you need to recompile and use the new version as part of the compilation Binary Compatibility = you can take something compiled against the old version and run it on the new version On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe radhe.krishna.ra...@live.com wrote: Hello People, As per the Apache site http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html Binary Compatibility First, we ensure binary compatibility to the applications that use old mapred APIs. This means that applications which were built against MRv1 mapred APIs can run directly on YARN without recompilation, merely by pointing them to an Apache Hadoop 2.x cluster via configuration. Source Compatibility We cannot ensure complete binary compatibility with the applications that use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, we ensure source compatibility for mapreduce APIs that break binary compatibility. In other words, users should recompile their applications that use mapreduce APIs against MRv2 jars. One notable binary incompatibility break is Counter and CounterGroup. For Binary Compatibility I understand that if I had build a MR job with old *mapred* APIs then they can be run directly on YARN without and changes. Can anybody explain what do we mean by Source Compatibility here and also a use case where one will need it? Does that mean code changes if I already have a MR job source code written with with old *mapred* APIs and I need to make some changes to it to run in then I need to use the new mapreduce* API and generate the new binaries? Thanks, -RR -- Zhijie ShenHortonworks Inc.http://hortonworks.com/ CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
bq. Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build against old MRv1 mapred APIS) Which APIs are you talking about, *mapred* or *mapreduce*? In #3, I was saying about *mapreduce*. If this is the case, you may be in the trouble unfortunately, because MRv2 has evolved so much in *mapreduce *APIs that it's difficult to ensure binary compatibility. Anyway, you should still try your luck, as your binaries may not use the incompatible APIs. On the other hand, if you meant *mapred* APIs instead, you binaries should just work. - Zhijie On Tue, Apr 15, 2014 at 1:35 PM, Radhe Radhe radhe.krishna.ra...@live.comwrote: Thanks Zhijie for the explanation. Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build against old MRv1 *mapred* APIS) then how can I compile it since I don't have the source code i.e. Java files. All I can do with binaries i.e. jar file is execute it. -RR -- Date: Tue, 15 Apr 2014 13:03:53 -0700 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: zs...@hortonworks.com To: user@hadoop.apache.org 1. If you have the binaries that were compiled against MRv1 *mapred*libs, it should just work with MRv2. 2. If you have the source code that refers to MRv1 *mapred* libs, it should be compilable without code changes. Of course, you're free to change your code. 3. If you have the binaries that were compiled against MRv1 *mapreduce* libs, it may not be executable directly with MRv2, but you should able to compile it against MRv2 *mapreduce* libs without code changes, and execute it. - Zhijie On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe radhe.krishna.ra...@live.com wrote: Thanks John for your comments, I believe MRv2 has support for both the old *mapred* APIs and new *mapreduce* APIs. I see this way: [1.] One may have binaries i.e. jar file of the M\R program that used old *mapred* APIs This will work directly on MRv2(YARN). [2.] One may have the source code i.e. Java Programs of the M\R program that used old *mapred* APIs For this I need to recompile and generate the binaries i.e. jar file. Do I have to change the old *org.apache.hadoop.mapred* APIs to new * org.apache.hadoop.mapreduce* APIs? or No code changes are needed? -RR Date: Mon, 14 Apr 2014 10:37:56 -0400 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: john.meag...@gmail.com To: user@hadoop.apache.org Also, Source Compatibility also means ONLY a recompile is needed. No code changes should be needed. On Mon, Apr 14, 2014 at 10:37 AM, John Meagher john.meag...@gmail.com wrote: Source Compatibility = you need to recompile and use the new version as part of the compilation Binary Compatibility = you can take something compiled against the old version and run it on the new version On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe radhe.krishna.ra...@live.com wrote: Hello People, As per the Apache site http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html Binary Compatibility First, we ensure binary compatibility to the applications that use old mapred APIs. This means that applications which were built against MRv1 mapred APIs can run directly on YARN without recompilation, merely by pointing them to an Apache Hadoop 2.x cluster via configuration. Source Compatibility We cannot ensure complete binary compatibility with the applications that use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, we ensure source compatibility for mapreduce APIs that break binary compatibility. In other words, users should recompile their applications that use mapreduce APIs against MRv2 jars. One notable binary incompatibility break is Counter and CounterGroup. For Binary Compatibility I understand that if I had build a MR job with old *mapred* APIs then they can be run directly on YARN without and changes. Can anybody explain what do we mean by Source Compatibility here and also a use case where one will need it? Does that mean code changes if I already have a MR job source code written with with old *mapred* APIs and I need to make some changes to it to run in then I need to use the new mapreduce* API and generate the new binaries? Thanks, -RR -- Zhijie Shen Hortonworks Inc. http://hortonworks.com/ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended
Re: About Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode
It's the same -- Original -- From: Shengjun Xin;s...@gopivotal.com; Date: Tue, Apr 15, 2014 04:43 PM To: useruser@hadoop.apache.org; Subject: Re: About Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode try to use bin/hadoop classpath to check whether the classpath is what you set On Tue, Apr 15, 2014 at 4:16 PM, Anacristing 99403...@qq.com wrote: Hi, I'm trying to setup Hadoop(version 2.2.0) on Windows(32-bit) with cygwin(version 1.7.5). I export JAVA_HOME=/cygdrive/c/Java/jdk1.7.0_51 in hadoop-env.sh and the classpath is /home/Administrator/hadoop-2.2.0/etc/hadoop: /home/Administrator/hadoop-2.2.0/share/hadoop/common/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/common/*: /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs: /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/*: /home/Administrator/hadoop-2.2.0/share/hadoop/yarn/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/yarn/*: /home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/lib/*: /home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/*: /contrib/capacity-scheduler/*.jar when I execute bin/hdfs namenode -format, I get Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode Anybody know why? Thanks! -- Regards Shengjun
Re: Compiling from Source
I think you can use the command 'mvn package -Pnative,dist -DskipTests' in source code root directory to build the binaries On Wed, Apr 16, 2014 at 2:31 AM, Justin Mrkva m...@justinmrkva.com wrote: I'm using the guide at http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.htmlto try to compile the native Hadoop libraries because I'm running a 64 bit OS and it keeps complaining that the native libraries can't be found. After running the third command (mvn clean install assembly:assembly -Pnative) I get the output shown in the gist at https://gist.github.com/anonymous/dd8e1833d09b48bdb813 I'm installing Hadoop 2.4.0 on CentOS 6.5 64-bit. The operating system is a clean install and is running nothing but Hadoop. Where should I go from here? There are so many packages used in Hadoop that I have no idea where to begin, and Maven gives no indication of what the actual error is. -- Regards Shengjun
Re: Offline image viewer - account for edits ?
Yes, I think you are right. (2014/04/16 1:20), Manoj Samel wrote: So, is it correct to say that if one wants to get the latest state of the Name node, the information from imageviewer and from edits viewer has to be combined somehow ? Thanks, On Tue, Apr 15, 2014 at 7:26 AM, Akira AJISAKA ajisa...@oss.nttdata.co.jp mailto:ajisa...@oss.nttdata.co.jp wrote: If you want to parse the edits, please use the Offline Edits Viewer. http://hadoop.apache.org/docs/__r2.4.0/hadoop-project-dist/__hadoop-hdfs/HdfsEditsViewer.__html http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html Thanks, Akira (2014/04/15 16:41), Mingjiang Shi wrote: I think you are right because the the offline image viewer only takes the fsimage file as input. On Tue, Apr 15, 2014 at 9:29 AM, Manoj Samel manojsamelt...@gmail.com mailto:manojsamelt...@gmail.com mailto:manojsameltech@gmail.__com mailto:manojsamelt...@gmail.com wrote: Hi, Is it correct to say that the offline image viewer does not accounts for any edits that are not yet merged into the fsimage? Thanks, -- Cheers -MJ
Re: Re: Hadoop NoClassDefFoundError
can do you an unzip -l myjob.jar to see if your jar file has the correct hierarchy? Regards, *Stanley Shi,* On Tue, Apr 15, 2014 at 6:53 PM, laozh...@sina.cn laozh...@sina.cn wrote: Thank you for your advice . When i user your command , i get the below error info . $ hadoop jar myjob.jar myjob.MyJob input output Exception in thread main java.lang.ClassNotFoundException: myjob.MyJob at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) -- *From:* Azuryy Yu azury...@gmail.com *Date:* 2014-04-15 16:14 *To:* user@hadoop.apache.org *Subject:* Re: Hadoop NoClassDefFoundError Please use: hadoop jar myjob.jar myjob.MyJob input output On Tue, Apr 15, 2014 at 3:06 PM, laozh...@sina.cn laozh...@sina.cnwrote: Hello EveryOne: I am new to hadoop,and i am reading Hadoop in action. When i tried to run a demo from this book,I got a problem and could not find answer from the net. Can you help me on this ? below is the error info : $ hadoop jar myjob.jar MyJob input output Exception in thread main java.lang.NoClassDefFoundError: MyJob (wrong name: myjob/MyJob) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) and this is the command that i compile the .java , I compiled in Win7 and ran on ubuntu . below is MyJob.java package myjob; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.KeyValueTextInputFormat; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class MyJob extends Configured implements Tool{ @Override public int run(String[] args) throws Exception { Configuration conf = getConf(); JobConf job = new JobConf(conf,MyJob.class); Path in = new Path(args[0]); Path out = new Path(args[1]); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out); job.setJobName(MyJob); job.setJarByClass(MyJob.class); job.setMapperClass(MapClass.class); job.setReducerClass(Reduce.class); job.setInputFormat(KeyValueTextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.set(key.value.separator.in.input.line,,); JobClient.runJob(job); return 0; } public static class MapClass extends MapReduceBase implements MapperText,Text,Text,Text{ @Override public void map(Text key, Text value, OutputCollectorText, Text output, Reporter reporter) throws IOException { output.collect(value, key); } } public static class Reduce extends MapReduceBase implements ReducerText,Text,Text,Text{ @Override public void reduce(Text key, IteratorText values, OutputCollectorText, Text output, Reporter reporter) throws IOException { String csv = ; while(values.hasNext()){ if(csv.length() 0) csv += ,; csv += values.next().toString(); } } } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new MyJob(), args); System.exit(res); } } -- Thank you for your kindly help ! inline: 2014-04-15_15013(04-15-18-51-38).png
Re: Setting debug log level for individual daemons
Is this what you are looking for? http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/CommandsManual.html#daemonlog Regards, *Stanley Shi,* On Wed, Apr 16, 2014 at 2:06 AM, Ashwin Shankar ashwinshanka...@gmail.comwrote: Thanks Gordon and Stanley, but this would require us to bounce the process. Is there a way to change log levels without bouncing the process ? On Tue, Apr 15, 2014 at 3:23 AM, Gordon Wang gw...@gopivotal.com wrote: Put the following line in the log4j setting file. log4j.logger.org.apache.hadoop.yarn.server.resourcemanager=DEBUG,console On Tue, Apr 15, 2014 at 8:33 AM, Ashwin Shankar ashwinshanka...@gmail.com wrote: Hi, How do we set log level to debug for lets say only Resource manager and not the other hadoop daemons ? -- Thanks, Ashwin -- Regards Gordon Wang -- Thanks, Ashwin
Re: Re: java.lang.OutOfMemoryError related with number of reducer?
Hi German Thomas, Seems i found the data that causes the error, but i still don't know the exactly reason. I just do a group with pig latin: domain_device_group = GROUP data_filter BY (custid, domain, level, device); domain_device = FOREACH domain_device_group { distinct_ip = DISTINCT data_filter.ip; distinct_userid = DISTINCT data_filter.userid; GENERATE group.custid, group.domain, group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); } STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING PigStorage('\t'); The group key (custid, domain, level, device) is significantly skewed, about 42% (58,621,533 / 138,455,355) of the records are the same key, and only the reducer which handle this key failed. But from https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort , I still have no idea why it cause an OOM. It doesn't tell how skewed key will be handled, neither how different keys in same reducer will be merged. leiwang...@gmail.com From: leiwang...@gmail.com Date: 2014-04-15 23:35 To: user; th; german.fl Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer? Thanks, let me take a careful look at it. leiwang...@gmail.com From: German Florez-Larrahondo Date: 2014-04-15 23:27 To: user; 'th' Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer? Lei A good explanation of this can be found on the Hadoop The Definitive Guide by Tom White. Here is an excerpt that explains a bit the behavior at the reduce side and some possible tweaks to control it. https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort From: leiwang...@gmail.com [mailto:leiwang...@gmail.com] Sent: Tuesday, April 15, 2014 9:29 AM To: user; th Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer? Thanks Thomas. Anohter question. I have no idea what is Failed to merge in memory. Does the 'merge' is the shuffle phase in reducer side? Why it is in memory? Except the two methods(increase reducer number and increase heap size), is there any other alternatives to fix this issue? Thanks a lot. leiwang...@gmail.com From: Thomas Bentsen Date: 2014-04-15 21:53 To: user Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer? When you increase the number of reducers they each have less to work with provided the data is distributed evenly between them - in this case about one third of the original work. It is eessentially the same thing as increasing the heap size - it's just distributed between more reducers. /th On Tue, 2014-04-15 at 20:41 +0800, leiwang...@gmail.com wrote: I can fix this by changing heap size. But what confuse me is that when i change the reducer number from 24 to 84, there's no this error. Any insight on this? Thanks Lei Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306) at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443) at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188) at