How will Hadoop handle it when a datanode server with total hardware failure?
Hi, If each of my datanode servers has 8 hard disks (a 10-node cluster) and I use the default replication factor of 3, how will Hadoop handle it when a datanode with total hardware failure suddenly? Regards Arthur
To Generate Test Data in HDFS (PDGF)
Hi, I need to generate large amount of test data (4TB) into Hadoop, has anyone used PDGF to do so? Could you share your cook book about PDGF in Hadoop (or HBase)? Many Thanks Arthur
Re: Hadoop 2.4.1 Compilation, How to specify HadoopBuildVersion and RMBuildVersion
Hi, Thank you very much for your reply ! I tried, versions:set -DnewVersion=NEWVERSION changed the strings before “from”, the strings after “from” are still set as “Unknown”. From the source, hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-version-info.properties, I found the following, any idea how or where to set version-info.scm.commit? version=${pom.version} revision=${version-info.scm.commit} branch=${version-info.scm.branch} user=${user.name} date=${version-info.build.time} url=${version-info.scm.uri} srcChecksum=${version-info.source.md5} Regards Arthur On 14 Sep, 2014, at 4:17 pm, Liu, Yi A yi.a@intel.com wrote: Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION Regards, Yi Liu From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Sunday, September 14, 2014 1:51 PM To: user@hadoop.apache.org Cc: arthur.hk.c...@gmail.com Subject: Re: Hadoop 2.4.1 Compilation, How to specify HadoopBuildVersion and RMBuildVersion (attached print screen) image001.png On 14 Sep, 2014, at 1:25 pm, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, To compile Hadoop 2.4.1 , any idea how to specify “hadoop.build.version” ? By modifying pom.xml? or add -Dhadoop.build.version=mybuild? or specify it by compile command line? Regards Arthur
Re: Hadoop 2.4.1 Compilation, How to specify HadoopBuildVersion and RMBuildVersion
Hi, Is there any document that lists all possible -D parameters that are used in Hadoop compilation? or any ideas about version-info.scm.commit? Regards Arthur On 14 Sep, 2014, at 7:07 pm, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Thank you very much for your reply ! I tried, versions:set -DnewVersion=NEWVERSION changed the strings before “from”, the strings after “from” are still set as “Unknown”. Screen Shot 2014-09-14 at 1.48.51 pm.png From the source, hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-version-info.properties, I found the following, any idea how or where to set version-info.scm.commit? version=${pom.version} revision=${version-info.scm.commit} branch=${version-info.scm.branch} user=${user.name} date=${version-info.build.time} url=${version-info.scm.uri} srcChecksum=${version-info.source.md5} Regards Arthur On 14 Sep, 2014, at 4:17 pm, Liu, Yi A yi.a@intel.com wrote: Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION Regards, Yi Liu From: arthur.hk.c...@gmail.com [mailto:arthur.hk.c...@gmail.com] Sent: Sunday, September 14, 2014 1:51 PM To: user@hadoop.apache.org Cc: arthur.hk.c...@gmail.com Subject: Re: Hadoop 2.4.1 Compilation, How to specify HadoopBuildVersion and RMBuildVersion (attached print screen) image001.png On 14 Sep, 2014, at 1:25 pm, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, To compile Hadoop 2.4.1 , any idea how to specify “hadoop.build.version” ? By modifying pom.xml? or add -Dhadoop.build.version=mybuild? or specify it by compile command line? Regards Arthur
Hadoop 2.4.1 Compilation, How to specify HadoopBuildVersion and RMBuildVersion
Hi, To compile Hadoop 2.4.1 , any idea how to specify “hadoop.build.version” ? By modifying pom.xml? or add -Dhadoop.build.version=mybuild? or specify it by compile command line? Regards Arthur
Re: Hadoop 2.4.1 Compilation, How to specify HadoopBuildVersion and RMBuildVersion
(attached print screen) On 14 Sep, 2014, at 1:25 pm, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, To compile Hadoop 2.4.1 , any idea how to specify “hadoop.build.version” ? By modifying pom.xml? or add -Dhadoop.build.version=mybuild? or specify it by compile command line? Regards Arthur
Hadoop Smoke Test
Hi, I am trying the smoke test for hadoop, “terasort”, during the Map phase, I found “Container killed by the ApplicationMast”, should I stop this job and try to run it again? or just let it continue? 14/09/11 21:27:53 INFO mapreduce.Job: map 22% reduce 0% 14/09/11 21:31:33 INFO mapreduce.Job: map 23% reduce 0% 14/09/11 21:33:45 INFO mapreduce.Job: Task Id : attempt_1409876705457_0003_m_005139_0, Status : FAILED Error: unable to create new native thread Container killed by the ApplicationMaster. Exception when trying to cleanup container container_1409876705457_0003_01_005146: java.io.IOException: Cannot run program kill: error=11, Resource temporarily unavailable at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) at org.apache.hadoop.util.Shell.runCommand(Shell.java:448) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.containerIsAlive(DefaultContainerExecutor.java:342) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.signalContainer(DefaultContainerExecutor.java:319) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:400) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:138) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:61) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=11, Resource temporarily unavailable at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) ... 11 more 14/09/11 21:33:47 INFO mapreduce.Job: Task Id : attempt_1409876705457_0003_m_005146_0, Status : FAILED Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:505) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 14/09/11 21:33:50 INFO mapreduce.Job: Task Id : attempt_1409876705457_0003_m_005147_0, Status : FAILED 14/09/11 21:37:17 INFO mapreduce.Job: map 24% reduce 0% 14/09/11 21:43:45 INFO mapreduce.Job: map 25% reduce 0% Regards Arthur
Hadoop Smoke Test: TERASORT
Hi, I am trying the smoke test for Hadoop (2.4.1). About “terasort”, below is my test command, the Map part was completed very fast because it was split into many subtasks, however the Reduce part takes very long time and only 1 running Reduce job. Is there a way speed up the reduce phase by splitting the large reduce job into many smaller ones and run them across the cluster like the Map part? bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort /tmp/teragenout /tmp/terasortout Job ID NameState Maps Total Maps Completed Reduce Total Reduce Complted job_1409876705457_0002 TeraSortRUNNING 22352 22352 1 0 Regards Arthur
org.apache.hadoop.io.compress.SnappyCodec not found
Hi, I use Hadoop 2.4.1, I got org.apache.hadoop.io.compress.SnappyCodec not found” error: hadoop checknative 14/08/29 02:54:51 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 14/08/29 02:54:51 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library Native library checking: hadoop: true /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libhadoop.so zlib: true /lib64/libz.so.1 snappy: true /mnt/hadoop/hadoop-2.4.1_snappy/lib/native/Linux-amd64-64/libsnappy.so.1 lz4:true revision:99 bzip2: false (smoke test is ok) bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen 30 /tmp/teragenout 14/08/29 07:40:41 INFO mapreduce.Job: Running job: job_1409253811850_0002 14/08/29 07:40:53 INFO mapreduce.Job: Job job_1409253811850_0002 running in uber mode : false 14/08/29 07:40:53 INFO mapreduce.Job: map 0% reduce 0% 14/08/29 07:41:00 INFO mapreduce.Job: map 50% reduce 0% 14/08/29 07:41:01 INFO mapreduce.Job: map 100% reduce 0% 14/08/29 07:41:02 INFO mapreduce.Job: Job job_1409253811850_0002 completed successfully 14/08/29 07:41:02 INFO mapreduce.Job: Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=197312 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=167 HDFS: Number of bytes written=3000 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=2 Other local map tasks=2 Total time spent by all maps in occupied slots (ms)=11925 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=11925 Total vcore-seconds taken by all map tasks=11925 Total megabyte-seconds taken by all map tasks=109900800 Map-Reduce Framework Map input records=30 Map output records=30 Input split bytes=167 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=22 CPU time spent (ms)=1910 Physical memory (bytes) snapshot=357318656 Virtual memory (bytes) snapshot=1691631616 Total committed heap usage (bytes)=401997824 org.apache.hadoop.examples.terasort.TeraGen$Counters CHECKSUM=644086318705578 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=3000 14/08/29 07:41:03 INFO terasort.TeraSort: starting 14/08/29 07:41:03 INFO input.FileInputFormat: Total input paths to process : 2 However I got org.apache.hadoop.io.compress.SnappyCodec not found” when running spark smoke test program: scala inFILE.first() java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:158) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:171) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.RDD.take(RDD.scala:983) at org.apache.spark.rdd.RDD.first(RDD.scala:1015) at $iwC$$iwC$$iwC$$iwC.init(console:15) at $iwC$$iwC$$iwC.init(console:20) at $iwC$$iwC.init(console:22) at $iwC.init(console:24) at init(console:26) at .init(console:30) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
Hadoop 2.4.1 How to clear usercache
Hi, i use Hadoop 2.4.1, in my cluster, Non DFS Used: 2.09 TB I found that these files are all under tmp/nm-local-dir/usercache Is there any Hadoop command to remove these unused user cache files tmp/nm-local-dir/usercache ? Regards Arthur
Re: Hadoop 2.4.1 Snappy Smoke Test failed
Thanks for your reply. However I think it is not about 32-bit version issue, cus my Hadoop is 64-bit as I compiled it from source. I think my way to install snappy should be wrong, Arthur On 19 Aug, 2014, at 11:53 pm, Andre Kelpe ake...@concurrentinc.com wrote: Could this be caused by the fact that hadoop no longer ships with 64bit libs? https://issues.apache.org/jira/browse/HADOOP-9911 - André On Tue, Aug 19, 2014 at 5:40 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I am trying Snappy in Hadoop 2.4.1, here are my steps: (CentOS 64-bit) 1) yum install snappy snappy-devel 2) added the following (core-site.xml) property nameio.compression.codecs/name valueorg.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec/value /property 3) mapred-site.xml property namemapreduce.admin.map.child.java.opts/name value-server -XX:NewRatio=8 -Djava.library.path=/usr/lib/hadoop/lib/native/ -Djava.net.preferIPv4Stack=true/value finaltrue/final /property property namemapreduce.admin.reduce.child.java.opts/name value-server -XX:NewRatio=8 -Djava.library.path=/usr/lib/hadoop/lib/native/ -Djava.net.preferIPv4Stack=true/value finaltrue/final /property 4) smoke test bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen 10 /tmp/teragenout I got the following warning, actually there is no any test file created in hdfs: 14/08/19 22:50:10 WARN mapred.YARNRunner: Usage of -Djava.library.path in mapreduce.admin.map.child.java.opts can cause programs to no longer function if hadoop native libraries are used. These values should be set as part of the LD_LIBRARY_PATH in the map JVM env using mapreduce.admin.user.env config settings. 14/08/19 22:50:10 WARN mapred.YARNRunner: Usage of -Djava.library.path in mapreduce.admin.reduce.child.java.opts can cause programs to no longer function if hadoop native libraries are used. These values should be set as part of the LD_LIBRARY_PATH in the reduce JVM env using mapreduce.admin.user.env config settings. Can anyone please advise how to install and enable SNAPPY in Hadoop 2.4.1? or what would be wrong? or is my new change in mapred-site.xml incorrect? Regards Arthur -- André Kelpe an...@concurrentinc.com http://concurrentinc.com
Hadoop 2.4.1 Snappy Smoke Test failed
Hi, I am trying Snappy in Hadoop 2.4.1, here are my steps: (CentOS 64-bit) 1) yum install snappy snappy-devel 2) added the following (core-site.xml) property nameio.compression.codecs/name valueorg.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec/value /property 3) mapred-site.xml property namemapreduce.admin.map.child.java.opts/name value-server -XX:NewRatio=8 -Djava.library.path=/usr/lib/hadoop/lib/native/ -Djava.net.preferIPv4Stack=true/value finaltrue/final /property property namemapreduce.admin.reduce.child.java.opts/name value-server -XX:NewRatio=8 -Djava.library.path=/usr/lib/hadoop/lib/native/ -Djava.net.preferIPv4Stack=true/value finaltrue/final /property 4) smoke test bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen 10 /tmp/teragenout I got the following warning, actually there is no any test file created in hdfs: 14/08/19 22:50:10 WARN mapred.YARNRunner: Usage of -Djava.library.path in mapreduce.admin.map.child.java.opts can cause programs to no longer function if hadoop native libraries are used. These values should be set as part of the LD_LIBRARY_PATH in the map JVM env using mapreduce.admin.user.env config settings. 14/08/19 22:50:10 WARN mapred.YARNRunner: Usage of -Djava.library.path in mapreduce.admin.reduce.child.java.opts can cause programs to no longer function if hadoop native libraries are used. These values should be set as part of the LD_LIBRARY_PATH in the reduce JVM env using mapreduce.admin.user.env config settings. Can anyone please advise how to install and enable SNAPPY in Hadoop 2.4.1? or what would be wrong? or is my new change in mapred-site.xml incorrect? Regards Arthur
Hadoop 2.4.1 Verifying Automatic Failover Failed: ResourceManager
Hi I am running Hadoop 2.4.1 with YARN HA enabled (two name nodes, NM1 and NM2). When verifying ResourceManager failover, I use “kill -9” to terminate the ResourceManager in name node 1 (NM1), if I run the the test job, it seems that the failover of ResourceManager keeps trying NM1 and NM2 non-stop. Does anyone have the idea what would be wrong about this? Thanks Regards Arthur bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar pi 5 101000 Number of Maps = 5 Samples per Map = 101000 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Starting Job 14/08/11 22:35:23 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm2 14/08/11 22:35:24 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm1 14/08/11 22:35:25 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm2 14/08/11 22:35:28 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm1 14/08/11 22:35:30 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm2 14/08/11 22:35:32 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm1 14/08/11 22:35:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm2 14/08/11 22:35:37 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm1 14/08/11 22:35:39 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm2 14/08/11 22:35:40 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm1 ….
Re: Hadoop 2.4.1 Verifying Automatic Failover Failed: ResourceManager
Hi, If I have TWO nodes for ResourceManager HA, what should be the correct steps and commands to start and stop ResourceManager in a ResourceManager HA cluster ? Unlike ./sbin/start-dfs.sh (which can start all NNs from a NN), it seems that ./sbin/start-yarn.sh can only start YARN in a node at a time. Regards Arthur On 11 Aug, 2014, at 11:04 pm, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi I am running Hadoop 2.4.1 with YARN HA enabled (two name nodes, NM1 and NM2). When verifying ResourceManager failover, I use “kill -9” to terminate the ResourceManager in name node 1 (NM1), if I run the the test job, it seems that the failover of ResourceManager keeps trying NM1 and NM2 non-stop. Does anyone have the idea what would be wrong about this? Thanks Regards Arthur bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar pi 5 101000 Number of Maps = 5 Samples per Map = 101000 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Starting Job 14/08/11 22:35:23 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm2 14/08/11 22:35:24 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm1 14/08/11 22:35:25 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm2 14/08/11 22:35:28 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm1 14/08/11 22:35:30 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm2 14/08/11 22:35:32 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm1 14/08/11 22:35:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm2 14/08/11 22:35:37 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm1 14/08/11 22:35:39 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm2 14/08/11 22:35:40 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to nm1 ….
Re: Hadoop 2.4.1 Verifying Automatic Failover Failed: ResourceManager
nameyarn.resourcemanager.webapp.address.rm2/name value192.168.1.2:23188/value /property property nameyarn.resourcemanager.resource-tracker.address.rm2/name value192.168.1.2:23125/value /property property nameyarn.resourcemanager.admin.address.rm2/name value192.168.1.2:23142/value /property property nameyarn.nodemanager.remote-app-log-dir/name value/edh/hadoop_logs/hadoop//value /property /configuration On 12 Aug, 2014, at 1:49 am, Xuan Gong xg...@hortonworks.com wrote: Hey, Arthur: Did you use single node cluster or multiple nodes cluster? Could you share your configuration file (yarn-site.xml) ? This looks like a configuration issue. Thanks Xuan Gong On Mon, Aug 11, 2014 at 9:45 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, If I have TWO nodes for ResourceManager HA, what should be the correct steps and commands to start and stop ResourceManager in a ResourceManager HA cluster ? Unlike ./sbin/start-dfs.sh (which can start all NNs from a NN), it seems that ./sbin/start-yarn.sh can only start YARN in a node at a time. Regards Arthur
Re: Hadoop 2.4.1 Verifying Automatic Failover Failed: ResourceManager
Hi, Thank y very much! At the moment if I run ./sbin/start-yarn.sh in rm1, the standby STANDBY ResourceManager in rm2 is not started accordingly. Please advise what would be wrong? Thanks Regards Arthur On 12 Aug, 2014, at 1:13 pm, Xuan Gong xg...@hortonworks.com wrote: Some questions: Q1) I need start yarn in EACH master separately, is this normal? Is there a way that I just run ./sbin/start-yarn.sh in rm1 and get the STANDBY ResourceManager in rm2 started as well? No, need to start multiple RMs separately. Q2) How to get alerts (e.g. by email) if the ACTIVE ResourceManager is down in an auto-failover env? or how do you monitor the status of ACTIVE/STANDBY ResourceManager? Interesting question. But one of the design for auto-failover is that the down-time of RM is invisible to end users. The end users can submit applications normally even if the failover happens. We can monitor the status of RMs by using the command-line (you did previously) or from webUI/webService (rm_address:portnumber/cluster/cluster). We can get the current status from there. Thanks Xuan Gong On Mon, Aug 11, 2014 at 5:12 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, it is a multiple-node cluster, two master nodes (rm1 and rm2), below is my yarn-site.xml. At the moment, the ResourceManager HA works if: 1) at rm1, run ./sbin/start-yarn.sh yarn rmadmin -getServiceState rm1 active yarn rmadmin -getServiceState rm2 14/08/12 07:47:59 INFO ipc.Client: Retrying connect to server: rm1/192.168.1.1:23142. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) Operation failed: Call From rm2/192.168.1.2 to rm2:23142 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 2) at rm2, run ./sbin/start-yarn.sh yarn rmadmin -getServiceState rm1 standby Some questions: Q1) I need start yarn in EACH master separately, is this normal? Is there a way that I just run ./sbin/start-yarn.sh in rm1 and get the STANDBY ResourceManager in rm2 started as well? Q2) How to get alerts (e.g. by email) if the ACTIVE ResourceManager is down in an auto-failover env? or how do you monitor the status of ACTIVE/STANDBY ResourceManager? Regards Arthur ?xml version=1.0? configuration !-- Site specific YARN configuration properties -- property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value /property property nameyarn.resourcemanager.address/name value192.168.1.1:8032/value /property property nameyarn.resourcemanager.resource-tracker.address/name value192.168.1.1:8031/value /property property nameyarn.resourcemanager.admin.address/name value192.168.1.1:8033/value /property property nameyarn.resourcemanager.scheduler.address/name value192.168.1.1:8030/value /property property nameyarn.nodemanager.loacl-dirs/name value/edh/hadoop_data/mapred/nodemanager/value finaltrue/final /property property nameyarn.web-proxy.address/name value192.168.1.1:/value /property property nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property property nameyarn.nodemanager.resource.memory-mb/name value18432/value /property property nameyarn.scheduler.minimum-allocation-mb/name value9216/value /property property nameyarn.scheduler.maximum-allocation-mb/name value18432/value /property property nameyarn.resourcemanager.connect.retry-interval.ms/name value2000/value /property property nameyarn.resourcemanager.ha.enabled/name valuetrue/value /property property nameyarn.resourcemanager.ha.automatic-failover.enabled/name valuetrue/value /property property nameyarn.resourcemanager.ha.automatic-failover.embedded/name valuetrue/value /property property nameyarn.resourcemanager.cluster-id/name valuecluster_rm/value /property property nameyarn.resourcemanager.ha.rm-ids/name valuerm1,rm2/value /property property nameyarn.resourcemanager.hostname.rm1/name value192.168.1.1/value /property property nameyarn.resourcemanager.hostname.rm2/name value192.168.1.2/value /property property nameyarn.resourcemanager.scheduler.class/name valueorg.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler/value /property property nameyarn.resourcemanager.recovery.enabled/name valuetrue/value /property property nameyarn.resourcemanager.store.class/name
Hadoop 2.4.1 Verifying Automatic Failover Failed: ResourceManager and JobHistoryServer do not auto-failover to Standby Node
Hi I have set up the Hadoop 2.4.1 with HDFS High Availability using the Quorum Journal Manager. I am verifying Automatic Failover: I manually used “kill -9” command to disable all running Hadoop services in active node (NN-1), I can find that the Standby node (NN-2) now becomes ACTIVE now which is good, however, the “ResourceManager” service cannot be found in NN-2, please advise how to make ResourceManager and JobHistoryServer auto-failover? or do I miss some important setup? missing some settings in hdfs-site.xml or core-site.xml? Please help! Regards Arthur BEFORE TESTING: NN-1: jps 9564 NameNode 10176 JobHistoryServer 21215 Jps 17636 QuorumPeerMain 20838 NodeManager 9678 DataNode 9933 JournalNode 10085 DFSZKFailoverController 20724 ResourceManager NN-2 (Standby Name node) jps 14064 Jps 32046 NameNode 13765 NodeManager 32126 DataNode 32271 DFSZKFailoverController AFTER NN-1 dips 17636 QuorumPeerMain 21508 Jps NN-2 jps 32046 NameNode 13765 NodeManager 32126 DataNode 32271 DFSZKFailoverController 14165 Jps
Hadoop 2.4.1 Verifying Automatic Failover Failed: Unable to trigger a roll of the active NN
Hi, I have setup Hadoop 2.4.1 HA Cluster using Quorum Journal, I am verifying automatic failover, after killing the process of namenode from Active one, the name node was not failover to standby node, Please advise Regards Arthur 2014-08-04 18:54:40,453 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to trigger a roll of the active NN java.net.ConnectException: Call From standbynode to activenode:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy16.rollEditLog(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:139) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:271) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:313) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) at org.apache.hadoop.ipc.Client.call(Client.java:1381) ... 11 more 2014-08-04 18:55:03,458 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from activenode:54571 Call#17 Retry#1: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby 2014-08-04 18:55:06,683 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from activenode:54571 Call#17 Retry#3: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby 2014-08-04 18:55:16,643 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from activenode:54602 Call#0 Retry#1: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby 2014-08-04 18:55:19,530 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from activenode:54610 Call#17 Retry#5: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby 2014-08-04 18:55:20,756 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from activenode:54602 Call#0 Retry#3: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby
Re: Hadoop 2.4.1 Verifying Automatic Failover Failed: Unable to trigger a roll of the active NN
Hi, Thanks for your reply. It was about StandBy Namenode not promoted to Active. Can you please advise what the path of ZKFC logs? Similar to Namenode status web page, a Cluster Web Console is added in federation to monitor the federated cluster at http://any_nn_host:port/dfsclusterhealth.jsp. Any Namenode in the cluster can be used to access this web page” What is the default port for the cluster console? I tried 8088 but no luck. Please advise. Regards Arthur On 4 Aug, 2014, at 7:22 pm, Brahma Reddy Battula brahmareddy.batt...@huawei.com wrote: HI, DO you mean Active Namenode which is killed is not transition to STANDBY..? Here Namenode will not start as standby if you kill..Again you need to start manually. Automatic failover means when over Active goes down Standy Node will transition to Active automatically..it's not like starting killed process and making the Active(which is standby.) Please refer the following doc for same ..( Section : Verifying automatic failover) http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html OR DO you mean Standby Namenode is not transition to ACTIVE..? Please check ZKFC logs,, Mostly this might not happen from the logs you pasted Thanks Regards Brahma Reddy Battula From: arthur.hk.c...@gmail.com [arthur.hk.c...@gmail.com] Sent: Monday, August 04, 2014 4:38 PM To: user@hadoop.apache.org Cc: arthur.hk.c...@gmail.com Subject: Hadoop 2.4.1 Verifying Automatic Failover Failed: Unable to trigger a roll of the active NN Hi, I have setup Hadoop 2.4.1 HA Cluster using Quorum Journal, I am verifying automatic failover, after killing the process of namenode from Active one, the name node was not failover to standby node, Please advise Regards Arthur 2014-08-04 18:54:40,453 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to trigger a roll of the active NN java.net.ConnectException: Call From standbynode to activenode:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy16.rollEditLog(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:139) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:271) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:313) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) at org.apache.hadoop.ipc.Client.call(Client.java:1381) ... 11 more 2014-08-04 18:55:03,458 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from activenode:54571 Call#17 Retry#1: org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby 2014-08-04 18:55:06,683 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 8020, call
Compile Hadoop 2.4.1 (with Tests and Without Tests)
Hi, I am trying to compile Hadoop 2.4.1. If I run mvm clean install -DskipTests, the compilation is GOOD, However, if I run mvn clean install”, i.e. didn’t skip the Tests, it returned “Failures” Can anyone please advise what should be prepared before unit tests in compilation? From the error log, e.g. I found it used 192.168.12.37, but this was not my local IPs, should I change some configure file? any ideas? On the other hand, can I use the the compiled code from GOOD compilation and just ignore the failed tests? Please advise!! Regards Arthur Compilation results: run mvm clean install -DskipTests, the compilation is GOOD, = [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main SUCCESS [1.756s] [INFO] Apache Hadoop Project POM . SUCCESS [0.586s] [INFO] Apache Hadoop Annotations . SUCCESS [1.282s] [INFO] Apache Hadoop Project Dist POM SUCCESS [0.257s] [INFO] Apache Hadoop Assemblies .. SUCCESS [0.136s] [INFO] Apache Hadoop Maven Plugins ... SUCCESS [1.189s] [INFO] Apache Hadoop MiniKDC . SUCCESS [0.837s] [INFO] Apache Hadoop Auth SUCCESS [0.835s] [INFO] Apache Hadoop Auth Examples ... SUCCESS [0.614s] [INFO] Apache Hadoop Common .. SUCCESS [9.020s] [INFO] Apache Hadoop NFS . SUCCESS [9.341s] [INFO] Apache Hadoop Common Project .. SUCCESS [0.013s] [INFO] Apache Hadoop HDFS SUCCESS [1:11.329s] [INFO] Apache Hadoop HttpFS .. SUCCESS [1.943s] [INFO] Apache Hadoop HDFS BookKeeper Journal . SUCCESS [8.236s] [INFO] Apache Hadoop HDFS-NFS SUCCESS [0.181s] [INFO] Apache Hadoop HDFS Project SUCCESS [0.014s] [INFO] hadoop-yarn ... SUCCESS [0.045s] [INFO] hadoop-yarn-api ... SUCCESS [3.080s] [INFO] hadoop-yarn-common SUCCESS [3.995s] [INFO] hadoop-yarn-server SUCCESS [0.036s] [INFO] hadoop-yarn-server-common . SUCCESS [0.406s] [INFO] hadoop-yarn-server-nodemanager SUCCESS [7.874s] [INFO] hadoop-yarn-server-web-proxy .. SUCCESS [0.185s] [INFO] hadoop-yarn-server-applicationhistoryservice .. SUCCESS [2.766s] [INFO] hadoop-yarn-server-resourcemanager SUCCESS [0.975s] [INFO] hadoop-yarn-server-tests .. SUCCESS [0.260s] [INFO] hadoop-yarn-client SUCCESS [0.401s] [INFO] hadoop-yarn-applications .. SUCCESS [0.012s] [INFO] hadoop-yarn-applications-distributedshell . SUCCESS [0.194s] [INFO] hadoop-yarn-applications-unmanaged-am-launcher SUCCESS [0.157s] [INFO] hadoop-yarn-site .. SUCCESS [0.028s] [INFO] hadoop-yarn-project ... SUCCESS [0.030s] [INFO] hadoop-mapreduce-client ... SUCCESS [0.027s] [INFO] hadoop-mapreduce-client-core .. SUCCESS [1.384s] [INFO] hadoop-mapreduce-client-common SUCCESS [1.167s] [INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [0.151s] [INFO] hadoop-mapreduce-client-app ... SUCCESS [0.692s] [INFO] hadoop-mapreduce-client-hs SUCCESS [0.521s] [INFO] hadoop-mapreduce-client-jobclient . SUCCESS [9.581s] [INFO] hadoop-mapreduce-client-hs-plugins SUCCESS [0.105s] [INFO] Apache Hadoop MapReduce Examples .. SUCCESS [0.288s] [INFO] hadoop-mapreduce .. SUCCESS [0.031s] [INFO] Apache Hadoop MapReduce Streaming . SUCCESS [2.485s] [INFO] Apache Hadoop Distributed Copy SUCCESS [14.204s] [INFO] Apache Hadoop Archives SUCCESS [0.147s] [INFO] Apache Hadoop Rumen ... SUCCESS [0.283s] [INFO] Apache Hadoop Gridmix . SUCCESS [0.266s] [INFO] Apache Hadoop Data Join ... SUCCESS [0.109s] [INFO] Apache Hadoop Extras .. SUCCESS [0.173s] [INFO] Apache Hadoop Pipes ... SUCCESS [0.013s] [INFO] Apache Hadoop OpenStack support ... SUCCESS [0.292s] [INFO] Apache Hadoop Client .. SUCCESS [0.093s] [INFO] Apache Hadoop Mini-Cluster SUCCESS [0.052s] [INFO] Apache Hadoop Scheduler Load Simulator SUCCESS [1.123s] [INFO] Apache Hadoop Tools Dist
ResourceManager version and Hadoop version
Hi, I am running Apache Hadoop Cluster 2.4.1, I have two questions about Hadoop HTML link http://test_namenode:8088/cluster/cluster, 1) If I click Server metrics” to the page of http://test_namenode::8088/metrics, it is blank. Can anyone please advise if this is normal or I have not yet setup some monitoring tools properly e.g. nagios? 2) On the page http://test_namenode:8088/cluster/cluster, I can see version is 2.4.1 from Unknown Is there a way to change the word from “Unknown” to a more meaningful word by myself? ResourceManager version: 2.4.1 from Unknown by hadoop source checksum f74…... Hadoop version: 2.4.1 from Unknown by hadoop source checksum bb7…... Many thanks! Regards Arthur
Re: Hadoop 2.4.0 How to change Configured Capacity
Hi, Both ”dfs.name.data.dir” and “dfs.datanode.data.dir” are not set in my cluster. By the way I have searched around about these two parameters, I cannot find them in Hadoop Default page. http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml Can you please advise where to set them and how to set them? in hdfs-site.xml or in core-site.xml or another configuration file? Many thanks Arthur On 29 Jul, 2014, at 1:27 am, hadoop hive hadooph...@gmail.com wrote: You need to add each disk inside dfs.name.data.dir parameter. On Jul 28, 2014 5:14 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have installed Hadoop 2.4.0 with 5 nodes, each node physically has 4T hard disk, when checking the configured capacity, I found it is about 49.22 GB per node, can anyone advise how to set bigger “configured capacity” e.g. 2T or more per node? Name node Configured Capacity: 264223436800 (246.08 GB) Each Datanode Configured Capacity: 52844687360 (49.22 GB) regards Arthur