migrating to Hortonworks or cloudera
I have a CDH4.1 cluster with 30 TB of HDFS space across 12 nodes. I now want to uninstall CDH and move the cluster to HDP. Nothing wrong with CDH, but want to try moving between distros without losing the data on datanodes. Is it possible to re-map the same datanodes and pre-populated HDFS data disks on existing datanodes to HDP ? The goal is to uninstall CDH, install HDP on the same datanodes without migrating the existing HDFS data. Any leads appreciated.
Re: I am about to lose all my data please help
property namehadoop.tmp.dir/name value/home/hadoop/project/hadoop-data/value /property On Tue, Mar 18, 2014 at 2:06 PM, Azuryy Yu azury...@gmail.com wrote: I don't think this is the case, because there is; property namehadoop.tmp.dir/name value/home/hadoop/project/hadoop-data/value /property On Tue, Mar 18, 2014 at 1:55 PM, Stanley Shi s...@gopivotal.com wrote: one possible reason is that you didn't set the namenode working directory, by default it's in /tmp folder; and the /tmp folder might get deleted by the OS without any notification. If this is the case, I am afraid you have lost all your namenode data. *property namedfs.name.dir/name value${hadoop.tmp.dir}/dfs/name/value descriptionDetermines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. /description /property* Regards, *Stanley Shi,* On Sun, Mar 16, 2014 at 5:29 PM, Mirko Kämpf mirko.kae...@gmail.comwrote: Hi, what is the location of the namenodes fsimage and editlogs? And how much memory has the NameNode. Did you work with a Secondary NameNode or a Standby NameNode for checkpointing? Where are your HDFS blocks located, are those still safe? With this information at hand, one might be able to fix your setup, but do not format the old namenode before all is working with a fresh one. Grab a copy of the maintainance guide: http://shop.oreilly.com/product/0636920025085.do?sortby=publicationDate which helps solving such type of problems as well. Best wishes Mirko 2014-03-16 9:07 GMT+00:00 Fatih Haltas fatih.hal...@nyu.edu: Dear All, I have just restarted machines of my hadoop clusters. Now, I am trying to restart hadoop clusters again, but getting error on namenode restart. I am afraid of loosing my data as it was properly running for more than 3 months. Currently, I believe if I do namenode formatting, it will work again, however, data will be lost. Is there anyway to solve this without losing the data. I will really appreciate any help. Thanks. = Here is the logs; 2014-02-26 16:02:39,698 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ADUAE042-LAP-V/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 / 2014-02-26 16:02:40,005 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-02-26 16:02:40,019 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2014-02-26 16:02:40,021 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-02-26 16:02:40,021 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started 2014-02-26 16:02:40,169 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2014-02-26 16:02:40,193 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered. 2014-02-26 16:02:40,194 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source NameNode registered. 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 64-bit 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 17.77875 MB 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^21 = 2097152 entries 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: recommended=2097152, actual=2097152 2014-02-26 16:02:40,273 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop 2014-02-26 16:02:40,273 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2014-02-26 16:02:40,274 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2014-02-26 16:02:40,279 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100 2014-02-26 16:02:40,279 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2014-02-26 16:02:40,724 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 2014-02-26 16:02:40,749 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times 2014-02-26 16:02:40,780 ERROR
Re: I am about to lose all my data please help
I don't think this is the case, because there is; property namehadoop.tmp.dir/name value/home/hadoop/project/hadoop-data/value /property On Tue, Mar 18, 2014 at 1:55 PM, Stanley Shi s...@gopivotal.com wrote: one possible reason is that you didn't set the namenode working directory, by default it's in /tmp folder; and the /tmp folder might get deleted by the OS without any notification. If this is the case, I am afraid you have lost all your namenode data. *property namedfs.name.dir/name value${hadoop.tmp.dir}/dfs/name/value descriptionDetermines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. /description /property* Regards, *Stanley Shi,* On Sun, Mar 16, 2014 at 5:29 PM, Mirko Kämpf mirko.kae...@gmail.comwrote: Hi, what is the location of the namenodes fsimage and editlogs? And how much memory has the NameNode. Did you work with a Secondary NameNode or a Standby NameNode for checkpointing? Where are your HDFS blocks located, are those still safe? With this information at hand, one might be able to fix your setup, but do not format the old namenode before all is working with a fresh one. Grab a copy of the maintainance guide: http://shop.oreilly.com/product/0636920025085.do?sortby=publicationDate which helps solving such type of problems as well. Best wishes Mirko 2014-03-16 9:07 GMT+00:00 Fatih Haltas fatih.hal...@nyu.edu: Dear All, I have just restarted machines of my hadoop clusters. Now, I am trying to restart hadoop clusters again, but getting error on namenode restart. I am afraid of loosing my data as it was properly running for more than 3 months. Currently, I believe if I do namenode formatting, it will work again, however, data will be lost. Is there anyway to solve this without losing the data. I will really appreciate any help. Thanks. = Here is the logs; 2014-02-26 16:02:39,698 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ADUAE042-LAP-V/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 / 2014-02-26 16:02:40,005 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-02-26 16:02:40,019 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2014-02-26 16:02:40,021 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-02-26 16:02:40,021 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started 2014-02-26 16:02:40,169 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2014-02-26 16:02:40,193 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered. 2014-02-26 16:02:40,194 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source NameNode registered. 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 64-bit 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 17.77875 MB 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^21 = 2097152 entries 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: recommended=2097152, actual=2097152 2014-02-26 16:02:40,273 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop 2014-02-26 16:02:40,273 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2014-02-26 16:02:40,274 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2014-02-26 16:02:40,279 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100 2014-02-26 16:02:40,279 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2014-02-26 16:02:40,724 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 2014-02-26 16:02:40,749 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times 2014-02-26 16:02:40,780 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: NameNode is not formatted. at
Re: I am about to lose all my data please help
Ah yes, I overlooked this. Then please check the file are there or not: ls /home/hadoop/project/hadoop-data/dfs/name? Regards, *Stanley Shi,* On Tue, Mar 18, 2014 at 2:06 PM, Azuryy Yu azury...@gmail.com wrote: I don't think this is the case, because there is; property namehadoop.tmp.dir/name value/home/hadoop/project/hadoop-data/value /property On Tue, Mar 18, 2014 at 1:55 PM, Stanley Shi s...@gopivotal.com wrote: one possible reason is that you didn't set the namenode working directory, by default it's in /tmp folder; and the /tmp folder might get deleted by the OS without any notification. If this is the case, I am afraid you have lost all your namenode data. *property namedfs.name.dir/name value${hadoop.tmp.dir}/dfs/name/value descriptionDetermines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. /description /property* Regards, *Stanley Shi,* On Sun, Mar 16, 2014 at 5:29 PM, Mirko Kämpf mirko.kae...@gmail.comwrote: Hi, what is the location of the namenodes fsimage and editlogs? And how much memory has the NameNode. Did you work with a Secondary NameNode or a Standby NameNode for checkpointing? Where are your HDFS blocks located, are those still safe? With this information at hand, one might be able to fix your setup, but do not format the old namenode before all is working with a fresh one. Grab a copy of the maintainance guide: http://shop.oreilly.com/product/0636920025085.do?sortby=publicationDate which helps solving such type of problems as well. Best wishes Mirko 2014-03-16 9:07 GMT+00:00 Fatih Haltas fatih.hal...@nyu.edu: Dear All, I have just restarted machines of my hadoop clusters. Now, I am trying to restart hadoop clusters again, but getting error on namenode restart. I am afraid of loosing my data as it was properly running for more than 3 months. Currently, I believe if I do namenode formatting, it will work again, however, data will be lost. Is there anyway to solve this without losing the data. I will really appreciate any help. Thanks. = Here is the logs; 2014-02-26 16:02:39,698 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ADUAE042-LAP-V/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 / 2014-02-26 16:02:40,005 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-02-26 16:02:40,019 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2014-02-26 16:02:40,021 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-02-26 16:02:40,021 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started 2014-02-26 16:02:40,169 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2014-02-26 16:02:40,193 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered. 2014-02-26 16:02:40,194 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source NameNode registered. 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 64-bit 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 17.77875 MB 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^21 = 2097152 entries 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: recommended=2097152, actual=2097152 2014-02-26 16:02:40,273 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop 2014-02-26 16:02:40,273 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2014-02-26 16:02:40,274 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2014-02-26 16:02:40,279 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100 2014-02-26 16:02:40,279 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2014-02-26 16:02:40,724 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 2014-02-26 16:02:40,749 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times 2014-02-26
issue of Log aggregation has not completed or is not enabled.
hi,maillist: i try look application log use the following process # yarn application -list Application-Id Application-Name User Queue State Final-State Tracking-URL application_1395126130647_0014 select user_id as userid, adverti...stattime(Stage-1) hivehive FINISHED SUCCEEDED ch18:19888/jobhistory/job/job_1395126130647_0014 # yarn logs -applicationId application_1395126130647_0014 Logs not available at /var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014 Log aggregation has not completed or is not enabled. but i do enable Log aggregation function ,here is my yarn-site.xml configuration about log aggregation property nameyarn.log-aggregation-enable/name valuetrue/value /property property descriptionWhere to aggregate logs to./description nameyarn.nodemanager.remote-app-log-dir/name value/var/log/hadoop-yarn/apps/value /property the application logs is not put on hdfs successfully,why? # hadoop fs -ls /var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014 ls: `/var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014': No such file or directory
RE: issue of Log aggregation has not completed or is not enabled.
Just for confirmation, 1. Does NodeManager is restarted after enabling LogAggregation? If Yes, check for NodeManager start up logs for Log Aggregation Service start is success. Thanks Regards Rohith Sharma K S From: ch huang [mailto:justlo...@gmail.com] Sent: 18 March 2014 13:09 To: user@hadoop.apache.org Subject: issue of Log aggregation has not completed or is not enabled. hi,maillist: i try look application log use the following process # yarn application -list Application-Id Application-Name User Queue State Final-State Tracking-URL application_1395126130647_0014 select user_id as userid, adverti...stattime(Stage-1) hivehive FINISHED SUCCEEDED ch18:19888/jobhistory/job/job_1395126130647_0014 # yarn logs -applicationId application_1395126130647_0014 Logs not available at /var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014 Log aggregation has not completed or is not enabled. but i do enable Log aggregation function ,here is my yarn-site.xml configuration about log aggregation property nameyarn.log-aggregation-enable/name valuetrue/value /property property descriptionWhere to aggregate logs to./description nameyarn.nodemanager.remote-app-log-dir/name value/var/log/hadoop-yarn/apps/value /property the application logs is not put on hdfs successfully,why? # hadoop fs -ls /var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014 ls: `/var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014': No such file or directory
history viewer issue
hi, How to solve this problem. [cloudera@localhost ~]$ hadoop job -history ~/1 DEPRECATED: Use of this script to execute mapred command is deprecated. Instead use the mapred command for it. Exception in thread main java.io.IOException: Not able to initialize History viewer at org.apache.hadoop.mapred.HistoryViewer.init(HistoryViewer.java:95) at org.apache.hadoop.mapred.JobClient.viewHistory(JobClient.java:1945) at org.apache.hadoop.mapred.JobClient.run(JobClient.java:1894) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:2162) Caused by: java.io.IOException: History directory /home/cloudera/1/_logs/historydoes not exist at org.apache.hadoop.mapred.HistoryViewer.init(HistoryViewer.java:76) ... 5 more Regards, Avinash
Re: history viewer issue
check the error Caused by: java.io.IOException: History directory /home/cloudera/1/_logs/historydoes not exist create that directory and change the ownership to the user running history server On Tue, Mar 18, 2014 at 5:16 PM, Avinash Kujur avin...@gmail.com wrote: hi, How to solve this problem. [cloudera@localhost ~]$ hadoop job -history ~/1 DEPRECATED: Use of this script to execute mapred command is deprecated. Instead use the mapred command for it. Exception in thread main java.io.IOException: Not able to initialize History viewer at org.apache.hadoop.mapred.HistoryViewer.init(HistoryViewer.java:95) at org.apache.hadoop.mapred.JobClient.viewHistory(JobClient.java:1945) at org.apache.hadoop.mapred.JobClient.run(JobClient.java:1894) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:2162) Caused by: java.io.IOException: History directory /home/cloudera/1/_logs/historydoes not exist at org.apache.hadoop.mapred.HistoryViewer.init(HistoryViewer.java:76) ... 5 more Regards, Avinash -- Nitin Pawar
RE: history viewer issue
you need to give the locatiion of the history file...Please find following for same... user@host-10-18-40-132mailto:user@host-10-18-40-132:~ mapred job -history /home/USER/staging-dir/history/done/2014/03/07/00/job_1394167487634_0004-1394189877843-User-word+count-1394189895136-1-1-SUCCEEDED-a-1394189882729.jhist Hadoop job: job_1394167487634_0004 = User: User JobName: word count JobConf: hdfs://hacluster/home/USER/staging-dir/User/.staging/job_1394167487634_0004/job.xml Submitted At: 7-Mar-2014 16:27:57 Launched At: 7-Mar-2014 16:28:02 (4sec) Finished At: 7-Mar-2014 16:28:15 (12sec) Status: SUCCEEDED Counters: |Group Name |Counter name |Map Value |Reduce Value|Total Value| Thanks Regards Brahma Reddy Battula From: Nitin Pawar [nitinpawar...@gmail.com] Sent: Tuesday, March 18, 2014 5:35 PM To: user@hadoop.apache.org Subject: Re: history viewer issue check the error Caused by: java.io.IOException: History directory /home/cloudera/1/_logs/historydoes not exist create that directory and change the ownership to the user running history server On Tue, Mar 18, 2014 at 5:16 PM, Avinash Kujur avin...@gmail.commailto:avin...@gmail.com wrote: hi, How to solve this problem. [cloudera@localhost ~]$ hadoop job -history ~/1 DEPRECATED: Use of this script to execute mapred command is deprecated. Instead use the mapred command for it. Exception in thread main java.io.IOException: Not able to initialize History viewer at org.apache.hadoop.mapred.HistoryViewer.init(HistoryViewer.java:95) at org.apache.hadoop.mapred.JobClient.viewHistory(JobClient.java:1945) at org.apache.hadoop.mapred.JobClient.run(JobClient.java:1894) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:2162) Caused by: java.io.IOException: History directory /home/cloudera/1/_logs/historydoes not exist at org.apache.hadoop.mapred.HistoryViewer.init(HistoryViewer.java:76) ... 5 more Regards, Avinash -- Nitin Pawar
unsubscribe
Please remove me from the user distribution list. Thanks.
Re: unsubscribe
Please send an email to user-unsubscr...@hadoop.apache.org. On Tue, Mar 18, 2014 at 6:57 PM, Rananavare, Sunil sunil.rananav...@unify.com wrote: Please remove me from the user distribution list. Thanks. -- Thanks and Regards, Vimal Jain
RE: HA NN Failover question
I think I found the issue. The ZKFC on the standby NN server tried, and failed, to connect to the standby NN when I shutdown the network on the Active NN server. I'm getting an exception from the HealthMonitor in the ZKFC log: WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception try to monitor health of NameNode at host/ip:port. INFO org.apache.hadoop.ipc.CLient: Retrying connect to server host/ip:port. Already tried 0 time(s); retry policy is (the default) Is it significant that it thinks the address is host/ip, instead of just the host or the ip? From: azury...@gmail.com Subject: Re: HA NN Failover question Date: Sat, 15 Mar 2014 11:35:20 +0800 To: user@hadoop.apache.org I suppose NN2 is standby, please check ZKFC2 is alive before stop network on nn1 Sent from my iPhone5s On 2014年3月15日, at 10:53, dlmarion dlmar...@hotmail.com wrote: Apache Hadoop 2.3.0 Sent via the Samsung GALAXY S®4, an ATT 4G LTE smartphone Original message From: Azuryy Date:03/14/2014 10:45 PM (GMT-05:00) To: user@hadoop.apache.org Subject: Re: HA NN Failover question Which Hadoop version you used? Sent from my iPhone5s On 2014年3月15日, at 9:29, dlmarion dlmar...@hotmail.com wrote: Server 1: NN1 and ZKFC1 Server 2: NN2 and ZKFC2 Server 3: Journal1 and ZK1 Server 4: Journal2 and ZK2 Server 5: Journal3 and ZK3 Server 6+: Datanode All in the same rack. I would expect the ZKFC from the active name node server to lose its lock and the other ZKFC to tell the standby namenode that it should become active (I’m assuming that’s how it works). - Dave From: Juan Carlos [mailto:juc...@gmail.com] Sent: Friday, March 14, 2014 9:12 PM To: user@hadoop.apache.org Subject: Re: HA NN Failover question Hi Dave, How many zookeeper servers do you have and where are them? Juan Carlos Fernández Rodríguez El 15/03/2014, a las 01:21, dlmarion dlmar...@hotmail.com escribió: I was doing some testing with HA NN today. I set up two NN with active failover (ZKFC) using sshfence. I tested that its working on both NN by doing ‘kill -9 pid’ on the active NN. When I did this on the active node, the standby would become the active and everything seemed to work. Next, I logged onto the active NN and did a ‘service network stop’ to simulate a NIC/network failure. The standby did not become the active in this scenario. In fact, it remained in standby mode and complained in the log that it could not communicate with (what was) the active NN. I was unable to find anything relevant via searches in Google in Jira. Does anyone have experience successfully testing this? I’m hoping that it is just a configuration problem. FWIW, when the network was restarted on the active NN, it failed over almost immediately. Thanks, Dave
RE: HA NN Failover question
Found this: http://grokbase.com/t/cloudera/cdh-user/12anhyr8ht/cdh4-failover-controllers Then configured dfs.ha.fencing.methods to contain both sshfence and shell(/bin/true). Note that the docs for core-default.xml say that the value is a list. I tried a comma with no luck. Had to look in the src to find it's separated by a newline. Adding shell(/bin/true) allowed it to work successfully. From: dlmar...@hotmail.com To: user@hadoop.apache.org Subject: RE: HA NN Failover question Date: Tue, 18 Mar 2014 14:51:25 + I think I found the issue. The ZKFC on the standby NN server tried, and failed, to connect to the standby NN when I shutdown the network on the Active NN server. I'm getting an exception from the HealthMonitor in the ZKFC log: WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception try to monitor health of NameNode at host/ip:port. INFO org.apache.hadoop.ipc.CLient: Retrying connect to server host/ip:port. Already tried 0 time(s); retry policy is (the default) Is it significant that it thinks the address is host/ip, instead of just the host or the ip? From: azury...@gmail.com Subject: Re: HA NN Failover question Date: Sat, 15 Mar 2014 11:35:20 +0800 To: user@hadoop.apache.org I suppose NN2 is standby, please check ZKFC2 is alive before stop network on nn1 Sent from my iPhone5s On 2014年3月15日, at 10:53, dlmarion dlmar...@hotmail.com wrote: Apache Hadoop 2.3.0 Sent via the Samsung GALAXY S®4, an ATT 4G LTE smartphone Original message From: Azuryy Date:03/14/2014 10:45 PM (GMT-05:00) To: user@hadoop.apache.org Subject: Re: HA NN Failover question Which Hadoop version you used? Sent from my iPhone5s On 2014年3月15日, at 9:29, dlmarion dlmar...@hotmail.com wrote: Server 1: NN1 and ZKFC1 Server 2: NN2 and ZKFC2 Server 3: Journal1 and ZK1 Server 4: Journal2 and ZK2 Server 5: Journal3 and ZK3 Server 6+: Datanode All in the same rack. I would expect the ZKFC from the active name node server to lose its lock and the other ZKFC to tell the standby namenode that it should become active (I’m assuming that’s how it works). - Dave From: Juan Carlos [mailto:juc...@gmail.com] Sent: Friday, March 14, 2014 9:12 PM To: user@hadoop.apache.org Subject: Re: HA NN Failover question Hi Dave, How many zookeeper servers do you have and where are them? Juan Carlos Fernández Rodríguez El 15/03/2014, a las 01:21, dlmarion dlmar...@hotmail.com escribió: I was doing some testing with HA NN today. I set up two NN with active failover (ZKFC) using sshfence. I tested that its working on both NN by doing ‘kill -9 pid’ on the active NN. When I did this on the active node, the standby would become the active and everything seemed to work. Next, I logged onto the active NN and did a ‘service network stop’ to simulate a NIC/network failure. The standby did not become the active in this scenario. In fact, it remained in standby mode and complained in the log that it could not communicate with (what was) the active NN. I was unable to find anything relevant via searches in Google in Jira. Does anyone have experience successfully testing this? I’m hoping that it is just a configuration problem. FWIW, when the network was restarted on the active NN, it failed over almost immediately. Thanks, Dave
Re: Problem of installing HDFS-385 and the usage
Hi Stanley, Thanks for your response, but I still have some problems. Could you gave me further instructions? I am now using hadoop 1.0.3. Does that mean I have to upgrade to 1.2.0? or I can directly override the original code with what command? Another question is that you said I can refer to HDFS-3601 but what version? I notice that there are 6 versions and v1,v2 modified the same file and v3,v4,v5,v6 modified another file but v3,v4 shows some connection and v5,v6 shows another connection. That confused me. And there is also a file branch 2. It confused me again. The last problem is that how do I start reading the code to know what policy one use? If I want to control the block placement then I have to write codes rather than type shell commands? The codes are so big so that I do not know where to start. Could you give me some hint? Since I am a new user, I am sorry if I asked stupid question. But I really did not mean to. Thanks, Eric 2014-03-18 13:43 GMT+08:00 Stanley Shi s...@gopivotal.com: This JIRA is included in Apache code since version 0.21.0https://issues.apache.org/jira/browse/HDFS/fixforversion/12314046 , 1.2.0https://issues.apache.org/jira/browse/HDFS/fixforversion/12321657 , 1-winhttps://issues.apache.org/jira/browse/HDFS/fixforversion/12320362 ; If you want to use it, you need to write your own policy, please see this JIRA for example: https://issues.apache.org/jira/browse/HDFS-3601 Regards, *Stanley Shi,* On Mon, Mar 17, 2014 at 11:31 AM, Eric Chiu ericchiu0...@gmail.comwrote: HI all, Could anyone tell me How to install and use this hadoop plug-in? https://issues.apache.org/jira/browse/HDFS-385 I read the code but do not know where to install and use what command to install them all. Another problem is that there are .txt and .patch files, which one should be applied? Some of the .patch files have -win , does that mean that file is for windows hadoop user? (I am using ubuntu) Thank you very much.
Re: issue of Log aggregation has not completed or is not enabled.
OTOH, if the application is still running, the logs are not yet uploaded, you certainly can not see the logs. Jian On Tue, Mar 18, 2014 at 1:57 AM, Rohith Sharma K S rohithsharm...@huawei.com wrote: Just for confirmation, 1. Does NodeManager is restarted after enabling LogAggregation? If Yes, check for NodeManager start up logs for Log Aggregation Service start is success. Thanks Regards Rohith Sharma K S *From:* ch huang [mailto:justlo...@gmail.com] *Sent:* 18 March 2014 13:09 *To:* user@hadoop.apache.org *Subject:* issue of Log aggregation has not completed or is not enabled. hi,maillist: i try look application log use the following process # yarn application -list Application-Id Application-Name User Queue State Final-State Tracking-URL application_1395126130647_0014 select user_id as userid, adverti...stattime(Stage-1) hivehive FINISHED SUCCEEDED ch18:19888/jobhistory/job/job_1395126130647_0014 # yarn logs -applicationId application_1395126130647_0014 Logs not available at /var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014 Log aggregation has not completed or is not enabled. but i do enable Log aggregation function ,here is my yarn-site.xml configuration about log aggregation property nameyarn.log-aggregation-enable/name valuetrue/value /property property descriptionWhere to aggregate logs to./description nameyarn.nodemanager.remote-app-log-dir/name value/var/log/hadoop-yarn/apps/value /property the application logs is not put on hdfs successfully,why? # hadoop fs -ls /var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014 ls: `/var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014': No such file or directory -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
show task failure count by node in Yarn 2.2.0
Hi, Back to older release of Hadoop, the web admin page is able to show me the number of failed task for each node, so I can have a clue that a certain node with higher number is not healthy, disk issue for example. But after I upgrade to 2.2.0 release, I do not find any equivalent page, so my question is what's the replacement for this? I think it's critical from administration perspective. -- --Anfernee
Cluster creation and adding Services through Apache Ambari Restful APIs
Hi, I want to create Cluster and want to add services through Apache Ambari Restful APIs. I am unable to call POST,PUT and DELETE Web Services successfully. I am using Resful API's client to work and trying to use below URL with POST request but not working. *POST REQUEST* http://AmbariServerIP:8080/api/v1/clusters/c1 Best Regards, Asif Sajjad
How to call POST,PUT and Delete Restful Web Services of Apache Ambari
Hi, I want to create Cluster by calling Apache Ambari Restful API's. I am using Restful API's client named as Postman Rest Client. GET Requests are working fine but POST,PUT and Delete are not working. Help me in this Regard please *Example:* I want to add Cluster by below URL, but it's not working. POST REQUEST http://AmbariServerIP:8080/api/v1/clusters/c1 Best Regards, Asif Sajjad
subscribe
Re: Cluster creation and adding Services through Apache Ambari Restful APIs
Hi Asif, What is the exact call that you are trying (with all the headers and parameters), and the response you are getting back from the server? Yusaku On Tue, Mar 18, 2014 at 2:46 AM, asif sajjad asif.sajjad@gmail.com wrote: Hi, I want to create Cluster and want to add services through Apache Ambari Restful APIs. I am unable to call POST,PUT and DELETE Web Services successfully. I am using Resful API's client to work and trying to use below URL with POST request but not working. POST REQUEST http://AmbariServerIP:8080/api/v1/clusters/c1 Best Regards, Asif Sajjad -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: HA NN Failover question
Correct david, Sshfence doesnot handle network unavailability. Since the JournalNodes ensures that only one NN can write, fencing of old active handled Automatically. So configuring fence method to shell(/bin/true) should be fine. Regards, Vinayakumar B. From: david marion [mailto:dlmar...@hotmail.com] Sent: 18 March 2014 20:53 To: user@hadoop.apache.org Subject: RE: HA NN Failover question Found this: http://grokbase.com/t/cloudera/cdh-user/12anhyr8ht/cdh4-failover-controllers Then configured dfs.ha.fencing.methods to contain both sshfence and shell(/bin/true). Note that the docs for core-default.xml say that the value is a list. I tried a comma with no luck. Had to look in the src to find it's separated by a newline. Adding shell(/bin/true) allowed it to work successfully. From: dlmar...@hotmail.commailto:dlmar...@hotmail.com To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: RE: HA NN Failover question Date: Tue, 18 Mar 2014 14:51:25 + I think I found the issue. The ZKFC on the standby NN server tried, and failed, to connect to the standby NN when I shutdown the network on the Active NN server. I'm getting an exception from the HealthMonitor in the ZKFC log: WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception try to monitor health of NameNode at host/ip:port. INFO org.apache.hadoop.ipc.CLient: Retrying connect to server host/ip:port. Already tried 0 time(s); retry policy is (the default) Is it significant that it thinks the address is host/ip, instead of just the host or the ip? From: azury...@gmail.commailto:azury...@gmail.com Subject: Re: HA NN Failover question Date: Sat, 15 Mar 2014 11:35:20 +0800 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org I suppose NN2 is standby, please check ZKFC2 is alive before stop network on nn1 Sent from my iPhone5s On 2014年3月15日, at 10:53, dlmarion dlmar...@hotmail.commailto:dlmar...@hotmail.com wrote: Apache Hadoop 2.3.0 Sent via the Samsung GALAXY S®4, an ATT 4G LTE smartphone Original message From: Azuryy Date:03/14/2014 10:45 PM (GMT-05:00) To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: HA NN Failover question Which Hadoop version you used? Sent from my iPhone5s On 2014年3月15日, at 9:29, dlmarion dlmar...@hotmail.commailto:dlmar...@hotmail.com wrote: Server 1: NN1 and ZKFC1 Server 2: NN2 and ZKFC2 Server 3: Journal1 and ZK1 Server 4: Journal2 and ZK2 Server 5: Journal3 and ZK3 Server 6+: Datanode All in the same rack. I would expect the ZKFC from the active name node server to lose its lock and the other ZKFC to tell the standby namenode that it should become active (I’m assuming that’s how it works). - Dave From: Juan Carlos [mailto:juc...@gmail.com] Sent: Friday, March 14, 2014 9:12 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: HA NN Failover question Hi Dave, How many zookeeper servers do you have and where are them? Juan Carlos Fernández Rodríguez El 15/03/2014, a las 01:21, dlmarion dlmar...@hotmail.commailto:dlmar...@hotmail.com escribió: I was doing some testing with HA NN today. I set up two NN with active failover (ZKFC) using sshfence. I tested that its working on both NN by doing ‘kill -9 pid’ on the active NN. When I did this on the active node, the standby would become the active and everything seemed to work. Next, I logged onto the active NN and did a ‘service network stop’ to simulate a NIC/network failure. The standby did not become the active in this scenario. In fact, it remained in standby mode and complained in the log that it could not communicate with (what was) the active NN. I was unable to find anything relevant via searches in Google in Jira. Does anyone have experience successfully testing this? I’m hoping that it is just a configuration problem. FWIW, when the network was restarted on the active NN, it failed over almost immediately. Thanks, Dave
How to configure nodemanager.health-checker.script.path
Hello, I'm running MR with 2.2.0 release, I noticed we can configure nodemanager.health-checker.script.path in yarn-site.xml to customize NM health checking, so I add below properties to yarn-site.xml property nameyarn.nodemanager.health-checker.script.path/name value/scratch/software/hadoop2/hadoop-dc/node_health.sh/value /property property nameyarn.nodemanager.health-checker.interval-ms/name value1/value /property To get a feel about this, the /scratch/software/hadoop2/hadoop-dc/node_health.sh simply print an ERROR message as below #!/bin/bash echo ERROR disk full exit -1 But it seems not working, the node is still in health state, did I missing something? Thanks for your help. -- --Anfernee
Re: Problem of installing HDFS-385 and the usage
HI Eric, Are you running an prod environment on Hadoop 1.0.3? If yes, then you have to upgrade to Hadoop1.2.0 or Hadoop2.2.0. If you don't want to change to other Hadoop version, you need to backport the patch to your code base ( I'm not sure the patch provided in HDFS-385 can be applied to Hadoop 1.0.3 smoothly, if not, you have to resolve the conflicts by yourself). The HDFS-3601 is not a good example for you, it's on hadoop2.x. You can read the implementation of this class: https://github.com/apache/hadoop-common/blob/release-1.2.0/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BlockPlacementPolicyWithNodeGroup.java --If I want to control the block placement then I have to write codes rather than type shell commands? if you want to implement your own logic on block placement, you have to write code. Regards, *Stanley Shi,* On Wed, Mar 19, 2014 at 3:07 AM, Eric Chiu ericchiu0...@gmail.com wrote: Hi Stanley, Thanks for your response, but I still have some problems. Could you gave me further instructions? I am now using hadoop 1.0.3. Does that mean I have to upgrade to 1.2.0? or I can directly override the original code with what command? Another question is that you said I can refer to HDFS-3601 but what version? I notice that there are 6 versions and v1,v2 modified the same file and v3,v4,v5,v6 modified another file but v3,v4 shows some connection and v5,v6 shows another connection. That confused me. And there is also a file branch 2. It confused me again. The last problem is that how do I start reading the code to know what policy one use? If I want to control the block placement then I have to write codes rather than type shell commands? The codes are so big so that I do not know where to start. Could you give me some hint? Since I am a new user, I am sorry if I asked stupid question. But I really did not mean to. Thanks, Eric 2014-03-18 13:43 GMT+08:00 Stanley Shi s...@gopivotal.com: This JIRA is included in Apache code since version 0.21.0https://issues.apache.org/jira/browse/HDFS/fixforversion/12314046 , 1.2.0https://issues.apache.org/jira/browse/HDFS/fixforversion/12321657 , 1-winhttps://issues.apache.org/jira/browse/HDFS/fixforversion/12320362 ; If you want to use it, you need to write your own policy, please see this JIRA for example: https://issues.apache.org/jira/browse/HDFS-3601 Regards, *Stanley Shi,* On Mon, Mar 17, 2014 at 11:31 AM, Eric Chiu ericchiu0...@gmail.comwrote: HI all, Could anyone tell me How to install and use this hadoop plug-in? https://issues.apache.org/jira/browse/HDFS-385 I read the code but do not know where to install and use what command to install them all. Another problem is that there are .txt and .patch files, which one should be applied? Some of the .patch files have -win , does that mean that file is for windows hadoop user? (I am using ubuntu) Thank you very much.
Multiple errors and messages
Hello When run the following command on Mahout-0.9 and Hadoop-1.2.1, I get multiple errors and I can not figure out what is the problem? Sorry for the long post. [hadoop@solaris ~]$ mahout wikipediaDataSetCreator -i wikipedia/chunks -o wikipediainput -c ~/categories.txt Running on hadoop, using /export/home/hadoop/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar 14/03/18 20:28:28 WARN driver.MahoutDriver: No wikipediaDataSetCreator.props found on classpath, will use command-line arguments only 14/03/18 20:28:29 INFO wikipedia.WikipediaDatasetCreatorDriver: Input: wikipedia/chunks Out: wikipediainput Categories: /export/home/hadoop/categories.txt 14/03/18 20:28:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/03/18 20:28:32 INFO input.FileInputFormat: Total input paths to process : 699 14/03/18 20:28:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/03/18 20:28:32 WARN snappy.LoadSnappy: Snappy native library not loaded 14/03/18 20:28:37 INFO mapred.JobClient: Running job: job_201403181916_0001 14/03/18 20:28:38 INFO mapred.JobClient: map 0% reduce 0% 14/03/18 20:41:44 INFO mapred.JobClient: map 1% reduce 0% 14/03/18 20:52:57 INFO mapred.JobClient: map 2% reduce 0% 14/03/18 21:04:02 INFO mapred.JobClient: map 3% reduce 0% 14/03/18 21:15:13 INFO mapred.JobClient: map 4% reduce 0% 14/03/18 21:26:30 INFO mapred.JobClient: map 5% reduce 0% 14/03/18 21:29:07 INFO mapred.JobClient: map 5% reduce 1% 14/03/18 21:34:45 INFO mapred.JobClient: Task Id : attempt_201403181916_0001_m_40_0, Status : FAILED 14/03/18 21:34:46 WARN mapred.JobClient: Error reading task outputhttp://solaris:50060/tasklog?plaintext=trueattemptid=attempt_201403181916_0001_m_40_0filter=stdout 14/03/18 21:34:46 WARN mapred.JobClient: Error reading task outputhttp://solaris:50060/tasklog?plaintext=trueattemptid=attempt_201403181916_0001_m_40_0filter=stderr 14/03/18 21:38:29 INFO mapred.JobClient: map 6% reduce 1% 14/03/18 21:41:48 INFO mapred.JobClient: map 6% reduce 2% 14/03/18 21:50:05 INFO mapred.JobClient: map 7% reduce 2% 14/03/18 22:00:59 INFO mapred.JobClient: map 8% reduce 2% 14/03/18 22:12:38 INFO mapred.JobClient: map 9% reduce 2% 14/03/18 22:14:53 INFO mapred.JobClient: map 9% reduce 3% 14/03/18 22:24:30 INFO mapred.JobClient: map 10% reduce 3% 14/03/18 22:35:49 INFO mapred.JobClient: map 11% reduce 3% 14/03/18 22:47:41 INFO mapred.JobClient: map 12% reduce 3% 14/03/18 22:48:18 INFO mapred.JobClient: map 12% reduce 4% 14/03/18 22:59:26 INFO mapred.JobClient: map 13% reduce 4% 14/03/18 23:10:39 INFO mapred.JobClient: map 14% reduce 4% 14/03/18 23:21:32 INFO mapred.JobClient: map 15% reduce 4% 14/03/18 23:24:54 INFO mapred.JobClient: map 15% reduce 5% 14/03/18 23:32:48 INFO mapred.JobClient: map 16% reduce 5% 14/03/18 23:43:53 INFO mapred.JobClient: map 17% reduce 5% 14/03/18 23:54:57 INFO mapred.JobClient: map 18% reduce 5% 14/03/18 23:58:59 INFO mapred.JobClient: map 18% reduce 6% 14/03/19 00:05:59 INFO mapred.JobClient: map 19% reduce 6% 14/03/19 00:16:43 INFO mapred.JobClient: map 20% reduce 6% 14/03/19 00:17:30 INFO mapred.JobClient: Task Id : attempt_201403181916_0001_m_000137_0, Status : FAILED Map output lost, rescheduling: getMapOutput(attempt_201403181916_0001_m_000137_0,0) failed : java.io.IOException: Error Reading IndexFile at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113) at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:4070) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:914) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at