Re: Decommissioning a data node and problems bringing it back online
Thanks for the reply, I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes a difference. Currently I really need to know how to get the data that was replicated during decommissioning back onto my two data nodes. On Thursday, July 24, 2014, Stanley Shi s...@gopivotal.com wrote: which distribution are you using? Regards, *Stanley Shi,* On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu javascript:_e(%7B%7D,'cvml','adt...@latech.edu'); wrote: I should have added this in my first email but I do get an error in the data node's log file '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs' On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu javascript:_e(%7B%7D,'cvml','adt...@latech.edu'); wrote: Hello, I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. Is this safe to do? Will edited files affect the decommissioning node? I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running 'hadoop dfsadmin -refreshNodes' on the name name node. Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode. Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards. What steps have I missed in the decommissioning process or while bringing the data node back online?
Re: Decommissioning a data node and problems bringing it back online
After you added the nodes back to your cluster you run the balancer tool, but it will not bring in exactly the same blocks like before. Cheers, Mirko 2014-07-24 17:34 GMT+01:00 andrew touchet adt...@latech.edu: Thanks for the reply, I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes a difference. Currently I really need to know how to get the data that was replicated during decommissioning back onto my two data nodes. On Thursday, July 24, 2014, Stanley Shi s...@gopivotal.com wrote: which distribution are you using? Regards, *Stanley Shi,* On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu wrote: I should have added this in my first email but I do get an error in the data node's log file '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs' On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu wrote: Hello, I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. Is this safe to do? Will edited files affect the decommissioning node? I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running 'hadoop dfsadmin -refreshNodes' on the name name node. Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode. Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards. What steps have I missed in the decommissioning process or while bringing the data node back online?
Re: Decommissioning a data node and problems bringing it back online
You should not face any data loss. The replicas were just moved away from that node to other nodes in the cluster during decommission. Once you recommission the node and re-balance your cluster, HDFS will re-distribute replicas between the nodes evenly, and the recommissioned node will receive replicas from other nodes, but there is no guarantee that exact the same replicas that were stored on this node before it was decommissioned will be assigned to this node again, after recommission and rebalance. Cheers, Wellington. On 24 Jul 2014, at 17:55, andrew touchet adt...@latech.edu wrote: Hi Mirko, Thanks for the reply! ...it will not bring in exactly the same blocks like before Is that what usually happens when adding nodes back in? Should I expect any data loss due to starting the data node process before running the balancing tool? Best Regards, Andrew Touchet On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf mirko.kae...@gmail.com wrote: After you added the nodes back to your cluster you run the balancer tool, but it will not bring in exactly the same blocks like before. Cheers, Mirko 2014-07-24 17:34 GMT+01:00 andrew touchet adt...@latech.edu: Thanks for the reply, I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes a difference. Currently I really need to know how to get the data that was replicated during decommissioning back onto my two data nodes. On Thursday, July 24, 2014, Stanley Shi s...@gopivotal.com wrote: which distribution are you using? Regards, Stanley Shi, On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu wrote: I should have added this in my first email but I do get an error in the data node's log file '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs' On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu wrote: Hello, I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. Is this safe to do? Will edited files affect the decommissioning node? I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running 'hadoop dfsadmin -refreshNodes' on the name name node. Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode. Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards. What steps have I missed in the decommissioning process or while bringing the data node back online?
Re: Decommissioning a data node and problems bringing it back online
Hi Mirko, Thanks for the reply! ...it will not bring in exactly the same blocks like before Is that what usually happens when adding nodes back in? Should I expect any data loss due to starting the data node process before running the balancing tool? Best Regards, Andrew Touchet On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf mirko.kae...@gmail.com wrote: After you added the nodes back to your cluster you run the balancer tool, but it will not bring in exactly the same blocks like before. Cheers, Mirko 2014-07-24 17:34 GMT+01:00 andrew touchet adt...@latech.edu: Thanks for the reply, I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes a difference. Currently I really need to know how to get the data that was replicated during decommissioning back onto my two data nodes. On Thursday, July 24, 2014, Stanley Shi s...@gopivotal.com wrote: which distribution are you using? Regards, *Stanley Shi,* On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu wrote: I should have added this in my first email but I do get an error in the data node's log file '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs' On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu wrote: Hello, I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. Is this safe to do? Will edited files affect the decommissioning node? I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running 'hadoop dfsadmin -refreshNodes' on the name name node. Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode. Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards. What steps have I missed in the decommissioning process or while bringing the data node back online?
Re: Decommissioning a data node and problems bringing it back online
Hello Wellington, That sounds wonderful! I appreciate everyone's help. Best Regards, Andrew Touchet On Thu, Jul 24, 2014 at 12:01 PM, Wellington Chevreuil wellington.chevre...@gmail.com wrote: You should not face any data loss. The replicas were just moved away from that node to other nodes in the cluster during decommission. Once you recommission the node and re-balance your cluster, HDFS will re-distribute replicas between the nodes evenly, and the recommissioned node will receive replicas from other nodes, but there is no guarantee that exact the same replicas that were stored on this node before it was decommissioned will be assigned to this node again, after recommission and rebalance. Cheers, Wellington. On 24 Jul 2014, at 17:55, andrew touchet adt...@latech.edu wrote: Hi Mirko, Thanks for the reply! ...it will not bring in exactly the same blocks like before Is that what usually happens when adding nodes back in? Should I expect any data loss due to starting the data node process before running the balancing tool? Best Regards, Andrew Touchet On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf mirko.kae...@gmail.com wrote: After you added the nodes back to your cluster you run the balancer tool, but it will not bring in exactly the same blocks like before. Cheers, Mirko 2014-07-24 17:34 GMT+01:00 andrew touchet adt...@latech.edu: Thanks for the reply, I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes a difference. Currently I really need to know how to get the data that was replicated during decommissioning back onto my two data nodes. On Thursday, July 24, 2014, Stanley Shi s...@gopivotal.com wrote: which distribution are you using? Regards, *Stanley Shi,* On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu wrote: I should have added this in my first email but I do get an error in the data node's log file '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs' On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu wrote: Hello, I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. Is this safe to do? Will edited files affect the decommissioning node? I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running 'hadoop dfsadmin -refreshNodes' on the name name node. Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode. Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards. What steps have I missed in the decommissioning process or while bringing the data node back online?
Re: Decommissioning a data node and problems bringing it back online
I should have added this in my first email but I do get an error in the data node's log file '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs' On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu wrote: Hello, I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. Is this safe to do? Will edited files affect the decommissioning node? I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running 'hadoop dfsadmin -refreshNodes' on the name name node. Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode. Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards. What steps have I missed in the decommissioning process or while bringing the data node back online?
Re: Decommissioning a data node and problems bringing it back online
which distribution are you using? Regards, *Stanley Shi,* On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu wrote: I should have added this in my first email but I do get an error in the data node's log file '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs' On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu wrote: Hello, I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. Is this safe to do? Will edited files affect the decommissioning node? I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running 'hadoop dfsadmin -refreshNodes' on the name name node. Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode. Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards. What steps have I missed in the decommissioning process or while bringing the data node back online?