Re: Decommissioning a data node and problems bringing it back online

2014-07-24 Thread andrew touchet
Thanks for the reply,

I am using Hadoop-0.20. We installed from Apache not cloundera, if that
makes a difference.

Currently I really need to know how to get the data that was replicated
during decommissioning back onto my two data nodes.





On Thursday, July 24, 2014, Stanley Shi s...@gopivotal.com wrote:

 which distribution are you using?

 Regards,
 *Stanley Shi,*



 On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu
 javascript:_e(%7B%7D,'cvml','adt...@latech.edu'); wrote:

 I should have added this in my first email but I do get an error in the
 data node's log file

 '2014-07-12 19:39:58,027 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
 got processed in 1 msecs'



 On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu
 javascript:_e(%7B%7D,'cvml','adt...@latech.edu'); wrote:

 Hello,

 I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
 Currently, users can run jobs that use data stored on /hdfs. They are able
 to access all datanodes/compute nodes except the one being decommissioned.

 Is this safe to do? Will edited files affect the decommissioning node?

 I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
 and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
 I simply wait for log files to report completion. After upgrade, I simply
 remove the node from hosts_exlude and start hadoop again on the datanode.

 Also: Under the namenode web interface I just noticed that the node I
 have decommissioned previously now has 0 Configured capacity, Used,
 Remaining memory and is now 100% Used.

 I used the same /etc/sysconfig/hadoop file from before the upgrade,
 removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.

 What steps have I missed in the decommissioning process or while
 bringing the data node back online?








Re: Decommissioning a data node and problems bringing it back online

2014-07-24 Thread Mirko Kämpf
After you added the nodes back to your cluster you run the balancer tool,
but it will not bring in exactly the same blocks like before.

Cheers,
Mirko



2014-07-24 17:34 GMT+01:00 andrew touchet adt...@latech.edu:

 Thanks for the reply,

 I am using Hadoop-0.20. We installed from Apache not cloundera, if that
 makes a difference.

 Currently I really need to know how to get the data that was replicated
 during decommissioning back onto my two data nodes.





 On Thursday, July 24, 2014, Stanley Shi s...@gopivotal.com wrote:

 which distribution are you using?

 Regards,
 *Stanley Shi,*



 On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu
 wrote:

 I should have added this in my first email but I do get an error in the
 data node's log file

 '2014-07-12 19:39:58,027 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
 got processed in 1 msecs'



 On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu
 wrote:

 Hello,

 I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
 Currently, users can run jobs that use data stored on /hdfs. They are able
 to access all datanodes/compute nodes except the one being decommissioned.

 Is this safe to do? Will edited files affect the decommissioning node?

 I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
 and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
 I simply wait for log files to report completion. After upgrade, I simply
 remove the node from hosts_exlude and start hadoop again on the datanode.

 Also: Under the namenode web interface I just noticed that the node I
 have decommissioned previously now has 0 Configured capacity, Used,
 Remaining memory and is now 100% Used.

 I used the same /etc/sysconfig/hadoop file from before the upgrade,
 removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.

 What steps have I missed in the decommissioning process or while
 bringing the data node back online?








Re: Decommissioning a data node and problems bringing it back online

2014-07-24 Thread Wellington Chevreuil
You should not face any data loss. The replicas were just moved away from that 
node to other nodes in the cluster during decommission. Once you recommission 
the node and re-balance your cluster, HDFS will re-distribute replicas between 
the nodes evenly, and the recommissioned node will receive replicas from other 
nodes, but there is no guarantee that exact the same replicas that were stored 
on this node before it was decommissioned will be assigned to this node again, 
after recommission and rebalance.

Cheers,
Wellington. 


On 24 Jul 2014, at 17:55, andrew touchet adt...@latech.edu wrote:

 Hi Mirko,
 
 Thanks for the reply!
 
 ...it will not bring in exactly the same blocks like before
 Is that what usually happens when adding nodes back in? Should I expect any 
 data loss due to starting the data node process before running the balancing 
 tool?
 
 Best Regards,
 
 Andrew Touchet
 
 
 
 On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf mirko.kae...@gmail.com wrote:
 After you added the nodes back to your cluster you run the balancer tool, but 
 it will not bring in exactly the same blocks like before.
 
 Cheers,
 Mirko
 
 
 
 2014-07-24 17:34 GMT+01:00 andrew touchet adt...@latech.edu:
 
 Thanks for the reply,
 
 I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes 
 a difference. 
 
 Currently I really need to know how to get the data that was replicated 
 during decommissioning back onto my two data nodes. 
 
 
 
 
 
 On Thursday, July 24, 2014, Stanley Shi s...@gopivotal.com wrote:
 which distribution are you using? 
 
 Regards,
 Stanley Shi,
 
 
 
 On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu wrote:
 I should have added this in my first email but I do get an error in the data 
 node's log file
 
 '2014-07-12 19:39:58,027 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got 
 processed in 1 msecs'
 
 
 
 On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu wrote:
 Hello,
 
 I am Decommissioning data nodes for an OS upgrade on a HPC cluster . 
 Currently, users can run jobs that use data stored on /hdfs. They are able to 
 access all datanodes/compute nodes except the one being decommissioned. 
 
 Is this safe to do? Will edited files affect the decommissioning node?
 
 I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and 
 running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I 
 simply wait for log files to report completion. After upgrade, I simply 
 remove the node from hosts_exlude and start hadoop again on the datanode.
 
 Also: Under the namenode web interface I just noticed that the node I have 
 decommissioned previously now has 0 Configured capacity, Used, Remaining 
 memory and is now 100% Used. 
 
 I used the same /etc/sysconfig/hadoop file from before the upgrade, removed 
 the node from hosts_exclude, and ran '-refreshNodes' afterwards.  
 
 What steps have I missed in the decommissioning process or while bringing the 
 data node back online?
 
 
 
 
 
 
 



Re: Decommissioning a data node and problems bringing it back online

2014-07-24 Thread andrew touchet
Hi Mirko,

Thanks for the reply!

...it will not bring in exactly the same blocks like before
Is that what usually happens when adding nodes back in? Should I expect any
data loss due to starting the data node process before running the
balancing tool?

Best Regards,

Andrew Touchet



On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf mirko.kae...@gmail.com
wrote:

 After you added the nodes back to your cluster you run the balancer tool,
 but it will not bring in exactly the same blocks like before.


 Cheers,
 Mirko



 2014-07-24 17:34 GMT+01:00 andrew touchet adt...@latech.edu:

 Thanks for the reply,

 I am using Hadoop-0.20. We installed from Apache not cloundera, if that
 makes a difference.

 Currently I really need to know how to get the data that was replicated
 during decommissioning back onto my two data nodes.





 On Thursday, July 24, 2014, Stanley Shi s...@gopivotal.com wrote:

 which distribution are you using?

 Regards,
 *Stanley Shi,*



 On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu
 wrote:

 I should have added this in my first email but I do get an error in the
 data node's log file

 '2014-07-12 19:39:58,027 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
 got processed in 1 msecs'



 On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu
 wrote:

 Hello,

 I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
 Currently, users can run jobs that use data stored on /hdfs. They are able
 to access all datanodes/compute nodes except the one being decommissioned.

 Is this safe to do? Will edited files affect the decommissioning node?

 I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
 and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
 I simply wait for log files to report completion. After upgrade, I simply
 remove the node from hosts_exlude and start hadoop again on the datanode.

 Also: Under the namenode web interface I just noticed that the node I
 have decommissioned previously now has 0 Configured capacity, Used,
 Remaining memory and is now 100% Used.

 I used the same /etc/sysconfig/hadoop file from before the upgrade,
 removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.

 What steps have I missed in the decommissioning process or while
 bringing the data node back online?









Re: Decommissioning a data node and problems bringing it back online

2014-07-24 Thread andrew touchet
Hello Wellington,

That sounds wonderful!  I appreciate everyone's help.

Best Regards,

Andrew Touchet


On Thu, Jul 24, 2014 at 12:01 PM, Wellington Chevreuil 
wellington.chevre...@gmail.com wrote:

 You should not face any data loss. The replicas were just moved away from
 that node to other nodes in the cluster during decommission. Once you
 recommission the node and re-balance your cluster, HDFS will re-distribute
 replicas between the nodes evenly, and the recommissioned node will receive
 replicas from other nodes, but there is no guarantee that exact the same
 replicas that were stored on this node before it was decommissioned will be
 assigned to this node again, after recommission and rebalance.

 Cheers,
 Wellington.


 On 24 Jul 2014, at 17:55, andrew touchet adt...@latech.edu wrote:

 Hi Mirko,

 Thanks for the reply!

 ...it will not bring in exactly the same blocks like before
 Is that what usually happens when adding nodes back in? Should I expect
 any data loss due to starting the data node process before running the
 balancing tool?

 Best Regards,

 Andrew Touchet



 On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf mirko.kae...@gmail.com
 wrote:

 After you added the nodes back to your cluster you run the balancer tool,
 but it will not bring in exactly the same blocks like before.


 Cheers,
 Mirko



 2014-07-24 17:34 GMT+01:00 andrew touchet adt...@latech.edu:

 Thanks for the reply,

 I am using Hadoop-0.20. We installed from Apache not cloundera, if that
 makes a difference.

 Currently I really need to know how to get the data that was replicated
 during decommissioning back onto my two data nodes.





 On Thursday, July 24, 2014, Stanley Shi s...@gopivotal.com wrote:

 which distribution are you using?

 Regards,
 *Stanley Shi,*



 On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu
 wrote:

 I should have added this in my first email but I do get an error in
 the data node's log file

 '2014-07-12 19:39:58,027 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
 got processed in 1 msecs'



 On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu
 wrote:

 Hello,

 I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
 Currently, users can run jobs that use data stored on /hdfs. They are 
 able
 to access all datanodes/compute nodes except the one being 
 decommissioned.

 Is this safe to do? Will edited files affect the decommissioning node?

 I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
 and running   'hadoop dfsadmin -refreshNodes' on the name name node.  
 Then
 I simply wait for log files to report completion. After upgrade, I simply
 remove the node from hosts_exlude and start hadoop again on the datanode.

 Also: Under the namenode web interface I just noticed that the node I
 have decommissioned previously now has 0 Configured capacity, Used,
 Remaining memory and is now 100% Used.

 I used the same /etc/sysconfig/hadoop file from before the upgrade,
 removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.

 What steps have I missed in the decommissioning process or while
 bringing the data node back online?











Re: Decommissioning a data node and problems bringing it back online

2014-07-23 Thread andrew touchet
I should have added this in my first email but I do get an error in the
data node's log file

'2014-07-12 19:39:58,027 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
got processed in 1 msecs'



On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu wrote:

 Hello,

 I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
 Currently, users can run jobs that use data stored on /hdfs. They are able
 to access all datanodes/compute nodes except the one being decommissioned.

 Is this safe to do? Will edited files affect the decommissioning node?

 I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
 running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
 simply wait for log files to report completion. After upgrade, I simply
 remove the node from hosts_exlude and start hadoop again on the datanode.

 Also: Under the namenode web interface I just noticed that the node I have
 decommissioned previously now has 0 Configured capacity, Used, Remaining
 memory and is now 100% Used.

 I used the same /etc/sysconfig/hadoop file from before the upgrade,
 removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.

 What steps have I missed in the decommissioning process or while bringing
 the data node back online?






Re: Decommissioning a data node and problems bringing it back online

2014-07-23 Thread Stanley Shi
which distribution are you using?

Regards,
*Stanley Shi,*



On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet adt...@latech.edu wrote:

 I should have added this in my first email but I do get an error in the
 data node's log file

 '2014-07-12 19:39:58,027 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
 got processed in 1 msecs'



 On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet adt...@latech.edu wrote:

 Hello,

 I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
 Currently, users can run jobs that use data stored on /hdfs. They are able
 to access all datanodes/compute nodes except the one being decommissioned.

 Is this safe to do? Will edited files affect the decommissioning node?

 I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
 running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
 simply wait for log files to report completion. After upgrade, I simply
 remove the node from hosts_exlude and start hadoop again on the datanode.

 Also: Under the namenode web interface I just noticed that the node I
 have decommissioned previously now has 0 Configured capacity, Used,
 Remaining memory and is now 100% Used.

 I used the same /etc/sysconfig/hadoop file from before the upgrade,
 removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.

 What steps have I missed in the decommissioning process or while bringing
 the data node back online?