I have been looking into this some more by looking a the output of dfsadmin -report during the decommissioning process. After a node has been decommissioned, dfsadmin -report shows that the node is in the Decommissioned state. The web interface dfshealth.jsp shows it as a dead node. After I removed the decommissioned node from the exclude file and run the refreshNodes command, the web interface continues to show it as a dead node but dfsadmin -report shows the node to be in service. After I restart HDFS dfsadmin -report shows the correct information again.
If I restart HDFS leaving the decommissioned node in the exlude, the web interface shows it as a dead node and dfsadmin -report shows it to be in service. But after I remove it from the exclude file and run the refreshNodes command, both the web interface and dfsadmin -report show the correct information. It looks to me I should only remove the decommissioned node from the exclude file after restarting HDFS. I would still like to see the web interface report any decommissioned node as decommissioned rather than dead as with the case with dfsadmin -report. I am willing to work on a patch for this. Before I start, does anyone know if this is already in the works? Bill On Mon, Feb 2, 2009 at 5:00 PM, Bill Au <bill.w...@gmail.com> wrote: > It looks like the behavior is the same with 0.18.2 and 0.19.0. Even though > I removed the decommissioned node from the exclude file and run the > refreshNode command, the decommissioned node still show up as a dead node. > What I did noticed is that if I leave the decommissioned node in the exclude > and restart HDFS, the node will show up as a dead node after restart. But > then if I remove it from the exclude file and run the refreshNode command, > it will disappear from the status page (dfshealth.jsp). > > So it looks like I will have to stop and start the entire cluster in order > to get what I want. > > Bill > > > On Thu, Jan 29, 2009 at 5:40 PM, Bill Au <bill.w...@gmail.com> wrote: > >> Not sure why but this does not work for me. I am running 0.18.2. I ran >> hadoop dfsadmin -refreshNodes after removing the decommissioned node from >> the exclude file. It still shows up as a dead node. I also removed it from >> the slaves file and ran the refresh nodes command again. It still shows up >> as a dead node after that. >> >> I am going to upgrade to 0.19.0 to see if it makes any difference. >> >> Bill >> >> >> On Tue, Jan 27, 2009 at 7:01 PM, paul <paulg...@gmail.com> wrote: >> >>> Once the nodes are listed as dead, if you still have the host names in >>> your >>> conf/exclude file, remove the entries and then run hadoop dfsadmin >>> -refreshNodes. >>> >>> >>> This works for us on our cluster. >>> >>> >>> >>> -paul >>> >>> >>> On Tue, Jan 27, 2009 at 5:08 PM, Bill Au <bill.w...@gmail.com> wrote: >>> >>> > I was able to decommission a datanode successfully without having to >>> stop >>> > my >>> > cluster. But I noticed that after a node has been decommissioned, it >>> shows >>> > up as a dead node in the web base interface to the namenode (ie >>> > dfshealth.jsp). My cluster is relatively small and losing a datanode >>> will >>> > have performance impact. So I have a need to monitor the health of my >>> > cluster and take steps to revive any dead datanode in a timely fashion. >>> So >>> > is there any way to altogether "get rid of" any decommissioned datanode >>> > from >>> > the web interace of the namenode? Or is there a better way to monitor >>> the >>> > health of the cluster? >>> > >>> > Bill >>> > >>> >> >> >