Just a note. If you just shut the node off, the blocks will replicate faster.
James. On 2011-03-18, at 10:03 AM, Ted Dunning wrote: > If nobody else more qualified is willing to jump in, I can at least provide > some pointers. > > What you describe is a bit surprising. I have zero experience with any 0.21 > version, but decommissioning was working well > in much older versions, so this would be a surprising regression. > > The observations you have aren't all inconsistent with how decommissioning > should work. The fact that your nodes look up > after starting the decommissioning isn't so strange. The idea is that no > new data will be put on the node, nor should it be > counted as a replica, but it will help in reading data. > > So that isn't such a big worry. > > The fact that it takes forever and a day, however, is a big worry. I cannot > provide any help there just off hand. > > What happens when a datanode goes down? Do you see under-replicated files? > Does the number of such files decrease over time? > > On Fri, Mar 18, 2011 at 4:23 AM, Rita <rmorgan...@gmail.com> wrote: > >> Any help? >> >> >> On Wed, Mar 16, 2011 at 9:36 PM, Rita <rmorgan...@gmail.com> wrote: >> >>> Hello, >>> >>> I have been struggling with decommissioning data nodes. I have a 50+ >> data >>> node cluster (no MR) with each server holding about 2TB of storage. I >> split >>> the nodes into 2 racks. >>> >>> >>> I edit the 'exclude' file and then do a -refreshNodes. I see the node >>> immediate in 'Decommiosied node' and I also see it as a 'live' node! >>> Eventhough I wait 24+ hours its still like this. I am suspecting its a >> bug >>> in my version. The data node process is still running on the node I am >>> trying to decommission. So, sometimes I kill -9 the process and I see the >>> 'under replicated' blocks...this can't be the normal procedure. >>> >>> There were even times that I had corrupt blocks because I was impatient >> -- >>> waited 24-34 hours >>> >>> I am using 23 August, 2010: release 0.21.0 < >> http://hadoop.apache.org/hdfs/releases.html#23+August%2C+2010%3A+release+0.21.0+available >>> >>> version. >>> >>> Is this a known bug? Is there anything else I need to do to decommission >> a >>> node? >>> >>> >>> >>> >>> >>> >>> >>> -- >>> --- Get your facts first, then you can distort them as you please.-- >>> >> >> >> >> -- >> --- Get your facts first, then you can distort them as you please.-- >>