-----Original Message----- From: ext David B. Ritch [mailto:david.ri...@gmail.com] Sent: Friday, September 11, 2009 11:07 To: common-user@hadoop.apache.org Subject: Re: Decommissioning Individual Disks
Thank you both. That's what we did today. It seems fairly reasonable when a node has a few disks, say 3-5. However, at some sites, with larger nodes, it seems more awkward. When a node has a dozen or more disks (as used in the larger terasort benchmarks), migrating the data off all the disks is likely to be more of an issue. I hope that there is a better solution to this before my client moves to much larger nodes! ;-) dbr On 9/10/2009 10:07 PM, Amandeep Khurana wrote: > I think decommissioning the node and replacing the disk is a cleaner > approach. That's what I'd recommend doing as well.. > > On 9/10/09, Alex Loddengaard <a...@cloudera.com> wrote: > >> Hi David, >> Unfortunately there's really no way to do what you're hoping to do in >> an automatic way. You can move the block files (including their >> .meta files) from one disk to another. Do this when the datanode daemon is stopped. >> Then, when you start the datanode daemon, it will scan dfs.data.dir >> and be totally happy if blocks have moved hard drives. I've never >> tried to do this myself, but others on the list have suggested this >> technique for "balancing disks." >> >> You could also change your process around a little. It's not too >> crazy to decommission an entire node, replace one of its disks, then >> bring it back into the cluster. Seems to me that this is a much >> saner approach: your ops team will tell you which disk needs >> replacing. You decommission the node, they replace the disk, you add >> the node back to the pool. Your call I guess, though. >> >> Hope this was helpful. >> >> Alex >> >> On Thu, Sep 10, 2009 at 6:30 PM, David B. Ritch >> <david.ri...@gmail.com>wrote: >> >> >>> What do you do with the data on a failing disk when you replace it? >>> >>> Our support person comes in occasionally, and often replaces several >>> disks when he does. These are disks that have not yet failed, but >>> firmware indicates that failure is imminent. We need to be able to >>> migrate our data off these disks before replacing them. If we were >>> replacing entire servers, we would decommission them - but we have 3 >>> data disks per server. If we were replacing one disk at a time, we >>> wouldn't worry about it (because of redundancy). We can >>> decommission the servers, but moving all the data off of all their disks is a waste. >>> >>> What's the best way to handle this? >>> >>> Thanks! >>> >>> David >>> >>> >> > >