Will "hadoop fs -rm -rf" move everything to the the /trash directory or will it delete that as well?
I was thinking along the lines of what you suggest, keep the original source of the data somewhere and then reprocess it all in the event of a problem. What do other people do? Do you run another cluster? Do you backup specific parts of the cluster? Some form of offsite SAN? On Tue, May 29, 2012 at 6:02 PM, Robert Evans <ev...@yahoo-inc.com> wrote: > Yes you will have redundancy, so no single point of hardware failure can > wipe out your data, short of a major catastrophe. But you can still have > an errant or malicious "hadoop fs -rm -rf" shut you down. If you still > have the original source of your data somewhere else you may be able to > recover, by reprocessing the data, but if this cluster is your single > repository for all your data you may have a problem. > > --Bobby Evans > > On 5/29/12 11:40 AM, "Michael Segel" <michael_se...@hotmail.com> wrote: > > Hi, > That's not a back up strategy. > You could still have joe luser take out a key file or directory. What do > you do then? > > On May 29, 2012, at 11:19 AM, Darrell Taylor wrote: > > > Hi, > > > > We are about to build a 10 machine cluster with 40Tb of storage, > obviously > > as this gets full actually trying to create an offsite backup becomes a > > problem unless we build another 10 machine cluster (too expensive right > > now). Not sure if it will help but we have planned the cabinet into an > > upper and lower half with separate redundant power, then we plan to put > > half of the cluster in the top, half in the bottom, effectively 2 racks, > so > > in theory we could lose half the cluster and still have the copies of all > > the blocks with a replication factor of 3? Apart form the data centre > > burning down or some other disaster that would render the machines > totally > > unrecoverable, is this approach good enough? > > > > I realise this is a very open question and everyone's circumstances are > > different, but I'm wondering what other peoples experiences/opinions are > > for backing up cluster data? > > > > Thanks > > Darrell. > > >