Will "hadoop fs -rm -rf" move everything to the the /trash directory or
will it delete that as well?

I was thinking along the lines of what you suggest, keep the original
source of the data somewhere and then reprocess it all in the event of a
problem.

What do other people do?  Do you run another cluster?  Do you backup
specific parts of the cluster?  Some form of offsite SAN?

On Tue, May 29, 2012 at 6:02 PM, Robert Evans <ev...@yahoo-inc.com> wrote:

> Yes you will have redundancy, so no single point of hardware failure can
> wipe out your data, short of a major catastrophe.  But you can still have
> an errant or malicious "hadoop fs -rm -rf" shut you down.  If you still
> have the original source of your data somewhere else you may be able to
> recover, by reprocessing the data, but if this cluster is your single
> repository for all your data you may have a problem.
>
> --Bobby Evans
>
> On 5/29/12 11:40 AM, "Michael Segel" <michael_se...@hotmail.com> wrote:
>
> Hi,
> That's not a back up strategy.
> You could still have joe luser take out a key file or directory. What do
> you do then?
>
> On May 29, 2012, at 11:19 AM, Darrell Taylor wrote:
>
> > Hi,
> >
> > We are about to build a 10 machine cluster with 40Tb of storage,
> obviously
> > as this gets full actually trying to create an offsite backup becomes a
> > problem unless we build another 10 machine cluster (too expensive right
> > now).  Not sure if it will help but we have planned the cabinet into an
> > upper and lower half with separate redundant power, then we plan to put
> > half of the cluster in the top, half in the bottom, effectively 2 racks,
> so
> > in theory we could lose half the cluster and still have the copies of all
> > the blocks with a replication factor of 3?  Apart form the data centre
> > burning down or some other disaster that would render the machines
> totally
> > unrecoverable, is this approach good enough?
> >
> > I realise this is a very open question and everyone's circumstances are
> > different, but I'm wondering what other peoples experiences/opinions are
> > for backing up cluster data?
> >
> > Thanks
> > Darrell.
>
>
>

Reply via email to