you can also distcp to AWS S3 http://wiki.apache.org/hadoop/AmazonS3 which
you can do as frequently as you like, even after the map/reduce job is done
just ship it over

On Tue, Jan 3, 2012 at 4:31 PM, Mac Noland <mcdonaldnol...@yahoo.com> wrote:

>
>
> Thanks for the reply Alex.  To make sure I understand:
>
> 1) "park" the data by sending it  over to a different cluster on a
> schedule (e.g. nightly is what we offer today on most things).
> 2) then from this secondary cluster, which is sitting idle after the
> distcp, do a copy local to a NFS mount pointed at SAN or NAS.
> 3) Then with some type of coordination (so you're not copying local when
> the backup happens), have the SAN or NAS device snap the data for backup.
>
> A simple restore process would be then to allow users read access to the
> NFS mounted storage so they can pick and choose what they want to recover
> via the SAN or NAS's snapshot feature - or after a "restore" to the local
> file system is completed by the support folks if they are using one of our
> older systems.
>
>
> Is that about right?
>
> Mac
>
>
>
> ________________________________
> From: alo alt <wget.n...@googlemail.com>
> To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>; Mac
> Noland <mcdonaldnol...@yahoo.com>
> Sent: Tuesday, January 3, 2012 3:10 PM
> Subject: Re: Hadoop HDFS Backup/Restore Solutions
>
>
> Hi Mac,
>
> hdfs has at the moment no solution for an complete backup- and restore
> process like ITL or ISO9000. An strategy could be to "park" the data from
> hdfs do you want to backup on tape with "distcp" to another backup cluster
> and snapshot from them with SAN mechanism. Here the DN store has to be
> located on the SAN box.
>
> - Alex
>
> On Tuesday, January 3, 2012, Mac Noland <mcdonaldnol...@yahoo.com> wrote:
> > Good day,
> >
> > I’m guessing this question been asked a myriad of times, but
> > we’re about to get serious with some of our Hadoop implementations so I
> wanted
> > to re-ask to see if I’m missing anything, or if others happen to know if
> this might
> > be on a future road map.
> >
> > For our current storage offerings (e.g. NAS or SAN), we give
> > businesses the opportunity to choose 7, 14, or 45 day “backups” for their
> > storage.   The purpose of the backup isn’t
> > so much as they are worried about losing their current data (we’re
> RAID’ed
> > and  have some stuff mirrored to remote
> > datacenters), but more so if they were to delete some data today, they
> can
> > recover from yesterday’s backup.  Or the
> > day before’s backup, or the day before that, etc.  And to be honest,
> business units buy a good portion of their backups to make people feel
> better and fulfill custom contracts.
> >
> >
> > So far with HDFS we haven’t found too many formalized
> > offerings for this specific feature.  While I haven’t done a ton of
> research, the best solution I’ve found is an
> > idea where we’d schedule a job to pull the data locally to a mount that
> is
> > backed up via our traditional methods.  See Michael Segel’s first post
> on this site
> http://lucene.472066.n3.nabble.com/Backing-up-HDFS-td1019184.html
> >
> > Though we’d have to work through the details of what this
> > would look like for our support folks, it looks like something that could
> > potentially fit into our current model.  We’d basically need to allocate
> the same amount of SAN or NAS disk as we
> > have for HDFS, then coordinate a snap on the the SAN or NAS via our
> traditional
> > methods.  Not sure what a restore would
> > look like, other than we could give the end users read access to the NAS
> or SAN
> > mounts so they can pick through what they need to recover and let them
> figure
> > out how to get it back into HDFS.
> >
> > For use cases like ours where we’d need multi-day backups to
> > fulfill business needs, is this kind of what people are thinking or
> doing?  Moreover, are there any things in the Hadoop
> > HDFS road map for providing, for lack of a better word, an “enterprise”
> > backup/restore solution?
> >
> > Thanks in advance,
> >
> > Mac Noland – Thomson Reuters
> >
>
> --
>
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> P Think of the environment: please don't print this email unless you
> really need to.
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://twitter.com/#!/allthingshadoop>
*/

Reply via email to