Yes. Thats what I've done to fit ITL. Also you can export the data-dir you backup'ed over samba / nfs so people has the opportunity to restore their files easier (fuse hdfs). For smb I wrote an article in my blog.
The copy to another cluster has the charm for fast restore of lost files in the first step of your backup concept. - Alex sent via my mobile device On Jan 3, 2012, at 1:31 PM, Mac Noland <mcdonaldnol...@yahoo.com> wrote: > > > Thanks for the reply Alex. To make sure I understand: > > 1) "park" the data by sending it over to a different cluster on a schedule > (e.g. nightly is what we offer today on most things). > 2) then from this secondary cluster, which is sitting idle after the distcp, > do a copy local to a NFS mount pointed at SAN or NAS. > 3) Then with some type of coordination (so you're not copying local when the > backup happens), have the SAN or NAS device snap the data for backup. > > A simple restore process would be then to allow users read access to the NFS > mounted storage so they can pick and choose what they want to recover via the > SAN or NAS's snapshot feature - or after a "restore" to the local file system > is completed by the support folks if they are using one of our older systems. > > > Is that about right? > > Mac > > > > ________________________________ > From: alo alt <wget.n...@googlemail.com> > To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>; Mac Noland > <mcdonaldnol...@yahoo.com> > Sent: Tuesday, January 3, 2012 3:10 PM > Subject: Re: Hadoop HDFS Backup/Restore Solutions > > > Hi Mac, > > hdfs has at the moment no solution for an complete backup- and restore > process like ITL or ISO9000. An strategy could be to "park" the data from > hdfs do you want to backup on tape with "distcp" to another backup cluster > and snapshot from them with SAN mechanism. Here the DN store has to be > located on the SAN box. > > - Alex > > On Tuesday, January 3, 2012, Mac Noland <mcdonaldnol...@yahoo.com> wrote: >> Good day, >> >> I’m guessing this question been asked a myriad of times, but >> we’re about to get serious with some of our Hadoop implementations so I >> wanted >> to re-ask to see if I’m missing anything, or if others happen to know if >> this might >> be on a future road map. >> >> For our current storage offerings (e.g. NAS or SAN), we give >> businesses the opportunity to choose 7, 14, or 45 day “backups” for their >> storage. The purpose of the backup isn’t >> so much as they are worried about losing their current data (we’re RAID’ed >> and have some stuff mirrored to remote >> datacenters), but more so if they were to delete some data today, they can >> recover from yesterday’s backup. Or the >> day before’s backup, or the day before that, etc. And to be honest, >> business units buy a good portion of their backups to make people feel >> better and fulfill custom contracts. >> >> >> So far with HDFS we haven’t found too many formalized >> offerings for this specific feature. While I haven’t done a ton of >> research, the best solution I’ve found is an >> idea where we’d schedule a job to pull the data locally to a mount that is >> backed up via our traditional methods. See Michael Segel’s first post on >> this site http://lucene.472066.n3.nabble.com/Backing-up-HDFS-td1019184.html >> >> Though we’d have to work through the details of what this >> would look like for our support folks, it looks like something that could >> potentially fit into our current model. We’d basically need to allocate the >> same amount of SAN or NAS disk as we >> have for HDFS, then coordinate a snap on the the SAN or NAS via our >> traditional >> methods. Not sure what a restore would >> look like, other than we could give the end users read access to the NAS or >> SAN >> mounts so they can pick through what they need to recover and let them figure >> out how to get it back into HDFS. >> >> For use cases like ours where we’d need multi-day backups to >> fulfill business needs, is this kind of what people are thinking or doing? >> Moreover, are there any things in the Hadoop >> HDFS road map for providing, for lack of a better word, an “enterprise” >> backup/restore solution? >> >> Thanks in advance, >> >> Mac Noland – Thomson Reuters >> > > -- > > Alexander Lorenz > http://mapredit.blogspot.com > > P Think of the environment: please don't print this email unless you really > need to.