I'm the new guy on the sysadmin team at the company. Of course I get annointed to be King of Backups. Nobody else wanted to do it and I'm a sucker for always being willing to do the dirty and un-fun work. I've never worked with AFS before, let alone configured a backup strategy for it. So this is a case of the blind leading the blind. I just got in a copy of Richard Campbell's "Managing AFS" book because it had a chapter on backups. The backup tool I've been reading about reminds me a lot of bacula. But bacula, like most UNIX/Linux backup tools keeps a database of what gets backed up by hostname/file system/date stamp and such. Which is fine for backing up traditional machines, but not necessarily sufficient for our situation.
Like I said, I'm a total NOOB with regards to AFS. From what I understand, we have something a bit over 200TB spread out over a number of different physical machines, spread out over 3 geographically distant data centers. Furthermore, this is a volume which will continue to grow indefinitely. Chances are we'll have to keep on-line at least 3 to 5 years of data, and unlike e-mails at the White House, everything will have to permanently archived and retrievable. I get the impression that AFS is this amorphous cloud of data storage. So when you backup stuff, it's not as if it's organized by machine and file system. With this much data, spread out over 3 geographically distant data centers, it's not as if you can do a full dump on the 1st of the month and then do daily incrementals for the month, and then start over again next month. And that's where the backup paradigm sort of breaks down for me. Effectively you have to keep everything forever, but you really only want to backup a given file once. Does anyone Out There have a similar problem, and if so, what strategy did you use? Thanks! Russ