Clinton, I'm not sure which are Adam's comments and which are yours. But in summary, am I right in reading that you see hcoop continuing to use rsync.net as our offsite backup storage and think obnam holds great promise as the management system for these backups?
Thanks, -- Jesse Shumway <layline AT hcoop.net> On Sep 7, 2012, at 5:00 PM, Clinton Ebadi wrote: > Adam Chlipala <[email protected]> writes: > >> On 09/03/2012 12:23 PM, Steve Killen wrote: >> >> So we currently do backups with rsync.net for ~$60/mo. I just ran across >> Amazon Glacier: >> >> http://aws.amazon.com/glacier/ >> >> It's $0.01/GB a month. >> >> I'm just spitballing to get the conversation started, but off the >> cuff it seems worth looking into to reduce our backup costs--how >> much data are we maintaining with rsync? > > Transfer costs additional money, reading files before N days costs > additional money, deleting files before N days incurs a cost for those N > days, you have to wait 2-3 hours for data. > > Basically, it's not really useful for the sort of backups we're > making. We keep most ephemeral backups (I'd like to keep more, but the > current backup scripts suck) that, if ever needed, need to be accessed > more or less at-will. > > Additionally, rsync.net supports Free Software development (we get a > discount, and so do any open source developers who ask for one) *and* > uses standard Free technologies so we're not beholden to them. Amazon, > OTOH, pushes DRM and proprietary web APIs and is really unfriendly > toward Free Software. > > It's all a moot point because the off-the-shelf backup solution we're > transitioning to requires sftp, and Amazon doesn't offer that: > >> I don't even know if a working, reasonable back-up regime is in place >> at this point. It wouldn't surprise me if that slipped by the wayside >> during various upgrades. >> >> A regular process for testing the integrity of back-up data would be >> great; I don't think we ever had one. > > Amazingly, we do have a vaguely working backup regime. AFS volumes and > databases are well backed up, and in theory deleuze gets backed up. The > other machines... not so lucky. It's also pretty terrible in that it > does a complete volume dump every single run, so it takes nearly 72 > hours and is responsible for about 80% of HCoop's data use (putting us > dangerously close to 5Mbit/s). > > The justifications for doing full dumps vaguely made sense when they > were first implemented (need to encrypt them basically), but ... it's > still untenable. > > Luckily, obnam <http://liw.fi/obnam/> exists now, and can give us > incremental and secure backups... I'm experimenting with it locally > using my laptops + workstation (I need to back my laptop up to RAID1ed > storage anyway) and expect to get it into production at HCoop once I > finish getting this new Apache machine up. The general idea of the new > backup regime: > > - Each machine has its own repository that a daily cron job pushes to > - Repository for database dumps (+ daily cron) > - Repository for afs backup dumps (+ daily cron) > - Unfortunately to preserve afs attributes we have to do a local `vos > dump' of the (near zero disk space using) backup volumes. You win > some, you lose some. > > Then, let obnam handle the rest initially keeping ~30 days of backups > and seeing how much space that uses. Thankfully obnam does the hard > parts and all I really need to do is manage the repository keyring and > set up a few cron jobs and we're good to go... > > Verification that backups (aside from afs volume dumps, which are easy) > actually work is bit more challenging... but now that we're moving to > having virtualization servers with the real stuff going on inside VMs, > it will at least be possible to do a disaster-recovery test without > affecting other operations. _______________________________________________ HCoop-Discuss mailing list [email protected] https://lists.hcoop.net/listinfo/hcoop-discuss
