Adam Chlipala <[email protected]> writes: > On 09/03/2012 12:23 PM, Steve Killen wrote: > > So we currently do backups with rsync.net for ~$60/mo. I just ran across > Amazon Glacier: > > http://aws.amazon.com/glacier/ > > It's $0.01/GB a month. > > I'm just spitballing to get the conversation started, but off the > cuff it seems worth looking into to reduce our backup costs--how > much data are we maintaining with rsync?
Transfer costs additional money, reading files before N days costs additional money, deleting files before N days incurs a cost for those N days, you have to wait 2-3 hours for data. Basically, it's not really useful for the sort of backups we're making. We keep most ephemeral backups (I'd like to keep more, but the current backup scripts suck) that, if ever needed, need to be accessed more or less at-will. Additionally, rsync.net supports Free Software development (we get a discount, and so do any open source developers who ask for one) *and* uses standard Free technologies so we're not beholden to them. Amazon, OTOH, pushes DRM and proprietary web APIs and is really unfriendly toward Free Software. It's all a moot point because the off-the-shelf backup solution we're transitioning to requires sftp, and Amazon doesn't offer that: > I don't even know if a working, reasonable back-up regime is in place > at this point. It wouldn't surprise me if that slipped by the wayside > during various upgrades. > > A regular process for testing the integrity of back-up data would be > great; I don't think we ever had one. Amazingly, we do have a vaguely working backup regime. AFS volumes and databases are well backed up, and in theory deleuze gets backed up. The other machines... not so lucky. It's also pretty terrible in that it does a complete volume dump every single run, so it takes nearly 72 hours and is responsible for about 80% of HCoop's data use (putting us dangerously close to 5Mbit/s). The justifications for doing full dumps vaguely made sense when they were first implemented (need to encrypt them basically), but ... it's still untenable. Luckily, obnam <http://liw.fi/obnam/> exists now, and can give us incremental and secure backups... I'm experimenting with it locally using my laptops + workstation (I need to back my laptop up to RAID1ed storage anyway) and expect to get it into production at HCoop once I finish getting this new Apache machine up. The general idea of the new backup regime: - Each machine has its own repository that a daily cron job pushes to - Repository for database dumps (+ daily cron) - Repository for afs backup dumps (+ daily cron) - Unfortunately to preserve afs attributes we have to do a local `vos dump' of the (near zero disk space using) backup volumes. You win some, you lose some. Then, let obnam handle the rest initially keeping ~30 days of backups and seeing how much space that uses. Thankfully obnam does the hard parts and all I really need to do is manage the repository keyring and set up a few cron jobs and we're good to go... Verification that backups (aside from afs volume dumps, which are easy) actually work is bit more challenging... but now that we're moving to having virtualization servers with the real stuff going on inside VMs, it will at least be possible to do a disaster-recovery test without affecting other operations. --
pgpu2anMSUkzI.pgp
Description: PGP signature
_______________________________________________ HCoop-Discuss mailing list [email protected] https://lists.hcoop.net/listinfo/hcoop-discuss
