On Mon, Dec 14, 2009 at 02:08:31PM -0500, Jeffrey J. Kosowsky wrote: > Robin Lee Powell wrote at about 10:10:17 -0800 on Monday, December 14, 2009: > > Do you actually see a *problem* with it, or are you just > > assuming it won't work because it seems too easy? > > The problem I see is that backuppc won't be able to backup hard > links on any interrupted or sub-divided backup unless you are > careful to make sure that no hard links span multiple restarts. > And once you mess up hard links for a file, all subsequent > incremental will be unlinked to. > > > If you are just using BackupPC to back up data then that might not > be important. On the other hand, if you are using backuppc to > backup entire systems with the goal of having (close to a) bare > metal restore, then this method won't work.
Agreed on both counts; I'm only interested in backing up data. Obviously such a system would have to be optional. > Personally, I haven't seen a major memory sink using rsync 3.0+. > Perhaps you could provide some real world data of the potential > savings so that people can understand the tradeoffs. > > That being said, memory is pretty cheap, while reliable backups > are hard. I'm *far* more worried about the reliability than the RAM usage; that was just a side effect. I'm losing 10+ hour backups routinely to SIGPIPE and rsync on the remote and dying and so on; *that* is what the idea was designed to fix. The whole point is to, optionally, make rsync more reliable at the expense of losing hardlink support and, tangentially, save some RAM. > As an aside, if anything, myself and others have been pushing to > get more reliable backup of filesystem details such as extended > attributes, ACLs, ntfs stuff etc. and removing the ability to > backup hard links would be a step backwards from that perspective. Understood. > Finally, the problem with interrupted backups that I see mentioned > most on this group is the interruption of large transfers that > have to be restarted and then retransferred over a slow link. > Rsync itself is pretty fast when it just has to check file > attributes to determine what needs to be backed up. Not with large trees it isn't. I have 3.5 million files, and more than 300GiB of data, in one file system. The last incremental took *twenty one hours*. I have another backup that's 4.5 million files, also more than 300 GiB of data, also in one file system. The full took 20 hours; it hasn't succeeded at an incremental yet. That's over full 100BaseT, if not better (I'm not the networking person). Asking rsync, and ssh, and a pair of firewalls and load balancers (it's complicated) to stay perfectly fine for almost a full day is really asking a whole hell of a lot. For large data sets like this, rsync simple isn't robust enough by itself. Losing 15 hours worth of (BackupPC's) work because the ssh connection goes down is *really* frustrating. In both cases, the client-side rsync uses more than 300MiB of RAM, with --hard-links *removed* from the rsync option list. Not devestating, but not trivial either. > So, I think the best way for improvement that would be consistent > with BackupPC design would be to store partial file transfers so > that they could be resumed on interruption. Also, people have > suggested tweaks to the algorithm for storing partial backups. Partial transfers won't help in the slightest: the cost is the time it takes to walk the file tree, which is what my idea was designed to avoid: re-walking the tree on resumption. Having said that, if incrementals could be resumed instead of just thrown away, that would at least be marginally less frustrating when a minor network glitch loses a 15+ hour transfer. In the incremental I mentioned above, rsync's MB/sec listing is 0.08. Over 100BaseT. Seriously: the problem is that walking file trees of that size, when they are active serving production traffic, takes a *really* long time. I don't see any way to avoid that besides keeping track of where you've been. -Robin -- They say: "The first AIs will be built by the military as weapons." And I'm thinking: "Does it even occur to you to try for something other than the default outcome?" See http://shrunklink.com/cdiz http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/ ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ BackupPC-users mailing list [email protected] List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
