I've heard that there's a new rsyncd client in the works, which might alleviate the frequent and severe problems encountered with rsync and high memory consumption on clients with many files. The "solution" now seems to be to split the backup into numerous rsync sets, with each one consuming less memory.
I've got another suggestion for dealing with the problem, which would work with the stock rsync client (and potentially be applicable to tar and samba backups as well). This suggestion is to dynamically build "include" and "exclude" lists to pass to rsync, and to do multiple (serial) calls to rsync to backup all files. In detail: Before doing the backup, build a data structure representing all the directories in the filesystem, and the number of files per directory. In building the data structure, any directories specified in the configuration file "exclude" list would be skipped. Then, apply an algorithm to the data, taking into account: the amount of memory on the server the amount of free memory on the server the number of directories the number of files to try to roughly split up all the directories into sets, sized based on the amount of memory in the server. The algorithm should be weighted to group directories as high "up" in the tree as possible. For example, it's better to backup all of "/var" than to combine "/var/spool/cron" and "/usr/local/src" in one set and the remainder of "/var" in another backup set, even if doing all of "/var" has slightly more files (and more memory usage) than the alternative. In addition, the algorithm could be weighted to give a small preference to combining directory trees from different physical devices, in order to improve performance by reducing I/O wait times. The algorithm for determining the ideal number of files per backup (and, by implication, which directories will be grouped together), doesn't need to be very sophisticated. There's no need to turn this into a true "knapsack problem" and attempt to reach optimal backups sets, as long as there's real improvement on backing up all files in a single rsync session. I think that putting fairly simple logging into BackupPC, to record available memory before a backup begins, the number of files backed-up, and the time the backup takes (possibly scaling for the speed of the network interface) would generate enough data (across the diverse configurations of clients where BackupPC is installed) to get some very good empirical constants for the algorithm. The time savings by doing smaller backups, which also cause less of an impact on both the backup client and server, should be far greater than the time required to get the filesystem data and build the set of individual rsync commands (excludes and includes). Since BackupPC already does and excellent job of "filling" individual backups into a single "virtual full" backup that can easily be browsed for restoring, it shouldn't matter to the users that the backup sets are dynamically generated, as long as the user-specified includes and excludes are obeyed. This scheme offers several advantages: It's dynamic, and will automatically adjust for changes in filesystem layouts or the number of files, and for the amount of physical memory and even adjusting for the load (free memory) on the backup client and server. It's maintenance-free on the part of users. There's no need to create multiple rsync "targets", making sure that they are identically named on the client and server, and trying to balance backup sizes per target. It would work with existing implementations of rsync, so any issues with the future rsync daemon that's supposed to be embedded in BackupPC would be avoided. Similarly, the dynamic backup set partitioning could also be applied to the embedded rsync daemon, when it's ready. I'm very interested in hearing your feedback about this proposal. Mark ---- Mark Bergman [EMAIL PROTECTED] Seeking a Unix or Linux sysadmin position local to Philadelphia or via telecommuting http://wwwkeys.pgp.net:11371/pks/lookup?op=get&search=bergman%40merctech.com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/