Hello all, Please consider this patch for inclusion to the next major rdiff-backup release. This is my first patch submission to rdiff-backup, so please offer constructive comments if this patch needs adjusted.
Many people have been interested in sparse file support, and so have I---so here's the patch! + Blocks that are "Globals.blocksize"-length of all \x00's are made sparse automatically. (Globals.blocksize is currently 128k) + This feature has been requested a few times, and I have added documentation to the SparseFiles wiki page: http://wiki.rdiff-backup.org/wiki/index.php/SparseFiles + Any filesystem that can f.seek() beyond EOF and f.write() to generate sparse files is supported. + This works for both local-copy and remote backups. + Works in conjunction with BlockFuse to backup sparse LVM snapshots: http://www.globallinuxsecurity.pro/blog.php?q=rdiff-backup-lvm-snapshot + I am using this patch in production with ~600GB sparse LVM snapshots. + Backups of sparse files can be 2x faster since the filesystem returns a zero-filled buffer on f.read()---rather than hitting the disk and causing unnecessary IO. Feedback and comments are appreciated! Cheers, -- Eric Wheeler President eWheeler, Inc. dba Global Linux Security www.GlobalLinuxSecurity.pro 503-330-4277 PO Box 14707 Portland, OR 97293
Index: rpath.py =================================================================== RCS file: /sources/rdiff-backup/rdiff-backup/rdiff_backup/rpath.py,v retrieving revision 1.142 diff -u -r1.142 rpath.py --- rpath.py 23 Jun 2009 23:56:30 -0000 1.142 +++ rpath.py 3 Jan 2011 03:27:04 -0000 @@ -58,10 +58,44 @@ def copyfileobj(inputfp, outputfp): """Copies file inputfp to outputfp in blocksize intervals""" blocksize = Globals.blocksize + + sparse = False + buf = None while 1: inbuf = inputfp.read(blocksize) if not inbuf: break - outputfp.write(inbuf) + + if not buf: + buf = inbuf + else: + buf += inbuf + + # Combine "short" reads + if (len(buf) < blocksize): + continue + + buflen = len(buf) + if buf == "\x00" * buflen: + outputfp.seek(buflen, os.SEEK_CUR) + buf = None + # flag sparse=True, that we seek()ed, but have not written yet + # The filesize is wrong until we write + sparse = True + else: + outputfp.write(buf) + buf = None + + # We wrote, so clear sparse. + sparse = False + + + if buf: + outputfp.write(buf) + buf = None + + elif sparse: + outputfp.seek(-1, os.SEEK_CUR) + outputfp.write("\x00") def cmpfileobj(fp1, fp2): """True if file objects fp1 and fp2 contain same data"""
_______________________________________________ rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki