Hello all,

Please consider this patch for inclusion to the next major rdiff-backup
release.  This is my first patch submission to rdiff-backup, so please
offer constructive comments if this patch needs adjusted.

Many people have been interested in sparse file support, and so have
I---so here's the patch! 

+ Blocks that are "Globals.blocksize"-length of all \x00's are made
sparse automatically. (Globals.blocksize is currently 128k)

+ This feature has been requested a few times, and I have added
documentation to the SparseFiles wiki page:
  http://wiki.rdiff-backup.org/wiki/index.php/SparseFiles

+ Any filesystem that can f.seek() beyond EOF and f.write() to generate
sparse files is supported.  

+ This works for both local-copy and remote backups.

+ Works in conjunction with BlockFuse to backup sparse LVM snapshots:
http://www.globallinuxsecurity.pro/blog.php?q=rdiff-backup-lvm-snapshot

+ I am using this patch in production with ~600GB sparse LVM snapshots.

+ Backups of sparse files can be 2x faster since the filesystem returns
a zero-filled buffer on f.read()---rather than hitting the disk and
causing unnecessary IO.

Feedback and comments are appreciated!

Cheers,


-- 
Eric Wheeler
President
eWheeler, Inc.
  dba Global Linux Security

www.GlobalLinuxSecurity.pro
503-330-4277
PO Box 14707
Portland, OR 97293


Index: rpath.py
===================================================================
RCS file: /sources/rdiff-backup/rdiff-backup/rdiff_backup/rpath.py,v
retrieving revision 1.142
diff -u -r1.142 rpath.py
--- rpath.py	23 Jun 2009 23:56:30 -0000	1.142
+++ rpath.py	3 Jan 2011 03:27:04 -0000
@@ -58,10 +58,44 @@
 def copyfileobj(inputfp, outputfp):
 	"""Copies file inputfp to outputfp in blocksize intervals"""
 	blocksize = Globals.blocksize
+
+	sparse = False
+	buf = None
 	while 1:
 		inbuf = inputfp.read(blocksize)
 		if not inbuf: break
-		outputfp.write(inbuf)
+
+		if not buf: 
+			buf = inbuf
+		else:
+			buf += inbuf
+
+		# Combine "short" reads
+		if (len(buf) < blocksize):
+			continue
+
+		buflen = len(buf)
+		if buf == "\x00" * buflen:
+			outputfp.seek(buflen, os.SEEK_CUR)
+			buf = None
+			# flag sparse=True, that we seek()ed, but have not written yet
+			# The filesize is wrong until we write
+			sparse = True 
+		else:
+			outputfp.write(buf)
+			buf = None
+
+			# We wrote, so clear sparse.
+			sparse = False
+
+	
+	if buf:
+		outputfp.write(buf)
+		buf = None
+
+	elif sparse:
+		outputfp.seek(-1, os.SEEK_CUR)
+		outputfp.write("\x00")
 
 def cmpfileobj(fp1, fp2):
 	"""True if file objects fp1 and fp2 contain same data"""
_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Reply via email to