[rdiff-backup-users] Re: feedback to blog entry rdiff-backup-lvm-snapshot

Eric Wheeler Fri, 28 Jan 2011 00:50:48 -0800

> Hi Eric,

Hi Sebastian,


I'm cc'ing the rdiff-backup-users list too, they may have some insight
as well.

> on LVM snapshots and came across your blog and your articles in that regard:
> 
> http://www.globallinuxsecurity.pro/blog.php?q=rdiff-backup-lvm-snapshot
> 
> I'm very impressed both with your rdiff-backup patch and the block-fuse
> application.

I'm glad you will find it useful!  Unfortunately, I have found the
sparse-destination patch for rdiff-backup is sometimes slow.  I'm
running without sparse files until I can figure out a faster way to
detect blocks of 0-bytes.  If you or someone on the list knows python
better than I, please take a look!

> Since you mentioned that you use this combination to backup up images up
> to 350GB, I am interested to find out whether you have encountered
> problems with I/O-Wait.

I'm using blockfuse+rdiff-backup after business hours, so if the VM
slows down, nobody (or very few) notice.  The server runs 4x 1TB drives
in RAID-10, and block-IO peaks at ~225MB/sec.  That 350GB volume was
recently extended to 600GB.

> There is a Linux Kernel bug that causes I/O-Wait to skyrocket when
> copying large files, especially when those files are larger than the
> available memory.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=12309

Good to know, I was unaware of this bug.  See comment#128, it looks like
using ext4 works a little better for writing, possibly because of
delayed allocation ("delalloc").  Since I'm using ext4 as my destination
backup filesystem, this could be the reason I am not experiencing the
same issue.  I suppose it could be my RAID controller (LSI 9240)
buffering the IO overhead from the host CPU, too.

What disk hardware are you using for source and destination?

> In our case, a quad-core server running rdiff-backup on a block-fuse
> directory, having 8GB ram, is basically made unavailable by the symptoms
> I described above. All the virtual machines on it become unreachable.

I have a feeling that this is due to backup-destination contention
rather than backup-source contention.  BlockFuse mmaps the source
device, and I'm not certain if mmap'ed IO is cached or not.  To
guarantee you are missing the source's disk cache, you could patch
blockfuse to use direct-IO (O_DIRECT), or backup from a "/dev/raw/rawX"
device.  (Missing disk cache is important for backups, because backups
tend to be read-once.  Thus, thrashing the cache effects the "good
stuff" in the cache.)

For large files, rdiff-backup may benefit from writing with the O_DIRECT
flag (a hint from comment#128).  Again, this would help miss the disk
cache.

I'm backing up local-to-local; the source is a RAID-10 array, and the
destination is a slow 5400rpm 2TB single-disk as tertiary storage.  Do
you backup local-to-local, or over a network?  

> If you have any experience with this in your backup scenarios, I would
> love to hear back from you.

So far it works great on my side.  I'm deploying this to backup LVM
snapshots of Windows VMs under KVM in about 2 weeks on different
hardware.  I might have better insight then if I run into new issues.

> 
> Cheers,
> Sebastian

-- 
Eric Wheeler
President
eWheeler, Inc.
  dba Global Linux Security

www.GlobalLinuxSecurity.pro
503-330-4277
PO Box 14707
Portland, OR 97293



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

[rdiff-backup-users] Re: feedback to blog entry rdiff-backup-lvm-snapshot

Reply via email to