Uwe Schuerkamp <uwe.schuerk...@nionex.net> 2009-03-13 10:42: > Hi folks, > > I was wondering what is a good backup strategy for ocfs2 based > clusters. > > > Background: We're running a cluster of 8 SLES 10 sp2 machines sharing > a common SAN-based FS (/shared) which is about 350g in size at the > moment. We've already taken care of the usual optimizations concerning > mount options on the cluster nodes (noatime and so on), but our backup > software (bacula 2.2.8) slows to a crawl when encountering directories > in this filesystem that contain quite a few small files. Data rates > usually average in the tens of MB/sec doing "normal" backups of local > filesystems on remote machines in the same LAN, but with the ocfs2 fs > bacula is hard pressed to not fall below 1mb / sec sustained > throughput which obviously isn't enough to back up 350g of data in a > sensible timeframe. > > I've already tried disabling compression, rsync'ing to another server > and so on, but so far nothing has helped with improving data rates. > > How would reducing the number of cluster nodes help with backups? Is > there a "dirty read" option in ocfs2 that would allow reading the > files without locking them first or something similar? I don't think > bacula is the culprit as it easily manages larger backups in the same > environment, even reading off smb shares is order of magnitudes faster > in this case, so my guess is I'm missing out some non-obvious > optimization that would improve ocfs2 cluster performance. > > Thanks in advance for any pointers & all the best, > > > Uwe
This clearly may not work for all cases and I'm sure is totally unsupported, but our SAN (Equallogic) has the ability to take RW snapshots which is where we do our backups from. There was a thread a while back about the proper way to do this. Basically after taking the snapshot you need to fixup the filesystem in a couple of different ways (fsck, relabel, reuuid, etc.) so that the machine can mount several of these at once. If anyone's interested I can post these scripts. Since there's only one machine handling the snapshots and it's outside of the real ocfs2 cluster, while we're doing the fixups we also convert the snapshot to a local fs and finally remount it ro. This prevents all network locking from happening (since it's unnecessary) while the backups happen. We're doing this with a 2TB mail volume (~700G of _many_ small files) and haven't noticed any problems with it. I think you could probably achieve something similar by taking the number of active nodes in the cluster down to 1 during your backup window, but that has it's own problems to be concerned with. I think a simple umount /shared on all but that one would do it. Brian _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users