Re: ffs snapshots patch
On Fri, Apr 29, 2011 at 01:56:01PM +0200, Juergen Hannken-Illjes wrote: On Fri, Apr 29, 2011 at 01:48:39PM +0200, Manuel Bouyer wrote: With your last changes, things are much better now: /usr/bin/time fssconfig fss0 /home /home/snaps/snap0 149.85 real 0.00 user 1.16 sys /home: suspended 0.040 sec, redo 0 of 2556 /usr/bin/time fssconfig fss1 /home /home/snaps/snap1 227.49 real 0.00 user 1.90 sys /home: suspended 0.040 sec, redo 0 of 2556 /usr/bin/time fssconfig fss2 /home /home/snaps/snap2 263.58 real 0.00 user 2.97 sys /home: suspended 0.040 sec, redo 0 of 2556 /usr/bin/time fssconfig fss3 /home /home/snaps/snap3 353.23 real 0.00 user 3.88 sys /home: suspended 0.040 sec, redo 0 of 2556 Taking a snapshot will still probably require a lot of time on large filesystems with a dozen snapshots, but at last the server won't hang for a long time. thanks ! Not really. Any thread ending up in ffs_copyonwrite() or ffs_snapblkfree() will block. If this server runs NFS it could be possible that all NFS server threads block. Oh - I might have seen this on Monday - 5.99.47 on sparc64. All I saw was [tstile], and the quickest way out after a couple of minutes was to hard reboot the machine and let wapbl / fsck sort it out - and to move back to the pre-snapshot rsync script. Sorry, no core dump. Regards, -is
Re: ffs snapshots patch
On Fri, Apr 29, 2011 at 01:56:01PM +0200, Juergen Hannken-Illjes wrote: On Fri, Apr 29, 2011 at 01:48:39PM +0200, Manuel Bouyer wrote: With your last changes, things are much better now: /usr/bin/time fssconfig fss0 /home /home/snaps/snap0 149.85 real 0.00 user 1.16 sys /home: suspended 0.040 sec, redo 0 of 2556 /usr/bin/time fssconfig fss1 /home /home/snaps/snap1 227.49 real 0.00 user 1.90 sys /home: suspended 0.040 sec, redo 0 of 2556 /usr/bin/time fssconfig fss2 /home /home/snaps/snap2 263.58 real 0.00 user 2.97 sys /home: suspended 0.040 sec, redo 0 of 2556 /usr/bin/time fssconfig fss3 /home /home/snaps/snap3 353.23 real 0.00 user 3.88 sys /home: suspended 0.040 sec, redo 0 of 2556 Taking a snapshot will still probably require a lot of time on large filesystems with a dozen snapshots, but at last the server won't hang for a long time. thanks ! Not really. Any thread ending up in ffs_copyonwrite() or ffs_snapblkfree() will block. If this server runs NFS it could be possible that all NFS server threads block. hum, this is bad then, because a NFS server with home directories is seeing mostly writes ... -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: ffs snapshots patch
On Thu, Apr 28, 2011 at 11:48:55AM +0200, Juergen Hannken-Illjes wrote: On Wed, Apr 27, 2011 at 10:43:59AM +0200, Manuel Bouyer wrote: On Mon, Apr 18, 2011 at 09:36:25AM +0200, Juergen Hannken-Illjes wrote: [...] Fixing 2) is trickier. To avoid the heavy writes to the snapshot file with the fs suspended, the snapshot appears with its real lenght and blocks at the time of creation, but is marked invalid (only the inode block needs to be copied, and this can be done before suspending the fs). Now BLK_SNAP should never be seen as a block number, and we skip ffs_copyonwrite() if the write is to a snapshot inode. I strongly object here. There are good reasons to expunge old snapshots. Even it it were done right, without deadlocks and locking-against-self, the resulting snapshot looses at least two properties: - A snapshot is considered stable. Whenever you read a block you get the same contents. Allowing old snapshots to exist but not running copy-on-write means these blocks will change their contents. - A snapshot will fsck clean. It is impossible to change fsck_ffs to check a snapshot as these old snapshots indirect blocks now will contain garbage. Maybe we should relax these contraints then No. We use snapshots (with -X) for fsck and dump. This makes no sense if we cannot fsck a snapshot any more. AFAIK dump will ignore snapshot files (or at last it should), so it's not a problem is the snapshot's blocks changes while we're working on a snapshot. Also AFAIK, the above issue will only cause fsck to report missing blocks in group maps and summary informations. It's not a big deal either. In their current form, snapshot are not useable even for this, because it's not acceptable to suspend a file server for several 10s of seconds (if not minutes) to start a dump or fsck. /home: suspended 170.733 sec, redo 0 of 2556 Even a 14s hang is still a long time for a NFS server (workstations will be frozen by this time). Even if we can make it shorter with some filesystem tuning, it still doesn't scale with the size of the filesystem and the number of snapshot (having 12 persistent snapshots on a filesystem is not a unreasonable number). Other OSes can do it with almost no freeze, so it should be possible (the snapshot may not be fsck-able, but I'm not sure it's the most important property of FS snapshots). The only other OS with ffs+snapshots is FreeBSD which should behave similiar. Other file systems like ZFS, NilFS etc. will be faster and scale better as they are designed with instant snapshots in mind. what about ext3fs ? -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: ffs snapshots patch
On Sat, Apr 16, 2011 at 09:29:26PM +0200, Manuel Bouyer wrote: Hello, attached is a work in progress on ffs snapshot (as it's work in progress, some debug and instrumentation code is still present in the patch, no need to comment on this part :). The start of this work is that when working on quota, I noticed that taking a snapshot on a 500Gb filesystem needs several minutes, and is O(n) with the number of persisent snapshots. Here's some timings on a otherwise idle 500Gb filesystem (it's some brand of SATA2 3.5 drive attached to a AHCI controller, so it's a reasonable test bed for today): java# /usr/bin/time fssconfig fss0 /home /home/snaps/snap0 260.53 real 0.00 user 1.15 sys /home: suspended 77.873 sec, redo 1184 of 2556 java# /usr/bin/time fssconfig fss1 /home /home/snaps/snap1 377.87 real 0.00 user 2.53 sys /home: suspended 206.078 sec, redo 1184 of 2556 java# /usr/bin/time fssconfig fss2 /home /home/snaps/snap2 508.23 real 0.00 user 4.28 sys /home: suspended 338.534 sec, redo 1184 of 2556 java# /usr/bin/time fssconfig fss3 /home /home/snaps/snap3 621.40 real 0.00 user 5.50 sys /home: suspended 431.154 sec, redo 1183 of 2556 suspending a filesystem for more than 7mn to take a snapshot makes persisent snapshot quite useless to me. I wonder how it would behaves on a multi-terabyte filesystem. I looked at where the time is spend and found 2 major issues: 1 cgaccount() works in 2 pass: first it copies cg before suspending the filesystem; then it is called again to copy only the cg that have been modified between copy and filesystem suspend. The problem is that to copy a cg we need to allocate blocks for the snapshot file, which may be in a cg we just copied. This is the cause of the high number of cg copies (almost half of them) with the filesystem suspended. 2 while the filesystem is suspended, we want to expunge the snapshot files from the snapshot view (make them appear as a 0-length file). With ~500GB sparse files this is a lot of work. I fixed 1) by preallocating needed blocks snapshot_setup(). Good catch. Committed. Fixing 2) is trickier. To avoid the heavy writes to the snapshot file with the fs suspended, the snapshot appears with its real lenght and blocks at the time of creation, but is marked invalid (only the inode block needs to be copied, and this can be done before suspending the fs). Now BLK_SNAP should never be seen as a block number, and we skip ffs_copyonwrite() if the write is to a snapshot inode. I strongly object here. There are good reasons to expunge old snapshots. Even it it were done right, without deadlocks and locking-against-self, the resulting snapshot looses at least two properties: - A snapshot is considered stable. Whenever you read a block you get the same contents. Allowing old snapshots to exist but not running copy-on-write means these blocks will change their contents. - A snapshot will fsck clean. It is impossible to change fsck_ffs to check a snapshot as these old snapshots indirect blocks now will contain garbage. You cannot copy blocks before suspension without rewriting them once the file system is suspended. The check in ffs_copyonwrite() will only work as long as the old snapshot exists. As sson as it gets removed we will run COW on the blocks used by the old snapshot. With these changes the times are much more reasonable: /usr/bin/time fssconfig fss0 /home /home/snaps/snap0 299.68 real 0.00 user 1.10 sys /home: suspended 0.310 sec, redo 0 of 2556 /usr/bin/time fssconfig fss1 /home /home/snaps/snap1 188.10 real 0.00 user 0.86 sys /home: suspended 0.270 sec, redo 0 of 2556 /usr/bin/time fssconfig fss2 /home /home/snaps/snap2 169.78 real 0.00 user 0.95 sys /home: suspended 0.450 sec, redo 0 of 2556 /usr/bin/time fssconfig fss3 /home /home/snaps/snap3 172.39 real 0.00 user 0.99 sys /home: suspended 0.300 sec, redo 0 of 2556 This seems to work; one issue with this patch is that the block count for the snapshot inode, and block summary informations (the second being probably a consequence of the first) appear wrong when running fsck against a snapshot. I believe this is fixable, but I've not yet found from where the information mismatch is coming from. comments ? PS: I'm away from computers for one week, so don't expect replies to your comments before next sunday. -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference -- -- Juergen Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)