Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Tue, Nov 01, 2011 at 09:54:45AM -0400, Greg Troxel wrote: > > My man page on 5.1 matches Mathew's. > > But, does sync do cache flushes on all disks as well? > > Does SUS require this? I believe fsync_range with FDATASYNC is required to. Note that since it's guaranteed to sync "sufficient metadata to.." it should force directory updates, file size updates, etc. out to disk as well -- without flushing the entire kernel page or metadata cache. Unfortunately, since *disk* cache flushes are rather a blunt instrument this will still harm performance more than it ought. It'd be nice to have a concept of ordered tags as barriers like Linux does (see my old B_BARRIER proposal) but that'd still need tagged queueing support for ATA disks to be really useful today. -- Thor Lancelot Simont...@panix.com "All of my opinions are consistent, but I cannot present them all at once."-Jean-Jacques Rousseau, On The Social Contract
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
chris...@astron.com (Christos Zoulas) writes: > In article <20010318.pa13ihod001...@ginseng.pulsar-zone.net>, > Matthew Mondor wrote: >>On Mon, 31 Oct 2011 19:58:27 -0400 >>Greg Troxel wrote: >> >>> Obligatory actual netbsd tech-kern content: It seems like we really need >>> a sync_synchronous(2) system call that guarantees that all file system >>> operations that have completed (syscall returned) before the issuance of >>> the sync_synchronous call are on disk before sync_synchronous returns. >>> It seems odd that for sync, there is no waiting, fsync seems to wait, >>> and fsync_range can flush or not flush caches, more or less. >> >>Hmm since in sync(2), the non-synchronous issue is noted as a bug: >> >>BUGS >> sync() may return before the buffers are completely flushed. >> >>Does this mean that sync(2) should normally be synchronous and fixed to >>be, such that sync_synchronous(2) not be necessary? > > Which sync man page are you reading? Ours has: > > Historically, sync() would schedule buffers for writing but not actually > wait for the writes to finish. It was necessary to issue a second or > sometimes a third call to ensure that all buffers had in fact been writ- > ten out. In NetBSD, sync() does not return until all buffers have been > written. My man page on 5.1 matches Mathew's. But, does sync do cache flushes on all disks as well? Does SUS require this? pgpsBBkwZ9Gkb.pgp Description: PGP signature
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
In article <20010318.pa13ihod001...@ginseng.pulsar-zone.net>, Matthew Mondor wrote: >On Mon, 31 Oct 2011 19:58:27 -0400 >Greg Troxel wrote: > >> Obligatory actual netbsd tech-kern content: It seems like we really need >> a sync_synchronous(2) system call that guarantees that all file system >> operations that have completed (syscall returned) before the issuance of >> the sync_synchronous call are on disk before sync_synchronous returns. >> It seems odd that for sync, there is no waiting, fsync seems to wait, >> and fsync_range can flush or not flush caches, more or less. > >Hmm since in sync(2), the non-synchronous issue is noted as a bug: > >BUGS > sync() may return before the buffers are completely flushed. > >Does this mean that sync(2) should normally be synchronous and fixed to >be, such that sync_synchronous(2) not be necessary? Which sync man page are you reading? Ours has: Historically, sync() would schedule buffers for writing but not actually wait for the writes to finish. It was necessary to issue a second or sometimes a third call to ensure that all buffers had in fact been writ- ten out. In NetBSD, sync() does not return until all buffers have been written. christos
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Mon, 31 Oct 2011 19:58:27 -0400 Greg Troxel wrote: > Obligatory actual netbsd tech-kern content: It seems like we really need > a sync_synchronous(2) system call that guarantees that all file system > operations that have completed (syscall returned) before the issuance of > the sync_synchronous call are on disk before sync_synchronous returns. > It seems odd that for sync, there is no waiting, fsync seems to wait, > and fsync_range can flush or not flush caches, more or less. Hmm since in sync(2), the non-synchronous issue is noted as a bug: BUGS sync() may return before the buffers are completely flushed. Does this mean that sync(2) should normally be synchronous and fixed to be, such that sync_synchronous(2) not be necessary? -- Matt
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
Thanks for the comments. This is rdiff-backup, not rsync, and it has the notion of considering the modified mirror dirty until it finishes, and it will roll back on restart. I am not clear how well it does about verifying contents (or timestamps before the last full-backup timestamp?). I am also not clear if it's fsyncing each file before putting it in the log. That's interesting about working around ext4 issues. The code also has (bizarre) calls to fsync the directory that a file is in, after fsyncing the file. I think what's really killing my performance is that cache flush on these disks is expensive, and that's part of fsync. So probably we need a way to call sync(2) and guarantee that everything that was dirty at call time is written before return, like fsync, and to do that after writing the data and before writing the commit file. The real issue is ordering and making sure all the data and per-file metadata is on disk before writing the file that says the backup succeeded, and I don't see that we/posix have a good way to express that, other than sync(2) and wait 30s, which isn't so bad. With a remote-over-ssh target, there are fsync calls on files opened but not written to, and with a non-WAPBL disk these are fast. I've brought this up on the rdiff-backup list; it appears the maintainer has gone missing. Obligatory actual netbsd tech-kern content: It seems like we really need a sync_synchronous(2) system call that guarantees that all file system operations that have completed (syscall returned) before the issuance of the sync_synchronous call are on disk before sync_synchronous returns. It seems odd that for sync, there is no waiting, fsync seems to wait, and fsync_range can flush or not flush caches, more or less. pgph5WFKkG2xg.pgp Description: PGP signature
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Sat, Oct 29, 2011 at 12:26:03PM +, David Holland wrote: > However, a tool that really supports commit/abort semantics (unlike > rsync) shouldn't need to sync at all until it's done. Actually, rsync could easily do it more intelligently without risk too. Before setting the mtime to the correct value, it has to f(data)sync. That's OK -- it just doesn't have to do it after every file, but can aggregate them. Joerg
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Sat, Oct 29, 2011 at 07:54:44AM +0200, Alan Barrett wrote: > The current state of ffs+wabl is that, if the system crashes and > the log is replayed, then files that had been written shortly > before the crash end up with whatever old data happened to be in > the underlying disk blocks, but new metadata indicating that the > size and timestamps are all up to date. I think that this violates > traditional unix file system semantics, but the people who worked > on wapbl don't seem to think it's a problem. It doesn't violate traditional semantics because ffs has always had this bug, but it is a bug. However, it's gotten more noticeable; I think this is mostly because wapbl is much faster creating files, so bulk copies/untars can accumulate much more unflushed data in the same amount of time. Also it gets worse as memory sizes increase. fsyncing everything is inherently expensive. fsyncing every file (especially if many of them are small) is inherently *very* expensive. Try hacking it to fsync in batches, or even (if the machine isn't doing much else at the same time) to just call sync every half second or something. This should make it markedly faster without much loss of integrity. However, a tool that really supports commit/abort semantics (unlike rsync) shouldn't need to sync at all until it's done. -- David A. Holland dholl...@netbsd.org
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
Matthew Mondor wrote: Greg Troxel wrote: So, I'm inclined to patch rdiff-backup not to fsync, since it seems excessive, and the backup is toast if the machine crashes before it is finished -- in that case rdiff-backup just rolls back. Opinions? I also wonder why fsync would be used for every file, especially if you consider a whole run a single "transaction", even more so if using snapshots (although you don't mention using them). If rdiff-backup was easily able to roll back after a crash, then I'd probably agree with the above. But it's expensive to roll back (you have to compare the actual data in the files, without assuming that {same size, same mtime} implies same data). The current state of ffs+wabl is that, if the system crashes and the log is replayed, then files that had been written shortly before the crash end up with whatever old data happened to be in the underlying disk blocks, but new metadata indicating that the size and timestamps are all up to date. I think that this violates traditional unix file system semantics, but the people who worked on wapbl don't seem to think it's a problem. Anyway, the new metadata with old data tends to make rsync (and probably rdiff-backup) think that the file is up to date, and so not copy it again next time (unless you perform an expensive comparison of all the data, nit just the metadata). I have patched rsync to issue fdatasync(2) calls frequently, to mitigate this problem in my own usage. It does slow it down, but nowhere near as dramatically as you report. (I use NetBSD-current.) --apb (Alan Barrett)
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Fri, 28 Oct 2011 20:33:29 -0400 Greg Troxel wrote: > So, I'm inclined to patch rdiff-backup not to fsync, since it seems > excessive, and the backup is toast if the machine crashes before it is > finished -- in that case rdiff-backup just rolls back. Opinions? I also wonder why fsync would be used for every file, especially if you consider a whole run a single "transaction", even more so if using snapshots (although you don't mention using them). In which case it simply should report failure and abort on any open/write/rename/close error, and at the end, fsync once, also checking for error. If at that point everything was successful, the "transaction" is commited (as far as software is concerned, of course, hardware buffers might still need flushing), otherwise everything should be rolled back, unless an inconsistent state is allowed (where the next full backup might fix that). I'm however wondering if the excessive fsync(2)s weren't eventually added because of issues with ext4, as I somehow remember unix semantic exceptions with it, and know that some have lost files using it as they'd normally safely use other file systems (and I haven't followed progress to know if it's since fixed). But if rdiff-backup cannot optionally avoid those, adding an option to tell it not to fsync at every file as you suggested would be very sane IMO (it still could default to sync mode, in case there's upstream resistence)... I can understand the need for some transaction-logging applications to call fdatasync(2) regularily, but that's another matter (and even then it's usually configurable after how many bytes or seconds to call it to allow the administrator to tweak performance). -- Matt
fsync, rdiff-backup, wapbl, and WD Elements 1T drive
netbsd-5, i386, 2 x 400G SATA in rf RAID1, external USB2 WD Elements 1T I have a UFS2+WAPBL filesystem on the above RAID1 with ~900K files in ~320GB. I'm backing it up with rdiff-backup to a USB2 external disk. The external disk has a single large UFS2+WAPBL partition. I found that backups took crazy long -- as in most of a week -- when not that many of the files had changed. The rdiff-backup process was often in tstile or xscmd, but continue to apparently make progress. The disk busy % on both the RAID1 and external USB2 drive was only around 30%, but any use of disks was very very slow. If I suspended rdiff-backup, all seemed well, and I saw ~30 MB/s bulk read with dd and several hundred tps with du. With ktrace, I found that rdiff-backup seems to do fsync on every file, and each fsync takes most of a second. Other than that ktrace events happen in quick succession. With (yes, I know I'm living dangerously, and the machine is on a UPS): sysctl -w vfs.wapbl.flush_disk_cache=0 I am seeing about 11 transactions flushed per second (via sysctl -w vfs.wapbl.verbose_commit=1 ), and typically 300 tps and 3 MB/s on the USB2 disk. In 15 minutes the processed-file count has gone up 10K, vs 25K in about 4 hours. The previous backup of a smaller fs did 230K files in 18h. So things are still slow, but much better. I have run rdiff-backup from a remote machine (with host::/mnt/rdiff-backup/foo/bar as target), and that has seemed to be ok, processing a 300G filesystem (on which not much changes) in about an hour, via a 10 Mb/s network connection to a net5501 (lame USB) with a similar disk. I haven't looked yet, but it seems that rdiff-backup must not do fsync when using the ssh backend. So, I'm inclined to patch rdiff-backup not to fsync, since it seems excessive, and the backup is toast if the machine crashes before it is finished -- in that case rdiff-backup just rolls back. Opinions? I also wonder if we should have a vfs.wapbl.honor_fsync sysctl. But it seems the real issue here is fsync when there shouldn't be fsync, and such a sysctl seems a bit scary and otherwise unncessary. pgpYgklZV1ouX.pgp Description: PGP signature