Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
chris...@astron.com (Christos Zoulas) writes: In article 20010318.pa13ihod001...@ginseng.pulsar-zone.net, Matthew Mondor mm_li...@pulsar-zone.net wrote: On Mon, 31 Oct 2011 19:58:27 -0400 Greg Troxel g...@ir.bbn.com wrote: Obligatory actual netbsd tech-kern content: It seems like we really need a sync_synchronous(2) system call that guarantees that all file system operations that have completed (syscall returned) before the issuance of the sync_synchronous call are on disk before sync_synchronous returns. It seems odd that for sync, there is no waiting, fsync seems to wait, and fsync_range can flush or not flush caches, more or less. Hmm since in sync(2), the non-synchronous issue is noted as a bug: BUGS sync() may return before the buffers are completely flushed. Does this mean that sync(2) should normally be synchronous and fixed to be, such that sync_synchronous(2) not be necessary? Which sync man page are you reading? Ours has: Historically, sync() would schedule buffers for writing but not actually wait for the writes to finish. It was necessary to issue a second or sometimes a third call to ensure that all buffers had in fact been writ- ten out. In NetBSD, sync() does not return until all buffers have been written. My man page on 5.1 matches Mathew's. But, does sync do cache flushes on all disks as well? Does SUS require this? pgpsBBkwZ9Gkb.pgp Description: PGP signature
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Tue, Nov 01, 2011 at 09:54:45AM -0400, Greg Troxel wrote: My man page on 5.1 matches Mathew's. But, does sync do cache flushes on all disks as well? Does SUS require this? I believe fsync_range with FDATASYNC is required to. Note that since it's guaranteed to sync sufficient metadata to.. it should force directory updates, file size updates, etc. out to disk as well -- without flushing the entire kernel page or metadata cache. Unfortunately, since *disk* cache flushes are rather a blunt instrument this will still harm performance more than it ought. It'd be nice to have a concept of ordered tags as barriers like Linux does (see my old B_BARRIER proposal) but that'd still need tagged queueing support for ATA disks to be really useful today. -- Thor Lancelot Simont...@panix.com All of my opinions are consistent, but I cannot present them all at once.-Jean-Jacques Rousseau, On The Social Contract
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
Thanks for the comments. This is rdiff-backup, not rsync, and it has the notion of considering the modified mirror dirty until it finishes, and it will roll back on restart. I am not clear how well it does about verifying contents (or timestamps before the last full-backup timestamp?). I am also not clear if it's fsyncing each file before putting it in the log. That's interesting about working around ext4 issues. The code also has (bizarre) calls to fsync the directory that a file is in, after fsyncing the file. I think what's really killing my performance is that cache flush on these disks is expensive, and that's part of fsync. So probably we need a way to call sync(2) and guarantee that everything that was dirty at call time is written before return, like fsync, and to do that after writing the data and before writing the commit file. The real issue is ordering and making sure all the data and per-file metadata is on disk before writing the file that says the backup succeeded, and I don't see that we/posix have a good way to express that, other than sync(2) and wait 30s, which isn't so bad. With a remote-over-ssh target, there are fsync calls on files opened but not written to, and with a non-WAPBL disk these are fast. I've brought this up on the rdiff-backup list; it appears the maintainer has gone missing. Obligatory actual netbsd tech-kern content: It seems like we really need a sync_synchronous(2) system call that guarantees that all file system operations that have completed (syscall returned) before the issuance of the sync_synchronous call are on disk before sync_synchronous returns. It seems odd that for sync, there is no waiting, fsync seems to wait, and fsync_range can flush or not flush caches, more or less. pgph5WFKkG2xg.pgp Description: PGP signature
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Mon, 31 Oct 2011 19:58:27 -0400 Greg Troxel g...@ir.bbn.com wrote: Obligatory actual netbsd tech-kern content: It seems like we really need a sync_synchronous(2) system call that guarantees that all file system operations that have completed (syscall returned) before the issuance of the sync_synchronous call are on disk before sync_synchronous returns. It seems odd that for sync, there is no waiting, fsync seems to wait, and fsync_range can flush or not flush caches, more or less. Hmm since in sync(2), the non-synchronous issue is noted as a bug: BUGS sync() may return before the buffers are completely flushed. Does this mean that sync(2) should normally be synchronous and fixed to be, such that sync_synchronous(2) not be necessary? -- Matt
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Sat, Oct 29, 2011 at 12:26:03PM +, David Holland wrote: However, a tool that really supports commit/abort semantics (unlike rsync) shouldn't need to sync at all until it's done. Actually, rsync could easily do it more intelligently without risk too. Before setting the mtime to the correct value, it has to f(data)sync. That's OK -- it just doesn't have to do it after every file, but can aggregate them. Joerg
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
On Fri, 28 Oct 2011 20:33:29 -0400 Greg Troxel g...@ir.bbn.com wrote: So, I'm inclined to patch rdiff-backup not to fsync, since it seems excessive, and the backup is toast if the machine crashes before it is finished -- in that case rdiff-backup just rolls back. Opinions? I also wonder why fsync would be used for every file, especially if you consider a whole run a single transaction, even more so if using snapshots (although you don't mention using them). In which case it simply should report failure and abort on any open/write/rename/close error, and at the end, fsync once, also checking for error. If at that point everything was successful, the transaction is commited (as far as software is concerned, of course, hardware buffers might still need flushing), otherwise everything should be rolled back, unless an inconsistent state is allowed (where the next full backup might fix that). I'm however wondering if the excessive fsync(2)s weren't eventually added because of issues with ext4, as I somehow remember unix semantic exceptions with it, and know that some have lost files using it as they'd normally safely use other file systems (and I haven't followed progress to know if it's since fixed). But if rdiff-backup cannot optionally avoid those, adding an option to tell it not to fsync at every file as you suggested would be very sane IMO (it still could default to sync mode, in case there's upstream resistence)... I can understand the need for some transaction-logging applications to call fdatasync(2) regularily, but that's another matter (and even then it's usually configurable after how many bytes or seconds to call it to allow the administrator to tweak performance). -- Matt
Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive
Matthew Mondor wrote: Greg Troxel g...@ir.bbn.com wrote: So, I'm inclined to patch rdiff-backup not to fsync, since it seems excessive, and the backup is toast if the machine crashes before it is finished -- in that case rdiff-backup just rolls back. Opinions? I also wonder why fsync would be used for every file, especially if you consider a whole run a single transaction, even more so if using snapshots (although you don't mention using them). If rdiff-backup was easily able to roll back after a crash, then I'd probably agree with the above. But it's expensive to roll back (you have to compare the actual data in the files, without assuming that {same size, same mtime} implies same data). The current state of ffs+wabl is that, if the system crashes and the log is replayed, then files that had been written shortly before the crash end up with whatever old data happened to be in the underlying disk blocks, but new metadata indicating that the size and timestamps are all up to date. I think that this violates traditional unix file system semantics, but the people who worked on wapbl don't seem to think it's a problem. Anyway, the new metadata with old data tends to make rsync (and probably rdiff-backup) think that the file is up to date, and so not copy it again next time (unless you perform an expensive comparison of all the data, nit just the metadata). I have patched rsync to issue fdatasync(2) calls frequently, to mitigate this problem in my own usage. It does slow it down, but nowhere near as dramatically as you report. (I use NetBSD-current.) --apb (Alan Barrett)