Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-11-01 Thread Greg Troxel

chris...@astron.com (Christos Zoulas) writes:

 In article 20010318.pa13ihod001...@ginseng.pulsar-zone.net,
 Matthew Mondor  mm_li...@pulsar-zone.net wrote:
On Mon, 31 Oct 2011 19:58:27 -0400
Greg Troxel g...@ir.bbn.com wrote:

 Obligatory actual netbsd tech-kern content: It seems like we really need
 a sync_synchronous(2) system call that guarantees that all file system
 operations that have completed (syscall returned) before the issuance of
 the sync_synchronous call are on disk before sync_synchronous returns.
 It seems odd that for sync, there is no waiting, fsync seems to wait,
 and fsync_range can flush or not flush caches, more or less.

Hmm since in sync(2), the non-synchronous issue is noted as a bug:

BUGS
 sync() may return before the buffers are completely flushed.

Does this mean that sync(2) should normally be synchronous and fixed to
be, such that sync_synchronous(2) not be necessary?

 Which sync man page are you reading? Ours has:

  Historically, sync() would schedule buffers for writing but not actually
  wait for the writes to finish.  It was necessary to issue a second or
  sometimes a third call to ensure that all buffers had in fact been writ-
  ten out.  In NetBSD, sync() does not return until all buffers have been
  written.

My man page on 5.1 matches Mathew's.

But, does sync do cache flushes on all disks as well?

Does SUS require this?


pgpsBBkwZ9Gkb.pgp
Description: PGP signature


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-11-01 Thread Thor Lancelot Simon
On Tue, Nov 01, 2011 at 09:54:45AM -0400, Greg Troxel wrote:
 
 My man page on 5.1 matches Mathew's.
 
 But, does sync do cache flushes on all disks as well?
 
 Does SUS require this?

I believe fsync_range with FDATASYNC is required to.  Note that
since it's guaranteed to sync sufficient metadata to.. it should
force directory updates, file size updates, etc. out to disk as
well -- without flushing the entire kernel page or metadata cache.

Unfortunately, since *disk* cache flushes are rather a blunt
instrument this will still harm performance more than it ought.
It'd be nice to have a concept of ordered tags as barriers like
Linux does (see my old B_BARRIER proposal) but that'd still need
tagged queueing support for ATA disks to be really useful today.

-- 
Thor Lancelot Simont...@panix.com
  All of my opinions are consistent, but I cannot present them all
   at once.-Jean-Jacques Rousseau, On The Social Contract


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-31 Thread Greg Troxel

Thanks for the comments.

This is rdiff-backup, not rsync, and it has the notion of considering
the modified mirror dirty until it finishes, and it will roll back on
restart.  I am not clear how well it does about verifying contents (or
timestamps before the last full-backup timestamp?).  I am also not clear
if it's fsyncing each file before putting it in the log.

That's interesting about working around ext4 issues.  The code also has
(bizarre) calls to fsync the directory that a file is in, after fsyncing
the file.

I think what's really killing my performance is that cache flush on
these disks is expensive, and that's part of fsync.

So probably we need a way to call sync(2) and guarantee that everything
that was dirty at call time is written before return, like fsync, and to
do that after writing the data and before writing the commit file.  The
real issue is ordering and making sure all the data and per-file
metadata is on disk before writing the file that says the backup
succeeded, and I don't see that we/posix have a good way to express
that, other than sync(2) and wait 30s, which isn't so bad.

With a remote-over-ssh target, there are fsync calls on files opened but
not written to, and with a non-WAPBL disk these are fast.

I've brought this up on the rdiff-backup list; it appears the maintainer
has gone missing.


Obligatory actual netbsd tech-kern content: It seems like we really need
a sync_synchronous(2) system call that guarantees that all file system
operations that have completed (syscall returned) before the issuance of
the sync_synchronous call are on disk before sync_synchronous returns.
It seems odd that for sync, there is no waiting, fsync seems to wait,
and fsync_range can flush or not flush caches, more or less.


pgph5WFKkG2xg.pgp
Description: PGP signature


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-31 Thread Matthew Mondor
On Mon, 31 Oct 2011 19:58:27 -0400
Greg Troxel g...@ir.bbn.com wrote:

 Obligatory actual netbsd tech-kern content: It seems like we really need
 a sync_synchronous(2) system call that guarantees that all file system
 operations that have completed (syscall returned) before the issuance of
 the sync_synchronous call are on disk before sync_synchronous returns.
 It seems odd that for sync, there is no waiting, fsync seems to wait,
 and fsync_range can flush or not flush caches, more or less.

Hmm since in sync(2), the non-synchronous issue is noted as a bug:

BUGS
 sync() may return before the buffers are completely flushed.

Does this mean that sync(2) should normally be synchronous and fixed to
be, such that sync_synchronous(2) not be necessary?
-- 
Matt


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-29 Thread Joerg Sonnenberger
On Sat, Oct 29, 2011 at 12:26:03PM +, David Holland wrote:
 However, a tool that really supports commit/abort semantics (unlike
 rsync) shouldn't need to sync at all until it's done.

Actually, rsync could easily do it more intelligently without risk too.
Before setting the mtime to the correct value, it has to f(data)sync.
That's OK -- it just doesn't have to do it after every file, but can
aggregate them.

Joerg


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-28 Thread Matthew Mondor
On Fri, 28 Oct 2011 20:33:29 -0400
Greg Troxel g...@ir.bbn.com wrote:

 So, I'm inclined to patch rdiff-backup not to fsync, since it seems
 excessive, and the backup is toast if the machine crashes before it is
 finished -- in that case rdiff-backup just rolls back.  Opinions?

I also wonder why fsync would be used for every file, especially if you
consider a whole run a single transaction, even more so if using
snapshots (although you don't mention using them).  In which case it
simply should report failure and abort on any open/write/rename/close
error, and at the end, fsync once, also checking for error.  If at
that point everything was successful, the transaction is commited (as
far as software is concerned, of course, hardware buffers might still
need flushing), otherwise everything should be rolled back, unless an
inconsistent state is allowed (where the next full backup might fix
that).

I'm however wondering if the excessive fsync(2)s weren't eventually
added because of issues with ext4, as I somehow remember unix semantic
exceptions with it, and know that some have lost files using it as
they'd normally safely use other file systems (and I haven't followed
progress to know if it's since fixed).

But if rdiff-backup cannot optionally avoid those, adding an option to
tell it not to fsync at every file as you suggested would be very sane
IMO (it still could default to sync mode, in case there's upstream
resistence)...

I can understand the need for some transaction-logging applications to
call fdatasync(2) regularily, but that's another matter (and even then
it's usually configurable after how many bytes or seconds to call it to
allow the administrator to tweak performance).
-- 
Matt


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-28 Thread Alan Barrett

Matthew Mondor wrote:

Greg Troxel g...@ir.bbn.com wrote:
So, I'm inclined to patch rdiff-backup not to fsync, since it 
seems excessive, and the backup is toast if the machine crashes 
before it is finished -- in that case rdiff-backup just rolls 
back.  Opinions?


I also wonder why fsync would be used for every file, especially 
if you consider a whole run a single transaction, even more so 
if using snapshots (although you don't mention using them).


If rdiff-backup was easily able to roll back after a crash, then 
I'd probably agree with the above.  But it's expensive to roll 
back (you have to compare the actual data in the files, without 
assuming that {same size, same mtime} implies same data).


The current state of ffs+wabl is that, if the system crashes and 
the log is replayed, then files that had been written shortly 
before the crash end up with whatever old data happened to be 
in the underlying disk blocks, but new metadata indicating that 
the size and timestamps are all up to date.  I think that this 
violates traditional unix file system semantics, but the people 
who worked on wapbl don't seem to think it's a problem.


Anyway, the new metadata with old data tends to make rsync (and 
probably rdiff-backup) think that the file is up to date, and 
so not copy it again next time (unless you perform an expensive 
comparison of all the data, nit just the metadata).


I have patched rsync to issue fdatasync(2) calls frequently, 
to mitigate this problem in my own usage.  It does slow it 
down, but nowhere near as dramatically as you report.  (I use 
NetBSD-current.)


--apb (Alan Barrett)