Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-11-01 Thread Thor Lancelot Simon
On Tue, Nov 01, 2011 at 09:54:45AM -0400, Greg Troxel wrote:
> 
> My man page on 5.1 matches Mathew's.
> 
> But, does sync do cache flushes on all disks as well?
> 
> Does SUS require this?

I believe fsync_range with FDATASYNC is required to.  Note that
since it's guaranteed to sync "sufficient metadata to.." it should
force directory updates, file size updates, etc. out to disk as
well -- without flushing the entire kernel page or metadata cache.

Unfortunately, since *disk* cache flushes are rather a blunt
instrument this will still harm performance more than it ought.
It'd be nice to have a concept of ordered tags as barriers like
Linux does (see my old B_BARRIER proposal) but that'd still need
tagged queueing support for ATA disks to be really useful today.

-- 
Thor Lancelot Simont...@panix.com
  "All of my opinions are consistent, but I cannot present them all
   at once."-Jean-Jacques Rousseau, On The Social Contract


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-11-01 Thread Greg Troxel

chris...@astron.com (Christos Zoulas) writes:

> In article <20010318.pa13ihod001...@ginseng.pulsar-zone.net>,
> Matthew Mondor   wrote:
>>On Mon, 31 Oct 2011 19:58:27 -0400
>>Greg Troxel  wrote:
>>
>>> Obligatory actual netbsd tech-kern content: It seems like we really need
>>> a sync_synchronous(2) system call that guarantees that all file system
>>> operations that have completed (syscall returned) before the issuance of
>>> the sync_synchronous call are on disk before sync_synchronous returns.
>>> It seems odd that for sync, there is no waiting, fsync seems to wait,
>>> and fsync_range can flush or not flush caches, more or less.
>>
>>Hmm since in sync(2), the non-synchronous issue is noted as a bug:
>>
>>BUGS
>> sync() may return before the buffers are completely flushed.
>>
>>Does this mean that sync(2) should normally be synchronous and fixed to
>>be, such that sync_synchronous(2) not be necessary?
>
> Which sync man page are you reading? Ours has:
>
>  Historically, sync() would schedule buffers for writing but not actually
>  wait for the writes to finish.  It was necessary to issue a second or
>  sometimes a third call to ensure that all buffers had in fact been writ-
>  ten out.  In NetBSD, sync() does not return until all buffers have been
>  written.

My man page on 5.1 matches Mathew's.

But, does sync do cache flushes on all disks as well?

Does SUS require this?


pgpsBBkwZ9Gkb.pgp
Description: PGP signature


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-11-01 Thread Christos Zoulas
In article <20010318.pa13ihod001...@ginseng.pulsar-zone.net>,
Matthew Mondor   wrote:
>On Mon, 31 Oct 2011 19:58:27 -0400
>Greg Troxel  wrote:
>
>> Obligatory actual netbsd tech-kern content: It seems like we really need
>> a sync_synchronous(2) system call that guarantees that all file system
>> operations that have completed (syscall returned) before the issuance of
>> the sync_synchronous call are on disk before sync_synchronous returns.
>> It seems odd that for sync, there is no waiting, fsync seems to wait,
>> and fsync_range can flush or not flush caches, more or less.
>
>Hmm since in sync(2), the non-synchronous issue is noted as a bug:
>
>BUGS
> sync() may return before the buffers are completely flushed.
>
>Does this mean that sync(2) should normally be synchronous and fixed to
>be, such that sync_synchronous(2) not be necessary?

Which sync man page are you reading? Ours has:

 Historically, sync() would schedule buffers for writing but not actually
 wait for the writes to finish.  It was necessary to issue a second or
 sometimes a third call to ensure that all buffers had in fact been writ-
 ten out.  In NetBSD, sync() does not return until all buffers have been
 written.

christos



Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-31 Thread Matthew Mondor
On Mon, 31 Oct 2011 19:58:27 -0400
Greg Troxel  wrote:

> Obligatory actual netbsd tech-kern content: It seems like we really need
> a sync_synchronous(2) system call that guarantees that all file system
> operations that have completed (syscall returned) before the issuance of
> the sync_synchronous call are on disk before sync_synchronous returns.
> It seems odd that for sync, there is no waiting, fsync seems to wait,
> and fsync_range can flush or not flush caches, more or less.

Hmm since in sync(2), the non-synchronous issue is noted as a bug:

BUGS
 sync() may return before the buffers are completely flushed.

Does this mean that sync(2) should normally be synchronous and fixed to
be, such that sync_synchronous(2) not be necessary?
-- 
Matt


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-31 Thread Greg Troxel

Thanks for the comments.

This is rdiff-backup, not rsync, and it has the notion of considering
the modified mirror dirty until it finishes, and it will roll back on
restart.  I am not clear how well it does about verifying contents (or
timestamps before the last full-backup timestamp?).  I am also not clear
if it's fsyncing each file before putting it in the log.

That's interesting about working around ext4 issues.  The code also has
(bizarre) calls to fsync the directory that a file is in, after fsyncing
the file.

I think what's really killing my performance is that cache flush on
these disks is expensive, and that's part of fsync.

So probably we need a way to call sync(2) and guarantee that everything
that was dirty at call time is written before return, like fsync, and to
do that after writing the data and before writing the commit file.  The
real issue is ordering and making sure all the data and per-file
metadata is on disk before writing the file that says the backup
succeeded, and I don't see that we/posix have a good way to express
that, other than sync(2) and wait 30s, which isn't so bad.

With a remote-over-ssh target, there are fsync calls on files opened but
not written to, and with a non-WAPBL disk these are fast.

I've brought this up on the rdiff-backup list; it appears the maintainer
has gone missing.


Obligatory actual netbsd tech-kern content: It seems like we really need
a sync_synchronous(2) system call that guarantees that all file system
operations that have completed (syscall returned) before the issuance of
the sync_synchronous call are on disk before sync_synchronous returns.
It seems odd that for sync, there is no waiting, fsync seems to wait,
and fsync_range can flush or not flush caches, more or less.


pgph5WFKkG2xg.pgp
Description: PGP signature


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-29 Thread Joerg Sonnenberger
On Sat, Oct 29, 2011 at 12:26:03PM +, David Holland wrote:
> However, a tool that really supports commit/abort semantics (unlike
> rsync) shouldn't need to sync at all until it's done.

Actually, rsync could easily do it more intelligently without risk too.
Before setting the mtime to the correct value, it has to f(data)sync.
That's OK -- it just doesn't have to do it after every file, but can
aggregate them.

Joerg


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-29 Thread David Holland
On Sat, Oct 29, 2011 at 07:54:44AM +0200, Alan Barrett wrote:
 > The current state of ffs+wabl is that, if the system crashes and
 > the log is replayed, then files that had been written shortly
 > before the crash end up with whatever old data happened to be in
 > the underlying disk blocks, but new metadata indicating that the
 > size and timestamps are all up to date.  I think that this violates
 > traditional unix file system semantics, but the people who worked
 > on wapbl don't seem to think it's a problem.

It doesn't violate traditional semantics because ffs has always had
this bug, but it is a bug. However, it's gotten more noticeable; I
think this is mostly because wapbl is much faster creating files, so
bulk copies/untars can accumulate much more unflushed data in the same
amount of time. Also it gets worse as memory sizes increase.

fsyncing everything is inherently expensive. fsyncing every file
(especially if many of them are small) is inherently *very* expensive.

Try hacking it to fsync in batches, or even (if the machine isn't
doing much else at the same time) to just call sync every half second
or something. This should make it markedly faster without much loss of
integrity.

However, a tool that really supports commit/abort semantics (unlike
rsync) shouldn't need to sync at all until it's done.

-- 
David A. Holland
dholl...@netbsd.org


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-28 Thread Alan Barrett

Matthew Mondor wrote:

Greg Troxel  wrote:
So, I'm inclined to patch rdiff-backup not to fsync, since it 
seems excessive, and the backup is toast if the machine crashes 
before it is finished -- in that case rdiff-backup just rolls 
back.  Opinions?


I also wonder why fsync would be used for every file, especially 
if you consider a whole run a single "transaction", even more so 
if using snapshots (although you don't mention using them).


If rdiff-backup was easily able to roll back after a crash, then 
I'd probably agree with the above.  But it's expensive to roll 
back (you have to compare the actual data in the files, without 
assuming that {same size, same mtime} implies same data).


The current state of ffs+wabl is that, if the system crashes and 
the log is replayed, then files that had been written shortly 
before the crash end up with whatever old data happened to be 
in the underlying disk blocks, but new metadata indicating that 
the size and timestamps are all up to date.  I think that this 
violates traditional unix file system semantics, but the people 
who worked on wapbl don't seem to think it's a problem.


Anyway, the new metadata with old data tends to make rsync (and 
probably rdiff-backup) think that the file is up to date, and 
so not copy it again next time (unless you perform an expensive 
comparison of all the data, nit just the metadata).


I have patched rsync to issue fdatasync(2) calls frequently, 
to mitigate this problem in my own usage.  It does slow it 
down, but nowhere near as dramatically as you report.  (I use 
NetBSD-current.)


--apb (Alan Barrett)


Re: fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-28 Thread Matthew Mondor
On Fri, 28 Oct 2011 20:33:29 -0400
Greg Troxel  wrote:

> So, I'm inclined to patch rdiff-backup not to fsync, since it seems
> excessive, and the backup is toast if the machine crashes before it is
> finished -- in that case rdiff-backup just rolls back.  Opinions?

I also wonder why fsync would be used for every file, especially if you
consider a whole run a single "transaction", even more so if using
snapshots (although you don't mention using them).  In which case it
simply should report failure and abort on any open/write/rename/close
error, and at the end, fsync once, also checking for error.  If at
that point everything was successful, the "transaction" is commited (as
far as software is concerned, of course, hardware buffers might still
need flushing), otherwise everything should be rolled back, unless an
inconsistent state is allowed (where the next full backup might fix
that).

I'm however wondering if the excessive fsync(2)s weren't eventually
added because of issues with ext4, as I somehow remember unix semantic
exceptions with it, and know that some have lost files using it as
they'd normally safely use other file systems (and I haven't followed
progress to know if it's since fixed).

But if rdiff-backup cannot optionally avoid those, adding an option to
tell it not to fsync at every file as you suggested would be very sane
IMO (it still could default to sync mode, in case there's upstream
resistence)...

I can understand the need for some transaction-logging applications to
call fdatasync(2) regularily, but that's another matter (and even then
it's usually configurable after how many bytes or seconds to call it to
allow the administrator to tweak performance).
-- 
Matt


fsync, rdiff-backup, wapbl, and WD Elements 1T drive

2011-10-28 Thread Greg Troxel

netbsd-5, i386, 2 x 400G SATA in rf RAID1, external USB2 WD Elements 1T

I have a UFS2+WAPBL filesystem on the above RAID1 with ~900K files in
~320GB.  I'm backing it up with rdiff-backup to a USB2 external disk.
The external disk has a single large UFS2+WAPBL partition.

I found that backups took crazy long -- as in most of a week -- when not
that many of the files had changed.  The rdiff-backup process was often
in tstile or xscmd, but continue to apparently make progress.  The disk
busy % on both the RAID1 and external USB2 drive was only around 30%,
but any use of disks was very very slow.  If I suspended rdiff-backup,
all seemed well, and I saw ~30 MB/s bulk read with dd and several
hundred tps with du.

With ktrace, I found that rdiff-backup seems to do fsync on every file,
and each fsync takes most of a second.  Other than that ktrace events
happen in quick succession.

With (yes, I know I'm living dangerously, and the machine is on a UPS):
  sysctl -w vfs.wapbl.flush_disk_cache=0
I am seeing about 11 transactions flushed per second (via
  sysctl -w vfs.wapbl.verbose_commit=1
), and typically 300 tps and 3 MB/s on the USB2 disk.  In 15 minutes the
processed-file count has gone up 10K, vs 25K in about 4 hours.
The previous backup of a smaller fs did 230K files in 18h.

So things are still slow, but much better.

I have run rdiff-backup from a remote machine (with
host::/mnt/rdiff-backup/foo/bar as target), and that has seemed to be
ok, processing a 300G filesystem (on which not much changes) in about an
hour, via a 10 Mb/s network connection to a net5501 (lame USB) with a
similar disk.  I haven't looked yet, but it seems that rdiff-backup must
not do fsync when using the ssh backend.

So, I'm inclined to patch rdiff-backup not to fsync, since it seems
excessive, and the backup is toast if the machine crashes before it is
finished -- in that case rdiff-backup just rolls back.  Opinions?

I also wonder if we should have a vfs.wapbl.honor_fsync sysctl.  But it
seems the real issue here is fsync when there shouldn't be fsync, and
such a sysctl seems a bit scary and otherwise unncessary.


pgpYgklZV1ouX.pgp
Description: PGP signature