Re: Proposal to enable WAPBL by default for 10.0

2020-09-03 Thread Reinoud Zandijk
On Thu, Jul 23, 2020 at 11:06:27AM +0200, Hauke Fath wrote:
> On Thu, 23 Jul 2020 07:45:08 +0200, Michał Górny wrote:
> > How does that compare to the level of damage non-journaled FFS takes?
> 
> From then on, the sandbox was easily recoverable after every panic. It 
> turned out fsck times on a moderately-sized SSD were bearable - 
> certainly shorter than writing cleanup scripts, throwing out half of 
> the sandbox, and 'cvs update'ing the rest.

I disabled WAPBL on my SSD too; it is `only' 232 GB and fsck-ing indeed takes
a very short time, especially when comparing it to my rotational discs :)

Reinoud



Re: Proposal to enable WAPBL by default for 10.0

2020-07-27 Thread David Holland
On Sun, Jul 26, 2020 at 11:20:37PM +, m...@netbsd.org wrote:
 > > To be explicit:
 > > 
 > > It is the same underly problem either way, and it is worse in practice
 > > with WAPBL than without because WAPBL ffs runs faster than non-WAPBL
 > > ffs and thus accumulates more unwritten blocks.
 > 
 > It looks like this difference is because FFS doesn't flush the disk
 > cache often, but if WAPBL is enabled, it does on every log write.

That would cause WAPBL to generate fewer unwritten blocks, which isn't
consistent with the observed results. (Or maybe it is, and without
this effect WAPBL would be even worse.)

But this is unlikely to be an issue in most cases, because data that
makes it to the disk-level cache is not lost just because the kernel
panics. You have to turn off the power for that.

-- 
David A. Holland
dholl...@netbsd.org


Re: Proposal to enable WAPBL by default for 10.0

2020-07-27 Thread Roy Marples

On 27/07/2020 11:58, nia wrote:

Of course, it would also be nice to have the option of more filesystems
in sysinst (ZFS, LFS in 10 assuming the remaining deadlocks mainly effect
removable media - taylor?), and a noatime option for flash media.


Until we get bootloader support for ZFS we might need to consider teaching 
sysinst to create a small FFS root and pivot to the real ZFS root.

https://wiki.netbsd.org/wiki/RootOnZFS/

That would be nice for 10 at least.

Roy


Re: Proposal to enable WAPBL by default for 10.0

2020-07-27 Thread nia
It feels like we could avoid the controversy of whether it should be
enabled by default by making it an option in sysinst.

Of course, it would also be nice to have the option of more filesystems
in sysinst (ZFS, LFS in 10 assuming the remaining deadlocks mainly effect
removable media - taylor?), and a noatime option for flash media.


Re: Proposal to enable WAPBL by default for 10.0

2020-07-26 Thread Jason Thorpe



> On Jul 26, 2020, at 4:20 PM, m...@netbsd.org  wrote:
> 
> It looks like this difference is because FFS doesn't flush the disk
> cache often, but if WAPBL is enabled, it does on every log write.

Do you mean by issuing a SYNCHRONIZE_CACHE or similar command?  Be aware that 
there are a lot of USB->SATA bridges out there that happily ignore this command.

-- thorpej



Re: Proposal to enable WAPBL by default for 10.0

2020-07-26 Thread maya
On Thu, Jul 23, 2020 at 08:56:14PM +, David Holland wrote:
> On Thu, Jul 23, 2020 at 07:45:08AM +0200, Micha? G?rny wrote:
>  > > > Rationale: the default filesystem (FFS) without WAPBL is more prone to
>  > > > data loss.
>  > > 
>  > > It is not, unfortunately. We had WAPBL on by default some time back
>  > > and eventually switched it off.
>  > > 
>  > > The problem is that because it still doesn't do anything about
>  > > journaling or preserving file contents, but runs a lot faster, it
>  > > loses more data when interrupted.
>  > 
>  > How does that compare to the level of damage non-journaled FFS takes?
> 
> To be explicit:
> 
> It is the same underly problem either way, and it is worse in practice
> with WAPBL than without because WAPBL ffs runs faster than non-WAPBL
> ffs and thus accumulates more unwritten blocks.

It looks like this difference is because FFS doesn't flush the disk
cache often, but if WAPBL is enabled, it does on every log write.


Re: Proposal to enable WAPBL by default for 10.0

2020-07-23 Thread David Holland
On Thu, Jul 23, 2020 at 07:45:08AM +0200, Micha? G?rny wrote:
 > > > Rationale: the default filesystem (FFS) without WAPBL is more prone to
 > > > data loss.
 > > 
 > > It is not, unfortunately. We had WAPBL on by default some time back
 > > and eventually switched it off.
 > > 
 > > The problem is that because it still doesn't do anything about
 > > journaling or preserving file contents, but runs a lot faster, it
 > > loses more data when interrupted.
 > 
 > How does that compare to the level of damage non-journaled FFS takes?

To be explicit:

It is the same underly problem either way, and it is worse in practice
with WAPBL than without because WAPBL ffs runs faster than non-WAPBL
ffs and thus accumulates more unwritten blocks.

-- 
David A. Holland
dholl...@netbsd.org


Re: Proposal to enable WAPBL by default for 10.0

2020-07-23 Thread Jonathan A. Kollasch
On Wed, Jul 22, 2020 at 11:24:16PM +0200, Kamil Rytarowski wrote:
> I propose to enable WAPBL ("log" in fstab(5)) by default for 10.0 and newer.

I oppose such a move.  I will not be able to support any such change
until https://gnats.netbsd.org/47231 is satisfactorily resolved.

Jonathan


Re: Proposal to enable WAPBL by default for 10.0

2020-07-23 Thread Taylor R Campbell
> Date: Thu, 23 Jul 2020 08:43:04 -0400
> From: Greg Troxel 
> 
> Taylor R Campbell  writes:
> 
> [lots of good points, no disagreement]
> 
> If /etc/master.passwd is ending up with junk, that's a clue that code
> that updates it isn't doing the write secondary file, fysnc it, rename,
> approach.  As I understand it with POSIX filesystems you have to do that
> because there is no guarantee on open/write/close that you'll have one
> or the other.  Even with zfs, you could have done write on the first
> half and not the second, so I think you still need this.

Yes, that sounds right and we should fix whatever writes
/etc/master.passwd.  There's no way it's performance-critical so it
should be fine to add fsync.

> > work...which is why I used to use ffs+sync on my laptop, and these
> > days I avoid ffs altogether in favour of zfs and lfs, except on
> > install images written to USB media.)
> 
> Do you find that lfs is 100% solid now (in 9-stable, or current)?  I
> have seen fixes and never really been sure.

No -- but the main point is that when anything goes wrong and the
system crashes or power fails, it doesn't leave garbage data at the
end of files like ffs does, leading, e.g., to spectacularly corrupted
CVS trees.

I fixed a lot of deadlocks in lfs in HEAD earlier this year.  Haven't
yet pulled them up to 9 because there was one major deadlock remaining
that I had an unsatisfactory fix for, and I haven't gotten around to
testing a better fix yet.


Re: Proposal to enable WAPBL by default for 10.0

2020-07-23 Thread Greg Troxel
Taylor R Campbell  writes:

[lots of good points, no disagreement]

If /etc/master.passwd is ending up with junk, that's a clue that code
that updates it isn't doing the write secondary file, fysnc it, rename,
approach.  As I understand it with POSIX filesystems you have to do that
because there is no guarantee on open/write/close that you'll have one
or the other.  Even with zfs, you could have done write on the first
half and not the second, so I think you still need this.q

> work...which is why I used to use ffs+sync on my laptop, and these
> days I avoid ffs altogether in favour of zfs and lfs, except on
> install images written to USB media.)

Do you find that lfs is 100% solid now (in 9-stable, or current)?  I
have seen fixes and never really been sure.


Re: Proposal to enable WAPBL by default for 10.0

2020-07-23 Thread Hauke Fath
On Thu, 23 Jul 2020 07:45:08 +0200, Michał Górny wrote:
>> The problem is that because it still doesn't do anything about
>> journaling or preserving file contents, but runs a lot faster, it
>> loses more data when interrupted.
> 
> How does that compare to the level of damage non-journaled FFS takes?

To give anecdotal experience: My home server experienced frequent ntp 
related panics two or three years ago, in particular during high 
network activity. I obliterated my cvs src sandbox several times over 
(with cvs not recognizing hundreds of files, and asking to move them 
out of the way), until I disabled wapbl.

From then on, the sandbox was easily recoverable after every panic. It 
turned out fsck times on a moderately-sized SSD were bearable - 
certainly shorter than writing cleanup scripts, throwing out half of 
the sandbox, and 'cvs update'ing the rest.

I have been thinking twice before mounting important partitions with 
'log' ever since.

Cheerio,
hauke

-- 
Hauke Fath
Grabengasse 57
64372 Ober-Ramstadt
Germany

Re: Proposal to enable WAPBL by default for 10.0

2020-07-23 Thread Taylor R Campbell
> Date: Thu, 23 Jul 2020 07:45:08 +0200
> From: Michał_Górny 
> 
> On Thu, 2020-07-23 at 05:17 +, David Holland wrote:
> > The problem is that because it still doesn't do anything about
> > journaling or preserving file contents, but runs a lot faster, it
> > loses more data when interrupted.
> 
> How does that compare to the level of damage non-journaled FFS takes?
>  My VM was just bricked a second time because /etc/passwd was turned to
> junk.  I dare say that a proper metadata journaling + proper writes
> (i.e. using rename() -- haven't verified whether that's done correctly)
> should prevent that from happening again.

Metadata journaling doesn't do anything about that, and it never has.

It is a common misconception that metadata journaling has anything to
do with making a system more robust against data corruption.

Metadata journaling is primarily about making it _faster_ to pick up
after interruption such as crash or power failure, and faster to issue
writes in the first place at the cost of doubling the number of
metadata writes.

- In traditional ffs, every operation issues metadata writes
  synchronously in a particular order.

  This way, if an operation is interrupted, then on reboot, `fsck -p'
  can reliably identify what state the file system was in, and either
  roll back to undo the operation or roll forward to complete it.

  Of course, identifying that state requires doing a global analysis
  of the file system structure, so it's slow, and the larger the file
  system is the slower it gets.

  (Note: `fsck -p' is part of the file system design; fsck _without_
  `-p' is pray-to-recover from `unexpected inconsistencies' arising
  either from bugs or from hardware failures.)

- With wapbl, every operation issues metadata writes in order _twice_:
  first to a sequential log and then -- once all the writes to the log
  for the operation have been committed to disk -- to the locations
  where the metadata blocks actually live.

  This way, if an operation is interrupted, then on reboot, log replay
  can reliably roll forward operations whose metadata writes were
  committed in the log, and discard the rest to roll back operations
  whose metadata writes were not committed.

  Log replay takes time proportional roughly to the number of
  in-flight operations rather than to the size of the file system, so
  it's much cheaper than the global analysis of `fsck -p' for large
  disks.

  wapbl only requires the metadata writes to be serialized -- not
  synchronous -- so even though it issues every metadata write twice,
  it tends to have much higher write throughput (especially on
  spinning rust) since metadata writes don't happen in lock-step with
  the disk write latency.

Of course, the devil is in the details, and wapbl is actually more
complicated than that, and we screwed up the on-disk format ages ago.
So wapbl has various shortcomings, like crashes when the number of
metadata writes needed to atomically truncate a large file exceeds the
free space left in the log on disk because we failed to guarantee
every operation runs in (small) constant log space and preallocate
enough space up front.

ffs also has a long-standing bug I call the `garbage data appended
after crash' bug: when you append data to a file, ffs will
_synchronously_ allocate data blocks and update the inode length, and
_asynchronously_ write the data to the new blocks.  If interrupted,
the new blocks may be allocated and the inode length updated, but the
new blocks may contain garbage because the asynchronous data writes
haven't completed yet.  The result is that it's as if you appended
garbage data to the end of the file.  You can work around it by
writing to a temporary file, fsyncing the temporary file, and renaming
to the permanent location, but it's a bug nevertheless.

wapbl makes this bug _worse_ by issuing the metadata writes much
faster -- since they only need to be serialized, not synchronous -- so
the bug can apply to many more files and much more data.

All of this is to say: wapbl -- and journaling generally -- doesn't do
anything more than ffs to change the `level of damage' in any
qualitative way; but both traditional ffs and ffs+wapbl have something
that you might call a `data loss' bug (more accurately, file
corruption), and it's quantitatively _worse_ for wapbl.

So I'm not clear on where kamil gets the idea that wapbl is less prone
to data loss, and the symptom you (mgorny) described is consistent
with the bug that wapbl makes worse.


(There are various ways we _could_ approach the shortcomings of ffs
and wapbl: impose ordering constraints on data writes to fix the
garbage data appended after crash bug (`soft updates'), for example;
create new types of logical log entries to atomically truncate inodes
so that truncation can run in constant log space; do bookkeeping for
wapbl transactions better so we never run out of space.  But some of
these require changes to the on-disk format, and overall it's a 

Re: Proposal to enable WAPBL by default for 10.0

2020-07-22 Thread Michał Górny
On Thu, 2020-07-23 at 05:17 +, David Holland wrote:
> On Wed, Jul 22, 2020 at 11:24:16PM +0200, Kamil Rytarowski wrote:
>  > I propose to enable WAPBL ("log" in fstab(5)) by default for 10.0 and 
> newer.
>  > [...]
>  >
>  > Rationale: the default filesystem (FFS) without WAPBL is more prone to
>  > data loss.
> 
> It is not, unfortunately. We had WAPBL on by default some time back
> and eventually switched it off.
> 
> The problem is that because it still doesn't do anything about
> journaling or preserving file contents, but runs a lot faster, it
> loses more data when interrupted.

How does that compare to the level of damage non-journaled FFS takes?
 My VM was just bricked a second time because /etc/passwd was turned to
junk.  I dare say that a proper metadata journaling + proper writes
(i.e. using rename() -- haven't verified whether that's done correctly)
should prevent that from happening again.

-- 
Best regards,
Michał Górny



signature.asc
Description: This is a digitally signed message part


Re: Proposal to enable WAPBL by default for 10.0

2020-07-22 Thread David Holland
On Thu, Jul 23, 2020 at 05:17:33AM +, David Holland wrote:
 > The problem is that because it still doesn't do anything about
 > journaling or preserving file contents, but runs a lot faster, it
 > loses more data when interrupted.

Note since someone already asked: that should be read as "(journaling
or preserving) file contents".

-- 
David A. Holland
dholl...@netbsd.org


Re: Proposal to enable WAPBL by default for 10.0

2020-07-22 Thread David Holland
On Wed, Jul 22, 2020 at 11:24:16PM +0200, Kamil Rytarowski wrote:
 > I propose to enable WAPBL ("log" in fstab(5)) by default for 10.0 and newer.
 > [...]
 >
 > Rationale: the default filesystem (FFS) without WAPBL is more prone to
 > data loss.

It is not, unfortunately. We had WAPBL on by default some time back
and eventually switched it off.

The problem is that because it still doesn't do anything about
journaling or preserving file contents, but runs a lot faster, it
loses more data when interrupted.

-- 
David A. Holland
dholl...@netbsd.org