Re: Qmail is *NOT* reliable with ReiserFS

2000-07-17 Thread Bruce Guenter

On Sun, Jul 16, 2000 at 06:55:21PM +0200, Jedi/Sector One wrote:
>   ReiserFS does not commit link() synchronously (mounting with "sync"
> doesn't change anything). Therefore, if there is a power outage during
> the Maildir delivery or if qmail-smtpd answered the final "queued"
> message without actually commiting the link in queue/todo, the message
> will not be processed by qmail-send.

Actually, qmail is not "reliable" on any Linux FS.  This was discussed
to death a while back.  It is DJB's view that all directory operations
(creating, removing, linking, etc.) sould be synchronous, just like BSD
does.  It is Linus' view that this is a significant performance penalty
with little gain, since applications that require synchronous directory
operations also tend to require synchronous file operations and other
special file handling.  I agree.

There is also the discussion of ordered meta-data updates (OMDU) vs
unordered (UMDU).  Linux (with the exception of newer journalled file
systems) does UMDU.  With OMDU, the file meta-data (inode, indirect
blocks, etc) is written in an ordered fashion, typically before the
data.  This means FWIR that you can have good meta-data pointing to bad
data in the case of a crash.  With UMDU, you can have bad meta-data but
good data, which is something that a fsck will detect.

Since crashes are so rare, and journalling file systems becoming more
populous, this is rapidly becoming a non-issue.

I wrote a source file that replaces libc's open, link, rename, and
unlink routines with my own that sync the appropriate directory after
executing the syscall but before completing.  Simply linking with it
causes all directory operations executed by the program to become
synchronous.

It is available at http://em.ca/~bruceg/syncdir/  I include it in my
patched qmail RPMs.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: Qmail is *NOT* reliable with ReiserFS

2000-07-17 Thread Greg Hudson

> It is DJB's view that all directory operations (creating, removing,
> linking, etc.) sould be synchronous, just like BSD does.

For the record, FFS with soft-updates does not guarantee synchronous
directory operations; you have to open and fsync() the file you just
moved to be sure the operation has been committed to disk.  See
http://mail-index.netbsd.org/current-users/2000/06/19/0011.html for a
little more information.

Based on the patch, it sounds like ReiserFS agrees with
FFS+softupdates in semantics; that is, if you want to ensure that a
directory operation has completed, you open and fsync the directory
entry you care about.  This behavior is different from ext2fs, where
you have to open and fsync the directory containing the entry you care
about.



Re: Qmail is *NOT* reliable with ReiserFS

2000-07-17 Thread Bruce Guenter

On Mon, Jul 17, 2000 at 03:59:00PM -0400, Greg Hudson wrote:
> > It is DJB's view that all directory operations (creating, removing,
> > linking, etc.) sould be synchronous, just like BSD does.
> 
> For the record, FFS with soft-updates does not guarantee synchronous
> directory operations; you have to open and fsync() the file you just
> moved to be sure the operation has been committed to disk.  See
> http://mail-index.netbsd.org/current-users/2000/06/19/0011.html for a
> little more information.

Then I was confused.  I assumed FFS was like UFS on Solaris, where you
can "feel" the synchronous directory operations by doing a "rm -rf" of
anything larger than a few files.

> Based on the patch, it sounds like ReiserFS agrees with
> FFS+softupdates in semantics; that is, if you want to ensure that a
> directory operation has completed, you open and fsync the directory
> entry you care about.

But qmail already does this.  In fact, it is very careful to do this in
all the places it is necessary.  If ReiserFS behaved identically to
FFS+softupdates, it would not need any qmail patches.  (I have deleted
the original message which we are discussing, and I don't remember what
exactly it patched)

> This behavior is different from ext2fs, where
> you have to open and fsync the directory containing the entry you care
> about.

Which to me seems to be a more logical mode of operations: if you want
the file data sync'd to disk, call fsync on the file; if you want the
directory, fsync the directory.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature


Re: Qmail is *NOT* reliable with ReiserFS

2000-07-17 Thread Greg Hudson

Apologies for not catching this in my first reply to Bruce's message.

> There is also the discussion of ordered meta-data updates (OMDU) vs
> unordered (UMDU).  Linux (with the exception of newer journalled
> file systems) does UMDU.  With OMDU, the file meta-data (inode,
> indirect blocks, etc) is written in an ordered fashion, typically
> before the data.  This means FWIR that you can have good meta-data
> pointing to bad data in the case of a crash.  With UMDU, you can
> have bad meta-data but good data, which is something that a fsck
> will detect.

You have ODMU backwards.  Any sane ordered write scheme will write out
a block X before writing out a block (inode or directory entry) which
points to block X.  FFS, with or without soft updates, should never
encounter a case where an inode points to bad data.  (Of course, if
you disk controller reorders write operations you'll lose no matter
what.  Unfortunately, you have to choose both your hardware and your
software somewhat carefully if you really care about filsystem
consistency.)

Linux ext2fs has no write ordering whatsoever.  If the system goes
down uncleanly, you can get metadata pointing to bad data or data not
pointed to by metadata.  A recently created file might exist but
contain blocks from an old copy of /etc/shadow instead of the data you
wrote to it.  It's really ugly.  fsck cannot correct all of the
possible problems which can arise, no matter how clever or thorough it
is.  People have tried to justify this state of affairs in lots of
ways, but the only potentially correct and convincing justification
is, "who cares?"  Which is great unless you're one of the (admittedly,
relatively few) people who does care.

Note that write ordering is different from synchronous
vs. asynchronous operations.  Write ordering is about filesystem
consistency, which is mostly irrelevant to qmail's operation because
of the way qmail works.  ext2fs is also a little odd with respect to
synchronous operations (as discussed in my last piece of mail), but
it's certainly possible to work around that.



Re: Qmail is *NOT* reliable with ReiserFS

2000-07-17 Thread Michael Babcock

The 'sane' response would be to buy high-end power protection equipment
and use redundant drive configurations (RAID) and only worry about whether
the kernel writes out data consistently or not (and good journalling takes
care of many performance issues here).

Reliability of gigabytes per minute of data is not just a case of
demanding synchronous writes (which is silly in many cases).  Power faults
can be protected against as can many cases of hardware failure.
Performance should not be sacrificed for the cases where someone has a
high case of failure potential (high being relative).

Greg Hudson wrote:

> (Of course, if
> you disk controller reorders write operations you'll lose no matter
> what.  Unfortunately, you have to choose both your hardware and your
> software somewhat carefully if you really care about filsystem
> consistency.)
>
> Linux ext2fs has no write ordering whatsoever.  If the system goes
> down uncleanly, you can get metadata pointing to bad data or data not
> pointed to by metadata.  A recently created file might exist but
> contain blocks from an old copy of /etc/shadow instead of the data you
> wrote to it.  It's really ugly.  fsck cannot correct all of the
> possible problems which can arise, no matter how clever or thorough it
> is.  People have tried to justify this state of affairs in lots of
> ways, but the only potentially correct and convincing justification
> is, "who cares?"  Which is great unless you're one of the (admittedly,
> relatively few) people who does care.
>
> Note that write ordering is different from synchronous
> vs. asynchronous operations.  Write ordering is about filesystem
> consistency, which is mostly irrelevant to qmail's operation because
> of the way qmail works.  ext2fs is also a little odd with respect to
> synchronous operations (as discussed in my last piece of mail), but
> it's certainly possible to work around that.




Re: Qmail is *NOT* reliable with ReiserFS

2000-07-17 Thread Greg Hudson

>> For the record, FFS with soft-updates does not guarantee synchronous
>> directory operations; you have to open and fsync() the file you just
>> moved to be sure the operation has been committed to disk.  See
>> http://mail-index.netbsd.org/current-users/2000/06/19/0011.html for a
>> little more information.

> Then I was confused.  I assumed FFS was like UFS on Solaris, where
> you can "feel" the synchronous directory operations by doing a "rm
> -rf" of anything larger than a few files.

Soft updates are a recent thing.  UFS on Solaris does not have them.
Without soft updates, FFS does have synchronous directory operations,
and yes, you will feel the resultant performance limitations.

> If ReiserFS behaved identically to FFS+softupdates, it would not
> need any qmail patches.

I can't really address this issue; I don't know qmail well enough.

> Which to me seems to be a more logical mode of operations: if you
> want the file data sync'd to disk, call fsync on the file; if you
> want the directory, fsync the directory.

Perhaps.  There are arguments for either model being simplest, and
history should not be ignored when picking between the two.  The
Single Unix Spec v2 also appears to mandate the FFS model, for those
who care about that standard:

The fsync() function forces all currently queued I/O
operations associated with the file indicated by file
descriptor fildes to the synchronised I/O completion
state. All I/O operations are completed as defined for
synchronised I/O file integrity completion.

[and:]

synchronised I/O file integrity completion - Identical to a
synchronised I/O data integrity completion with the addition
that all file attributes relative to the I/O operation
(including access time, modification time, status change time)
will be successfully transferred prior to returning to the
calling process.

[and:]

synchronised I/O data integrity completion - [...] The write
is complete only when the data specified in the write request
is successfully transferred and all file system information
required to retrieve the data is successfully transferred.



Re: Qmail is *NOT* reliable with ReiserFS

2000-07-17 Thread Bruce Guenter

On Mon, Jul 17, 2000 at 04:39:01PM -0400, Greg Hudson wrote:
> > Which to me seems to be a more logical mode of operations: if you
> > want the file data sync'd to disk, call fsync on the file; if you
> > want the directory, fsync the directory.
> 
> Perhaps.  There are arguments for either model being simplest,

I didn't say simplest.  It's a little more complicated to have to
remember to sync the directory as well as the file.

> and history should not be ignored when picking between the two.

Exactly the point that Linus has made about this (and many other issues)
before.
-- 
Bruce Guenter <[EMAIL PROTECTED]>   http://em.ca/~bruceg/

 PGP signature