Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-05 Thread thomas graichen

Bruce Momjian <[EMAIL PROTECTED]> wrote:
>> > Yes, this double-writing is a problem.  Suppose you have your WAL on a
>> > separate drive.  You can fsync() WAL with zero head movement.  With a
>> > log based file system, you need two head movements, so you have gone
>> > from zero movements to two.
>> 
>> It may be worse depending on how the filesystem actually does
>> journalling.  I wonder if an fsync() may cause ALL pending
>> meta-data to be updated (even metadata not related to the 
>> postgresql files).
>> 
>> Do you know if reiser or xfs have this problem?

> I don't know, but the Linux user reported xfs was really slow.

i think this should be tested in more detail: i once tried this
lightly (running pgbench against postgresql 7.1beta4) with
different filesystems: ext2, reiserfs and XFS and reproducable
i got about 15% better results running on XFS ... ok - it's
not a very big test, but i think it might be worth to really
do an a/b test before seing it as a fact that postgresql is
slow on XFS (and maybe reiserfs too ... but reiserfs has had
performance problems in certain situations anyway)

XFS is a journaling fs, but it does all it's work in a very
clever way (delayed allocation etc.) - so usually you should
under normal conditions get decent performance out of it -
otherwise it might be worth sending a mail to the XFS
mailinglist (resierfs maybe dito)

t

-- 
thomas graichen <[EMAIL PROTECTED]> ... perfection is reached, not
when there is no longer anything to add, but when there is no
longer anything to take away. --- antoine de saint-exupery

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-04 Thread Bruce Momjian

[ Charset ISO-8859-1 unsupported, converting... ]
> I got some information from Stephen Tweedie on this - please keep him
> "Cc:" as he's not on this list
> 
> 
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> 
> > I was talking to a Linux user yesterday, and he said that performance
> > using the xfs file system is pretty bad.  He believes it has to do with
> > the fact that fsync() on log-based file systems requires more writes.
> 
> 
> Performance doing what?  XFS has known performance problems doing
> unlinks and truncates, but not synchronous IO.  The user should be
> using fdatasync() for databases, btw, not fsync().

This is hugely helpful.  In PostgreSQL 7.1, we do use fdatasync() by
default it is available on a platform.


> First, XFS, ext3 and reiserfs are *NOT* log-based filesystems.  They
> are journaling filesystems.  They have a log, but they are not
> log-based because they do not store data permanently in a log
> structure.  Berkeley LFS, Sprite and Spiralog are log-based
> filesystems.

Sorry, I get those mixed up.

> > With a standard BSD/ext2 file system, WAL writes can stay on the same
> > cylinder to perform fsync.  Is that true of log-based file systems?
> 
> Not true on ext2 or BSD.  Write-aheads are _usually_ close to the
> inode, but not always.  For true log-based filesystems, writes are
> always completely sequential, so the issue just goes away.  For
> journaling filesystems, depending on the setup there may be a seek to
> the journal involved, but some journaling filesystems can use a
> separate disk for the journal so no seek is required.
> 
> > I know xfs and reiser are both log based.  Do we need to be concerned
> > about PostgreSQL performance on these file systems?  I use BSD FFS with
> > soft updates here, so it doesn't affect me.
> 
> A database normally preallocates its data files and then performs most
> of its writes using update-in-place.  In such cases, fsync() is almost
> always the wrong thing to be doing --- the data writes have changed
> nothing in the inode except for the timestamps, and there's no need to
> flush the timestamps to disk for every write.  fdatasync() is
> designed for this --- if the only inode change is timestamps,
> fdatasync() will skip the seek to the inode and will only update the
> data.  If any significant inode fields have been changed, then a full
> flush is done.

We do pre-allocate our log file space in chunks to avoid inode/block
index writes.

> Using fdatasync, most filesystems will incur no seeks for data flush,
> regardless of whether the filesystem is journaling or not.

Thanks.  That is a big help.  I wonder if people reporting performance
problems were using 7.0.3.  We only added fdatasync() in 7.1.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-04 Thread Bruce Momjian

> Hi,
> 
> On Fri, May 04, 2001 at 01:49:54PM -0400, Bruce Momjian wrote:
> > > 
> > > Performance doing what?  XFS has known performance problems doing
> > > unlinks and truncates, but not synchronous IO.  The user should be
> > > using fdatasync() for databases, btw, not fsync().
> > 
> > This is hugely helpful.  In PostgreSQL 7.1, we do use fdatasync() by
> > default it is available on a platform.
> 
> Good --- fdatasync is defined in SingleUnix, so it's probably safe to
> probe for it and use it by default if it is there.
> 
> The 2.2 Linux kernel does not have fdatasync implemented, but glibc
> will fall back to fsync if that's all that the kernel supports.  2.4
> implements both with the required semantics.

OK, that is something we found too, that fdatasync() was there on some
platforms, but was really just an fsync().  I believe some HPUX
platforms had that.

OK, so they need a 2.4 kernel to properly test performance of Reiser/xfs
with fdatasync().

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-04 Thread Trond Eivind Glomsrød

I got some information from Stephen Tweedie on this - please keep him
"Cc:" as he's not on this list


Bruce Momjian <[EMAIL PROTECTED]> writes:

> I was talking to a Linux user yesterday, and he said that performance
> using the xfs file system is pretty bad.  He believes it has to do with
> the fact that fsync() on log-based file systems requires more writes.


Performance doing what?  XFS has known performance problems doing
unlinks and truncates, but not synchronous IO.  The user should be
using fdatasync() for databases, btw, not fsync().

First, XFS, ext3 and reiserfs are *NOT* log-based filesystems.  They
are journaling filesystems.  They have a log, but they are not
log-based because they do not store data permanently in a log
structure.  Berkeley LFS, Sprite and Spiralog are log-based
filesystems.

> With a standard BSD/ext2 file system, WAL writes can stay on the same
> cylinder to perform fsync.  Is that true of log-based file systems?

Not true on ext2 or BSD.  Write-aheads are _usually_ close to the
inode, but not always.  For true log-based filesystems, writes are
always completely sequential, so the issue just goes away.  For
journaling filesystems, depending on the setup there may be a seek to
the journal involved, but some journaling filesystems can use a
separate disk for the journal so no seek is required.

> I know xfs and reiser are both log based.  Do we need to be concerned
> about PostgreSQL performance on these file systems?  I use BSD FFS with
> soft updates here, so it doesn't affect me.

A database normally preallocates its data files and then performs most
of its writes using update-in-place.  In such cases, fsync() is almost
always the wrong thing to be doing --- the data writes have changed
nothing in the inode except for the timestamps, and there's no need to
flush the timestamps to disk for every write.  fdatasync() is
designed for this --- if the only inode change is timestamps,
fdatasync() will skip the seek to the inode and will only update the
data.  If any significant inode fields have been changed, then a full
flush is done.

Using fdatasync, most filesystems will incur no seeks for data flush,
regardless of whether the filesystem is journaling or not.

Cheers,
 Stephen


-- 
Trond Eivind Glomsrød
Red Hat, Inc.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Bruce Momjian

> > Yes, this double-writing is a problem.  Suppose you have your WAL on a
> > separate drive.  You can fsync() WAL with zero head movement.  With a
> > log based file system, you need two head movements, so you have gone
> > from zero movements to two.
> 
> It may be worse depending on how the filesystem actually does
> journalling.  I wonder if an fsync() may cause ALL pending
> meta-data to be updated (even metadata not related to the 
> postgresql files).
> 
> Do you know if reiser or xfs have this problem?

I don't know, but the Linux user reported xfs was really slow.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [010502 15:20] wrote:
> > The "problem" with log based filesystems is that they most likely
> > do not know the consequences of a write so an fsync on a file may
> > require double writing to both the log and the "real" portion of
> > the disk.  They can also exhibit the problem that an fsync may
> > cause all pending writes to require scheduling unless the log is
> > constructed on the fly rather than incrementally.
> 
> Yes, this double-writing is a problem.  Suppose you have your WAL on a
> separate drive.  You can fsync() WAL with zero head movement.  With a
> log based file system, you need two head movements, so you have gone
> from zero movements to two.

It may be worse depending on how the filesystem actually does
journalling.  I wonder if an fsync() may cause ALL pending
meta-data to be updated (even metadata not related to the 
postgresql files).

Do you know if reiser or xfs have this problem?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Alfred Perlstein

* Bruce Momjian <[EMAIL PROTECTED]> [010502 14:01] wrote:
> I was talking to a Linux user yesterday, and he said that performance
> using the xfs file system is pretty bad.  He believes it has to do with
> the fact that fsync() on log-based file systems requires more writes.
> 
> With a standard BSD/ext2 file system, WAL writes can stay on the same
> cylinder to perform fsync.  Is that true of log-based file systems?
> 
> I know xfs and reiser are both log based.  Do we need to be concerned
> about PostgreSQL performance on these file systems?  I use BSD FFS with
> soft updates here, so it doesn't affect me.

The "problem" with log based filesystems is that they most likely
do not know the consequences of a write so an fsync on a file may
require double writing to both the log and the "real" portion of
the disk.  They can also exhibit the problem that an fsync may
cause all pending writes to require scheduling unless the log is
constructed on the fly rather than incrementally.

There was also the problem that was brought up recently that
certain versions (maybe all?) of Linux perform fsync() in a very
non-optimal manner, if the user is able to use the O_FSYNC option
rather than fsync he may see a performance increase.

But his guess is probably nearly as good as mine. :)


-- 
-Alfred Perlstein - [[EMAIL PROTECTED]]
http://www.egr.unlv.edu/~slumos/on-netbsd.html

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Bruce Momjian

> The "problem" with log based filesystems is that they most likely
> do not know the consequences of a write so an fsync on a file may
> require double writing to both the log and the "real" portion of
> the disk.  They can also exhibit the problem that an fsync may
> cause all pending writes to require scheduling unless the log is
> constructed on the fly rather than incrementally.

Yes, this double-writing is a problem.  Suppose you have your WAL on a
separate drive.  You can fsync() WAL with zero head movement.  With a
log based file system, you need two head movements, so you have gone
from zero movements to two.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



[HACKERS] New Linux xfs/reiser file systems

2001-05-02 Thread Bruce Momjian

I was talking to a Linux user yesterday, and he said that performance
using the xfs file system is pretty bad.  He believes it has to do with
the fact that fsync() on log-based file systems requires more writes.

With a standard BSD/ext2 file system, WAL writes can stay on the same
cylinder to perform fsync.  Is that true of log-based file systems?

I know xfs and reiser are both log based.  Do we need to be concerned
about PostgreSQL performance on these file systems?  I use BSD FFS with
soft updates here, so it doesn't affect me.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html