Re: [HACKERS] New Linux xfs/reiser file systems
Bruce Momjian <[EMAIL PROTECTED]> wrote: >> > Yes, this double-writing is a problem. Suppose you have your WAL on a >> > separate drive. You can fsync() WAL with zero head movement. With a >> > log based file system, you need two head movements, so you have gone >> > from zero movements to two. >> >> It may be worse depending on how the filesystem actually does >> journalling. I wonder if an fsync() may cause ALL pending >> meta-data to be updated (even metadata not related to the >> postgresql files). >> >> Do you know if reiser or xfs have this problem? > I don't know, but the Linux user reported xfs was really slow. i think this should be tested in more detail: i once tried this lightly (running pgbench against postgresql 7.1beta4) with different filesystems: ext2, reiserfs and XFS and reproducable i got about 15% better results running on XFS ... ok - it's not a very big test, but i think it might be worth to really do an a/b test before seing it as a fact that postgresql is slow on XFS (and maybe reiserfs too ... but reiserfs has had performance problems in certain situations anyway) XFS is a journaling fs, but it does all it's work in a very clever way (delayed allocation etc.) - so usually you should under normal conditions get decent performance out of it - otherwise it might be worth sending a mail to the XFS mailinglist (resierfs maybe dito) t -- thomas graichen <[EMAIL PROTECTED]> ... perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away. --- antoine de saint-exupery ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] New Linux xfs/reiser file systems
[ Charset ISO-8859-1 unsupported, converting... ] > I got some information from Stephen Tweedie on this - please keep him > "Cc:" as he's not on this list > > > Bruce Momjian <[EMAIL PROTECTED]> writes: > > > I was talking to a Linux user yesterday, and he said that performance > > using the xfs file system is pretty bad. He believes it has to do with > > the fact that fsync() on log-based file systems requires more writes. > > > Performance doing what? XFS has known performance problems doing > unlinks and truncates, but not synchronous IO. The user should be > using fdatasync() for databases, btw, not fsync(). This is hugely helpful. In PostgreSQL 7.1, we do use fdatasync() by default it is available on a platform. > First, XFS, ext3 and reiserfs are *NOT* log-based filesystems. They > are journaling filesystems. They have a log, but they are not > log-based because they do not store data permanently in a log > structure. Berkeley LFS, Sprite and Spiralog are log-based > filesystems. Sorry, I get those mixed up. > > With a standard BSD/ext2 file system, WAL writes can stay on the same > > cylinder to perform fsync. Is that true of log-based file systems? > > Not true on ext2 or BSD. Write-aheads are _usually_ close to the > inode, but not always. For true log-based filesystems, writes are > always completely sequential, so the issue just goes away. For > journaling filesystems, depending on the setup there may be a seek to > the journal involved, but some journaling filesystems can use a > separate disk for the journal so no seek is required. > > > I know xfs and reiser are both log based. Do we need to be concerned > > about PostgreSQL performance on these file systems? I use BSD FFS with > > soft updates here, so it doesn't affect me. > > A database normally preallocates its data files and then performs most > of its writes using update-in-place. In such cases, fsync() is almost > always the wrong thing to be doing --- the data writes have changed > nothing in the inode except for the timestamps, and there's no need to > flush the timestamps to disk for every write. fdatasync() is > designed for this --- if the only inode change is timestamps, > fdatasync() will skip the seek to the inode and will only update the > data. If any significant inode fields have been changed, then a full > flush is done. We do pre-allocate our log file space in chunks to avoid inode/block index writes. > Using fdatasync, most filesystems will incur no seeks for data flush, > regardless of whether the filesystem is journaling or not. Thanks. That is a big help. I wonder if people reporting performance problems were using 7.0.3. We only added fdatasync() in 7.1. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] New Linux xfs/reiser file systems
> Hi, > > On Fri, May 04, 2001 at 01:49:54PM -0400, Bruce Momjian wrote: > > > > > > Performance doing what? XFS has known performance problems doing > > > unlinks and truncates, but not synchronous IO. The user should be > > > using fdatasync() for databases, btw, not fsync(). > > > > This is hugely helpful. In PostgreSQL 7.1, we do use fdatasync() by > > default it is available on a platform. > > Good --- fdatasync is defined in SingleUnix, so it's probably safe to > probe for it and use it by default if it is there. > > The 2.2 Linux kernel does not have fdatasync implemented, but glibc > will fall back to fsync if that's all that the kernel supports. 2.4 > implements both with the required semantics. OK, that is something we found too, that fdatasync() was there on some platforms, but was really just an fsync(). I believe some HPUX platforms had that. OK, so they need a 2.4 kernel to properly test performance of Reiser/xfs with fdatasync(). -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] New Linux xfs/reiser file systems
I got some information from Stephen Tweedie on this - please keep him "Cc:" as he's not on this list Bruce Momjian <[EMAIL PROTECTED]> writes: > I was talking to a Linux user yesterday, and he said that performance > using the xfs file system is pretty bad. He believes it has to do with > the fact that fsync() on log-based file systems requires more writes. Performance doing what? XFS has known performance problems doing unlinks and truncates, but not synchronous IO. The user should be using fdatasync() for databases, btw, not fsync(). First, XFS, ext3 and reiserfs are *NOT* log-based filesystems. They are journaling filesystems. They have a log, but they are not log-based because they do not store data permanently in a log structure. Berkeley LFS, Sprite and Spiralog are log-based filesystems. > With a standard BSD/ext2 file system, WAL writes can stay on the same > cylinder to perform fsync. Is that true of log-based file systems? Not true on ext2 or BSD. Write-aheads are _usually_ close to the inode, but not always. For true log-based filesystems, writes are always completely sequential, so the issue just goes away. For journaling filesystems, depending on the setup there may be a seek to the journal involved, but some journaling filesystems can use a separate disk for the journal so no seek is required. > I know xfs and reiser are both log based. Do we need to be concerned > about PostgreSQL performance on these file systems? I use BSD FFS with > soft updates here, so it doesn't affect me. A database normally preallocates its data files and then performs most of its writes using update-in-place. In such cases, fsync() is almost always the wrong thing to be doing --- the data writes have changed nothing in the inode except for the timestamps, and there's no need to flush the timestamps to disk for every write. fdatasync() is designed for this --- if the only inode change is timestamps, fdatasync() will skip the seek to the inode and will only update the data. If any significant inode fields have been changed, then a full flush is done. Using fdatasync, most filesystems will incur no seeks for data flush, regardless of whether the filesystem is journaling or not. Cheers, Stephen -- Trond Eivind Glomsrød Red Hat, Inc. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] New Linux xfs/reiser file systems
> > Yes, this double-writing is a problem. Suppose you have your WAL on a > > separate drive. You can fsync() WAL with zero head movement. With a > > log based file system, you need two head movements, so you have gone > > from zero movements to two. > > It may be worse depending on how the filesystem actually does > journalling. I wonder if an fsync() may cause ALL pending > meta-data to be updated (even metadata not related to the > postgresql files). > > Do you know if reiser or xfs have this problem? I don't know, but the Linux user reported xfs was really slow. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] New Linux xfs/reiser file systems
* Bruce Momjian <[EMAIL PROTECTED]> [010502 15:20] wrote: > > The "problem" with log based filesystems is that they most likely > > do not know the consequences of a write so an fsync on a file may > > require double writing to both the log and the "real" portion of > > the disk. They can also exhibit the problem that an fsync may > > cause all pending writes to require scheduling unless the log is > > constructed on the fly rather than incrementally. > > Yes, this double-writing is a problem. Suppose you have your WAL on a > separate drive. You can fsync() WAL with zero head movement. With a > log based file system, you need two head movements, so you have gone > from zero movements to two. It may be worse depending on how the filesystem actually does journalling. I wonder if an fsync() may cause ALL pending meta-data to be updated (even metadata not related to the postgresql files). Do you know if reiser or xfs have this problem? -- -Alfred Perlstein - [[EMAIL PROTECTED]] Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] New Linux xfs/reiser file systems
* Bruce Momjian <[EMAIL PROTECTED]> [010502 14:01] wrote: > I was talking to a Linux user yesterday, and he said that performance > using the xfs file system is pretty bad. He believes it has to do with > the fact that fsync() on log-based file systems requires more writes. > > With a standard BSD/ext2 file system, WAL writes can stay on the same > cylinder to perform fsync. Is that true of log-based file systems? > > I know xfs and reiser are both log based. Do we need to be concerned > about PostgreSQL performance on these file systems? I use BSD FFS with > soft updates here, so it doesn't affect me. The "problem" with log based filesystems is that they most likely do not know the consequences of a write so an fsync on a file may require double writing to both the log and the "real" portion of the disk. They can also exhibit the problem that an fsync may cause all pending writes to require scheduling unless the log is constructed on the fly rather than incrementally. There was also the problem that was brought up recently that certain versions (maybe all?) of Linux perform fsync() in a very non-optimal manner, if the user is able to use the O_FSYNC option rather than fsync he may see a performance increase. But his guess is probably nearly as good as mine. :) -- -Alfred Perlstein - [[EMAIL PROTECTED]] http://www.egr.unlv.edu/~slumos/on-netbsd.html ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] New Linux xfs/reiser file systems
> The "problem" with log based filesystems is that they most likely > do not know the consequences of a write so an fsync on a file may > require double writing to both the log and the "real" portion of > the disk. They can also exhibit the problem that an fsync may > cause all pending writes to require scheduling unless the log is > constructed on the fly rather than incrementally. Yes, this double-writing is a problem. Suppose you have your WAL on a separate drive. You can fsync() WAL with zero head movement. With a log based file system, you need two head movements, so you have gone from zero movements to two. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[HACKERS] New Linux xfs/reiser file systems
I was talking to a Linux user yesterday, and he said that performance using the xfs file system is pretty bad. He believes it has to do with the fact that fsync() on log-based file systems requires more writes. With a standard BSD/ext2 file system, WAL writes can stay on the same cylinder to perform fsync. Is that true of log-based file systems? I know xfs and reiser are both log based. Do we need to be concerned about PostgreSQL performance on these file systems? I use BSD FFS with soft updates here, so it doesn't affect me. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html