Re: [PERFORM] O_DIRECT setting
Sorry about the belated reply, its been busy around here. Incidentally, postgres heap files suffer really, really bad fragmentation, which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...) quite drastically. We have in-house patches that somewhat alleiviate this, but they are not release quality. Has anybody else suffered this? Any chance I could give those patches a try? I'm interested in seeing how they may affect our DBT-3 workload, which execute DSS type queries. Like I said, the patches are not release quality... if you run them on a metadata journalling filesystem, without an 'ordered write' mode, its possible to end up with corrupt heaps after a crash because of garbage data in the extended files. If/when we move to postgres 8 I'll try to ensure the patches get re-done with releasable quality Guy Thornley ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] O_DIRECT setting
On Thu, Sep 30, 2004 at 07:02:32PM +1200, Guy Thornley wrote: Sorry about the belated reply, its been busy around here. Incidentally, postgres heap files suffer really, really bad fragmentation, which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...) quite drastically. We have in-house patches that somewhat alleiviate this, but they are not release quality. Has anybody else suffered this? Any chance I could give those patches a try? I'm interested in seeing how they may affect our DBT-3 workload, which execute DSS type queries. Like I said, the patches are not release quality... if you run them on a metadata journalling filesystem, without an 'ordered write' mode, its possible to end up with corrupt heaps after a crash because of garbage data in the extended files. If/when we move to postgres 8 I'll try to ensure the patches get re-done with releasable quality Guy Thornley That's ok, we like to help test and proof things, we don't need patches to be release quality. Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] O_DIRECT setting
On Thu, Sep 23, 2004 at 10:57:41AM -0400, Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: TODO has: * Consider use of open/fcntl(O_DIRECT) to minimize OS caching Should the item be removed? I think it's fine ;-) ... it says consider it, not do it. The point is that we could do with more research in this area, even if O_DIRECT per se is not useful. Maybe you could generalize the entry to investigate ways of fine-tuning OS caching behavior. regards, tom lane I talked to Jan a little about this during OSCon since Linux filesystems (ext2, ext3, etc) let you use O_DIRECT. He felt the only place where PostgreSQL may benefit from this now, without managing its own buffer first, would be with the log writer. I'm probably going to get this wrong, but he thought it would be interesting to try an experiment by taking X number of pages to be flushed, sort them (by age? where they go on disk?) and write them out. He thought this would be a relatively easy thing to try, a day or two of work. We'd really love to experiment with it. Mark ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] O_DIRECT setting
Mark Wong [EMAIL PROTECTED] writes: I talked to Jan a little about this during OSCon since Linux filesystems (ext2, ext3, etc) let you use O_DIRECT. He felt the only place where PostgreSQL may benefit from this now, without managing its own buffer first, would be with the log writer. I'm probably going to get this wrong, but he thought it would be interesting to try an experiment by taking X number of pages to be flushed, sort them (by age? where they go on disk?) and write them out. Hmm. Most of the time the log writer has little choice about page write order --- certainly if all your transactions are small it's not going to have any choice. I think this would mainly be equivalent to O_SYNC with the extra feature of stopping the kernel from buffering the WAL data in its own buffer cache. Which is probably useful, but I doubt it's going to make a huge difference. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] O_DIRECT setting
On Mon, Sep 20, 2004 at 07:57:34PM +1200, Guy Thornley wrote: [snip] Incidentally, postgres heap files suffer really, really bad fragmentation, which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...) quite drastically. We have in-house patches that somewhat alleiviate this, but they are not release quality. Has anybody else suffered this? Any chance I could give those patches a try? I'm interested in seeing how they may affect our DBT-3 workload, which execute DSS type queries. Thanks, Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] O_DIRECT setting
TODO has: * Consider use of open/fcntl(O_DIRECT) to minimize OS caching Should the item be removed? --- Neil Conway wrote: On Mon, 2004-09-20 at 17:57, Guy Thornley wrote: According to the manpage, O_DIRECT implies O_SYNC: File I/O is done directly to/from user space buffers. The I/O is synchronous, i.e., at the completion of the read(2) or write(2) system call, data is guaranteed to have been transferred. This seems like it would be a rather large net loss. PostgreSQL already structures writes so that the writes we need to hit disk immediately (WAL records) are fsync()'ed -- the kernel is given more freedom to schedule how other writes are flushed from the cache. Also, my recollection is that O_DIRECT also disables readahead -- if that's correct, that's not what we want either. BTW, using O_DIRECT has been discussed a few times in the past. Have you checked the list archives? (for both -performance and -hackers) Would people be interested in a performance benchmark? Sure -- I'd definitely be curious, although as I said I'm skeptical it's a win. I need some benchmark tips :) Some people have noted that it can be difficult to use contrib/pgbench to get reproducible results -- you might want to look at Jan's TPC-W implementation or the OSDL database benchmarks: http://pgfoundry.org/projects/tpc-w-php/ http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/ Incidentally, postgres heap files suffer really, really bad fragmentation, which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...) quite drastically. We have in-house patches that somewhat alleiviate this, but they are not release quality. Can you elaborate on these in-house patches? -Neil ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] O_DIRECT setting
Bruce Momjian [EMAIL PROTECTED] writes: TODO has: * Consider use of open/fcntl(O_DIRECT) to minimize OS caching Should the item be removed? I think it's fine ;-) ... it says consider it, not do it. The point is that we could do with more research in this area, even if O_DIRECT per se is not useful. Maybe you could generalize the entry to investigate ways of fine-tuning OS caching behavior. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] O_DIRECT setting
On Mon, 2004-09-20 at 17:57, Guy Thornley wrote: According to the manpage, O_DIRECT implies O_SYNC: File I/O is done directly to/from user space buffers. The I/O is synchronous, i.e., at the completion of the read(2) or write(2) system call, data is guaranteed to have been transferred. This seems like it would be a rather large net loss. PostgreSQL already structures writes so that the writes we need to hit disk immediately (WAL records) are fsync()'ed -- the kernel is given more freedom to schedule how other writes are flushed from the cache. Also, my recollection is that O_DIRECT also disables readahead -- if that's correct, that's not what we want either. BTW, using O_DIRECT has been discussed a few times in the past. Have you checked the list archives? (for both -performance and -hackers) Would people be interested in a performance benchmark? Sure -- I'd definitely be curious, although as I said I'm skeptical it's a win. I need some benchmark tips :) Some people have noted that it can be difficult to use contrib/pgbench to get reproducible results -- you might want to look at Jan's TPC-W implementation or the OSDL database benchmarks: http://pgfoundry.org/projects/tpc-w-php/ http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/ Incidentally, postgres heap files suffer really, really bad fragmentation, which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...) quite drastically. We have in-house patches that somewhat alleiviate this, but they are not release quality. Can you elaborate on these in-house patches? -Neil ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] O_DIRECT setting
A recent comment on this (or perhaps another?) mailing list about Sun boxen and the directio mount option has prompted me to read about O_DIRECT on the open() manpage. Has anybody tried this option? Ever taken any performance measurements? I assume the way postgres manages its buffer memory (dealing with 8kB pages) would be compatible with the restrictions: Under Linux 2.4 transfer sizes, and the alignment of user buffer and file offset must all be multiples of the logical block size of the file system. According to the manpage, O_DIRECT implies O_SYNC: File I/O is done directly to/from user space buffers. The I/O is synchronous, i.e., at the completion of the read(2) or write(2) system call, data is guaranteed to have been transferred. At the moment I am fairly interested in trying this, and I would spend some time with it, but I have my hands full with other projects. I'd imagine this is more use with the revamped buffer manager in PG8.0 than the 7.x line, but we are not using PG8.0 here yet. Would people be interested in a performance benchmark? I need some benchmark tips :) Incidentally, postgres heap files suffer really, really bad fragmentation, which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...) quite drastically. We have in-house patches that somewhat alleiviate this, but they are not release quality. Has anybody else suffered this? Guy Thornley ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match