Re: [PERFORM] O_DIRECT setting

2004-09-30 Thread Guy Thornley
Sorry about the belated reply, its been busy around here.

  Incidentally, postgres heap files suffer really, really bad fragmentation,
  which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...)
  quite drastically. We have in-house patches that somewhat alleiviate this,
  but they are not release quality. Has anybody else suffered this?
  
 
 Any chance I could give those patches a try?  I'm interested in seeing
 how they may affect our DBT-3 workload, which execute DSS type queries.

Like I said, the patches are not release quality... if you run them on a
metadata journalling filesystem, without an 'ordered write' mode, its
possible to end up with corrupt heaps after a crash because of garbage data
in the extended files.

If/when we move to postgres 8 I'll try to ensure the patches get re-done
with releasable quality

Guy Thornley

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] O_DIRECT setting

2004-09-30 Thread Mark Wong
On Thu, Sep 30, 2004 at 07:02:32PM +1200, Guy Thornley wrote:
 Sorry about the belated reply, its been busy around here.
 
   Incidentally, postgres heap files suffer really, really bad fragmentation,
   which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...)
   quite drastically. We have in-house patches that somewhat alleiviate this,
   but they are not release quality. Has anybody else suffered this?
   
  
  Any chance I could give those patches a try?  I'm interested in seeing
  how they may affect our DBT-3 workload, which execute DSS type queries.
 
 Like I said, the patches are not release quality... if you run them on a
 metadata journalling filesystem, without an 'ordered write' mode, its
 possible to end up with corrupt heaps after a crash because of garbage data
 in the extended files.
 
 If/when we move to postgres 8 I'll try to ensure the patches get re-done
 with releasable quality
 
 Guy Thornley

That's ok, we like to help test and proof things, we don't need patches to be
release quality.

Mark

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] O_DIRECT setting

2004-09-29 Thread Mark Wong
On Thu, Sep 23, 2004 at 10:57:41AM -0400, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  TODO has:
  * Consider use of open/fcntl(O_DIRECT) to minimize OS caching
  Should the item be removed?
 
 I think it's fine ;-) ... it says consider it, not do it.  The point
 is that we could do with more research in this area, even if O_DIRECT
 per se is not useful.  Maybe you could generalize the entry to
 investigate ways of fine-tuning OS caching behavior.
 
   regards, tom lane
 

I talked to Jan a little about this during OSCon since Linux filesystems
(ext2, ext3, etc) let you use O_DIRECT.  He felt the only place where
PostgreSQL may benefit from this now, without managing its own buffer first,
would be with the log writer.  I'm probably going to get this wrong, but
he thought it would be interesting to try an experiment by taking X number
of pages to be flushed, sort them (by age? where they go on disk?) and
write them out.  He thought this would be a relatively easy thing to try,
a day or two of work.  We'd really love to experiment with it.

Mark

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] O_DIRECT setting

2004-09-29 Thread Tom Lane
Mark Wong [EMAIL PROTECTED] writes:
 I talked to Jan a little about this during OSCon since Linux filesystems
 (ext2, ext3, etc) let you use O_DIRECT.  He felt the only place where
 PostgreSQL may benefit from this now, without managing its own buffer first,
 would be with the log writer.  I'm probably going to get this wrong, but
 he thought it would be interesting to try an experiment by taking X number
 of pages to be flushed, sort them (by age? where they go on disk?) and
 write them out.

Hmm.  Most of the time the log writer has little choice about page write
order --- certainly if all your transactions are small it's not going to
have any choice.  I think this would mainly be equivalent to O_SYNC with
the extra feature of stopping the kernel from buffering the WAL data in
its own buffer cache.  Which is probably useful, but I doubt it's going
to make a huge difference.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] O_DIRECT setting

2004-09-24 Thread Mark Wong
On Mon, Sep 20, 2004 at 07:57:34PM +1200, Guy Thornley wrote:
[snip]
 
 Incidentally, postgres heap files suffer really, really bad fragmentation,
 which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...)
 quite drastically. We have in-house patches that somewhat alleiviate this,
 but they are not release quality. Has anybody else suffered this?
 

Any chance I could give those patches a try?  I'm interested in seeing
how they may affect our DBT-3 workload, which execute DSS type queries.

Thanks,
Mark

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] O_DIRECT setting

2004-09-23 Thread Bruce Momjian

TODO has:

* Consider use of open/fcntl(O_DIRECT) to minimize OS caching

Should the item be removed?

---

Neil Conway wrote:
 On Mon, 2004-09-20 at 17:57, Guy Thornley wrote:
  According to the manpage, O_DIRECT implies O_SYNC:
  
  File I/O is done directly to/from user space buffers.  The I/O is
  synchronous, i.e., at the completion of the read(2) or write(2)
  system call, data is guaranteed to have been transferred.
 
 This seems like it would be a rather large net loss. PostgreSQL already
 structures writes so that the writes we need to hit disk immediately
 (WAL records) are fsync()'ed -- the kernel is given more freedom to
 schedule how other writes are flushed from the cache. Also, my
 recollection is that O_DIRECT also disables readahead -- if that's
 correct, that's not what we want either.
 
 BTW, using O_DIRECT has been discussed a few times in the past. Have you
 checked the list archives? (for both -performance and -hackers)
 
  Would people be interested in a performance benchmark?
 
 Sure -- I'd definitely be curious, although as I said I'm skeptical it's
 a win.
 
  I need some benchmark tips :)
 
 Some people have noted that it can be difficult to use contrib/pgbench
 to get reproducible results -- you might want to look at Jan's TPC-W
 implementation or the OSDL database benchmarks:
 
 http://pgfoundry.org/projects/tpc-w-php/
 http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/
 
  Incidentally, postgres heap files suffer really, really bad fragmentation,
  which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...)
  quite drastically. We have in-house patches that somewhat alleiviate this,
  but they are not release quality.
 
 Can you elaborate on these in-house patches?
 
 -Neil
 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] O_DIRECT setting

2004-09-23 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 TODO has:
   * Consider use of open/fcntl(O_DIRECT) to minimize OS caching
 Should the item be removed?

I think it's fine ;-) ... it says consider it, not do it.  The point
is that we could do with more research in this area, even if O_DIRECT
per se is not useful.  Maybe you could generalize the entry to
investigate ways of fine-tuning OS caching behavior.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] O_DIRECT setting

2004-09-22 Thread Neil Conway
On Mon, 2004-09-20 at 17:57, Guy Thornley wrote:
 According to the manpage, O_DIRECT implies O_SYNC:
 
 File I/O is done directly to/from user space buffers.  The I/O is
 synchronous, i.e., at the completion of the read(2) or write(2)
 system call, data is guaranteed to have been transferred.

This seems like it would be a rather large net loss. PostgreSQL already
structures writes so that the writes we need to hit disk immediately
(WAL records) are fsync()'ed -- the kernel is given more freedom to
schedule how other writes are flushed from the cache. Also, my
recollection is that O_DIRECT also disables readahead -- if that's
correct, that's not what we want either.

BTW, using O_DIRECT has been discussed a few times in the past. Have you
checked the list archives? (for both -performance and -hackers)

 Would people be interested in a performance benchmark?

Sure -- I'd definitely be curious, although as I said I'm skeptical it's
a win.

 I need some benchmark tips :)

Some people have noted that it can be difficult to use contrib/pgbench
to get reproducible results -- you might want to look at Jan's TPC-W
implementation or the OSDL database benchmarks:

http://pgfoundry.org/projects/tpc-w-php/
http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/

 Incidentally, postgres heap files suffer really, really bad fragmentation,
 which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...)
 quite drastically. We have in-house patches that somewhat alleiviate this,
 but they are not release quality.

Can you elaborate on these in-house patches?

-Neil



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] O_DIRECT setting

2004-09-20 Thread Guy Thornley
A recent comment on this (or perhaps another?) mailing list about Sun boxen
and the directio mount option has prompted me to read about O_DIRECT on the
open() manpage.

Has anybody tried this option? Ever taken any performance measurements?
I assume the way postgres manages its buffer memory (dealing with 8kB pages)
would be compatible with the restrictions:

Under  Linux  2.4 transfer  sizes,  and the alignment of user buffer
and file offset must all be multiples of the logical block size of
the file system.

According to the manpage, O_DIRECT implies O_SYNC:

File I/O is done directly to/from user space buffers.  The I/O is
synchronous, i.e., at the completion of the read(2) or write(2)
system call, data is guaranteed to have been transferred.

At the moment I am fairly interested in trying this, and I would spend some
time with it, but I have my hands full with other projects. I'd imagine this
is more use with the revamped buffer manager in PG8.0 than the 7.x line, but
we are not using PG8.0 here yet.

Would people be interested in a performance benchmark? I need some benchmark
tips :)

Incidentally, postgres heap files suffer really, really bad fragmentation,
which affects sequential scan operations (VACUUM, ANALYZE, REINDEX ...)
quite drastically. We have in-house patches that somewhat alleiviate this,
but they are not release quality. Has anybody else suffered this?

Guy Thornley


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match