Re: [PERFORM] [HACKERS] fsync method checking

2004-03-26 Thread markw
On 26 Mar, Bruce Momjian wrote:
 [EMAIL PROTECTED] wrote:
 On 26 Mar, Manfred Spraul wrote:
  [EMAIL PROTECTED] wrote:
  
 Compare file sync methods with one 8k write:
 (o_dsync unavailable)  
 open o_sync, write   6.270724
 write, fdatasync13.275225
 write, fsync,   13.359847
   
 
  Odd. Which filesystem, which kernel? It seems fdatasync is broken and 
  syncs the inode, too.
 
 It's linux-2.6.5-rc1 with ext2 filesystems.
 
 Would you benchmark open_sync for wal_sync_method too?

Oh yeah.  Will try to get results later today.

Mark 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-26 Thread Bruce Momjian
[EMAIL PROTECTED] wrote:
 On 26 Mar, Manfred Spraul wrote:
  [EMAIL PROTECTED] wrote:
  
 Compare file sync methods with one 8k write:
 (o_dsync unavailable)  
 open o_sync, write   6.270724
 write, fdatasync13.275225
 write, fsync,   13.359847
   
 
  Odd. Which filesystem, which kernel? It seems fdatasync is broken and 
  syncs the inode, too.
 
 It's linux-2.6.5-rc1 with ext2 filesystems.

Would you benchmark open_sync for wal_sync_method too?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-26 Thread markw
On 26 Mar, Manfred Spraul wrote:
 [EMAIL PROTECTED] wrote:
 
Compare file sync methods with one 8k write:
(o_dsync unavailable)  
open o_sync, write   6.270724
write, fdatasync13.275225
write, fsync,   13.359847
  

 Odd. Which filesystem, which kernel? It seems fdatasync is broken and 
 syncs the inode, too.

It's linux-2.6.5-rc1 with ext2 filesystems.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-26 Thread Steve Atkins
On Fri, Mar 26, 2004 at 07:25:53AM +0100, Manfred Spraul wrote:

 Compare file sync methods with one 8k write:
(o_dsync unavailable)  
open o_sync, write   6.270724
write, fdatasync13.275225
write, fsync,   13.359847
  
 
 Odd. Which filesystem, which kernel? It seems fdatasync is broken and 
 syncs the inode, too.

This may be relevant.

From the man page for fdatasync on a moderately recent RedHat installation:

  BUGS
   Currently (Linux 2.2) fdatasync is equivalent to fsync.

Cheers,
  Steve

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-25 Thread markw
On 25 Mar, Manfred Spraul wrote:
 Tom Lane wrote:
 
[EMAIL PROTECTED] writes:
  

I could certainly do some testing if you want to see how DBT-2 does.
Just tell me what to do. ;)



Just do some runs that are identical except for the wal_sync_method
setting.  Note that this should not have any impact on SELECT
performance, only insert/update/delete performance.
  

 I've made a test run that compares fsync and fdatasync: The performance 
 was identical:
 - with fdatasync:
 
 http://khack.osdl.org/stp/290607/
 
 - with fsync:
 http://khack.osdl.org/stp/290483/
 
 I don't understand why. Mark - is there a battery backed write cache in 
 the raid controller, or something similar that might skew the results? 
 The test generates quite a lot of wal traffic - around 1.5 MB/sec. 
 Perhaps the writes are so large that the added overhead of syncing the 
 inode is not noticable?
 Is the pg_xlog directory on a seperate drive?
 
 Btw, it's possible to request such tests through the web-interface, see
 http://www.osdl.org/lab_activities/kernel_testing/stp/script_param.html

We have 2 Adaptec 2200s controllers, without the battery backed add-on,
connected to four 10-disk arrays in those systems.  I can't think of
anything off hand that would skew the results.

The pg_xlog directory is not on a separate drive.  I haven't found the
best way to lay out of the drives on those systems yet, so I just have
everything on a 28 drive lvm2 volume.

Mark

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-25 Thread Bruce Momjian
[EMAIL PROTECTED] wrote:
  I've made a test run that compares fsync and fdatasync: The performance 
  was identical:
  - with fdatasync:
  
  http://khack.osdl.org/stp/290607/
  
  - with fsync:
  http://khack.osdl.org/stp/290483/
  
  I don't understand why. Mark - is there a battery backed write cache in 
  the raid controller, or something similar that might skew the results? 
  The test generates quite a lot of wal traffic - around 1.5 MB/sec. 
  Perhaps the writes are so large that the added overhead of syncing the 
  inode is not noticable?
  Is the pg_xlog directory on a seperate drive?
  
  Btw, it's possible to request such tests through the web-interface, see
  http://www.osdl.org/lab_activities/kernel_testing/stp/script_param.html
 
 We have 2 Adaptec 2200s controllers, without the battery backed add-on,
 connected to four 10-disk arrays in those systems.  I can't think of
 anything off hand that would skew the results.
 
 The pg_xlog directory is not on a separate drive.  I haven't found the
 best way to lay out of the drives on those systems yet, so I just have
 everything on a 28 drive lvm2 volume.

We don't actually extend the WAL file during writes (preallocated), and
the access/modification timestamp is only in seconds, so I wonder of the
OS only updates the inode once a second.  What else would change in the
inode more frequently than once a second?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-25 Thread Josh Berkus
Bruce,

 We don't actually extend the WAL file during writes (preallocated), and
 the access/modification timestamp is only in seconds, so I wonder of the
 OS only updates the inode once a second.  What else would change in the
 inode more frequently than once a second?

What about really big writes, when WAL files are getting added/recycled?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-25 Thread markw
On 22 Mar, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
 I could certainly do some testing if you want to see how DBT-2 does.
 Just tell me what to do. ;)
 
 Just do some runs that are identical except for the wal_sync_method
 setting.  Note that this should not have any impact on SELECT
 performance, only insert/update/delete performance.

Ok, here are the results I have from my 4-way xeon system, a 14 disk
volume for the log and a 52 disk volume for everything else:
http://developer.osdl.org/markw/pgsql/wal_sync_method.html

7.5devel-200403222  

wal_sync_method metric
default (fdatasync) 1935.28
fsync   1613.92

# ./test_fsync -f /opt/pgdb/dbt2/pg_xlog/test.out
Simple write timing:
write0.018787

Compare fsync times on write() and non-write() descriptor:
(If the times are similar, fsync() can sync data written
 on a different descriptor.)
write, fsync, close 13.057781
write, close, fsync 13.311313

Compare one o_sync write to two:
one 16k o_sync write 6.515122
two 8k o_sync writes12.455124

Compare file sync methods with one 8k write:
(o_dsync unavailable)  
open o_sync, write   6.270724
write, fdatasync13.275225
write, fsync,   13.359847

Compare file sync methods with 2 8k writes:
(o_dsync unavailable)  
open o_sync, write  12.479563
write, fdatasync13.651709
write, fsync,   14.000240

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-25 Thread Manfred Spraul
[EMAIL PROTECTED] wrote:

Compare file sync methods with one 8k write:
   (o_dsync unavailable)  
   open o_sync, write   6.270724
   write, fdatasync13.275225
   write, fsync,   13.359847
 

Odd. Which filesystem, which kernel? It seems fdatasync is broken and 
syncs the inode, too.

--
   Manfred
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-24 Thread Manfred Spraul
Tom Lane wrote:

[EMAIL PROTECTED] writes:
 

I could certainly do some testing if you want to see how DBT-2 does.
Just tell me what to do. ;)
   

Just do some runs that are identical except for the wal_sync_method
setting.  Note that this should not have any impact on SELECT
performance, only insert/update/delete performance.
 

I've made a test run that compares fsync and fdatasync: The performance 
was identical:
- with fdatasync:

http://khack.osdl.org/stp/290607/

- with fsync:
http://khack.osdl.org/stp/290483/
I don't understand why. Mark - is there a battery backed write cache in 
the raid controller, or something similar that might skew the results? 
The test generates quite a lot of wal traffic - around 1.5 MB/sec. 
Perhaps the writes are so large that the added overhead of syncing the 
inode is not noticable?
Is the pg_xlog directory on a seperate drive?

Btw, it's possible to request such tests through the web-interface, see
http://www.osdl.org/lab_activities/kernel_testing/stp/script_param.html
--
   Manfred
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-22 Thread markw
On 18 Mar, Tom Lane wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
 1) This is an OSS project.   Why not just recruit a bunch of people on 
 PERFORMANCE and GENERAL to test the 4 different synch methods using real 
 databases?   No test like reality, I say 
 
 I agree --- that is likely to yield *far* more useful results than
 any standalone test program, for the purpose of finding out what
 wal_sync_method to use in real databases.  However, there's a second
 issue here: we would like to move sync/checkpoint responsibility into
 the bgwriter, and that requires knowing whether it's valid to let one
 process fsync on behalf of writes that were done by other processes.
 That's got nothing to do with WAL sync performance.  I think that it
 would be sensible to make a test program that focuses on this one
 specific question.  (There has been some handwaving to the effect that
 everybody knows this is safe on Unixen, but I question whether the
 handwavers have seen the internals of HPUX or AIX for instance; and
 besides we need to worry about Windows now.)

I could certainly do some testing if you want to see how DBT-2 does.
Just tell me what to do. ;)

Mark

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-22 Thread Tom Lane
[EMAIL PROTECTED] writes:
 I could certainly do some testing if you want to see how DBT-2 does.
 Just tell me what to do. ;)

Just do some runs that are identical except for the wal_sync_method
setting.  Note that this should not have any impact on SELECT
performance, only insert/update/delete performance.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-22 Thread Bruce Momjian
[EMAIL PROTECTED] wrote:
 On 18 Mar, Tom Lane wrote:
  Josh Berkus [EMAIL PROTECTED] writes:
  1) This is an OSS project.   Why not just recruit a bunch of people on 
  PERFORMANCE and GENERAL to test the 4 different synch methods using real 
  databases?   No test like reality, I say 
  
  I agree --- that is likely to yield *far* more useful results than
  any standalone test program, for the purpose of finding out what
  wal_sync_method to use in real databases.  However, there's a second
  issue here: we would like to move sync/checkpoint responsibility into
  the bgwriter, and that requires knowing whether it's valid to let one
  process fsync on behalf of writes that were done by other processes.
  That's got nothing to do with WAL sync performance.  I think that it
  would be sensible to make a test program that focuses on this one
  specific question.  (There has been some handwaving to the effect that
  everybody knows this is safe on Unixen, but I question whether the
  handwavers have seen the internals of HPUX or AIX for instance; and
  besides we need to worry about Windows now.)
 
 I could certainly do some testing if you want to see how DBT-2 does.
 Just tell me what to do. ;)

To test, you would run from CVS version src/tools/fsync, find the
fastest fsync method from the last group of outputs, then try the
wal_fsync_method setting to see if the one that tools/fsync says is
fastest is actually fastest.  However, it might be better to run your
tests and get some indication of how frequently writes and fsync's are
going to WAL and modify tools/fsync to match what your DBT-2 test does.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-19 Thread Kevin Brown
I wrote:
 Note, too, that the preferred method isn't likely to depend just on the
 operating system, it's likely to depend also on the filesystem type
 being used.
 
 Linux provides quite a few of them: ext2, ext3, jfs, xfs, and reiserfs,
 and that's just off the top of my head.  I imagine the performance of
 the various syncing methods will vary significantly between them.

For what it's worth, my database throughput for transactions involving
a lot of inserts, updates, and deletes is about 12% faster using
fdatasync() than O_SYNC under Linux using JFS.

I'll run the test program and report my results with it as well, so
we'll be able to see if there's any consistency between it and the live
database.




-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-18 Thread Josh Berkus
Tom, Bruce,

 My previous point about checking different fsync spacings corresponds to
 different assumptions about average transaction size.  I think a useful
 tool for determining wal_sync_method has got to be able to reflect that
 range of possibilities.

Questions:
1) This is an OSS project.   Why not just recruit a bunch of people on 
PERFORMANCE and GENERAL to test the 4 different synch methods using real 
databases?   No test like reality, I say 

2) Won't Jan's work on 7.5 memory and I/O management mean that we have to 
re-evaluate synching anyway?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-18 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 1) This is an OSS project.   Why not just recruit a bunch of people on 
 PERFORMANCE and GENERAL to test the 4 different synch methods using real 
 databases?   No test like reality, I say 

I agree --- that is likely to yield *far* more useful results than
any standalone test program, for the purpose of finding out what
wal_sync_method to use in real databases.  However, there's a second
issue here: we would like to move sync/checkpoint responsibility into
the bgwriter, and that requires knowing whether it's valid to let one
process fsync on behalf of writes that were done by other processes.
That's got nothing to do with WAL sync performance.  I think that it
would be sensible to make a test program that focuses on this one
specific question.  (There has been some handwaving to the effect that
everybody knows this is safe on Unixen, but I question whether the
handwavers have seen the internals of HPUX or AIX for instance; and
besides we need to worry about Windows now.)

A third reason for having a simple test program is to confirm whether
your drives are syncing at all (cf. hdparm discussion).

 2) Won't Jan's work on 7.5 memory and I/O management mean that we have to 
 re-evaluate synching anyway?

So far nothing's been done that touches WAL writing.  However, I am
thinking about making the bgwriter process take some of the load of
writing WAL buffers (right now it only writes data-file buffers).
And you're right, after that happens we will need to re-measure.
The open flags will probably become considerably more attractive than
they are now, if the bgwriter handles most non-commit writes of WAL.
(We might also think of letting the bgwriter use a different sync method
than the backends do.)

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-18 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Well, I wrote the program to allow testing.  I don't see a complex test
 as being that much better than simple one.  We don't need accurate
 numbers.  We just need to know if fsync or O_SYNC is faster.

Faster than what?  The thing everyone is trying to point out here is
that it depends on context, and we have little faith that this test
program creates a context similar to a live Postgres database.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-18 Thread Kevin Brown
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Well, I wrote the program to allow testing.  I don't see a complex test
  as being that much better than simple one.  We don't need accurate
  numbers.  We just need to know if fsync or O_SYNC is faster.
 
 Faster than what?  The thing everyone is trying to point out here is
 that it depends on context, and we have little faith that this test
 program creates a context similar to a live Postgres database.

Note, too, that the preferred method isn't likely to depend just on the
operating system, it's likely to depend also on the filesystem type
being used.

Linux provides quite a few of them: ext2, ext3, jfs, xfs, and reiserfs,
and that's just off the top of my head.  I imagine the performance of
the various syncing methods will vary significantly between them.


It seems reasonable to me that decisions such as which sync method to
use should initially be made at installation time: have the test program
run on the target filesystem as part of the installation process, and
build the initial postgresql.conf based on the results.  You might even
be able to do some additional testing such as measuring the difference
between random block access and sequential access, and again feed the
results into the postgresql.conf file.  This is no substitute for
experience with the platform, but I expect it's likely to get you closer
to something optimal than doing nothing.  The only question, of course,
is whether or not it's worth going to the effort when it may or may not
gain you a whole lot.  Answering that is going to require some
experimentation with such an automatic configuration system.



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] fsync method checking

2004-03-18 Thread Bruce Momjian
Tom Lane wrote:
  It really just shows whether the fsync fater the close has similar
  timing to the one before the close.  That was the best way I could think
  to test it.
 
 Sure, but where's the separate process part?  What this seems to test
 is whether a single process can sync its own writes through a different
 file descriptor; which is interesting but by no means the only thing we
 need to be sure of if we want to make the bgwriter handle syncing.

I am not sure how to easily test if a separate process can do the same. 
I am sure it can be done, but for me it was enough to see that it works
in a single process.  Unix isn't very process-centered for I/O, so I
don't think it would make much of a difference.  Now, Win32, that might
be an issue.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]