Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-19 Thread Jignesh Shah
On Tue, Nov 16, 2010 at 8:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 Well, we're not going to increase the default to gigabytes, but we could
 very probably increase it by a factor of 10 or so without anyone
 squawking.  It's been awhile since I heard of anyone trying to run PG in
 4MB shmmax.  How much would a change of that size help?

 Last I checked, though, this comes out of the allocation available to
 shared_buffers.  And there definitely are several OSes (several linuxes,
 OSX) still limited to 32MB by default.

 Sure, but the current default is a measly 64kB.  We could increase that
 10x for a relatively small percentage hit in the size of shared_buffers,
 if you suppose that there's 32MB available.  The current default is set
 to still work if you've got only a couple of MB in SHMMAX.

 What we'd want is for initdb to adjust the setting as part of its
 probing to see what SHMMAX is set to.

                        regards, tom lane



In all the performance tests that I have done, generally I get a good
bang for the buck with wal_buffers set to 512kB in low memory cases
and mostly I set it to 1MB which is probably enough for most of the
cases even with high memory.

That 1/2 MB wont make drastic change on shared_buffers anyway (except
for edge cases) but will relieve the stress quite a bit on wal
buffers.

Regards,
Jignesh

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-17 Thread Scott Carey

On Nov 16, 2010, at 4:05 PM, Mladen Gogala wrote:

 Josh Berkus wrote:
 On 11/16/10 12:39 PM, Greg Smith wrote:
 
 I want to next go through and replicate some of the actual database
 level tests before giving a full opinion on whether this data proves
 it's worth changing the wal_sync_method detection.  So far I'm torn
 between whether that's the right approach, or if we should just increase
 the default value for wal_buffers to something more reasonable.
 
 
 We'd love to, but wal_buffers uses sysV shmem.
 
 
 Speaking of the SYSV SHMEM, is it possible to use huge pages?

RHEL 6  and friends have transparent hugepage support.  I'm not sure if they 
yet transparently do it for SYSV SHMEM, but they do for most everything else.  
Sequential traversal of a process heap is several times faster with hugepages.  
Unfortunately, postgres doesn't organize its blocks in its shared_mem to be 
sequential for a relation.  So it might not matter much.

 
 -- 
 
 Mladen Gogala 
 Sr. Oracle DBA
 1500 Broadway
 New York, NY 10036
 (212) 329-5251
 http://www.vmsinfo.com 
 The Leader in Integrated Media Intelligence Solutions
 
 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-17 Thread Scott Carey

On Nov 16, 2010, at 12:39 PM, Greg Smith wrote:
 
 $ ./test_fsync
 Loops = 1
 
 Simple write:
8k write  88476.784/second
 
 Compare file sync methods using one write:
(unavailable: open_datasync)
open_sync 8k write 1192.135/second
8k write, fdatasync1222.158/second
8k write, fsync1097.980/second
 
 Compare file sync methods using two writes:
(unavailable: open_datasync)
2 open_sync 8k writes   527.361/second
8k write, 8k write, fdatasync  1105.204/second
8k write, 8k write, fsync  1084.050/second
 
 Compare open_sync with different sizes:
open_sync 16k write 966.047/second
2 open_sync 8k writes   529.565/second
 
 Test if fsync on non-write file descriptor is honored:
 (If the times are similar, fsync() can sync data written
 on a different descriptor.)
8k write, fsync, close 1064.177/second
8k write, close, fsync 1042.337/second
 
 Two notable things here.  One, there is no open_datasync defined in this
 older kernel.  Two, all methods of commit give equally inflated commit
 rates, far faster than the drive is capable of.  This proves this setup
 isn't flushing the drive's write cache after commit.

Nit: there is no open_sync, only open_dsync.  Prior to recent kernels, only 
(semantically) open_dsync exists, labeled as open_sync.  New kernels move that 
code to open_datasync and nave a NEW open_sync that supposedly flushes metadata 
properly.   

 
 You can get safe behavior out of the old kernel by disabling its write
 cache:
 
 $ sudo /sbin/hdparm -W0 /dev/sda
 
 /dev/sda:
 setting drive write-caching to 0 (off)
 write-caching =  0 (off)
 
 Loops = 1
 
 Simple write:
8k write  89023.413/second
 
 Compare file sync methods using one write:
(unavailable: open_datasync)
open_sync 8k write  106.968/second
8k write, fdatasync 108.106/second
8k write, fsync 104.238/second
 
 Compare file sync methods using two writes:
(unavailable: open_datasync)
2 open_sync 8k writes51.637/second
8k write, 8k write, fdatasync   109.256/second
8k write, 8k write, fsync   103.952/second
 
 Compare open_sync with different sizes:
open_sync 16k write 109.562/second
2 open_sync 8k writes52.752/second
 
 Test if fsync on non-write file descriptor is honored:
 (If the times are similar, fsync() can sync data written
 on a different descriptor.)
8k write, fsync, close  107.179/second
8k write, close, fsync  106.923/second
 
 And now results are as expected:  just under 120/second.
 
 Onto RHEL6.  Setup for this initial test was:
 
 $ uname -a
 Linux meddle 2.6.32-44.1.el6.x86_64 #1 SMP Wed Jul 14 18:51:29 EDT 2010
 x86_64 x86_64 x86_64 GNU/Linux
 $ cat /etc/redhat-release
 Red Hat Enterprise Linux Server release 6.0 Beta (Santiago)
 $ mount
 /dev/sda7 on / type ext4 (rw)
 
 And I started with the write cache off to see a straight comparison
 against the above:
 
 $ sudo hdparm -W0 /dev/sda
 
 /dev/sda:
 setting drive write-caching to 0 (off)
 write-caching =  0 (off)
 $ ./test_fsync
 Loops = 1
 
 Simple write:
8k write  104194.886/second
 
 Compare file sync methods using one write:
open_datasync 8k write   97.828/second
open_sync 8k write  109.158/second
8k write, fdatasync 109.838/second
8k write, fsync  20.872/second

fsync is working now!  flushing metadata properly reduces performance.
However, shouldn't open_sync slow down vs open_datasync too and be similar to 
fsync?

Did you recompile your test on the RHEL6 system?  
Code compiled on newer kernels will see O_DSYNC and O_SYNC as two separate 
sentinel values, lets call them 1 and 2 respectively.  Code compiled against 
earlier kernels will see both O_DSYNC and O_SYNC as the same value, 1.  So code 
compiled against older kernels, asking for O_SYNC on a newer kernel will 
actually get O_DSYNC behavior!  This was intended.  I can't find the link to 
the mail, but it was Linus' idea to make old code that expected the 'faster but 
incorrect' behavior to retain it on newer kernels.  Only a recompile with newer 
header files will trigger the new behavior and expose the 'correct' open_sync 
behavior.

This will be 'fun' for postgres packagers and users -- data reliability 
behavior differs based on what kernel it is compiled against.  Luckily, the 
xlogs only need open_datasync semantics.

 
 Compare file sync methods using two writes:
2 open_datasync 8k writes53.902/second
2 open_sync 8k writes53.721/second
8k write, 8k write, fdatasync   109.731/second
8k write, 8k write, fsync20.918/second
 
 Compare open_sync with different sizes:
open_sync 16k write 109.552/second
2 open_sync 8k writes

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-17 Thread Greg Smith

Scott Carey wrote:
Did you recompile your test on the RHEL6 system? 


On both systems I showed, I checked out a fresh copy of the PostgreSQL 
9.1 HEAD from the git repo, and compiled that on the server, to make 
sure I was pulling in the appropriate kernel headers.  I wasn't aware of 
exactly how the kernel sync stuff was refactored though, thanks for the 
concise update on that.  I can do similar tests on a RHEL5 system, but 
not on the same hardware.  Can only make my laptop boot so many 
operating systems at a time usefully.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-17 Thread Jon Nelson
On Wed, Nov 17, 2010 at 3:24 PM, Greg Smith g...@2ndquadrant.com wrote:
 Scott Carey wrote:

 Did you recompile your test on the RHEL6 system?

 On both systems I showed, I checked out a fresh copy of the PostgreSQL 9.1
 HEAD from the git repo, and compiled that on the server, to make sure I was
 pulling in the appropriate kernel headers.  I wasn't aware of exactly how
 the kernel sync stuff was refactored though, thanks for the concise update
 on that.  I can do similar tests on a RHEL5 system, but not on the same
 hardware.  Can only make my laptop boot so many operating systems at a time
 usefully.

One thing to note is that where on a disk things sit can make a /huge/
difference - depending on if Ubuntu is /here/ and RHEL is /there/ and
so on can make a factor of 2 or more difference.  The outside tracks
of most modern SATA disks can do around 120MB/s. The inside tracks
aren't even half of that.

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-17 Thread Greg Smith

Jon Nelson wrote:

One thing to note is that where on a disk things sit can make a /huge/
difference - depending on if Ubuntu is /here/ and RHEL is /there/ and
so on can make a factor of 2 or more difference.  The outside tracks
of most modern SATA disks can do around 120MB/s. The inside tracks
aren't even half of that.
  


You're talking about changes in sequential read and write speed due to 
Zone Bit Recording (ZBR) AKA Zone Constant Angular Velocity (ZCAV).  
What I was measuring was commit latency time on small writes.  That 
doesn't change as you move around the disk, since it's tied to the raw 
rotation speed of the drive rather than density of storage in any zone.  
If I get to something that's impacted by sequential transfers rather 
than rotation time, I'll be sure to use the same section of disk for 
that.  It wasn't really necessary to get these initial gross numbers 
anyway.  What I was looking for is the about 10:1 speedup seen on this 
hardware when the write cache is used, which could easily be seen even 
were there ZBR differences involved.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-17 Thread Scott Carey

On Nov 17, 2010, at 1:24 PM, Greg Smith wrote:

 Scott Carey wrote:
 Did you recompile your test on the RHEL6 system? 
 
 On both systems I showed, I checked out a fresh copy of the PostgreSQL 
 9.1 HEAD from the git repo, and compiled that on the server, to make 
 sure I was pulling in the appropriate kernel headers.  I wasn't aware of 
 exactly how the kernel sync stuff was refactored though, thanks for the 
 concise update on that.  

Thanks!

So this could be another bug in Linux.  Not entirely surprising.
Since fsync/fdatasync relative performance isn't similar to 
open_sync/open_datasync relative performance on this test there is probably a 
bug that either hurts fsync, or one that is preventing open_sync from dealing 
with metadata properly.   Luckily for the xlog, both of those can be avoided -- 
the real choice is fdatasync vs open_datasync.  And both work in newer kernels 
or break in certain older ones.


 I can do similar tests on a RHEL5 system, but 
 not on the same hardware.  Can only make my laptop boot so many 
 operating systems at a time usefully.

Yeah, I understand.  I might throw this at a RHEL5 system if I get a chance but 
I need one without a RAID card that is not in use.  Hopefully it doesn't turn 
out that fdatasync is write-cache safe but open_sync/open_datasync isn't on 
that platform.  It could impact the choice of a default value.

 
 -- 
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
 PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Greg Smith
Time for a deeper look at what's going on here...I installed RHEL6 Beta 
2 yesterday, on the presumption that since the release version just came 
out this week it was likely the same version Marti tested against.  
Also, it was the one I already had a DVD to install for.  This was on a 
laptop with 7200 RPM hard drive, already containing an Ubuntu 
installation for comparison sake.


Initial testing was done with the PostgreSQL test_fsync utility, just to 
get a gross idea of what situations the drives involved were likely 
flushing data to disk correctly during, and which it was impossible for 
that to be true.  7200 RPM = 120 rotations/second, which puts an upper 
limit of 120 true fsync executions per second.  The test_fsync released 
with PostgreSQL 9.0 now reports its value on the right scale that you 
can directly compare against that (earlier versions reported 
seconds/commit, not commits/second).


First I built test_fsync from inside of an existing PostgreSQL 9.1 HEAD 
checkout:


$ cd [PostgreSQL source code tree]
$ cd src/tools/fsync/
$ make

And I started with looking at the Ubuntu system running ext3, which 
represents the status quo we've been seeing the past few years.  
Initially the drive write cache was turned on:


Linux meddle 2.6.28-19-generic #61-Ubuntu SMP Wed May 26 23:35:15 UTC 
2010 i686 GNU/Linux

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=9.04
DISTRIB_CODENAME=jaunty
DISTRIB_DESCRIPTION=Ubuntu 9.04

/dev/sda5 on / type ext3 (rw,relatime,errors=remount-ro)

$ ./test_fsync
Loops = 1

Simple write:
   8k write  88476.784/second

Compare file sync methods using one write:
   (unavailable: open_datasync)
   open_sync 8k write 1192.135/second
   8k write, fdatasync1222.158/second
   8k write, fsync1097.980/second

Compare file sync methods using two writes:
   (unavailable: open_datasync)
   2 open_sync 8k writes   527.361/second
   8k write, 8k write, fdatasync  1105.204/second
   8k write, 8k write, fsync  1084.050/second

Compare open_sync with different sizes:
   open_sync 16k write 966.047/second
   2 open_sync 8k writes   529.565/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
   8k write, fsync, close 1064.177/second
   8k write, close, fsync 1042.337/second

Two notable things here.  One, there is no open_datasync defined in this 
older kernel.  Two, all methods of commit give equally inflated commit 
rates, far faster than the drive is capable of.  This proves this setup 
isn't flushing the drive's write cache after commit.


You can get safe behavior out of the old kernel by disabling its write 
cache:


$ sudo /sbin/hdparm -W0 /dev/sda

/dev/sda:
setting drive write-caching to 0 (off)
write-caching =  0 (off)

Loops = 1

Simple write:
   8k write  89023.413/second

Compare file sync methods using one write:
   (unavailable: open_datasync)
   open_sync 8k write  106.968/second
   8k write, fdatasync 108.106/second
   8k write, fsync 104.238/second

Compare file sync methods using two writes:
   (unavailable: open_datasync)
   2 open_sync 8k writes51.637/second
   8k write, 8k write, fdatasync   109.256/second
   8k write, 8k write, fsync   103.952/second

Compare open_sync with different sizes:
   open_sync 16k write 109.562/second
   2 open_sync 8k writes52.752/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
   8k write, fsync, close  107.179/second
   8k write, close, fsync  106.923/second

And now results are as expected:  just under 120/second.

Onto RHEL6.  Setup for this initial test was:

$ uname -a
Linux meddle 2.6.32-44.1.el6.x86_64 #1 SMP Wed Jul 14 18:51:29 EDT 2010 
x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.0 Beta (Santiago)
$ mount
/dev/sda7 on / type ext4 (rw)

And I started with the write cache off to see a straight comparison 
against the above:


$ sudo hdparm -W0 /dev/sda

/dev/sda:
setting drive write-caching to 0 (off)
write-caching =  0 (off)
$ ./test_fsync
Loops = 1

Simple write:
   8k write  104194.886/second

Compare file sync methods using one write:
   open_datasync 8k write   97.828/second
   open_sync 8k write  109.158/second
   8k write, fdatasync 109.838/second
   8k write, fsync  20.872/second

Compare file sync methods using two writes:
   2 open_datasync 8k writes53.902/second
   2 open_sync 8k writes53.721/second
   8k write, 8k write, fdatasync   109.731/second
   8k write, 8k write, fsync20.918/second

Compare open_sync with different sizes:
   

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Robert Haas
On Tue, Nov 16, 2010 at 3:39 PM, Greg Smith g...@2ndquadrant.com wrote:
 I want to next go through and replicate some of the actual database level
 tests before giving a full opinion on whether this data proves it's worth
 changing the wal_sync_method detection.  So far I'm torn between whether
 that's the right approach, or if we should just increase the default value
 for wal_buffers to something more reasonable.

How about both?

open_datasync seems problematic for a number of reasons - you get an
immediate write-through whether you need it or not, including, as you
point out, the case where the you want to write several blocks at once
and then force them all out together.

And 64kB for a ring buffer just seems awfully small.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Josh Berkus
On 11/16/10 12:39 PM, Greg Smith wrote:
 I want to next go through and replicate some of the actual database
 level tests before giving a full opinion on whether this data proves
 it's worth changing the wal_sync_method detection.  So far I'm torn
 between whether that's the right approach, or if we should just increase
 the default value for wal_buffers to something more reasonable.

We'd love to, but wal_buffers uses sysV shmem.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 11/16/10 12:39 PM, Greg Smith wrote:
 I want to next go through and replicate some of the actual database
 level tests before giving a full opinion on whether this data proves
 it's worth changing the wal_sync_method detection.  So far I'm torn
 between whether that's the right approach, or if we should just increase
 the default value for wal_buffers to something more reasonable.

 We'd love to, but wal_buffers uses sysV shmem.

Well, we're not going to increase the default to gigabytes, but we could
very probably increase it by a factor of 10 or so without anyone
squawking.  It's been awhile since I heard of anyone trying to run PG in
4MB shmmax.  How much would a change of that size help?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Marti Raudsepp
On Wed, Nov 17, 2010 at 01:31, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, we're not going to increase the default to gigabytes, but we could
 very probably increase it by a factor of 10 or so without anyone
 squawking.  It's been awhile since I heard of anyone trying to run PG in
 4MB shmmax.  How much would a change of that size help?

In my testing, when running a large bulk insert query with fdatasync
on ext4, changing wal_buffers has very little effect:
http://ompldr.org/vNjNiNQ/wal_sync_method1.png

(More details at
http://archives.postgresql.org/pgsql-performance/2010-11/msg00094.php
)

It would take some more testing to say this conclusively, but looking
at the raw data, there only seems to be an effect when moving from 8
to 16MB. Could be different on other file systems though.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Mladen Gogala

Josh Berkus wrote:

On 11/16/10 12:39 PM, Greg Smith wrote:
  

I want to next go through and replicate some of the actual database
level tests before giving a full opinion on whether this data proves
it's worth changing the wal_sync_method detection.  So far I'm torn
between whether that's the right approach, or if we should just increase
the default value for wal_buffers to something more reasonable.



We'd love to, but wal_buffers uses sysV shmem.

  

Speaking of the SYSV SHMEM, is it possible to use huge pages?

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Andres Freund
On Wednesday 17 November 2010 00:31:34 Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  On 11/16/10 12:39 PM, Greg Smith wrote:
  I want to next go through and replicate some of the actual database
  level tests before giving a full opinion on whether this data proves
  it's worth changing the wal_sync_method detection.  So far I'm torn
  between whether that's the right approach, or if we should just increase
  the default value for wal_buffers to something more reasonable.
  
  We'd love to, but wal_buffers uses sysV shmem.
 
 Well, we're not going to increase the default to gigabytes
Especially not as I don't think it will have any effect after wal_segment_size 
as that will force a write-out anyway. Or am I misremembering the 
implementation?

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On Wednesday 17 November 2010 00:31:34 Tom Lane wrote:
 Well, we're not going to increase the default to gigabytes

 Especially not as I don't think it will have any effect after 
 wal_segment_size 
 as that will force a write-out anyway. Or am I misremembering the 
 implementation?

Well, there's a forced fsync after writing the last page of an xlog
file, but I don't believe that proves that more than 16MB of xlog
buffers is useless.  Other processes could still be busy filling the
buffers.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Andres Freund
On Wednesday 17 November 2010 01:51:28 Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On Wednesday 17 November 2010 00:31:34 Tom Lane wrote:
  Well, we're not going to increase the default to gigabytes
  
  Especially not as I don't think it will have any effect after
  wal_segment_size as that will force a write-out anyway. Or am I
  misremembering the implementation?
 
 Well, there's a forced fsync after writing the last page of an xlog
 file, but I don't believe that proves that more than 16MB of xlog
 buffers is useless.  Other processes could still be busy filling the
 buffers.
Maybe I am missing something, but I think the relevant AdvanceXLInsertBuffer() 
is currently called with WALInsertLock held?

Andres


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On Wednesday 17 November 2010 01:51:28 Tom Lane wrote:
 Well, there's a forced fsync after writing the last page of an xlog
 file, but I don't believe that proves that more than 16MB of xlog
 buffers is useless.  Other processes could still be busy filling the
 buffers.

 Maybe I am missing something, but I think the relevant 
 AdvanceXLInsertBuffer() 
 is currently called with WALInsertLock held?

The fsync is associated with the write, which is not done with insert
lock held.  We're not quite that dumb.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Andres Freund
On Wednesday 17 November 2010 02:04:28 Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On Wednesday 17 November 2010 01:51:28 Tom Lane wrote:
  Well, there's a forced fsync after writing the last page of an xlog
  file, but I don't believe that proves that more than 16MB of xlog
  buffers is useless.  Other processes could still be busy filling the
  buffers.
  
  Maybe I am missing something, but I think the relevant
  AdvanceXLInsertBuffer() is currently called with WALInsertLock held?
 
 The fsync is associated with the write, which is not done with insert
 lock held.  We're not quite that dumb.
Ah, I see. The XLogWrite in AdvanceXLInsertBuffer is only happening if the head 
of the buffer gets to the tail - which is more likely if the wal buffers are 
small...

Andres


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Josh Berkus

 Well, we're not going to increase the default to gigabytes, but we could
 very probably increase it by a factor of 10 or so without anyone
 squawking.  It's been awhile since I heard of anyone trying to run PG in
 4MB shmmax.  How much would a change of that size help?

Last I checked, though, this comes out of the allocation available to
shared_buffers.  And there definitely are several OSes (several linuxes,
OSX) still limited to 32MB by default.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Well, we're not going to increase the default to gigabytes, but we could
 very probably increase it by a factor of 10 or so without anyone
 squawking.  It's been awhile since I heard of anyone trying to run PG in
 4MB shmmax.  How much would a change of that size help?

 Last I checked, though, this comes out of the allocation available to
 shared_buffers.  And there definitely are several OSes (several linuxes,
 OSX) still limited to 32MB by default.

Sure, but the current default is a measly 64kB.  We could increase that
10x for a relatively small percentage hit in the size of shared_buffers,
if you suppose that there's 32MB available.  The current default is set
to still work if you've got only a couple of MB in SHMMAX.

What we'd want is for initdb to adjust the setting as part of its
probing to see what SHMMAX is set to.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Robert Haas
On Tue, Nov 16, 2010 at 6:25 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/16/10 12:39 PM, Greg Smith wrote:
 I want to next go through and replicate some of the actual database
 level tests before giving a full opinion on whether this data proves
 it's worth changing the wal_sync_method detection.  So far I'm torn
 between whether that's the right approach, or if we should just increase
 the default value for wal_buffers to something more reasonable.

 We'd love to, but wal_buffers uses sysV shmem.

places tongue firmly in cheek

Gee, too bad there's not some other shared-memory implementation we could use...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-14 Thread Marti Raudsepp
On Sat, Nov 13, 2010 at 20:01, Tom Lane t...@sss.pgh.pa.us wrote:
 What's your basis for asserting he's uninterested?  Please have a little
 patience.

My apologies, I was under the impression that he hadn't answered your
request, but he did in the -hackers thread.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-13 Thread Marti Raudsepp
On Mon, Nov 8, 2010 at 20:40, Tom Lane t...@sss.pgh.pa.us wrote:
 The latter choice is the one that requires testing to prove that it is the 
 proper and preferred default from the performance and data reliability POV.

 And, in fact, the game plan is to do that testing and see which default
 we want.  I think it's premature to argue further about this until we
 have some test results.

Who will be doing that testing? You said you're relying on Greg Smith
to manage the testing, but he's obviously uninterested, so it seems
unlikely that this will go anywhere.

I posted my results with the simple INSERT test, but nobody cared. I
could do some pgbench runs, but I have no idea what parameters would
give useful results.

Meanwhile, PostgreSQL performance is regressing and there's still no
evidence that open_datasync is any safer.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-13 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 On Mon, Nov 8, 2010 at 20:40, Tom Lane t...@sss.pgh.pa.us wrote:
 And, in fact, the game plan is to do that testing and see which default
 we want.  I think it's premature to argue further about this until we
 have some test results.

 Who will be doing that testing? You said you're relying on Greg Smith
 to manage the testing, but he's obviously uninterested, so it seems
 unlikely that this will go anywhere.

What's your basis for asserting he's uninterested?  Please have a little
patience.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-08 Thread Marti Raudsepp
On Mon, Nov 8, 2010 at 02:05, Greg Smith g...@2ndquadrant.com wrote:
 Where's your benchmarks proving it then?  If you're right about this, and
 I'm not saying you aren't, it should be obvious in simple bechmarks by
 stepping through various sizes for wal_buffers and seeing the
 throughput/latency situation improve.

Since benchmarking is the easy part, I did that. I plotted the time
taken by inserting 2 million rows to a table with a single integer
column and no indexes (total 70MB). Entire script is attached. If you
don't agree with something in this benchmark, please suggest
improvements.

Chart: http://ompldr.org/vNjNiNQ/wal_sync_method1.png
Spreadsheet: http://ompldr.org/vNjNiNg/wal_sync_method1.ods (the 2nd
worksheet has exact measurements)

This is a different machine from the original post, but similar
configuration. One 1TB 7200RPM Seagate Barracuda, no disk controller
cache, 4G RAM, Phenom X4, Linux 2.6.36, PostgreSQL 9.0.1, Arch Linux.

This time I created a separate 20GB ext4 partition specially for
PostgreSQL, with all default settings (shared_buffers=32MB). The
partition is near the end of the disk, so hdparm gives a sequential
read throughput of ~72 MB/s. I'm getting frequent checkpoint warnings,
should I try larger checkpoing_segments too?

The partition is re-created and 'initdb' is re-ran for each test, to
prevent file system allocation from affecting results. I did two runs
of all benchmarks. The points on the graph show a sum of INSERT time +
COMMIT time in seconds.

One surprising thing on the graph is a plateau, where open_datasync
performs almost equally with wal_buffers=128kB and 256kB.

Another noteworthy difference (not visible on the graph) is that with
open_datasync -- but not fdatasync -- and wal_buffers=128M, INSERT
time keeps shrinking, but COMMIT takes longer. The total INSERT+COMMIT
time remains the same, however.



I have a few expendable hard drives here so I can test reliability by
pulling the SATA cable as well. Is this kind of testing useful? What
workloads do you suggest?

Regards,
Marti


pgtest.sh
Description: Bourne shell script

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-08 Thread Scott Carey

On Nov 7, 2010, at 6:35 PM, Marti Raudsepp wrote:

 On Mon, Nov 8, 2010 at 01:35, Greg Smith g...@2ndquadrant.com wrote:
 Yes; it's supposed to, and that logic works fine on some other platforms.
 
 No, the logic was broken to begin with. Linux technically supported
 O_DSYNC all along. PostgreSQL used fdatasync as the default. Now,
 because Linux added proper O_SYNC support, PostgreSQL suddenly prefers
 O_DSYNC over fdatasync?
 
 Until you've
 quantified which of the cases do that--which is required for reliable
 operation of PostgreSQL--and which don't, you don't have any data that can
 be used to draw a conclusion from.  If some setups are faster because they
 write less reliably, that doesn't automatically make them the better choice.
 
 I don't see your point. If fdatasync worked on Linux, AS THE DEFAULT,
 all the time until recently, then how does it all of a sudden need
 proof NOW?
 
 If anything, the new open_datasync should be scrutinized because it
 WASN'T the default before and it hasn't gotten as much testing on
 Linux.
 

I agree.  Im my opinion, the burden of proof lies with those contending that 
the default value should _change_ from fdatasync to O_DSYNC on linux.  If the 
default changes, all power-fail testing and other reliability tests done prior 
on a hardware configuration may become invalid without users even knowing.

Unfortunately, a code change in postgres is required to _prevent_ the default 
from changing when compiled and run against the latest kernels.

Summary:
Until recently, there was code with a code comment in the Linux kernel that 
said For now, when the user asks for O_SYNC, we'll actually give O_DSYNC.  
Linux has had O_DSYNC forever and ever, but not O_SYNC.  
If O_DSYNC is preferred over fdatasync for Postgres xlog (as the code 
indicates), it should have been the preferred for years on Linux as well.  If 
fdatasync has been the preferred method on Linux, and the O_SYNC = O_DSYNC test 
was for that, then the purpose behind the test has broken.  

No matter how you slice it, the default on Linux is implicitly changing and the 
choice is to either:
 * Return the default to fdatasync
 * Let it implicitly change to O_DSYNC

The latter choice is the one that requires testing to prove that it is the 
proper and preferred default from the performance and data reliability POV.  
The former is the status quo -- but requires a code change.






 Regards,
 Marti
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-08 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 No matter how you slice it, the default on Linux is implicitly changing and 
 the choice is to either:
  * Return the default to fdatasync
  * Let it implicitly change to O_DSYNC

 The latter choice is the one that requires testing to prove that it is the 
 proper and preferred default from the performance and data reliability POV.

And, in fact, the game plan is to do that testing and see which default
we want.  I think it's premature to argue further about this until we
have some test results.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-08 Thread Greg Smith

Scott Carey wrote:

Im my opinion, the burden of proof lies with those contending that the default 
value should _change_ from fdatasync to O_DSYNC on linux.  If the default 
changes, all power-fail testing and other reliability tests done prior on a 
hardware configuration may become invalid without users even knowing.
  


This seems to be ignoring the fact that unless you either added a 
non-volatile cache or specifically turned off all write caching on your 
drives, the results of all power-fail testing done on earlier versions 
of Linux was that it failed.  The default configuration of PostgreSQL on 
Linux has been that any user who has a simple SATA drive gets unsafe 
writes, unless they go out of their way to prevent them.


Whatever newer kernels do by default cannot be worse.  The open question 
is whether it's still broken, in which case we might as well favor the 
known buggy behavior rather than the new one, or whether everything has 
improved enough to no longer be unsafe with the new defaults.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-08 Thread Andres Freund
Hi,

On Monday 08 November 2010 23:12:57 Greg Smith wrote:
 This seems to be ignoring the fact that unless you either added a 
 non-volatile cache or specifically turned off all write caching on your 
 drives, the results of all power-fail testing done on earlier versions 
 of Linux was that it failed.  The default configuration of PostgreSQL on 
 Linux has been that any user who has a simple SATA drive gets unsafe 
 writes, unless they go out of their way to prevent them.
Which is about *no* argument in favor of any of the options, right?

 Whatever newer kernels do by default cannot be worse.  The open question 
 is whether it's still broken, in which case we might as well favor the 
 known buggy behavior rather than the new one, or whether everything has 
 improved enough to no longer be unsafe with the new defaults.
Either I majorly misunderstand you, or ... I dont know.

There simply *is* no new implementation relevant for this discussion. Full 
Stop. What changed is that O_DSYNC is defined differently from O_SYNC these 
days 
and O_SYNC actually does what it should. Which causes pg to move open_datasync 
first in the preference list doing what the option with the lowest preference 
did up to now.

That does not *at all* change the earlier fdatasync() or fsync() 
implementations/tests. It simply makes open_datasync the default doing what 
open_sync did earlier.
For that note that open_sync was the method of *least* preference till now... 
And that fdatasync() thus was the default till now. Which it is not anymore.

I don't argue *at all* that we have to test the change moving fdatasync before 
open_datasync on the *other* operating systems. What I completely don't get is 
all that talking about data consistency on linux. Its simply irrelevant in 
that context.

Andres




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-07 Thread Greg Smith

Marti Raudsepp wrote:

I will grant you that the details were wrong, but I stand by the conclusion.
I can state for a fact that PostgreSQL's default wal_sync_method
varies depending on the fcntl.h header.
  


Yes; it's supposed to, and that logic works fine on some other 
platforms.  The question is exactly what the new Linux O_DSYNC behavior 
is doing, in regards to whether it flushes drive caches out or not.  
Until you've quantified which of the cases do that--which is required 
for reliable operation of PostgreSQL--and which don't, you don't have 
any data that can be used to draw a conclusion from.  If some setups are 
faster because they write less reliably, that doesn't automatically make 
them the better choice.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-07 Thread Andres Freund
On Monday 08 November 2010 00:35:29 Greg Smith wrote:
 Marti Raudsepp wrote:
  I will grant you that the details were wrong, but I stand by the
  conclusion. I can state for a fact that PostgreSQL's default
  wal_sync_method varies depending on the fcntl.h header.
 
 Yes; it's supposed to, and that logic works fine on some other
 platforms.  The question is exactly what the new Linux O_DSYNC behavior
 is doing, in regards to whether it flushes drive caches out or not.
 Until you've quantified which of the cases do that--which is required
 for reliable operation of PostgreSQL--and which don't, you don't have
 any data that can be used to draw a conclusion from.  If some setups are
 faster because they write less reliably, that doesn't automatically make
 them the better choice.
I think thats FUD. Sorry.

Can you explain to me why fsync() may/should/could be *any* less reliable than 
O_DSYNC? On *any* platform. Or fdatasync() in the special way its used with 
pg, namely completely preallocated files.

I think the reasons why O_DSYNC is, especially, but not only, in combination 
with a small wal_buffers setting, slow in most circumstances are pretty clear.

Making a setting which is only supported on a small range of systems highest 
in the preferences list is even more doubtfull than the already strange choice 
of making O_DSYNC the default given the way it works (i.e. no reordering, 
synchronous writes in the bgwriter, synchronous writes on wal_buffers pressure 
etc).

Greetings,

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-07 Thread Greg Smith

Andres Freund wrote:

I think thats FUD. Sorry.
  


Yes, there's plenty of uncertainty and doubt here, but not from me.  The 
test reports given so far have been so riddled with errors I don't trust 
any of them. 

As a counter example showing my expectations here, the Testing 
Sandforce SSD tests done by Yeb Havinga:  
http://archives.postgresql.org/message-id/4c4a9452.9070...@gmail.com 
followed the right method for confirming both write integrity and 
performance including pull the plug situations.  Those I trusted.  What 
Marti had posted, and what Phoronix investigated, just aren't that thorough.


Can you explain to me why fsync() may/should/could be *any* less reliable than 
O_DSYNC? On *any* platform. Or fdatasync() in the special way its used with 
pg, namely completely preallocated files.
  


If the Linux kernel has done extra work so that O_DSYNC writes are 
forced to disk including a cache flush, but that isn't done for just 
fdatasync() calls, there could be difference here.  The database still 
wouldn't work right in that case, because checkpoint writes are still 
going to be using fdatasync.


I'm not sure what the actual behavior is supposed to be, but ultimately 
it doesn't matter.  The history of the Linux kernel developers in this 
area has been so completely full of bugs and incomplete implementations 
that I am working from the assumption that we know nothing about what 
actually works and what doesn't without doing careful real-world testing.


I think the reasons why O_DSYNC is, especially, but not only, in combination 
with a small wal_buffers setting, slow in most circumstances are pretty clear.
  


Where's your benchmarks proving it then?  If you're right about this, 
and I'm not saying you aren't, it should be obvious in simple bechmarks 
by stepping through various sizes for wal_buffers and seeing the 
throughput/latency situation improve.  But since I haven't seen that 
done, this one is still in the uncertainty  doubt bucket too.  You're 
assuming one of the observed problems corresponds to this theorized 
cause.  But you can't prove a performance change on theory.  You have to 
isolate it and then you'll know.  So long as there are multiple 
uncertainties going on here, I don't have any conclusion yet, just a 
list of things to investigate that's far longer than the list of what's 
been looked at so far.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-07 Thread Marti Raudsepp
On Mon, Nov 8, 2010 at 01:35, Greg Smith g...@2ndquadrant.com wrote:
 Yes; it's supposed to, and that logic works fine on some other platforms.

No, the logic was broken to begin with. Linux technically supported
O_DSYNC all along. PostgreSQL used fdatasync as the default. Now,
because Linux added proper O_SYNC support, PostgreSQL suddenly prefers
O_DSYNC over fdatasync?

 Until you've
 quantified which of the cases do that--which is required for reliable
 operation of PostgreSQL--and which don't, you don't have any data that can
 be used to draw a conclusion from.  If some setups are faster because they
 write less reliably, that doesn't automatically make them the better choice.

I don't see your point. If fdatasync worked on Linux, AS THE DEFAULT,
all the time until recently, then how does it all of a sudden need
proof NOW?

If anything, the new open_datasync should be scrutinized because it
WASN'T the default before and it hasn't gotten as much testing on
Linux.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-05 Thread Greg Smith

Andres Freund wrote:

On Sunday 31 October 2010 20:59:31 Greg Smith wrote:
  
Writes only are sync'd out when you do a commit, or the database does a 
checkpoint.

Hm?  WAL is written out to disk after an the space provided by wal_buffers(def 
8) * XLOG_BLCKSZ (def 8192) is used. The default is 64kb which you reach 
pretty quickly - especially after a checkpoint.


Fair enough; I'm so used to bumping wal_buffers up to 16MB nowadays that 
I forget sometimes that people actually run with the default where this 
becomes an important consideration.



Not having a real O_DSYNC on linux until recently makes it even more dubious 
to have it as a default...
  


If Linux is now defining O_DSYNC, and it's buggy, that's going to break 
more software than just PostgreSQL.  It wasn't defined before because it 
didn't work.  If the kernel developers have made changes to claim it's 
working now, but it doesn't really, I would think they'd consider any 
reports of actual bugs here as important to fix.  There's only so much 
the database can do in the face of incorrect information reported by the 
operating system.


Anyway, I haven't actually seen reports that proves there's any problem 
here, I was just pointing out that we haven't seen any positive reports 
about database stress testing on these kernel versions yet either.  The 
changes here are theoretically the right ones, and defaulting to safe 
writes that flush out write caches is a long-term good thing.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-05 Thread Marti Raudsepp
On Fri, Nov 5, 2010 at 23:10, Greg Smith g...@2ndquadrant.com wrote:
 Not having a real O_DSYNC on linux until recently makes it even more
 dubious to have it as a default...


 If Linux is now defining O_DSYNC

Well, Linux always defined both O_SYNC and O_DSYNC, but they used to
have the same value. The defaults changed due to an unfortunate
heuristic in PostgreSQL, which boils down to:

#if O_DSYNC != O_SYNC
#define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC
#else
#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC

(see src/include/access/xlogdefs.h for details)

In fact, I was wrong in my earlier post. Linux always offered O_DSYNC
behavior. What's new is POSIX-compliant O_SYNC, and the fact that
these flags are now distinguished.

Here's the change in Linux:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b2f3d1f769be5779b479c37800229d9a4809fc3

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-05 Thread Andres Freund
On Friday 05 November 2010 22:10:36 Greg Smith wrote:
 Andres Freund wrote:
  On Sunday 31 October 2010 20:59:31 Greg Smith wrote:
  Writes only are sync'd out when you do a commit, or the database does a
  checkpoint.
  
  Hm?  WAL is written out to disk after an the space provided by
  wal_buffers(def 8) * XLOG_BLCKSZ (def 8192) is used. The default is 64kb
  which you reach pretty quickly - especially after a checkpoint.
 Fair enough; I'm so used to bumping wal_buffers up to 16MB nowadays that
 I forget sometimes that people actually run with the default where this
 becomes an important consideration.
If you have relatively frequent checkpoints (quite a sensible in some 
environments given the burstiness/response time problems you can get) even a 
16MB wal_buffers can cause significantly more synchronous writes with O_DSYNC 
because of the amounts of wal traffic due to full_page_writes. For one the 
background wal writer wont keep up and for another all its writes will be 
synchronous...

Its simply a pointless setting.

  Not having a real O_DSYNC on linux until recently makes it even more
  dubious to have it as a default...
 If Linux is now defining O_DSYNC, and it's buggy, that's going to break
 more software than just PostgreSQL.  It wasn't defined before because it
 didn't work.  If the kernel developers have made changes to claim it's
 working now, but it doesn't really, I would think they'd consider any
 reports of actual bugs here as important to fix.  There's only so much
 the database can do in the face of incorrect information reported by the
 operating system.
I don't see it being buggy so far. Its just doing what it should. Which is 
simply a terrible thing for our implementation. Generally. Independent from 
linux.

 Anyway, I haven't actually seen reports that proves there's any problem
 here, I was just pointing out that we haven't seen any positive reports
 about database stress testing on these kernel versions yet either.  The
 changes here are theoretically the right ones, and defaulting to safe
 writes that flush out write caches is a long-term good thing.
I have seen several database which run under 2.6.33 with moderate to high load 
for some time now. And two 2.6.35.
Loads of problems, but none kernel related so far ;-)

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-05 Thread Greg Smith

Marti Raudsepp wrote:

In fact, I was wrong in my earlier post. Linux always offered O_DSYNC
behavior. What's new is POSIX-compliant O_SYNC, and the fact that
these flags are now distinguished.
  


While I appreciate that you're trying to help here, I'm unconvinced 
you've correctly diagnosed a couple of components to what's going on 
here properly yet.  Please refrain from making changes to popular 
documents like the tuning guide on the wiki based on speculation about 
what's happening.  There's definitely at least one mistake in what you 
wrote there, and I just reverted the whole set of changes you made 
accordingly until this is sorted out better.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-05 Thread Josh Berkus

 Fair enough; I'm so used to bumping wal_buffers up to 16MB nowadays that
 I forget sometimes that people actually run with the default where this
 becomes an important consideration.

Do you have any testing in favor of 16mb vs. lower/higher?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-05 Thread Marti Raudsepp
On Sat, Nov 6, 2010 at 00:06, Greg Smith g...@2ndquadrant.com wrote:
  Please refrain from making changes to popular documents like the
 tuning guide on the wiki based on speculation about what's happening.

I will grant you that the details were wrong, but I stand by the conclusion.

I can state for a fact that PostgreSQL's default wal_sync_method
varies depending on the fcntl.h header.
I have two PostgreSQL 9.0.1 builds, one with older
/usr/include/bits/fcntl.h and one with newer.

When I run show wal_sync_method; on one instance, I get fdatasync.
On the other one I get open_datasync.

So let's get down to code.

Older fcntl.h has:
#define O_SYNC   01
# define O_DSYNCO_SYNC  /* Synchronize data.  */

Newer has:
#define O_SYNC 0401
# define O_DSYNC01  /* Synchronize data.  */

So you can see that in the older header, O_DSYNC and O_SYNC are equal.

src/include/access/xlogdefs.h does:

#if defined(O_SYNC)
#define OPEN_SYNC_FLAG  O_SYNC
...
#if defined(OPEN_SYNC_FLAG)
/* O_DSYNC is distinct? */
#if O_DSYNC != OPEN_SYNC_FLAG
#define OPEN_DATASYNC_FLAG  O_DSYNC

^ it's comparing O_DSYNC != O_SYNC

#if defined(OPEN_DATASYNC_FLAG)
#define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC
#elif defined(HAVE_FDATASYNC)
#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC

^ depending on whether O_DSYNC and O_SYNC were equal, the default
wal_sync_method will change.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-05 Thread Pierre C



Fair enough; I'm so used to bumping wal_buffers up to 16MB nowadays that
I forget sometimes that people actually run with the default where this
becomes an important consideration.


Do you have any testing in favor of 16mb vs. lower/higher?


From some tests I had done some time ago, using separate spindles (RAID1)  
for xlog, no battery, on 8.4, with stuff that generates lots of xlog  
(INSERT INTO SELECT) :


When using a small wal_buffers, there was a problem when switching from  
one xlog file to the next. Basically a fsync was issued, but most of the  
previous log segment was still not written. So, postgres was waiting for  
the fsync to finish. Of course, the default 64 kB of wal_buffers is  
quickly filled up, and all writes wait for the end of this fsync. This  
caused hiccups in the xlog traffic, and xlog throughput wassn't nearly as  
high as the disks would allow. Sticking a sthetoscope on the xlog  
harddrives revealed a lot more random accesses that I would have liked  
(this is a much simpler solution than tracing the IOs, lol)


I set wal writer delay to a very low setting (I dont remember which,  
perhaps 1 ms) so the walwriter was in effect constantly flushing the wal  
buffers to disk. I also used fdatasync instead of fsync. Then I set  
wal_buffers to a rather high value, like 32-64 MB. Throughput and  
performance were a lot better, and the xlog drives made a much more  
linear-access noise.


What happened is that, since wal_buffers was larger than what the drives  
can write in 1-2 rotations, it could absorb wal traffic during the time  
postgres waits for fdatasync / wal segment change, so the inserts would  
not have to wait. And lowering the walwriter delay made it write something  
on each disk rotation, so that when a COMMIT or segment switch came, most  
of the time, the WAL was already synced and there was no wait.


Just my 2 c ;)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-10-31 Thread Marti Raudsepp
Hi pgsql-performance,

I was doing mass insertions on my desktop machine and getting at most
1 MB/s disk writes (apart from occasional bursts of 16MB). Inserting 1
million rows with a single integer (data+index 56 MB total) took over
2 MINUTES! The only tuning I had done was shared_buffers=256MB. So I
got around to tuning the WAL writer and found that wal_buffers=16MB
works MUCH better. wal_sync_method=fdatasync also got similar results.

First of all, I'm running PostgreSQL 9.0.1 on Arch Linux
* Linux kernel 2.6.36 (also tested with 2.6.35.
* Quad-core Phenom II
* a single Seagate 7200RPM SATA drive (write caching on)
* ext4 FS over LVM, with noatime, data=writeback

I am creating a table like: create table foo(id integer primary key);
Then measuring performance with the query: insert into foo (id) select
generate_series(1, 100);

130438,011 mswal_buffers=64kB, wal_sync_method=open_datasync  (all defaults)
29306,847 ms wal_buffers=1MB, wal_sync_method=open_datasync
4641,113 ms  wal_buffers=16MB, wal_sync_method=open_datasync
^ from 130s to 4.6 seconds by just changing wal_buffers.

5528,534 ms wal_buffers=64kB, wal_sync_method=fdatasync
4856,712 ms wal_buffers=16MB, wal_sync_method=fdatasync
^ fdatasync works well even with small wal_buffers

2911,265 mswal_buffers=16MB, fsync=off
^ Not bad, getting 60% of ideal throughput

These defaults are not just hurting bulk-insert performance, but also
everyone who uses synchronus_commit=off

Unless fdatasync is unsafe, I'd very much want to see it as the
default for 9.1 on Linux (I don't know about other platforms).  I
can't see any reasons why each write would need to be sync-ed if I
don't commit that often. Increasing wal_buffers probably has the same
effect wrt data safety.

Also, the tuning guide on wiki is understating the importance of these
tunables. Reading it I got the impression that some people change
wal_sync_method but it's dangerous and it even literally claims about
wal_buffers that 1MB is enough for some large systems

But the truth is that if you want any write throughput AT ALL on a
regular Linux desktop, you absolutely have to change one of these. If
the defaults were better, it would be enough to set
synchronous_commit=off to get all that your hardware has to offer.

I was reading mailing list archives and didn't find anything against
it either. Can anyone clarify the safety of wal_sync_method=fdatasync?
Are there any reasons why it shouldn't be the default?

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-10-31 Thread Greg Smith

Marti Raudsepp wrote:

Unless fdatasync is unsafe, I'd very much want to see it as the
default for 9.1 on Linux (I don't know about other platforms).  I
can't see any reasons why each write would need to be sync-ed if I
don't commit that often. Increasing wal_buffers probably has the same
effect wrt data safety.
  


Writes only are sync'd out when you do a commit, or the database does a 
checkpoint.


This issue is a performance difference introduced by a recent change to 
Linux.  open_datasync support was just added to Linux itself very 
recently.  It may be more safe than fdatasync on your platform.  As new 
code it may have bugs so that it doesn't really work at all under heavy 
load.  No one has really run those tests yet.  See 
http://wiki.postgresql.org/wiki/Reliable_Writes for some background, and 
welcome to the fun of being an early adopter.  The warnings in the 
tuning guide are there for a reason--you're in untested territory now.  
I haven't finished validating whether I consider 2.6.32 safe for 
production use or not yet, and 2.6.36 is a solid year away from being on 
my list for even considering it as a production database kernel.  You 
should proceed presuming that all writes are unreliable until proven 
otherwise.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-10-31 Thread Andres Freund
On Sunday 31 October 2010 20:59:31 Greg Smith wrote:
 Writes only are sync'd out when you do a commit, or the database does a 
 checkpoint.
Hm?  WAL is written out to disk after an the space provided by wal_buffers(def 
8) * XLOG_BLCKSZ (def 8192) is used. The default is 64kb which you reach 
pretty quickly - especially after a checkpoint. With O_D?SYNC that will 
synchronously get written out during a normal XLogInsert if hits a page 
boundary.
*Additionally* its gets written out at a commit if sync commit is not on.

Not having a real O_DSYNC on linux until recently makes it even more dubious 
to have it as a default...


Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-10-31 Thread Marti Raudsepp
On Sun, Oct 31, 2010 at 21:59, Greg Smith g...@2ndquadrant.com wrote:
 open_datasync support was just added to Linux itself very recently.

Oh I didn't realize it was a new feature. Indeed O_DSYNC support was
added in 2.6.33

It seems like bad behavior on PostgreSQL's part to default to new,
untested features.

I have updated the tuning wiki page with my understanding of the problem:
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#wal_sync_method_wal_buffers

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-10-31 Thread Mark Kirkwood

On 01/11/10 08:59, Greg Smith wrote:

Marti Raudsepp wrote:

Unless fdatasync is unsafe, I'd very much want to see it as the
default for 9.1 on Linux (I don't know about other platforms).  I
can't see any reasons why each write would need to be sync-ed if I
don't commit that often. Increasing wal_buffers probably has the same
effect wrt data safety.


Writes only are sync'd out when you do a commit, or the database does 
a checkpoint.


This issue is a performance difference introduced by a recent change 
to Linux.  open_datasync support was just added to Linux itself very 
recently.  It may be more safe than fdatasync on your platform.  As 
new code it may have bugs so that it doesn't really work at all under 
heavy load.  No one has really run those tests yet.  See 
http://wiki.postgresql.org/wiki/Reliable_Writes for some background, 
and welcome to the fun of being an early adopter.  The warnings in the 
tuning guide are there for a reason--you're in untested territory 
now.  I haven't finished validating whether I consider 2.6.32 safe for 
production use or not yet, and 2.6.36 is a solid year away from being 
on my list for even considering it as a production database kernel.  
You should proceed presuming that all writes are unreliable until 
proven otherwise.




Greg,

Your reply is possibly a bit confusingly worded - Marti was suggesting 
that fdatasync be the default - so he wouldn't be a new adopter, since 
this call has been implemented in the kernel for ages. I guess you were 
wanting to stress that *open_datasync* is the new kid, so watch out to 
see if he bites...


Cheers

Mark