Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Tue, Nov 16, 2010 at 8:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: Well, we're not going to increase the default to gigabytes, but we could very probably increase it by a factor of 10 or so without anyone squawking. It's been awhile since I heard of anyone trying to run PG in 4MB shmmax. How much would a change of that size help? Last I checked, though, this comes out of the allocation available to shared_buffers. And there definitely are several OSes (several linuxes, OSX) still limited to 32MB by default. Sure, but the current default is a measly 64kB. We could increase that 10x for a relatively small percentage hit in the size of shared_buffers, if you suppose that there's 32MB available. The current default is set to still work if you've got only a couple of MB in SHMMAX. What we'd want is for initdb to adjust the setting as part of its probing to see what SHMMAX is set to. regards, tom lane In all the performance tests that I have done, generally I get a good bang for the buck with wal_buffers set to 512kB in low memory cases and mostly I set it to 1MB which is probably enough for most of the cases even with high memory. That 1/2 MB wont make drastic change on shared_buffers anyway (except for edge cases) but will relieve the stress quite a bit on wal buffers. Regards, Jignesh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Nov 16, 2010, at 4:05 PM, Mladen Gogala wrote: Josh Berkus wrote: On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. Speaking of the SYSV SHMEM, is it possible to use huge pages? RHEL 6 and friends have transparent hugepage support. I'm not sure if they yet transparently do it for SYSV SHMEM, but they do for most everything else. Sequential traversal of a process heap is several times faster with hugepages. Unfortunately, postgres doesn't organize its blocks in its shared_mem to be sequential for a relation. So it might not matter much. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Nov 16, 2010, at 12:39 PM, Greg Smith wrote: $ ./test_fsync Loops = 1 Simple write: 8k write 88476.784/second Compare file sync methods using one write: (unavailable: open_datasync) open_sync 8k write 1192.135/second 8k write, fdatasync1222.158/second 8k write, fsync1097.980/second Compare file sync methods using two writes: (unavailable: open_datasync) 2 open_sync 8k writes 527.361/second 8k write, 8k write, fdatasync 1105.204/second 8k write, 8k write, fsync 1084.050/second Compare open_sync with different sizes: open_sync 16k write 966.047/second 2 open_sync 8k writes 529.565/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 1064.177/second 8k write, close, fsync 1042.337/second Two notable things here. One, there is no open_datasync defined in this older kernel. Two, all methods of commit give equally inflated commit rates, far faster than the drive is capable of. This proves this setup isn't flushing the drive's write cache after commit. Nit: there is no open_sync, only open_dsync. Prior to recent kernels, only (semantically) open_dsync exists, labeled as open_sync. New kernels move that code to open_datasync and nave a NEW open_sync that supposedly flushes metadata properly. You can get safe behavior out of the old kernel by disabling its write cache: $ sudo /sbin/hdparm -W0 /dev/sda /dev/sda: setting drive write-caching to 0 (off) write-caching = 0 (off) Loops = 1 Simple write: 8k write 89023.413/second Compare file sync methods using one write: (unavailable: open_datasync) open_sync 8k write 106.968/second 8k write, fdatasync 108.106/second 8k write, fsync 104.238/second Compare file sync methods using two writes: (unavailable: open_datasync) 2 open_sync 8k writes51.637/second 8k write, 8k write, fdatasync 109.256/second 8k write, 8k write, fsync 103.952/second Compare open_sync with different sizes: open_sync 16k write 109.562/second 2 open_sync 8k writes52.752/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 107.179/second 8k write, close, fsync 106.923/second And now results are as expected: just under 120/second. Onto RHEL6. Setup for this initial test was: $ uname -a Linux meddle 2.6.32-44.1.el6.x86_64 #1 SMP Wed Jul 14 18:51:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.0 Beta (Santiago) $ mount /dev/sda7 on / type ext4 (rw) And I started with the write cache off to see a straight comparison against the above: $ sudo hdparm -W0 /dev/sda /dev/sda: setting drive write-caching to 0 (off) write-caching = 0 (off) $ ./test_fsync Loops = 1 Simple write: 8k write 104194.886/second Compare file sync methods using one write: open_datasync 8k write 97.828/second open_sync 8k write 109.158/second 8k write, fdatasync 109.838/second 8k write, fsync 20.872/second fsync is working now! flushing metadata properly reduces performance. However, shouldn't open_sync slow down vs open_datasync too and be similar to fsync? Did you recompile your test on the RHEL6 system? Code compiled on newer kernels will see O_DSYNC and O_SYNC as two separate sentinel values, lets call them 1 and 2 respectively. Code compiled against earlier kernels will see both O_DSYNC and O_SYNC as the same value, 1. So code compiled against older kernels, asking for O_SYNC on a newer kernel will actually get O_DSYNC behavior! This was intended. I can't find the link to the mail, but it was Linus' idea to make old code that expected the 'faster but incorrect' behavior to retain it on newer kernels. Only a recompile with newer header files will trigger the new behavior and expose the 'correct' open_sync behavior. This will be 'fun' for postgres packagers and users -- data reliability behavior differs based on what kernel it is compiled against. Luckily, the xlogs only need open_datasync semantics. Compare file sync methods using two writes: 2 open_datasync 8k writes53.902/second 2 open_sync 8k writes53.721/second 8k write, 8k write, fdatasync 109.731/second 8k write, 8k write, fsync20.918/second Compare open_sync with different sizes: open_sync 16k write 109.552/second 2 open_sync 8k writes
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Scott Carey wrote: Did you recompile your test on the RHEL6 system? On both systems I showed, I checked out a fresh copy of the PostgreSQL 9.1 HEAD from the git repo, and compiled that on the server, to make sure I was pulling in the appropriate kernel headers. I wasn't aware of exactly how the kernel sync stuff was refactored though, thanks for the concise update on that. I can do similar tests on a RHEL5 system, but not on the same hardware. Can only make my laptop boot so many operating systems at a time usefully. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Wed, Nov 17, 2010 at 3:24 PM, Greg Smith g...@2ndquadrant.com wrote: Scott Carey wrote: Did you recompile your test on the RHEL6 system? On both systems I showed, I checked out a fresh copy of the PostgreSQL 9.1 HEAD from the git repo, and compiled that on the server, to make sure I was pulling in the appropriate kernel headers. I wasn't aware of exactly how the kernel sync stuff was refactored though, thanks for the concise update on that. I can do similar tests on a RHEL5 system, but not on the same hardware. Can only make my laptop boot so many operating systems at a time usefully. One thing to note is that where on a disk things sit can make a /huge/ difference - depending on if Ubuntu is /here/ and RHEL is /there/ and so on can make a factor of 2 or more difference. The outside tracks of most modern SATA disks can do around 120MB/s. The inside tracks aren't even half of that. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Jon Nelson wrote: One thing to note is that where on a disk things sit can make a /huge/ difference - depending on if Ubuntu is /here/ and RHEL is /there/ and so on can make a factor of 2 or more difference. The outside tracks of most modern SATA disks can do around 120MB/s. The inside tracks aren't even half of that. You're talking about changes in sequential read and write speed due to Zone Bit Recording (ZBR) AKA Zone Constant Angular Velocity (ZCAV). What I was measuring was commit latency time on small writes. That doesn't change as you move around the disk, since it's tied to the raw rotation speed of the drive rather than density of storage in any zone. If I get to something that's impacted by sequential transfers rather than rotation time, I'll be sure to use the same section of disk for that. It wasn't really necessary to get these initial gross numbers anyway. What I was looking for is the about 10:1 speedup seen on this hardware when the write cache is used, which could easily be seen even were there ZBR differences involved. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Nov 17, 2010, at 1:24 PM, Greg Smith wrote: Scott Carey wrote: Did you recompile your test on the RHEL6 system? On both systems I showed, I checked out a fresh copy of the PostgreSQL 9.1 HEAD from the git repo, and compiled that on the server, to make sure I was pulling in the appropriate kernel headers. I wasn't aware of exactly how the kernel sync stuff was refactored though, thanks for the concise update on that. Thanks! So this could be another bug in Linux. Not entirely surprising. Since fsync/fdatasync relative performance isn't similar to open_sync/open_datasync relative performance on this test there is probably a bug that either hurts fsync, or one that is preventing open_sync from dealing with metadata properly. Luckily for the xlog, both of those can be avoided -- the real choice is fdatasync vs open_datasync. And both work in newer kernels or break in certain older ones. I can do similar tests on a RHEL5 system, but not on the same hardware. Can only make my laptop boot so many operating systems at a time usefully. Yeah, I understand. I might throw this at a RHEL5 system if I get a chance but I need one without a RAID card that is not in use. Hopefully it doesn't turn out that fdatasync is write-cache safe but open_sync/open_datasync isn't on that platform. It could impact the choice of a default value. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Time for a deeper look at what's going on here...I installed RHEL6 Beta 2 yesterday, on the presumption that since the release version just came out this week it was likely the same version Marti tested against. Also, it was the one I already had a DVD to install for. This was on a laptop with 7200 RPM hard drive, already containing an Ubuntu installation for comparison sake. Initial testing was done with the PostgreSQL test_fsync utility, just to get a gross idea of what situations the drives involved were likely flushing data to disk correctly during, and which it was impossible for that to be true. 7200 RPM = 120 rotations/second, which puts an upper limit of 120 true fsync executions per second. The test_fsync released with PostgreSQL 9.0 now reports its value on the right scale that you can directly compare against that (earlier versions reported seconds/commit, not commits/second). First I built test_fsync from inside of an existing PostgreSQL 9.1 HEAD checkout: $ cd [PostgreSQL source code tree] $ cd src/tools/fsync/ $ make And I started with looking at the Ubuntu system running ext3, which represents the status quo we've been seeing the past few years. Initially the drive write cache was turned on: Linux meddle 2.6.28-19-generic #61-Ubuntu SMP Wed May 26 23:35:15 UTC 2010 i686 GNU/Linux $ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=9.04 DISTRIB_CODENAME=jaunty DISTRIB_DESCRIPTION=Ubuntu 9.04 /dev/sda5 on / type ext3 (rw,relatime,errors=remount-ro) $ ./test_fsync Loops = 1 Simple write: 8k write 88476.784/second Compare file sync methods using one write: (unavailable: open_datasync) open_sync 8k write 1192.135/second 8k write, fdatasync1222.158/second 8k write, fsync1097.980/second Compare file sync methods using two writes: (unavailable: open_datasync) 2 open_sync 8k writes 527.361/second 8k write, 8k write, fdatasync 1105.204/second 8k write, 8k write, fsync 1084.050/second Compare open_sync with different sizes: open_sync 16k write 966.047/second 2 open_sync 8k writes 529.565/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 1064.177/second 8k write, close, fsync 1042.337/second Two notable things here. One, there is no open_datasync defined in this older kernel. Two, all methods of commit give equally inflated commit rates, far faster than the drive is capable of. This proves this setup isn't flushing the drive's write cache after commit. You can get safe behavior out of the old kernel by disabling its write cache: $ sudo /sbin/hdparm -W0 /dev/sda /dev/sda: setting drive write-caching to 0 (off) write-caching = 0 (off) Loops = 1 Simple write: 8k write 89023.413/second Compare file sync methods using one write: (unavailable: open_datasync) open_sync 8k write 106.968/second 8k write, fdatasync 108.106/second 8k write, fsync 104.238/second Compare file sync methods using two writes: (unavailable: open_datasync) 2 open_sync 8k writes51.637/second 8k write, 8k write, fdatasync 109.256/second 8k write, 8k write, fsync 103.952/second Compare open_sync with different sizes: open_sync 16k write 109.562/second 2 open_sync 8k writes52.752/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 107.179/second 8k write, close, fsync 106.923/second And now results are as expected: just under 120/second. Onto RHEL6. Setup for this initial test was: $ uname -a Linux meddle 2.6.32-44.1.el6.x86_64 #1 SMP Wed Jul 14 18:51:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.0 Beta (Santiago) $ mount /dev/sda7 on / type ext4 (rw) And I started with the write cache off to see a straight comparison against the above: $ sudo hdparm -W0 /dev/sda /dev/sda: setting drive write-caching to 0 (off) write-caching = 0 (off) $ ./test_fsync Loops = 1 Simple write: 8k write 104194.886/second Compare file sync methods using one write: open_datasync 8k write 97.828/second open_sync 8k write 109.158/second 8k write, fdatasync 109.838/second 8k write, fsync 20.872/second Compare file sync methods using two writes: 2 open_datasync 8k writes53.902/second 2 open_sync 8k writes53.721/second 8k write, 8k write, fdatasync 109.731/second 8k write, 8k write, fsync20.918/second Compare open_sync with different sizes:
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Tue, Nov 16, 2010 at 3:39 PM, Greg Smith g...@2ndquadrant.com wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. How about both? open_datasync seems problematic for a number of reasons - you get an immediate write-through whether you need it or not, including, as you point out, the case where the you want to write several blocks at once and then force them all out together. And 64kB for a ring buffer just seems awfully small. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Josh Berkus j...@agliodbs.com writes: On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. Well, we're not going to increase the default to gigabytes, but we could very probably increase it by a factor of 10 or so without anyone squawking. It's been awhile since I heard of anyone trying to run PG in 4MB shmmax. How much would a change of that size help? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Wed, Nov 17, 2010 at 01:31, Tom Lane t...@sss.pgh.pa.us wrote: Well, we're not going to increase the default to gigabytes, but we could very probably increase it by a factor of 10 or so without anyone squawking. It's been awhile since I heard of anyone trying to run PG in 4MB shmmax. How much would a change of that size help? In my testing, when running a large bulk insert query with fdatasync on ext4, changing wal_buffers has very little effect: http://ompldr.org/vNjNiNQ/wal_sync_method1.png (More details at http://archives.postgresql.org/pgsql-performance/2010-11/msg00094.php ) It would take some more testing to say this conclusively, but looking at the raw data, there only seems to be an effect when moving from 8 to 16MB. Could be different on other file systems though. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Josh Berkus wrote: On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. Speaking of the SYSV SHMEM, is it possible to use huge pages? -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Wednesday 17 November 2010 00:31:34 Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. Well, we're not going to increase the default to gigabytes Especially not as I don't think it will have any effect after wal_segment_size as that will force a write-out anyway. Or am I misremembering the implementation? Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Andres Freund and...@anarazel.de writes: On Wednesday 17 November 2010 00:31:34 Tom Lane wrote: Well, we're not going to increase the default to gigabytes Especially not as I don't think it will have any effect after wal_segment_size as that will force a write-out anyway. Or am I misremembering the implementation? Well, there's a forced fsync after writing the last page of an xlog file, but I don't believe that proves that more than 16MB of xlog buffers is useless. Other processes could still be busy filling the buffers. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Wednesday 17 November 2010 01:51:28 Tom Lane wrote: Andres Freund and...@anarazel.de writes: On Wednesday 17 November 2010 00:31:34 Tom Lane wrote: Well, we're not going to increase the default to gigabytes Especially not as I don't think it will have any effect after wal_segment_size as that will force a write-out anyway. Or am I misremembering the implementation? Well, there's a forced fsync after writing the last page of an xlog file, but I don't believe that proves that more than 16MB of xlog buffers is useless. Other processes could still be busy filling the buffers. Maybe I am missing something, but I think the relevant AdvanceXLInsertBuffer() is currently called with WALInsertLock held? Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Andres Freund and...@anarazel.de writes: On Wednesday 17 November 2010 01:51:28 Tom Lane wrote: Well, there's a forced fsync after writing the last page of an xlog file, but I don't believe that proves that more than 16MB of xlog buffers is useless. Other processes could still be busy filling the buffers. Maybe I am missing something, but I think the relevant AdvanceXLInsertBuffer() is currently called with WALInsertLock held? The fsync is associated with the write, which is not done with insert lock held. We're not quite that dumb. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Wednesday 17 November 2010 02:04:28 Tom Lane wrote: Andres Freund and...@anarazel.de writes: On Wednesday 17 November 2010 01:51:28 Tom Lane wrote: Well, there's a forced fsync after writing the last page of an xlog file, but I don't believe that proves that more than 16MB of xlog buffers is useless. Other processes could still be busy filling the buffers. Maybe I am missing something, but I think the relevant AdvanceXLInsertBuffer() is currently called with WALInsertLock held? The fsync is associated with the write, which is not done with insert lock held. We're not quite that dumb. Ah, I see. The XLogWrite in AdvanceXLInsertBuffer is only happening if the head of the buffer gets to the tail - which is more likely if the wal buffers are small... Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Well, we're not going to increase the default to gigabytes, but we could very probably increase it by a factor of 10 or so without anyone squawking. It's been awhile since I heard of anyone trying to run PG in 4MB shmmax. How much would a change of that size help? Last I checked, though, this comes out of the allocation available to shared_buffers. And there definitely are several OSes (several linuxes, OSX) still limited to 32MB by default. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Josh Berkus j...@agliodbs.com writes: Well, we're not going to increase the default to gigabytes, but we could very probably increase it by a factor of 10 or so without anyone squawking. It's been awhile since I heard of anyone trying to run PG in 4MB shmmax. How much would a change of that size help? Last I checked, though, this comes out of the allocation available to shared_buffers. And there definitely are several OSes (several linuxes, OSX) still limited to 32MB by default. Sure, but the current default is a measly 64kB. We could increase that 10x for a relatively small percentage hit in the size of shared_buffers, if you suppose that there's 32MB available. The current default is set to still work if you've got only a couple of MB in SHMMAX. What we'd want is for initdb to adjust the setting as part of its probing to see what SHMMAX is set to. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Tue, Nov 16, 2010 at 6:25 PM, Josh Berkus j...@agliodbs.com wrote: On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. places tongue firmly in cheek Gee, too bad there's not some other shared-memory implementation we could use... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Sat, Nov 13, 2010 at 20:01, Tom Lane t...@sss.pgh.pa.us wrote: What's your basis for asserting he's uninterested? Please have a little patience. My apologies, I was under the impression that he hadn't answered your request, but he did in the -hackers thread. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Mon, Nov 8, 2010 at 20:40, Tom Lane t...@sss.pgh.pa.us wrote: The latter choice is the one that requires testing to prove that it is the proper and preferred default from the performance and data reliability POV. And, in fact, the game plan is to do that testing and see which default we want. I think it's premature to argue further about this until we have some test results. Who will be doing that testing? You said you're relying on Greg Smith to manage the testing, but he's obviously uninterested, so it seems unlikely that this will go anywhere. I posted my results with the simple INSERT test, but nobody cared. I could do some pgbench runs, but I have no idea what parameters would give useful results. Meanwhile, PostgreSQL performance is regressing and there's still no evidence that open_datasync is any safer. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Marti Raudsepp ma...@juffo.org writes: On Mon, Nov 8, 2010 at 20:40, Tom Lane t...@sss.pgh.pa.us wrote: And, in fact, the game plan is to do that testing and see which default we want. Â I think it's premature to argue further about this until we have some test results. Who will be doing that testing? You said you're relying on Greg Smith to manage the testing, but he's obviously uninterested, so it seems unlikely that this will go anywhere. What's your basis for asserting he's uninterested? Please have a little patience. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Mon, Nov 8, 2010 at 02:05, Greg Smith g...@2ndquadrant.com wrote: Where's your benchmarks proving it then? If you're right about this, and I'm not saying you aren't, it should be obvious in simple bechmarks by stepping through various sizes for wal_buffers and seeing the throughput/latency situation improve. Since benchmarking is the easy part, I did that. I plotted the time taken by inserting 2 million rows to a table with a single integer column and no indexes (total 70MB). Entire script is attached. If you don't agree with something in this benchmark, please suggest improvements. Chart: http://ompldr.org/vNjNiNQ/wal_sync_method1.png Spreadsheet: http://ompldr.org/vNjNiNg/wal_sync_method1.ods (the 2nd worksheet has exact measurements) This is a different machine from the original post, but similar configuration. One 1TB 7200RPM Seagate Barracuda, no disk controller cache, 4G RAM, Phenom X4, Linux 2.6.36, PostgreSQL 9.0.1, Arch Linux. This time I created a separate 20GB ext4 partition specially for PostgreSQL, with all default settings (shared_buffers=32MB). The partition is near the end of the disk, so hdparm gives a sequential read throughput of ~72 MB/s. I'm getting frequent checkpoint warnings, should I try larger checkpoing_segments too? The partition is re-created and 'initdb' is re-ran for each test, to prevent file system allocation from affecting results. I did two runs of all benchmarks. The points on the graph show a sum of INSERT time + COMMIT time in seconds. One surprising thing on the graph is a plateau, where open_datasync performs almost equally with wal_buffers=128kB and 256kB. Another noteworthy difference (not visible on the graph) is that with open_datasync -- but not fdatasync -- and wal_buffers=128M, INSERT time keeps shrinking, but COMMIT takes longer. The total INSERT+COMMIT time remains the same, however. I have a few expendable hard drives here so I can test reliability by pulling the SATA cable as well. Is this kind of testing useful? What workloads do you suggest? Regards, Marti pgtest.sh Description: Bourne shell script -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Nov 7, 2010, at 6:35 PM, Marti Raudsepp wrote: On Mon, Nov 8, 2010 at 01:35, Greg Smith g...@2ndquadrant.com wrote: Yes; it's supposed to, and that logic works fine on some other platforms. No, the logic was broken to begin with. Linux technically supported O_DSYNC all along. PostgreSQL used fdatasync as the default. Now, because Linux added proper O_SYNC support, PostgreSQL suddenly prefers O_DSYNC over fdatasync? Until you've quantified which of the cases do that--which is required for reliable operation of PostgreSQL--and which don't, you don't have any data that can be used to draw a conclusion from. If some setups are faster because they write less reliably, that doesn't automatically make them the better choice. I don't see your point. If fdatasync worked on Linux, AS THE DEFAULT, all the time until recently, then how does it all of a sudden need proof NOW? If anything, the new open_datasync should be scrutinized because it WASN'T the default before and it hasn't gotten as much testing on Linux. I agree. Im my opinion, the burden of proof lies with those contending that the default value should _change_ from fdatasync to O_DSYNC on linux. If the default changes, all power-fail testing and other reliability tests done prior on a hardware configuration may become invalid without users even knowing. Unfortunately, a code change in postgres is required to _prevent_ the default from changing when compiled and run against the latest kernels. Summary: Until recently, there was code with a code comment in the Linux kernel that said For now, when the user asks for O_SYNC, we'll actually give O_DSYNC. Linux has had O_DSYNC forever and ever, but not O_SYNC. If O_DSYNC is preferred over fdatasync for Postgres xlog (as the code indicates), it should have been the preferred for years on Linux as well. If fdatasync has been the preferred method on Linux, and the O_SYNC = O_DSYNC test was for that, then the purpose behind the test has broken. No matter how you slice it, the default on Linux is implicitly changing and the choice is to either: * Return the default to fdatasync * Let it implicitly change to O_DSYNC The latter choice is the one that requires testing to prove that it is the proper and preferred default from the performance and data reliability POV. The former is the status quo -- but requires a code change. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Scott Carey sc...@richrelevance.com writes: No matter how you slice it, the default on Linux is implicitly changing and the choice is to either: * Return the default to fdatasync * Let it implicitly change to O_DSYNC The latter choice is the one that requires testing to prove that it is the proper and preferred default from the performance and data reliability POV. And, in fact, the game plan is to do that testing and see which default we want. I think it's premature to argue further about this until we have some test results. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Scott Carey wrote: Im my opinion, the burden of proof lies with those contending that the default value should _change_ from fdatasync to O_DSYNC on linux. If the default changes, all power-fail testing and other reliability tests done prior on a hardware configuration may become invalid without users even knowing. This seems to be ignoring the fact that unless you either added a non-volatile cache or specifically turned off all write caching on your drives, the results of all power-fail testing done on earlier versions of Linux was that it failed. The default configuration of PostgreSQL on Linux has been that any user who has a simple SATA drive gets unsafe writes, unless they go out of their way to prevent them. Whatever newer kernels do by default cannot be worse. The open question is whether it's still broken, in which case we might as well favor the known buggy behavior rather than the new one, or whether everything has improved enough to no longer be unsafe with the new defaults. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Hi, On Monday 08 November 2010 23:12:57 Greg Smith wrote: This seems to be ignoring the fact that unless you either added a non-volatile cache or specifically turned off all write caching on your drives, the results of all power-fail testing done on earlier versions of Linux was that it failed. The default configuration of PostgreSQL on Linux has been that any user who has a simple SATA drive gets unsafe writes, unless they go out of their way to prevent them. Which is about *no* argument in favor of any of the options, right? Whatever newer kernels do by default cannot be worse. The open question is whether it's still broken, in which case we might as well favor the known buggy behavior rather than the new one, or whether everything has improved enough to no longer be unsafe with the new defaults. Either I majorly misunderstand you, or ... I dont know. There simply *is* no new implementation relevant for this discussion. Full Stop. What changed is that O_DSYNC is defined differently from O_SYNC these days and O_SYNC actually does what it should. Which causes pg to move open_datasync first in the preference list doing what the option with the lowest preference did up to now. That does not *at all* change the earlier fdatasync() or fsync() implementations/tests. It simply makes open_datasync the default doing what open_sync did earlier. For that note that open_sync was the method of *least* preference till now... And that fdatasync() thus was the default till now. Which it is not anymore. I don't argue *at all* that we have to test the change moving fdatasync before open_datasync on the *other* operating systems. What I completely don't get is all that talking about data consistency on linux. Its simply irrelevant in that context. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Marti Raudsepp wrote: I will grant you that the details were wrong, but I stand by the conclusion. I can state for a fact that PostgreSQL's default wal_sync_method varies depending on the fcntl.h header. Yes; it's supposed to, and that logic works fine on some other platforms. The question is exactly what the new Linux O_DSYNC behavior is doing, in regards to whether it flushes drive caches out or not. Until you've quantified which of the cases do that--which is required for reliable operation of PostgreSQL--and which don't, you don't have any data that can be used to draw a conclusion from. If some setups are faster because they write less reliably, that doesn't automatically make them the better choice. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Monday 08 November 2010 00:35:29 Greg Smith wrote: Marti Raudsepp wrote: I will grant you that the details were wrong, but I stand by the conclusion. I can state for a fact that PostgreSQL's default wal_sync_method varies depending on the fcntl.h header. Yes; it's supposed to, and that logic works fine on some other platforms. The question is exactly what the new Linux O_DSYNC behavior is doing, in regards to whether it flushes drive caches out or not. Until you've quantified which of the cases do that--which is required for reliable operation of PostgreSQL--and which don't, you don't have any data that can be used to draw a conclusion from. If some setups are faster because they write less reliably, that doesn't automatically make them the better choice. I think thats FUD. Sorry. Can you explain to me why fsync() may/should/could be *any* less reliable than O_DSYNC? On *any* platform. Or fdatasync() in the special way its used with pg, namely completely preallocated files. I think the reasons why O_DSYNC is, especially, but not only, in combination with a small wal_buffers setting, slow in most circumstances are pretty clear. Making a setting which is only supported on a small range of systems highest in the preferences list is even more doubtfull than the already strange choice of making O_DSYNC the default given the way it works (i.e. no reordering, synchronous writes in the bgwriter, synchronous writes on wal_buffers pressure etc). Greetings, Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Andres Freund wrote: I think thats FUD. Sorry. Yes, there's plenty of uncertainty and doubt here, but not from me. The test reports given so far have been so riddled with errors I don't trust any of them. As a counter example showing my expectations here, the Testing Sandforce SSD tests done by Yeb Havinga: http://archives.postgresql.org/message-id/4c4a9452.9070...@gmail.com followed the right method for confirming both write integrity and performance including pull the plug situations. Those I trusted. What Marti had posted, and what Phoronix investigated, just aren't that thorough. Can you explain to me why fsync() may/should/could be *any* less reliable than O_DSYNC? On *any* platform. Or fdatasync() in the special way its used with pg, namely completely preallocated files. If the Linux kernel has done extra work so that O_DSYNC writes are forced to disk including a cache flush, but that isn't done for just fdatasync() calls, there could be difference here. The database still wouldn't work right in that case, because checkpoint writes are still going to be using fdatasync. I'm not sure what the actual behavior is supposed to be, but ultimately it doesn't matter. The history of the Linux kernel developers in this area has been so completely full of bugs and incomplete implementations that I am working from the assumption that we know nothing about what actually works and what doesn't without doing careful real-world testing. I think the reasons why O_DSYNC is, especially, but not only, in combination with a small wal_buffers setting, slow in most circumstances are pretty clear. Where's your benchmarks proving it then? If you're right about this, and I'm not saying you aren't, it should be obvious in simple bechmarks by stepping through various sizes for wal_buffers and seeing the throughput/latency situation improve. But since I haven't seen that done, this one is still in the uncertainty doubt bucket too. You're assuming one of the observed problems corresponds to this theorized cause. But you can't prove a performance change on theory. You have to isolate it and then you'll know. So long as there are multiple uncertainties going on here, I don't have any conclusion yet, just a list of things to investigate that's far longer than the list of what's been looked at so far. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Mon, Nov 8, 2010 at 01:35, Greg Smith g...@2ndquadrant.com wrote: Yes; it's supposed to, and that logic works fine on some other platforms. No, the logic was broken to begin with. Linux technically supported O_DSYNC all along. PostgreSQL used fdatasync as the default. Now, because Linux added proper O_SYNC support, PostgreSQL suddenly prefers O_DSYNC over fdatasync? Until you've quantified which of the cases do that--which is required for reliable operation of PostgreSQL--and which don't, you don't have any data that can be used to draw a conclusion from. If some setups are faster because they write less reliably, that doesn't automatically make them the better choice. I don't see your point. If fdatasync worked on Linux, AS THE DEFAULT, all the time until recently, then how does it all of a sudden need proof NOW? If anything, the new open_datasync should be scrutinized because it WASN'T the default before and it hasn't gotten as much testing on Linux. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Andres Freund wrote: On Sunday 31 October 2010 20:59:31 Greg Smith wrote: Writes only are sync'd out when you do a commit, or the database does a checkpoint. Hm? WAL is written out to disk after an the space provided by wal_buffers(def 8) * XLOG_BLCKSZ (def 8192) is used. The default is 64kb which you reach pretty quickly - especially after a checkpoint. Fair enough; I'm so used to bumping wal_buffers up to 16MB nowadays that I forget sometimes that people actually run with the default where this becomes an important consideration. Not having a real O_DSYNC on linux until recently makes it even more dubious to have it as a default... If Linux is now defining O_DSYNC, and it's buggy, that's going to break more software than just PostgreSQL. It wasn't defined before because it didn't work. If the kernel developers have made changes to claim it's working now, but it doesn't really, I would think they'd consider any reports of actual bugs here as important to fix. There's only so much the database can do in the face of incorrect information reported by the operating system. Anyway, I haven't actually seen reports that proves there's any problem here, I was just pointing out that we haven't seen any positive reports about database stress testing on these kernel versions yet either. The changes here are theoretically the right ones, and defaulting to safe writes that flush out write caches is a long-term good thing. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Fri, Nov 5, 2010 at 23:10, Greg Smith g...@2ndquadrant.com wrote: Not having a real O_DSYNC on linux until recently makes it even more dubious to have it as a default... If Linux is now defining O_DSYNC Well, Linux always defined both O_SYNC and O_DSYNC, but they used to have the same value. The defaults changed due to an unfortunate heuristic in PostgreSQL, which boils down to: #if O_DSYNC != O_SYNC #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC #else #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC (see src/include/access/xlogdefs.h for details) In fact, I was wrong in my earlier post. Linux always offered O_DSYNC behavior. What's new is POSIX-compliant O_SYNC, and the fact that these flags are now distinguished. Here's the change in Linux: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b2f3d1f769be5779b479c37800229d9a4809fc3 Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Friday 05 November 2010 22:10:36 Greg Smith wrote: Andres Freund wrote: On Sunday 31 October 2010 20:59:31 Greg Smith wrote: Writes only are sync'd out when you do a commit, or the database does a checkpoint. Hm? WAL is written out to disk after an the space provided by wal_buffers(def 8) * XLOG_BLCKSZ (def 8192) is used. The default is 64kb which you reach pretty quickly - especially after a checkpoint. Fair enough; I'm so used to bumping wal_buffers up to 16MB nowadays that I forget sometimes that people actually run with the default where this becomes an important consideration. If you have relatively frequent checkpoints (quite a sensible in some environments given the burstiness/response time problems you can get) even a 16MB wal_buffers can cause significantly more synchronous writes with O_DSYNC because of the amounts of wal traffic due to full_page_writes. For one the background wal writer wont keep up and for another all its writes will be synchronous... Its simply a pointless setting. Not having a real O_DSYNC on linux until recently makes it even more dubious to have it as a default... If Linux is now defining O_DSYNC, and it's buggy, that's going to break more software than just PostgreSQL. It wasn't defined before because it didn't work. If the kernel developers have made changes to claim it's working now, but it doesn't really, I would think they'd consider any reports of actual bugs here as important to fix. There's only so much the database can do in the face of incorrect information reported by the operating system. I don't see it being buggy so far. Its just doing what it should. Which is simply a terrible thing for our implementation. Generally. Independent from linux. Anyway, I haven't actually seen reports that proves there's any problem here, I was just pointing out that we haven't seen any positive reports about database stress testing on these kernel versions yet either. The changes here are theoretically the right ones, and defaulting to safe writes that flush out write caches is a long-term good thing. I have seen several database which run under 2.6.33 with moderate to high load for some time now. And two 2.6.35. Loads of problems, but none kernel related so far ;-) Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Marti Raudsepp wrote: In fact, I was wrong in my earlier post. Linux always offered O_DSYNC behavior. What's new is POSIX-compliant O_SYNC, and the fact that these flags are now distinguished. While I appreciate that you're trying to help here, I'm unconvinced you've correctly diagnosed a couple of components to what's going on here properly yet. Please refrain from making changes to popular documents like the tuning guide on the wiki based on speculation about what's happening. There's definitely at least one mistake in what you wrote there, and I just reverted the whole set of changes you made accordingly until this is sorted out better. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Fair enough; I'm so used to bumping wal_buffers up to 16MB nowadays that I forget sometimes that people actually run with the default where this becomes an important consideration. Do you have any testing in favor of 16mb vs. lower/higher? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Sat, Nov 6, 2010 at 00:06, Greg Smith g...@2ndquadrant.com wrote: Please refrain from making changes to popular documents like the tuning guide on the wiki based on speculation about what's happening. I will grant you that the details were wrong, but I stand by the conclusion. I can state for a fact that PostgreSQL's default wal_sync_method varies depending on the fcntl.h header. I have two PostgreSQL 9.0.1 builds, one with older /usr/include/bits/fcntl.h and one with newer. When I run show wal_sync_method; on one instance, I get fdatasync. On the other one I get open_datasync. So let's get down to code. Older fcntl.h has: #define O_SYNC 01 # define O_DSYNCO_SYNC /* Synchronize data. */ Newer has: #define O_SYNC 0401 # define O_DSYNC01 /* Synchronize data. */ So you can see that in the older header, O_DSYNC and O_SYNC are equal. src/include/access/xlogdefs.h does: #if defined(O_SYNC) #define OPEN_SYNC_FLAG O_SYNC ... #if defined(OPEN_SYNC_FLAG) /* O_DSYNC is distinct? */ #if O_DSYNC != OPEN_SYNC_FLAG #define OPEN_DATASYNC_FLAG O_DSYNC ^ it's comparing O_DSYNC != O_SYNC #if defined(OPEN_DATASYNC_FLAG) #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC #elif defined(HAVE_FDATASYNC) #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC ^ depending on whether O_DSYNC and O_SYNC were equal, the default wal_sync_method will change. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Fair enough; I'm so used to bumping wal_buffers up to 16MB nowadays that I forget sometimes that people actually run with the default where this becomes an important consideration. Do you have any testing in favor of 16mb vs. lower/higher? From some tests I had done some time ago, using separate spindles (RAID1) for xlog, no battery, on 8.4, with stuff that generates lots of xlog (INSERT INTO SELECT) : When using a small wal_buffers, there was a problem when switching from one xlog file to the next. Basically a fsync was issued, but most of the previous log segment was still not written. So, postgres was waiting for the fsync to finish. Of course, the default 64 kB of wal_buffers is quickly filled up, and all writes wait for the end of this fsync. This caused hiccups in the xlog traffic, and xlog throughput wassn't nearly as high as the disks would allow. Sticking a sthetoscope on the xlog harddrives revealed a lot more random accesses that I would have liked (this is a much simpler solution than tracing the IOs, lol) I set wal writer delay to a very low setting (I dont remember which, perhaps 1 ms) so the walwriter was in effect constantly flushing the wal buffers to disk. I also used fdatasync instead of fsync. Then I set wal_buffers to a rather high value, like 32-64 MB. Throughput and performance were a lot better, and the xlog drives made a much more linear-access noise. What happened is that, since wal_buffers was larger than what the drives can write in 1-2 rotations, it could absorb wal traffic during the time postgres waits for fdatasync / wal segment change, so the inserts would not have to wait. And lowering the walwriter delay made it write something on each disk rotation, so that when a COMMIT or segment switch came, most of the time, the WAL was already synced and there was no wait. Just my 2 c ;) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Hi pgsql-performance, I was doing mass insertions on my desktop machine and getting at most 1 MB/s disk writes (apart from occasional bursts of 16MB). Inserting 1 million rows with a single integer (data+index 56 MB total) took over 2 MINUTES! The only tuning I had done was shared_buffers=256MB. So I got around to tuning the WAL writer and found that wal_buffers=16MB works MUCH better. wal_sync_method=fdatasync also got similar results. First of all, I'm running PostgreSQL 9.0.1 on Arch Linux * Linux kernel 2.6.36 (also tested with 2.6.35. * Quad-core Phenom II * a single Seagate 7200RPM SATA drive (write caching on) * ext4 FS over LVM, with noatime, data=writeback I am creating a table like: create table foo(id integer primary key); Then measuring performance with the query: insert into foo (id) select generate_series(1, 100); 130438,011 mswal_buffers=64kB, wal_sync_method=open_datasync (all defaults) 29306,847 ms wal_buffers=1MB, wal_sync_method=open_datasync 4641,113 ms wal_buffers=16MB, wal_sync_method=open_datasync ^ from 130s to 4.6 seconds by just changing wal_buffers. 5528,534 ms wal_buffers=64kB, wal_sync_method=fdatasync 4856,712 ms wal_buffers=16MB, wal_sync_method=fdatasync ^ fdatasync works well even with small wal_buffers 2911,265 mswal_buffers=16MB, fsync=off ^ Not bad, getting 60% of ideal throughput These defaults are not just hurting bulk-insert performance, but also everyone who uses synchronus_commit=off Unless fdatasync is unsafe, I'd very much want to see it as the default for 9.1 on Linux (I don't know about other platforms). I can't see any reasons why each write would need to be sync-ed if I don't commit that often. Increasing wal_buffers probably has the same effect wrt data safety. Also, the tuning guide on wiki is understating the importance of these tunables. Reading it I got the impression that some people change wal_sync_method but it's dangerous and it even literally claims about wal_buffers that 1MB is enough for some large systems But the truth is that if you want any write throughput AT ALL on a regular Linux desktop, you absolutely have to change one of these. If the defaults were better, it would be enough to set synchronous_commit=off to get all that your hardware has to offer. I was reading mailing list archives and didn't find anything against it either. Can anyone clarify the safety of wal_sync_method=fdatasync? Are there any reasons why it shouldn't be the default? Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Marti Raudsepp wrote: Unless fdatasync is unsafe, I'd very much want to see it as the default for 9.1 on Linux (I don't know about other platforms). I can't see any reasons why each write would need to be sync-ed if I don't commit that often. Increasing wal_buffers probably has the same effect wrt data safety. Writes only are sync'd out when you do a commit, or the database does a checkpoint. This issue is a performance difference introduced by a recent change to Linux. open_datasync support was just added to Linux itself very recently. It may be more safe than fdatasync on your platform. As new code it may have bugs so that it doesn't really work at all under heavy load. No one has really run those tests yet. See http://wiki.postgresql.org/wiki/Reliable_Writes for some background, and welcome to the fun of being an early adopter. The warnings in the tuning guide are there for a reason--you're in untested territory now. I haven't finished validating whether I consider 2.6.32 safe for production use or not yet, and 2.6.36 is a solid year away from being on my list for even considering it as a production database kernel. You should proceed presuming that all writes are unreliable until proven otherwise. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Sunday 31 October 2010 20:59:31 Greg Smith wrote: Writes only are sync'd out when you do a commit, or the database does a checkpoint. Hm? WAL is written out to disk after an the space provided by wal_buffers(def 8) * XLOG_BLCKSZ (def 8192) is used. The default is 64kb which you reach pretty quickly - especially after a checkpoint. With O_D?SYNC that will synchronously get written out during a normal XLogInsert if hits a page boundary. *Additionally* its gets written out at a commit if sync commit is not on. Not having a real O_DSYNC on linux until recently makes it even more dubious to have it as a default... Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Sun, Oct 31, 2010 at 21:59, Greg Smith g...@2ndquadrant.com wrote: open_datasync support was just added to Linux itself very recently. Oh I didn't realize it was a new feature. Indeed O_DSYNC support was added in 2.6.33 It seems like bad behavior on PostgreSQL's part to default to new, untested features. I have updated the tuning wiki page with my understanding of the problem: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#wal_sync_method_wal_buffers Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On 01/11/10 08:59, Greg Smith wrote: Marti Raudsepp wrote: Unless fdatasync is unsafe, I'd very much want to see it as the default for 9.1 on Linux (I don't know about other platforms). I can't see any reasons why each write would need to be sync-ed if I don't commit that often. Increasing wal_buffers probably has the same effect wrt data safety. Writes only are sync'd out when you do a commit, or the database does a checkpoint. This issue is a performance difference introduced by a recent change to Linux. open_datasync support was just added to Linux itself very recently. It may be more safe than fdatasync on your platform. As new code it may have bugs so that it doesn't really work at all under heavy load. No one has really run those tests yet. See http://wiki.postgresql.org/wiki/Reliable_Writes for some background, and welcome to the fun of being an early adopter. The warnings in the tuning guide are there for a reason--you're in untested territory now. I haven't finished validating whether I consider 2.6.32 safe for production use or not yet, and 2.6.36 is a solid year away from being on my list for even considering it as a production database kernel. You should proceed presuming that all writes are unreliable until proven otherwise. Greg, Your reply is possibly a bit confusingly worded - Marti was suggesting that fdatasync be the default - so he wouldn't be a new adopter, since this call has been implemented in the kernel for ages. I guess you were wanting to stress that *open_datasync* is the new kid, so watch out to see if he bites... Cheers Mark