Re: [GENERAL] Maximum transaction rate

2009-03-30 Thread Marco Colombo
Markus Wanner wrote: Hi, Martijn van Oosterhout wrote: And fsync better do what you're asking (how fast is just a performance issue, just as long as it's done). Where are we on this issue? I've read all of this thread and the one on the lvm-linux mailing list as well, but still don't

Re: [GENERAL] Maximum transaction rate

2009-03-25 Thread Markus Wanner
Hi, Martijn van Oosterhout wrote: And fsync better do what you're asking (how fast is just a performance issue, just as long as it's done). Where are we on this issue? I've read all of this thread and the one on the lvm-linux mailing list as well, but still don't feel confident. In the

Re: [GENERAL] Maximum transaction rate

2009-03-20 Thread Martijn van Oosterhout
On Thu, Mar 19, 2009 at 12:49:52AM +0100, Marco Colombo wrote: It has to wait for I/O completion on write(), then, it has to go to sleep. If two different processes do a write(), you don't know which will be awakened first. Preallocation don't mean much here, since with O_SYNC you expect a

Re: [GENERAL] Maximum transaction rate

2009-03-20 Thread Marco Colombo
Martijn van Oosterhout wrote: True, but the relative wakeup order of two different processes is not important since by definition they are working on different transactions. As long as the WAL writes for a single transaction (in a single process) are not reordered you're fine. I'm not totally

Re: [GENERAL] Maximum transaction rate

2009-03-20 Thread Marco Colombo
Ron Mayer wrote: Marco Colombo wrote: Yes, but we knew it already, didn't we? It's always been like that, with IDE disks and write-back cache enabled, fsync just waits for the disk reporting completion and disks lie about I've looked hard, and I have yet to see a disk that lies. No, lie in

Re: [GENERAL] Maximum transaction rate

2009-03-19 Thread Joshua D. Drake
Hello, As a continued follow up to this thread, Tim Post replied on the LVM list to this affect: If a logical volume spans physical devices where write caching is enabled, the results of fsync() can not be trusted. This is an issue with device mapper, lvm is one of a few possible customers of

Re: [GENERAL] Maximum transaction rate

2009-03-19 Thread Ron Mayer
Marco Colombo wrote: Yes, but we knew it already, didn't we? It's always been like that, with IDE disks and write-back cache enabled, fsync just waits for the disk reporting completion and disks lie about I've looked hard, and I have yet to see a disk that lies. ext3, OTOH seems to lie. IDE

Re: [GENERAL] Maximum transaction rate

2009-03-19 Thread Baron Schwartz
I am jumping into this thread late, and maybe this has already been stated clearly, but from my experience benchmarking, LVM does *not* lie about fsync() on the servers I've configured. An fsync() goes to the physical device. You can see it clearly by setting the write cache on the RAID

Re: [GENERAL] Maximum transaction rate

2009-03-18 Thread Ron Mayer
Marco Colombo wrote: Ron Mayer wrote: Greg Smith wrote: There are some known limitations to Linux fsync that I remain somewhat concerned about, independantly of LVM, like ext3 fsync() only does a journal commit when the inode has changed (see

Re: [GENERAL] Maximum transaction rate

2009-03-18 Thread Marco Colombo
Greg Smith wrote: On Wed, 18 Mar 2009, Marco Colombo wrote: If you fsync() after each write you want ordered, there can't be any subsequent I/O (unless there are many different processes cuncurrently writing to the file w/o synchronization). Inside PostgreSQL, each of the database backend

Re: [GENERAL] Maximum transaction rate

2009-03-18 Thread Martijn van Oosterhout
On Wed, Mar 18, 2009 at 10:58:39PM +0100, Marco Colombo wrote: I hope it's full defence. If you have two processes doing at the same time write(); fsycn(); on the same file, either there are no order requirements, or it will boom sooner or later... fsync() works inside a single process, but

Re: [GENERAL] Maximum transaction rate

2009-03-18 Thread Marco Colombo
Ron Mayer wrote: Marco Colombo wrote: Ron Mayer wrote: Greg Smith wrote: There are some known limitations to Linux fsync that I remain somewhat concerned about, independantly of LVM, like ext3 fsync() only does a journal commit when the inode has changed (see

Re: [GENERAL] Maximum transaction rate

2009-03-18 Thread Greg Smith
On Wed, 18 Mar 2009, Martijn van Oosterhout wrote: Generally PG uses O_SYNC on open Only if you change wal_sync_method=open_sync. That's the very last option PostgreSQL will try--only if none of the other are available will it use that. Last time I checked the defaults value for that

Re: [GENERAL] Maximum transaction rate

2009-03-18 Thread Marco Colombo
Martijn van Oosterhout wrote: Generally PG uses O_SYNC on open, so it's only one system call, not two. And the file it's writing to is generally preallocated (not always though). It has to wait for I/O completion on write(), then, it has to go to sleep. If two different processes do a write(),

Re: [GENERAL] Maximum transaction rate

2009-03-17 Thread Marco Colombo
John R Pierce wrote: Stefan Kaltenbrunner wrote: So in my understanding LVM is safe on disks that have write cache disabled or behave as one (like a controller with a battery backed cache). what about drive write caches on battery backed raid controllers? do the controllers ensure the

Re: [GENERAL] Maximum transaction rate

2009-03-17 Thread Greg Smith
On Tue, 17 Mar 2009, Marco Colombo wrote: If LVM/dm is lying about fsync(), all this is moot. There's no point talking about disk caches. I decided to run some tests to see what's going on there, and it looks like some of my quick criticism of LVM might not actually be valid--it's only the

Re: [GENERAL] Maximum transaction rate

2009-03-17 Thread Ron Mayer
Greg Smith wrote: There are some known limitations to Linux fsync that I remain somewhat concerned about, independantly of LVM, like ext3 fsync() only does a journal commit when the inode has changed (see http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/990504 ). The way files are

Re: [GENERAL] Maximum transaction rate

2009-03-17 Thread Greg Smith
On Tue, 17 Mar 2009, Ron Mayer wrote: I wonder if there should be an optional fsync mode in postgres should turn fsync() into fchmod (fd, 0644); fchmod (fd, 0664); to work around this issue. The test I haven't had time to run yet is to turn the bug exposing program you were fiddling with

Re: [GENERAL] Maximum transaction rate

2009-03-17 Thread Marco Colombo
Greg Smith wrote: On Tue, 17 Mar 2009, Marco Colombo wrote: If LVM/dm is lying about fsync(), all this is moot. There's no point talking about disk caches. I decided to run some tests to see what's going on there, and it looks like some of my quick criticism of LVM might not actually be

Re: [GENERAL] Maximum transaction rate

2009-03-17 Thread Marco Colombo
Ron Mayer wrote: Greg Smith wrote: There are some known limitations to Linux fsync that I remain somewhat concerned about, independantly of LVM, like ext3 fsync() only does a journal commit when the inode has changed (see http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/990504 ). The

Re: [GENERAL] Maximum transaction rate

2009-03-17 Thread Greg Smith
On Wed, 18 Mar 2009, Marco Colombo wrote: If you fsync() after each write you want ordered, there can't be any subsequent I/O (unless there are many different processes cuncurrently writing to the file w/o synchronization). Inside PostgreSQL, each of the database backend processes ends up

Re: [GENERAL] Maximum transaction rate

2009-03-16 Thread Stefan Kaltenbrunner
Tom Lane wrote: Jack Orenstein jack.orenst...@hds.com writes: The transaction rates I'm getting seem way too high: 2800-2900 with one thread, 5000-7000 with ten threads. I'm guessing that writes aren't really reaching the disk. Can someone suggest how to figure out where, below postgres,

Re: [GENERAL] Maximum transaction rate

2009-03-16 Thread Scott Marlowe
On Mon, Mar 16, 2009 at 2:03 PM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: So in my understanding LVM is safe on disks that have write cache disabled or behave as one (like a controller with a battery backed cache). For storage with write caches it seems to be unsafe, even if the

Re: [GENERAL] Maximum transaction rate

2009-03-16 Thread John R Pierce
Stefan Kaltenbrunner wrote: So in my understanding LVM is safe on disks that have write cache disabled or behave as one (like a controller with a battery backed cache). what about drive write caches on battery backed raid controllers? do the controllers ensure the drive cache gets flushed

Re: [GENERAL] Maximum transaction rate

2009-03-16 Thread Stefan Kaltenbrunner
Scott Marlowe wrote: On Mon, Mar 16, 2009 at 2:03 PM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: So in my understanding LVM is safe on disks that have write cache disabled or behave as one (like a controller with a battery backed cache). For storage with write caches it seems to be

Re: [GENERAL] Maximum transaction rate

2009-03-15 Thread Marco Colombo
Joshua D. Drake wrote: I understand but disabling cache is not an option for anyone I know. So I need to know the other :) Joshua D. Drake Come on, how many people/organizations do you know who really need 30+ MB/s sustained write throughtput in the disk subsystem but can't afford a

Re: [GENERAL] Maximum transaction rate

2009-03-14 Thread Joshua D. Drake
On Sat, 2009-03-14 at 05:25 +0100, Marco Colombo wrote: Scott Marlowe wrote: Also see: http://lkml.org/lkml/2008/2/26/41 but it seems to me that all this discussion is under the assuption that disks have write-back caches. The alternative is to disable the disk write cache. says it all. If

Re: [GENERAL] Maximum transaction rate

2009-03-14 Thread Marco Colombo
Joshua D. Drake wrote: On Sat, 2009-03-14 at 05:25 +0100, Marco Colombo wrote: Scott Marlowe wrote: Also see: http://lkml.org/lkml/2008/2/26/41 but it seems to me that all this discussion is under the assuption that disks have write-back caches. The alternative is to disable the disk

Re: [GENERAL] Maximum transaction rate

2009-03-14 Thread Joshua D. Drake
On Sun, 2009-03-15 at 01:48 +0100, Marco Colombo wrote: Joshua D. Drake wrote: On Sat, 2009-03-14 at 05:25 +0100, Marco Colombo wrote: Scott Marlowe wrote: Also see: http://lkml.org/lkml/2008/2/26/41 but it seems to me that all this discussion is under the assuption that disks have

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Marco Colombo
Scott Marlowe wrote: On Fri, Mar 6, 2009 at 2:22 PM, Ben Chobot be...@silentmedia.com wrote: On Fri, 6 Mar 2009, Greg Smith wrote: On Fri, 6 Mar 2009, Tom Lane wrote: Otherwise you need to reconfigure your drive to not cache writes. I forget the incantation for that but it's in the PG

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Tom Lane
Marco Colombo pg...@esiway.net writes: And I'm still wondering. The problem with LVM, AFAIK, is missing support for write barriers. Once you disable the write-back cache on the disk, you no longer need write barriers. So I'm missing something, what else does LVM do to break fsync()? I think

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Marco Colombo
Tom Lane wrote: Marco Colombo pg...@esiway.net writes: And I'm still wondering. The problem with LVM, AFAIK, is missing support for write barriers. Once you disable the write-back cache on the disk, you no longer need write barriers. So I'm missing something, what else does LVM do to break

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Tom Lane
Marco Colombo pg...@esiway.net writes: You mean some layer (LVM) is lying about the fsync()? Got it in one. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Joshua D. Drake
On Fri, 2009-03-13 at 14:00 -0400, Tom Lane wrote: Marco Colombo pg...@esiway.net writes: You mean some layer (LVM) is lying about the fsync()? Got it in one. I wouldn't think this would be a problem with the proper battery backed raid controller correct? Joshua D. Drake

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Ben Chobot
On Fri, 13 Mar 2009, Joshua D. Drake wrote: On Fri, 2009-03-13 at 14:00 -0400, Tom Lane wrote: Marco Colombo pg...@esiway.net writes: You mean some layer (LVM) is lying about the fsync()? Got it in one. I wouldn't think this would be a problem with the proper battery backed raid

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Joshua D. Drake
On Fri, 2009-03-13 at 11:17 -0700, Ben Chobot wrote: On Fri, 13 Mar 2009, Joshua D. Drake wrote: On Fri, 2009-03-13 at 14:00 -0400, Tom Lane wrote: Marco Colombo pg...@esiway.net writes: You mean some layer (LVM) is lying about the fsync()? Got it in one. I wouldn't think this

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Ben Chobot
On Fri, 13 Mar 2009, Joshua D. Drake wrote: It seems to me that all you get with a BBU-enabled card is the ability to get burts of writes out of the OS faster. So you still have the problem, it's just less like to be encountered. A BBU controller is about more than that. It is also supposed

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Joshua D. Drake
On Fri, 2009-03-13 at 11:41 -0700, Ben Chobot wrote: On Fri, 13 Mar 2009, Joshua D. Drake wrote: Of course. But if you can't reliably flush the OS buffers (because, say, you're using LVM so fsync() doesn't work), then you can't say what actually has made it to the safety of the raid card.

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Joshua D. Drake
On Fri, 2009-03-13 at 11:41 -0700, Ben Chobot wrote: On Fri, 13 Mar 2009, Joshua D. Drake wrote: It seems to me that all you get with a BBU-enabled card is the ability to get burts of writes out of the OS faster. So you still have the problem, it's just less like to be encountered. A

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Christophe
On Mar 13, 2009, at 11:59 AM, Joshua D. Drake wrote: Wait, actually a good BBU RAID controller will disable the cache on the drives. So everything that is cached is already on the controller vs. the drives itself. Or am I missing something? Maybe I'm missing something, but a BBU controller

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Scott Marlowe
On Fri, Mar 13, 2009 at 1:09 PM, Christophe x...@thebuild.com wrote: On Mar 13, 2009, at 11:59 AM, Joshua D. Drake wrote: Wait, actually a good BBU RAID controller will disable the cache on the drives. So everything that is cached is already on the controller vs. the drives itself. Or am I

Re: [GENERAL] Maximum transaction rate

2009-03-13 Thread Marco Colombo
Scott Marlowe wrote: On Fri, Mar 13, 2009 at 1:09 PM, Christophe x...@thebuild.com wrote: So, if the software calls fsync, but fsync doesn't actually push the data to the controller, you are still at risk... right? Ding! I've been doing some googling, now I'm not sure that not supporting

[GENERAL] Maximum transaction rate

2009-03-06 Thread Jack Orenstein
I'm using postgresql 8.3.6 through JDBC, and trying to measure the maximum transaction rate on a given Linux box. I wrote a test program that: - Creates a table with two int columns and no indexes, - loads the table through a configurable number of threads, with each transaction writing one

Re: [GENERAL] Maximum transaction rate

2009-03-06 Thread Tom Lane
Jack Orenstein jack.orenst...@hds.com writes: The transaction rates I'm getting seem way too high: 2800-2900 with one thread, 5000-7000 with ten threads. I'm guessing that writes aren't really reaching the disk. Can someone suggest how to figure out where, below postgres, someone is lying

Re: [GENERAL] Maximum transaction rate

2009-03-06 Thread Greg Smith
On Fri, 6 Mar 2009, Tom Lane wrote: Otherwise you need to reconfigure your drive to not cache writes. I forget the incantation for that but it's in the PG list archives. There's a dicussion of this in the docs now, http://www.postgresql.org/docs/8.3/interactive/wal-reliability.html hdparm

Re: [GENERAL] Maximum transaction rate

2009-03-06 Thread Ben Chobot
On Fri, 6 Mar 2009, Greg Smith wrote: On Fri, 6 Mar 2009, Tom Lane wrote: Otherwise you need to reconfigure your drive to not cache writes. I forget the incantation for that but it's in the PG list archives. There's a dicussion of this in the docs now,

Re: [GENERAL] Maximum transaction rate

2009-03-06 Thread Scott Marlowe
On Fri, Mar 6, 2009 at 2:22 PM, Ben Chobot be...@silentmedia.com wrote: On Fri, 6 Mar 2009, Greg Smith wrote: On Fri, 6 Mar 2009, Tom Lane wrote:  Otherwise you need to reconfigure your drive to not cache writes.  I forget the incantation for that but it's in the PG list archives. There's

Re: [GENERAL] Maximum transaction rate

2009-03-06 Thread Greg Smith
On Fri, 6 Mar 2009, Ben Chobot wrote: How does turning off write caching on the disk stop the problem with LVM? It doesn't. Linux LVM is awful and broken, I was just suggesting more details on what you still need to check even when it's not involved. -- * Greg Smith gsm...@gregsmith.com