On Sat, Aug 8, 2009 at 9:31 AM, Heikki Linnakangas<heikki.linnakan...@enterprisedb.com> wrote: > Tom Lane wrote: >> Michael Paquier <michael.paqu...@gmail.com> writes: >>> Based on an idea of Heikki Linnakangas, here is a patch in order to improve >>> 2PC >>> by sending the state files of prepared transactions to shared memory instead >>> of disk. >> >> I don't understand how this can possibly work. The entire point of >> 2PC is that the state file is guaranteed to be on disk so it will >> survive a crash. What good is it if it's in shared memory? > > The state files are not fsync'd when they're written, but a copy is > written to WAL so that it can be replayed on crash. With this patch, > it's still written to WAL, but the write to a file on disk is skipped, > and it's stored in shared memory instead. > >> Quite aside from that, the fixed size of shared memory makes this seem >> pretty impractical. > > Most state files are small. If one doesn't fit in the area reserved for > this, it's written to disk as usual. It's just an optimization. > > I'm a bit disappointed by the performance gains. I would've expected > more, given a decent battery-backed-up cache to buffer the WAL fsyncs. > But it looks like they're still causing the most overhead, even with a > battery-backed-up cache.
It doesn't seem that surprising to me that a write to shared memory and a write to an un-fsync'd file would be about the same speed. The file write will eventually generate some I/O when it goes to disk, but at the time you make the system call it's basically just a memory copy. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers