Re: [HACKERS] [PATCH] 2PC state files on shared memory

2009-08-10 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 After making a lot of tests, state file size is not more than 600B.
 In some cases, it reached a maximum of size of 712B and I used such
 transactions in my tests.

I can only say that that demonstrates you didn't test very many cases.
It is trivial to generate enormous state files --- try something with
a lot of subtransactions, for example, or a lot of files created or
deleted.  I remain of the opinion that asking users to estimate the
amount of shared memory needed for this patch will cripple its
usability.  We learned that lesson the hard way for FSM, I see no
reason we have to fail to learn from experience.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] 2PC state files on shared memory

2009-08-09 Thread Michael Paquier
After making a lot of tests, state file size is not more than 600B.
In some cases, it reached a maximum of size of 712B and I used such
transactions in my tests.

 I think setting the size parameter for this would be a frightfully
 difficult problem; the fact that average installations wouldn't use it
 doesn't make that any better for those who would.  After our bad
 experiences with fixed-size FSM, I'm pretty wary of introducing new
 fixed-size structures that the user is expected to figure out how to
 size.
The patch has been designed such as if a state file has a size higher than
what has been decided by the user,
it will be written to disk instead of shared memory. So it will not
represent a danger for teh stability of the system.
The case of too many prepared transactions is also covered thanks to
max_prepared_transactions.

Regards,

-- 
Michael Paquier

NTT OSSC


Re: [HACKERS] [PATCH] 2PC state files on shared memory

2009-08-08 Thread Heikki Linnakangas
Tom Lane wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 Based on an idea of Heikki Linnakangas, here is a patch in order to improve
 2PC
 by sending the state files of prepared transactions to shared memory instead
 of disk.
 
 I don't understand how this can possibly work.  The entire point of
 2PC is that the state file is guaranteed to be on disk so it will
 survive a crash.  What good is it if it's in shared memory?

The state files are not fsync'd when they're written, but a copy is
written to WAL so that it can be replayed on crash. With this patch,
it's still written to WAL, but the write to a file on disk is skipped,
and it's stored in shared memory instead.

 Quite aside from that, the fixed size of shared memory makes this seem
 pretty impractical.

Most state files are small. If one doesn't fit in the area reserved for
this, it's written to disk as usual. It's just an optimization.

I'm a bit disappointed by the performance gains. I would've expected
more, given a decent battery-backed-up cache to buffer the WAL fsyncs.
But it looks like they're still causing the most overhead, even with a
battery-backed-up cache.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] 2PC state files on shared memory

2009-08-08 Thread Robert Haas
On Sat, Aug 8, 2009 at 9:31 AM, Heikki
Linnakangasheikki.linnakan...@enterprisedb.com wrote:
 Tom Lane wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 Based on an idea of Heikki Linnakangas, here is a patch in order to improve
 2PC
 by sending the state files of prepared transactions to shared memory instead
 of disk.

 I don't understand how this can possibly work.  The entire point of
 2PC is that the state file is guaranteed to be on disk so it will
 survive a crash.  What good is it if it's in shared memory?

 The state files are not fsync'd when they're written, but a copy is
 written to WAL so that it can be replayed on crash. With this patch,
 it's still written to WAL, but the write to a file on disk is skipped,
 and it's stored in shared memory instead.

 Quite aside from that, the fixed size of shared memory makes this seem
 pretty impractical.

 Most state files are small. If one doesn't fit in the area reserved for
 this, it's written to disk as usual. It's just an optimization.

 I'm a bit disappointed by the performance gains. I would've expected
 more, given a decent battery-backed-up cache to buffer the WAL fsyncs.
 But it looks like they're still causing the most overhead, even with a
 battery-backed-up cache.

It doesn't seem that surprising to me that a write to shared memory
and a write to an un-fsync'd file would be about the same speed.  The
file write will eventually generate some I/O when it goes to disk, but
at the time you make the system call it's basically just a memory
copy.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] 2PC state files on shared memory

2009-08-08 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 Quite aside from that, the fixed size of shared memory makes this seem
 pretty impractical.

 Most state files are small. If one doesn't fit in the area reserved for
 this, it's written to disk as usual. It's just an optimization.

What evidence do you have for that assumption?  And what's small anyway?
I think setting the size parameter for this would be a frightfully
difficult problem; the fact that average installations wouldn't use it
doesn't make that any better for those who would.  After our bad
experiences with fixed-size FSM, I'm pretty wary of introducing new
fixed-size structures that the user is expected to figure out how to
size.

 I'm a bit disappointed by the performance gains. I would've expected
 more, given a decent battery-backed-up cache to buffer the WAL fsyncs.
 But it looks like they're still causing the most overhead, even with a
 battery-backed-up cache.

If you can't demonstrate order-of-magnitude speedups, I think we
shouldn't touch this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] 2PC state files on shared memory

2009-08-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Aug 8, 2009 at 9:31 AM, Heikki
 Linnakangasheikki.linnakan...@enterprisedb.com wrote:
 I'm a bit disappointed by the performance gains. I would've expected
 more, given a decent battery-backed-up cache to buffer the WAL fsyncs.

 It doesn't seem that surprising to me that a write to shared memory
 and a write to an un-fsync'd file would be about the same speed.

I just had a second thought about this.  The idea is to avoid writing
the separate 2PC state file until/unless it has to be checkpointed.
(And, per the comments for CheckPointTwoPhase, that is an uncommon
case --- especially now with our time-extended checkpoints.)

What if PREPARE simply didn't write the 2PC file at all, except into WAL?
Then, make CheckPointTwoPhase write the 2PC file for any still-live
GXACT, by means of reaching into the WAL and pulling the data out.
All it would need for that is the LSN of the WAL record, which I think
the GXACT has already.  (It might have the end location rather than
the start, but in any case we could store both.)  Similarly, COMMIT
PREPARED could be taught to pull the data from WAL instead of a 2PC
file, in the typical case where the file didn't exist yet.  I think
there might be some synchronization issues against checkpoints --- you
couldn't recycle WAL until you were sure there was no COMMIT PREPARED
pulling from it.  But it seems possibly workable, and there's no tuning
knob needed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] 2PC state files on shared memory

2009-08-08 Thread Heikki Linnakangas
Tom Lane wrote:
 What if PREPARE simply didn't write the 2PC file at all, except into WAL?
 Then, make CheckPointTwoPhase write the 2PC file for any still-live
 GXACT, by means of reaching into the WAL and pulling the data out.
 All it would need for that is the LSN of the WAL record, which I think
 the GXACT has already.  (It might have the end location rather than
 the start, but in any case we could store both.)  Similarly, COMMIT
 PREPARED could be taught to pull the data from WAL instead of a 2PC
 file, in the typical case where the file didn't exist yet.  I think
 there might be some synchronization issues against checkpoints --- you
 couldn't recycle WAL until you were sure there was no COMMIT PREPARED
 pulling from it.  But it seems possibly workable, and there's no tuning
 knob needed.

Interesting idea, might be worth performance testing. Peeking into the
WAL files during normal operation feels naughty, but it should work.
However, if the bottleneck is the WAL fsyncs, I doubt it's any faster
than Michael's current patch.

Actually, it would be interesting to performance test a stripped down
broken implementation that doesn't write the state files anywhere but
WAL, PREPARE releases all locks like regular COMMIT does, and COMMIT
PREPARED just writes the commit record and fsyncs. That would give an
upper bound on how much gain any of these patches can have. If that's
not much, we can throw in the towel.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] 2PC state files on shared memory

2009-08-08 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 What if PREPARE simply didn't write the 2PC file at all, except into WAL?

 Interesting idea, might be worth performance testing. Peeking into the
 WAL files during normal operation feels naughty, but it should work.
 However, if the bottleneck is the WAL fsyncs, I doubt it's any faster
 than Michael's current patch.

This isn't about faster, it's about not requiring users to estimate
a suitable size for a shared-memory arena.

 Actually, it would be interesting to performance test a stripped down
 broken implementation that doesn't write the state files anywhere but
 WAL, PREPARE releases all locks like regular COMMIT does, and COMMIT
 PREPARED just writes the commit record and fsyncs. That would give an
 upper bound on how much gain any of these patches can have. If that's
 not much, we can throw in the towel.

Good idea --- although I would think that the performance of 2PC would
be pretty context-dependent anyway.  What load would you test under?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] 2PC state files on shared memory

2009-08-07 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 Based on an idea of Heikki Linnakangas, here is a patch in order to improve
 2PC
 by sending the state files of prepared transactions to shared memory instead
 of disk.

I don't understand how this can possibly work.  The entire point of
2PC is that the state file is guaranteed to be on disk so it will
survive a crash.  What good is it if it's in shared memory?

Quite aside from that, the fixed size of shared memory makes this seem
pretty impractical.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers