Re: [HACKERS] WALInsertLock contention

2011-06-09 Thread Merlin Moncure
On Wed, Jun 8, 2011 at 11:30 PM, Merlin Moncure mmonc...@gmail.com wrote:
 The heap pages that have been marked this way may or may not have to
 be off limits from the backend other than the one that did the
 marking, and if they have to be off limits logically, there may be no
 realistic path to make them so.

After some more thought, plus a bit of off-list coaching from Haas, I
see now the whole approach is basically a non-starter due to the
above.  Heap pages *are* off limits, because once deferred they can't
be scribbled on and committed by other transactions -- that would
violate the 'wal before data' rule.  To make it 'work', you'd have to
implement shared memory machinery to do cooperative flushing as
suggested upthread (complex, nasty) or simply block on deferred
pages...which would be a deadlock factory.

Oh well.  :(

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock contention

2011-06-08 Thread Robert Haas
On Wed, Jun 8, 2011 at 1:59 AM, Merlin Moncure mmonc...@gmail.com wrote:
 There's probably an obvious explanation that I'm not seeing, ...

Yep.  :-)

 but if
 you're not delegating the work of writing the buffers out to someone
 else, why do you need to lock the per backend buffer at all?  That is,
 why does it have to be in shared memory?  Suppose that if the
 following are true:
 *) Writing qualifying data (non commit, non switch)
 *) There is room left in whatever you are copying to
 you could trylock WalInsertLock, and if failing to get it, just copy
 qualifying data into a private buffer and punt if the following are
 true...otherwise just do the current behavior.

And here it is: Writing a buffer requires a write  fsync of WAL
through the buffer LSN.  If the WAL for the buffers were completely
inaccessible to other backends, then those buffers would be pinned in
shared memory.  Which would make things very difficult at buffer
eviction time, or for checkpoints.

At any rate, even if it were possible to make it work, it'd be a
misplaced optimization.  It isn't touching shared memory - or even
touching the LWLock - that's expensive; it's the LWLock contention
that kills you, either because stuff blocks, or just because the CPUs
burn a lot of cycles fighting over cache lines.  An LWLock that is
typically taken by only one backend at a time is pretty cheap.  I
suppose I couldn't afford to be so blasé if we were trying to scale to
2048-core systems where even inserting a memory barrier is expensive
enough to worry about, but we've got a ways to go before we need to
start worrying about that.

[...snip...]
 A further refinement would be to try to jigger things so that as a
 backend fills up per-backend WAL buffers, it somehow throws them over
 the fence to one of the background processes to write out.  For
 short-running transactions, that won't really make any difference,
 since the commit will force the per-backend buffers out to the main
 buffers anyway.  But for long-running transactions it seems like it
 could be quite useful; in essence, the task of assembling the final
 WAL stream from the WAL output of individual backends becomes a
 background activity, and ideally the background process doing the work
 is the only one touching the cache lines being shuffled around.  Of
 course, to make this work, backends would need a steady supply of
 available per-backend WAL buffers.  Maybe shared buffers could be used
 for this purpose, with the buffer header being marked in some special
 way to indicate that this is what the buffer's being used for.

 That seems complicated -- plus I think the key is to distribute as
 much of the work as possible. Why would the forward lateral to the
 background processor not require a similar lock to WalInsertLock?

Well, that's the problem.  It would.  Now, in an ideal world, you
might still hope to get some benefit: only the background writer would
typically be writing to the real WAL stream, so that's not contended.
And the contention between the background writer and the individual
backends is only two-way.  There's no single point where you have
every process on the system piling on to a single lock.

But I'm not sure we can really make it work well enough to do more
than nibble around at the edges of the problem.  Consider:

INSERT INTO foo VALUES (1,2,3);

This is going to generate XLOG_HEAP_INSERT followed by
XLOG_XACT_COMMIT.  And now it wants to flush WAL.  So now you're
pretty much forced to have it go perform the serialization operation
itself, and you're right back in contention soup.  Batching two
records together and inserting them in one operation is presumably
going to be more efficient than inserting them one at a time, but not
all that much more efficient; and there are bookkeeping and memory
bandwidth costs to get there.  If we are dealing with long-running
transactions, or asynchronous commit, then this approach might have
legs -- but I suspect that in real life most transactions are small,
and the default configuration is synchronous_commit=on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock contention

2011-06-08 Thread Merlin Moncure
On Wed, Jun 8, 2011 at 7:44 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 8, 2011 at 1:59 AM, Merlin Moncure mmonc...@gmail.com wrote:
 There's probably an obvious explanation that I'm not seeing, ...

 Yep.  :-)

 but if
 you're not delegating the work of writing the buffers out to someone
 else, why do you need to lock the per backend buffer at all?  That is,
 why does it have to be in shared memory?  Suppose that if the
 following are true:
 *) Writing qualifying data (non commit, non switch)
 *) There is room left in whatever you are copying to
 you could trylock WalInsertLock, and if failing to get it, just copy
 qualifying data into a private buffer and punt if the following are
 true...otherwise just do the current behavior.

 And here it is: Writing a buffer requires a write  fsync of WAL
 through the buffer LSN.  If the WAL for the buffers were completely
 inaccessible to other backends, then those buffers would be pinned in
 shared memory.  Which would make things very difficult at buffer
 eviction time, or for checkpoints.

Well, (bear with me here) I'm not giving up that easy. Pinning a
judiciously small amount buffers into shared memory so you can recuce
congestion on the insert lock might be an acceptable trade-off in high
contention scenarios...in fact I assumed that was the whole point of
your original idea, which I still think has tremendous potential.
Obviously, you wouldn't want more than a very small percentage of
shared buffers overall (say 1-10% max) to be pinned in this way.  The
trylock is an attempt to cap the downside case so that you aren't
unnecessarily pinning buffers in say, long running i/o bound
transactions where insert lock contention is low.  Maybe you could
experiment with very small private insert buffer sizes (say 64 kb)
that would hopefully provide some of the benefits (if there are in
fact any) and mitigate potential costs.  Another tweak you could make
is that, once having trylocked and failed in a transaction and failed
acquirement, you always punt from there on in until you fill up or
need to block per ordering requirements.  Or maybe the whole thing
doesn't help at all...just trying to understand the problem better.

 At any rate, even if it were possible to make it work, it'd be a
 misplaced optimization.  It isn't touching shared memory - or even
 touching the LWLock - that's expensive; it's the LWLock contention
 that kills you, either because stuff blocks, or just because the CPUs
 burn a lot of cycles fighting over cache lines.  An LWLock that is
 typically taken by only one backend at a time is pretty cheap.  I
 suppose I couldn't afford to be so blasé if we were trying to scale to
 2048-core systems where even inserting a memory barrier is expensive
 enough to worry about, but we've got a ways to go before we need to
 start worrying about that.

Right -- although it isn't so much of an optimization (although you
still want to do everything reasonable to keep work under the lock as
light as possible, and shm-shm copy is going to be slower than
mem-shm) as a simplification trade-off.  You don't have to worry
about deadlocks messing around with your per backend buffers during
your internal 'flush', and it's generally just easier messing around
with private memory (less code, less locking, less everything).

One point i'm missing though.  Getting back to your original idea, how
does writing to shmem prevent you from having to keep buffers pinned?
I'm reading your comment here:
Those buffers are stamped with a fake LSN that
points back to the per-backend WAL buffer, and they can't be written
until the WAL has been moved from the per-backend WAL buffers to the
main WAL buffers.

That suggests to me that you have to keep them pinned anyways.  I'm
still a bit fuzzy on how the per-backend buffers being in shm conveys
any advantage.  IOW, (trying not to be obtuse) under what
circumstances would backend A want to read from or (especially) write
to backend B's wal buffer?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock contention

2011-06-08 Thread Robert Haas
On Wed, Jun 8, 2011 at 10:18 AM, Merlin Moncure mmonc...@gmail.com wrote:
 One point i'm missing though.  Getting back to your original idea, how
 does writing to shmem prevent you from having to keep buffers pinned?
 I'm reading your comment here:
 Those buffers are stamped with a fake LSN that
 points back to the per-backend WAL buffer, and they can't be written
 until the WAL has been moved from the per-backend WAL buffers to the
 main WAL buffers.

 That suggests to me that you have to keep them pinned anyways.  I'm
 still a bit fuzzy on how the per-backend buffers being in shm conveys
 any advantage.  IOW, (trying not to be obtuse) under what
 circumstances would backend A want to read from or (especially) write
 to backend B's wal buffer?

If backend A needs to evict a buffer with a fake LSN, it can go find
the WAL that needs to be serialized, do that, flush WAL, and then
evict the buffer.

IOW, backend A's private WAL buffer will not be completely private.
Only A will write to the buffer, but we don't know who will remove WAL
from the buffer and insert it into the main stream.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock contention

2011-06-08 Thread Jim Nasby
On Jun 8, 2011, at 10:15 AM, Robert Haas wrote:
 That suggests to me that you have to keep them pinned anyways.  I'm
 still a bit fuzzy on how the per-backend buffers being in shm conveys
 any advantage.  IOW, (trying not to be obtuse) under what
 circumstances would backend A want to read from or (especially) write
 to backend B's wal buffer?
 
 If backend A needs to evict a buffer with a fake LSN, it can go find
 the WAL that needs to be serialized, do that, flush WAL, and then
 evict the buffer.

Isn't the only time that you'd need to evict if you ran out of buffers? If the 
buffer was truly private, would that still be an issue?

Perhaps the only way to make that work is multiple WAL streams, as was 
originally suggested...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock contention

2011-06-08 Thread Robert Haas
On Wed, Jun 8, 2011 at 6:49 PM, Jim Nasby j...@nasby.net wrote:
 If backend A needs to evict a buffer with a fake LSN, it can go find
 the WAL that needs to be serialized, do that, flush WAL, and then
 evict the buffer.

 Isn't the only time that you'd need to evict if you ran out of buffers?

Sure, but that happens all the time.  See pg_stat_bgwriter.buffers_backend.

 If the buffer was truly private, would that still be an issue?

I'm not sure if you mean make the buffer private or make the WAL
storage arena private, but I'm pretty well convinced that neither one
can work.

 Perhaps the only way to make that work is multiple WAL streams, as was 
 originally suggested...

Maybe...  but I hope not.  I just found an academic paper on this
subject, about which I will post shortly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock contention

2011-06-08 Thread Merlin Moncure
On Wed, Jun 8, 2011 at 10:21 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 8, 2011 at 6:49 PM, Jim Nasby j...@nasby.net wrote:
 If backend A needs to evict a buffer with a fake LSN, it can go find
 the WAL that needs to be serialized, do that, flush WAL, and then
 evict the buffer.

 Isn't the only time that you'd need to evict if you ran out of buffers?

 Sure, but that happens all the time.  See pg_stat_bgwriter.buffers_backend.

 If the buffer was truly private, would that still be an issue?

 I'm not sure if you mean make the buffer private or make the WAL
 storage arena private, but I'm pretty well convinced that neither one
 can work.

You're probably right.  I think though there is enough hypothetical
upside to the private buffer case that it should be attempted just to
see what breaks. The major tricky bit is dealing with the new
pin/unpin mechanics.  I'd like to give it the 'college try'. (being
typically vain and attention seeking, this is right up my alley) :-D.

 Perhaps the only way to make that work is multiple WAL streams, as was 
 originally suggested...

If this was an easy way out all high performance file systems would
have multiple journals which you could write to concurrently (which
they don't afaik).

 Maybe...  but I hope not.  I just found an academic paper on this
 subject, about which I will post shortly.

I'm thinking that as long as your transactions have to be rigidly
ordered you have a fundamental bottleneck you can't really work
around.  One way to maybe get around this is to try and work out on
the fly if transaction 'A' functionally independent from transaction
'B' -- maybe then you could try and write them concurrently to
pre-allocated space on the log, or to separate logs maintained for
that purpose.  Good luck with that...even if you could somehow get it
to work, you would still have degenerate cases (like, 99% of real
world cases) to contend with.

Another thing you could try is to keep separate logs for rigidly
ordered data (commits, xlog switch, etc) and non rigidly ordered data
(everything else). On the non rigidly ordered side, you can
pre-allocate log space and write to it.  This is more or less a third
potential route (#1 and #2 being the shared/private buffer approaches)
of leveraging the fact that some of the data does not have to be
rigidly ordered.  Ultimately though even that could only get you so
far, because it incurs other costs and even contention on the lock for
inserting the commit records could start to bottleneck you.

If something useful squirts out of academia, I'd sure like to see it :-).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock contention

2011-06-08 Thread Robert Haas
On Wed, Jun 8, 2011 at 11:20 PM, Merlin Moncure mmonc...@gmail.com wrote:
 You're probably right.  I think though there is enough hypothetical
 upside to the private buffer case that it should be attempted just to
 see what breaks. The major tricky bit is dealing with the new
 pin/unpin mechanics.  I'd like to give it the 'college try'. (being
 typically vain and attention seeking, this is right up my alley) :-D.

Well, I think it's fairly clear what will break:

- If you make the data-file buffer completely private, then what will
happen when some other backend needs to read or write that buffer?
- If you make the XLOG spool private, you will not be able to checkpoint.

But I just work here.  Feel free to hit your head on that brick wall
all you like.  If you manage to make a hole (in the wall, not your
head), I'll be as happy as anyone to climb through...!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock contention

2011-06-08 Thread Merlin Moncure
On Wed, Jun 8, 2011 at 11:27 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 8, 2011 at 11:20 PM, Merlin Moncure mmonc...@gmail.com wrote:
 You're probably right.  I think though there is enough hypothetical
 upside to the private buffer case that it should be attempted just to
 see what breaks. The major tricky bit is dealing with the new
 pin/unpin mechanics.  I'd like to give it the 'college try'. (being
 typically vain and attention seeking, this is right up my alley) :-D.

 Well, I think it's fairly clear what will break:

 - If you make the data-file buffer completely private, then what will
 happen when some other backend needs to read or write that buffer?

The private wal buffer?  The whole point (maybe impossible) is to try
and engineer it so that the other backends *never* have to read and
write it -- from their point of view, it hasn't happened yet (even
though it has been written into some heap buffers).

Since all data action on ongoing transactions can happen at any time,
moving wal inserts into the private buffer is delaying its entry into
the log so you can avoid taking locks for pre-commit heap activity.
Doing this allows the backends doing that to pretend they are actually
did write data out into the log without breaking the 'wal before data'
rule which is effected by keeping the pin on pages with your magic LSN
(which I'm starting to wonder if it should be a flag like
BM_DEFERRED_WAL).  We essentially are moving xlog activity as far
ahead in time as possible (although in a very limited time space) in
order to combine locks and hopefully gain efficiency. It all comes
down to which rules you can bend and which you can break.

The heap pages that have been marked this way may or may not have to
be off limits from the backend other than the one that did the
marking, and if they have to be off limits logically, there may be no
realistic path to make them so.  I just don't know...I'm learning as I
go.  At the end of the day, it's all coming off as pretty fragile if
it even works, but it's fun to think about. Anyways, I'm inclined to
experiment.

 - If you make the XLOG spool private, you will not be able to checkpoint.

Correct -- but I don't think this problem is intractable, and is
really a secondary issue vs making sure the wal/heap/mvcc/backend
interactions 'work'.  The intent here is to spool only a relatively
small amount of uncommitted transaction data for a short period of
time, like 5-10 seconds.  Maybe you bite the bullet and tell everyone
to flush private WAL at checkpoint time via signal or something.
Maybe you bend the some rules on checkpoints.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock contention

2011-06-07 Thread Merlin Moncure
On Wed, Feb 16, 2011 at 11:02 PM, Robert Haas robertmh...@gmail.com wrote:
 I've been thinking about the problem of $SUBJECT, and while I know
 it's too early to think seriously about any 9.2 development, I want to
 get my thoughts down in writing while they're fresh in my head.

 It seems to me that there are two basic approaches to this problem.
 We could either split up the WAL stream into several streams, say one
 per database or one per tablespace or something of that sort, or we
 could keep it as a single stream but try not to do so much locking
 whilst in the process of getting it out the door.  Or we could try to
 do both, and maybe ultimately we'll need to.  However, if the second
 one is practical, it's got two major advantages: it'll probably be a
 lot less invasive, and it won't add any extra fsync traffic.  In
 thinking about how we might accomplish the goal of reducing lock
 contention, it occurred to me there's probably no need for the final
 WAL stream to reflect the exact order in which WAL is generated.

 For example, suppose transaction T1 inserts a tuple into table A;
 transaction T2 inserts a tuple into table B; T1 commits; T2 commits.
 The commit records need to be in the right order, and all the actions
 that are part of a given transaction need to precede the associated
 commit record, but, for example, I don't think it would matter if you
 emitted the commit record for T1 before T2's insert into B.  Or you
 could switch the order in which you logged the inserts, since they're
 not touching the same buffers.

 So here's the basic idea.  Each backend, if it so desires, is
 permitted to maintain a per-backend WAL buffer.  Per-backend WAL
 buffers live in shared memory and can be accessed by any backend, but
 the idea is that most of the time only one backend will be accessing
 them, so that the locks won't be heavily contended.  Any WAL written
 to a per-backend WAL buffer will eventually be transferred into the
 main WAL buffers, and flushed.  When a process writes to a per-backend
 WAL buffer, it writes (1) the actual WAL data and (2) the list of
 buffers affected.  Those buffers are stamped with a fake LSN that
 points back to the per-backend WAL buffer, and they can't be written
 until the WAL has been moved from the per-backend WAL buffers to the
 main WAL buffers.

 So, if a buffer with a fake LSN needs to be (a) written back to the OS
 or (b) modified by a backend other than the one that owns the fake
 LSN, this triggers a flush of the per-backend WAL buffers to the main
 WAL buffers.  When this happens, all the affected buffers get stamped
 with a real LSN and the entries are discarded from the per-backend WAL
 buffers.  Such a flush would also be needed when a backend commits or
 otherwise needs an XLOG flush, or when there's no more per-backend
 buffer space.  In theory, all of this taken together should mean that
 WAL gets pushed out in larger chunks: a transaction that does three
 inserts and commits should only need to grab WALInsertLock once,
 instead of once per heap insert, once per index insert, and again for
 the commit, though it'll l have to write a bigger chunk of data when it
 does get the lock.  It'lhave to repeatedly grab the lock on its
 per-backend WAL buffer, but ideally that's uncontended.

There's probably an obvious explanation that I'm not seeing, but if
you're not delegating the work of writing the buffers out to someone
else, why do you need to lock the per backend buffer at all?  That is,
why does it have to be in shared memory?  Suppose that if the
following are true:
*) Writing qualifying data (non commit, non switch)
*) There is room left in whatever you are copying to
you could trylock WalInsertLock, and if failing to get it, just copy
qualifying data into a private buffer and punt if the following are
true...otherwise just do the current behavior.

When you *do* get a lock, either because you got lucky or because you
had to wait anyways, you write out the data your previously staged,
fixing up the LSNs as you go.  Even if you do have to write it to
shared memory, I think your idea is a winner -- probably a fair amount
of work can get done before ultimately forced to wait...maybe enough
to change the scaling dyanmics.

 A further refinement would be to try to jigger things so that as a
 backend fills up per-backend WAL buffers, it somehow throws them over
 the fence to one of the background processes to write out.  For
 short-running transactions, that won't really make any difference,
 since the commit will force the per-backend buffers out to the main
 buffers anyway.  But for long-running transactions it seems like it
 could be quite useful; in essence, the task of assembling the final
 WAL stream from the WAL output of individual backends becomes a
 background activity, and ideally the background process doing the work
 is the only one touching the cache lines being shuffled around.  Of
 course, to make this work, backends would need a steady 

Re: [HACKERS] WALInsertLock contention

2011-02-16 Thread Tatsuo Ishii
 I've been thinking about the problem of $SUBJECT, and while I know
 it's too early to think seriously about any 9.2 development, I want to
 get my thoughts down in writing while they're fresh in my head.
 
 It seems to me that there are two basic approaches to this problem.
 We could either split up the WAL stream into several streams, say one
 per database or one per tablespace or something of that sort, or we
 could keep it as a single stream but try not to do so much locking
 whilst in the process of getting it out the door.  Or we could try to
 do both, and maybe ultimately we'll need to.  However, if the second
 one is practical, it's got two major advantages: it'll probably be a
 lot less invasive, and it won't add any extra fsync traffic.  In
 thinking about how we might accomplish the goal of reducing lock
 contention, it occurred to me there's probably no need for the final
 WAL stream to reflect the exact order in which WAL is generated.
 
 For example, suppose transaction T1 inserts a tuple into table A;
 transaction T2 inserts a tuple into table B; T1 commits; T2 commits.
 The commit records need to be in the right order, and all the actions
 that are part of a given transaction need to precede the associated
 commit record, but, for example, I don't think it would matter if you
 emitted the commit record for T1 before T2's insert into B.  Or you
 could switch the order in which you logged the inserts, since they're
 not touching the same buffers.
 
 So here's the basic idea.  Each backend, if it so desires, is
 permitted to maintain a per-backend WAL buffer.  Per-backend WAL
 buffers live in shared memory and can be accessed by any backend, but
 the idea is that most of the time only one backend will be accessing
 them, so that the locks won't be heavily contended.  Any WAL written
 to a per-backend WAL buffer will eventually be transferred into the
 main WAL buffers, and flushed.  When a process writes to a per-backend
 WAL buffer, it writes (1) the actual WAL data and (2) the list of
 buffers affected.  Those buffers are stamped with a fake LSN that
 points back to the per-backend WAL buffer, and they can't be written
 until the WAL has been moved from the per-backend WAL buffers to the
 main WAL buffers.
 
 So, if a buffer with a fake LSN needs to be (a) written back to the OS
 or (b) modified by a backend other than the one that owns the fake
 LSN, this triggers a flush of the per-backend WAL buffers to the main
 WAL buffers.  When this happens, all the affected buffers get stamped
 with a real LSN and the entries are discarded from the per-backend WAL
 buffers.  Such a flush would also be needed when a backend commits or
 otherwise needs an XLOG flush, or when there's no more per-backend
 buffer space.  In theory, all of this taken together should mean that
 WAL gets pushed out in larger chunks: a transaction that does three
 inserts and commits should only need to grab WALInsertLock once,
 instead of once per heap insert, once per index insert, and again for
 the commit, though it'll have to write a bigger chunk of data when it
 does get the lock.  It'll have to repeatedly grab the lock on its
 per-backend WAL buffer, but ideally that's uncontended.
 
 A further refinement would be to try to jigger things so that as a
 backend fills up per-backend WAL buffers, it somehow throws them over
 the fence to one of the background processes to write out.  For
 short-running transactions, that won't really make any difference,
 since the commit will force the per-backend buffers out to the main
 buffers anyway.  But for long-running transactions it seems like it
 could be quite useful; in essence, the task of assembling the final
 WAL stream from the WAL output of individual backends becomes a
 background activity, and ideally the background process doing the work
 is the only one touching the cache lines being shuffled around.  Of
 course, to make this work, backends would need a steady supply of
 available per-backend WAL buffers.  Maybe shared buffers could be used
 for this purpose, with the buffer header being marked in some special
 way to indicate that this is what the buffer's being used for.
 
 One not-so-good property of this algorithm is that the operation of
 moving per-backend WAL into the main WAL buffers requires relocking
 all the buffers whose fake LSNs now need to changed to real LSNs.
 That could possible be problematic from a performance standpoint, and
 there are deadlock risks to worry about too.
 
 Any thoughts?  Other ideas?

I vaguely recall that UNISYS used to present patches to reduce the WAL
buffer lock contention and enhanced the CPU scalability limit from 12
to 16 or so(if my memory serves). Your second idea is somewhat related
to the patches?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] WALInsertLock contention

2011-02-16 Thread Robert Haas
On Wed, Feb 16, 2011 at 11:13 PM, Tatsuo Ishii is...@postgresql.org wrote:
 I vaguely recall that UNISYS used to present patches to reduce the WAL
 buffer lock contention and enhanced the CPU scalability limit from 12
 to 16 or so(if my memory serves). Your second idea is somewhat related
 to the patches?

Not sure.  Do you have a link to the archives, or any idea when this
discussion occurred/what the subject line was?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALInsertLock contention

2011-02-16 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Feb 16, 2011 at 11:13 PM, Tatsuo Ishii is...@postgresql.org wrote:
  I vaguely recall that UNISYS used to present patches to reduce the WAL
  buffer lock contention and enhanced the CPU scalability limit from 12
  to 16 or so(if my memory serves). Your second idea is somewhat related
  to the patches?
 
 Not sure.  Do you have a link to the archives, or any idea when this
 discussion occurred/what the subject line was?

They presented at PgCon a couple of years in a row, iirc..

http://www.pgcon.org/2007/schedule/events/16.en.html

I thought there was another one but I'm not finding it atm..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WALInsertLock contention

2011-02-16 Thread Tatsuo Ishii
 Not sure.  Do you have a link to the archives, or any idea when this
 discussion occurred/what the subject line was?
 
 They presented at PgCon a couple of years in a row, iirc..
 
 http://www.pgcon.org/2007/schedule/events/16.en.html

Yes, this one. On page 18, they talked about their customized version
of PostgreSQL called Postgres 8.2.4-uis:

Change WALInsertLock access
  $(Q#|(B Using SpinLockAcquire () as WALInsertLock locked most of time
  $(Q#|(B Considering a queue mechanism for WALInsertLock

I'm not sure if they brought their patches to public or not though...
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers