Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Sat, 2009-03-14 at 12:09 -0400, Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  WALInsertLock is also quite high on Jignesh's list. That I've seen 
  become the bottleneck on other tests too.
 
 Yeah, that's been seen to be an issue before.  I had the germ of an idea
 about how to fix that:
 
   ... with no lock, determine size of WAL record ...
   obtain WALInsertLock
   identify WAL start address of my record, advance insert pointer
   past record end
   *release* WALInsertLock
   without lock, copy record into the space just reserved
 
 The idea here is to allow parallelization of the copying of data into
 the buffers.  The hold time on WALInsertLock would be very short.  Maybe
 it could even become a spinlock, though I'm not sure, because the
 advance insert pointer bit is more complicated than it looks (you have
 to allow for the extra overhead when crossing a WAL page boundary).
 
 Now the fly in the ointment is that there would need to be some way to
 ensure that we didn't write data out to disk until it was valid; in
 particular how do we implement a request to flush WAL up to a particular
 LSN value, when maybe some of the records before that haven't been fully
 transferred into the buffers yet?  The best idea I've thought of so far
 is shared/exclusive locks on the individual WAL buffer pages, with the
 rather unusual behavior that writers of the page would take shared lock
 and only the reader (he who has to dump to disk) would take exclusive
 lock.  But maybe there's a better way.  Currently I don't believe that
 dumping a WAL buffer (WALWriteLock) blocks insertion of new WAL data,
 and it would be nice to preserve that property.

Yeh, that's just what we'd discussed previously:
http://markmail.org/message/gectqy3yzvjs2hru#query:Reworking%20WAL%
20locking+page:1+mid:gectqy3yzvjs2hru+state:results

Are you thinking of doing this for 8.4? :-)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote:
 One possibility would be for the locks to alternate between exclusive
 and 
 shared - that is:
 
 1. Take a snapshot of all shared waits, and grant them all -
 thundering
  herd style.
 2. Wait until ALL of them have finished, granting no more.
 3. Take a snapshot of all exclusive waits, and grant them all, one by
 one.
 4. Wait until all of them have been finished, granting no more.
 5. Back to (1)

I agree with that, apart from the granting no more bit.

Currently we queue up exclusive locks, but there is no need to since for
ProcArrayLock commits are all changing different data.

The most useful behaviour is just to have two modes:
* exclusive-lock held - all other x locks welcome, s locks queue
* shared-lock held - all other s locks welcome, x locks queue

This *only* works for ProcArrayLock.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance of archive logging in a PITR restore

2009-03-18 Thread Heikki Linnakangas

Joshua D. Drake wrote:

On Mon, 2009-03-16 at 12:11 -0400, Mark Steben wrote:
The issue is that during a restore on a remote site, (Postgres 8.2.5) 


8.2.5 is quite old. You should upgrade to the latest 8.2.X release.


archived logs are taking an average of 35 – 40 seconds apiece to
restore.  


Archive logs are restored in a serialized manner so they will be slower
to restore in general.


Yeah, if you have several concurrent processes on the primary doing I/O 
and generating log, at restore the I/O will be serialized.


Version 8.3 is significantly better with this (as long as you don't 
disable full_page_writes). In earlier versions, each page referenced in 
the WAL was read from the filesystem, only to be replaced with the full 
page image. In 8.3, we skip the read and just write over the page image. 
Depending on your application, that can make a very dramatic difference 
to restore time.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Gregory Stark

Jignesh K. Shah j.k.s...@sun.com writes:

 In next couple of weeks I plan to test the patch on a different x64 based
 system to do a sanity testing on lower number of cores and also try out other
 workloads ...

I'm actually more interested in the large number of cores but fewer processes
and lower max_connections. If you set max_connections to 64 and eliminate the
wait time you should, in theory, be able to get 100% cpu usage. It would be
very interesting to track down the contention which is preventing that.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Matthew Wakeling

On Wed, 18 Mar 2009, Simon Riggs wrote:

I agree with that, apart from the granting no more bit.

The most useful behaviour is just to have two modes:
* exclusive-lock held - all other x locks welcome, s locks queue
* shared-lock held - all other s locks welcome, x locks queue


The problem with making all other locks welcome is that there is a 
possibility of starvation. Imagine a case where there is a constant stream 
of shared locks - the exclusive locks may never actually get hold of the 
lock under the all other shared locks welcome strategy. Likewise with 
the reverse.


Taking a snapshot and queueing all newer locks forces fairness in the 
locking strategy, and avoids one of the sides getting starved.


Matthew

--
I've run DOOM more in the last few days than I have the last few
months.  I just love debugging ;-)  -- Linus Torvalds

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Heikki Linnakangas

Matthew Wakeling wrote:

On Sat, 14 Mar 2009, Heikki Linnakangas wrote:
It's going require some hard thinking to bust that bottleneck. I've 
sometimes thought about maintaining a pre-calculated array of 
in-progress XIDs in shared memory. GetSnapshotData would simply 
memcpy() that to private memory, instead of collecting the xids from 
ProcArray.


Shifting the contention from reading that data to altering it. But that 
would probably be quite a lot fewer times, so it would be a benefit.


It's true that it would shift work from reading (GetSnapshotData) to 
modifying (xact end) the ProcArray. Which could actually be much worse: 
when modifying, you hold an ExclusiveLock, but readers only hold a 
SharedLock. I don't think it's that bad in reality since at transaction 
end you would only need to remove your own xid from an array. That 
should be very fast, especially if you know exactly where in the array 
your own xid is.



On Sat, 14 Mar 2009, Tom Lane wrote:

Now the fly in the ointment is that there would need to be some way to
ensure that we didn't write data out to disk until it was valid; in
particular how do we implement a request to flush WAL up to a particular
LSN value, when maybe some of the records before that haven't been fully
transferred into the buffers yet?  The best idea I've thought of so far
is shared/exclusive locks on the individual WAL buffer pages, with the
rather unusual behavior that writers of the page would take shared lock
and only the reader (he who has to dump to disk) would take exclusive
lock.  But maybe there's a better way.  Currently I don't believe that
dumping a WAL buffer (WALWriteLock) blocks insertion of new WAL data,
and it would be nice to preserve that property.


The writers would need to take a shared lock on the page before 
releasing the lock that marshals access to the how long is the log 
data. Other than that, your idea would work.


An alternative would be to maintain a concurrent linked list of WAL 
writes in progress. An entry would be added to the tail every time a new 
writer is generated, marking the end of the log. When a writer finishes, 
it can remove the entry from the list very cheaply and with very little 
contention. The reader (who dumps the WAL to disc) need only look at the 
head of the list to find out how far the log is completed, because the 
list is guaranteed to be in order of position in the log.


A linked list or an array of in-progress writes was my first thought as 
well. But the real problem is: how does the reader wait until all WAL up 
to X have been written? It could poll, but that's inefficient.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Matthew Wakeling

On Wed, 18 Mar 2009, Heikki Linnakangas wrote:
A linked list or an array of in-progress writes was my first thought as well. 
But the real problem is: how does the reader wait until all WAL up to X have 
been written? It could poll, but that's inefficient.


Good point - waiting for an exclusive lock on a page is a pretty easy way 
to wake up at the right time.


However, is there not some way to wait for a notify? I'm no C expert, but 
in Java that's one of the most fundamental features of a lock.


Matthew

--
A bus station is where buses stop.
A train station is where trains stop.
On my desk, I have a workstation.

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote:
 On Wed, 18 Mar 2009, Simon Riggs wrote:
  I agree with that, apart from the granting no more bit.
 
  The most useful behaviour is just to have two modes:
  * exclusive-lock held - all other x locks welcome, s locks queue
  * shared-lock held - all other s locks welcome, x locks queue
 
 The problem with making all other locks welcome is that there is a 
 possibility of starvation. Imagine a case where there is a constant stream 
 of shared locks - the exclusive locks may never actually get hold of the 
 lock under the all other shared locks welcome strategy. 

That's exactly what happens now. 

 Likewise with the reverse.

I think it depends upon how frequently requests arrive. Commits cause X
locks and we don't commit that often, so its very unlikely that we'd see
a constant stream of X locks and prevent shared lockers.


Some comments from an earlier post on this topic (about 20 months ago):

Since shared locks are currently queued behind exclusive requests
when they cannot be immediately satisfied, it might be worth
reconsidering the way LWLockRelease works also. When we wake up the
queue we only wake the Shared requests that are adjacent to the head of
the queue. Instead we could wake *all* waiting Shared requestors.

e.g. with a lock queue like this:
(HEAD)  S-S-X-S-X-S-X-S
Currently we would wake the 1st and 2nd waiters only. 

If we were to wake the 3rd, 5th and 7th waiters also, then the queue
would reduce in length very quickly, if we assume generally uniform
service times. (If the head of the queue is X, then we wake only that
one process and I'm not proposing we change that). That would mean queue
jumping right? Well thats what already happens in other circumstances,
so there cannot be anything intrinsically wrong with allowing it, the
only question is: would it help? 

We need not wake the whole queue, there may be some generally more
beneficial heuristic. The reason for considering this is not to speed up
Shared requests but to reduce the queue length and thus the waiting time
for the Xclusive requestors. Each time a Shared request is dequeued, we
effectively re-enable queue jumping, so a Shared request arriving during
that point will actually jump ahead of Shared requests that were unlucky
enough to arrive while an Exclusive lock was held. Worse than that, the
new incoming Shared requests exacerbate the starvation, so the more
non-adjacent groups of Shared lock requests there are in the queue, the
worse the starvation of the exclusive requestors becomes. We are
effectively randomly starving some shared locks as well as exclusive
locks in the current scheme, based upon the state of the lock when they
make their request. The situation is worst when the lock is heavily
contended and the workload has a 50/50 mix of shared/exclusive requests,
e.g. serializable transactions or transactions with lots of
subtransactions.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Matthew Wakeling

On Wed, 18 Mar 2009, Simon Riggs wrote:

On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote:

The problem with making all other locks welcome is that there is a
possibility of starvation. Imagine a case where there is a constant stream
of shared locks - the exclusive locks may never actually get hold of the
lock under the all other shared locks welcome strategy.


That's exactly what happens now.


So the question becomes whether such shared starvation of exclusive locks 
is an issue or not. I would imagine that the greater the number of CPUs 
and backend processes in the system, the more likely this is to become an 
issue.



Likewise with the reverse.


I think it depends upon how frequently requests arrive. Commits cause X
locks and we don't commit that often, so its very unlikely that we'd see
a constant stream of X locks and prevent shared lockers.


Well, on a very large system, and in the case where exclusive locks are 
actually exclusive (so, not ProcArrayList), then processing can only 
happen one at a time rather than in parallel, so that offsets the reduced 
frequency of requests compared to shared. Again, it'd only become an issue 
with very large numbers of CPUs and backends.


Interesting comments from the previous thread - thanks for that. If the 
goal is to reduce the waiting time for exclusive, then some fairness would 
seem to be useful.


The problem is that under the current system where shared locks join in on 
the fun, you are relying on there being a time when there are no shared 
locks at all in the queue in order for exclusive locks to ever get a 
chance.


Statistically, if such a situation is likely to occur frequently, then the 
average queue length of shared locks is small. If that is the case, then 
there is little benefit in letting them join in, because the parallelism 
gain is small. However, if the average queue length is large, and you are 
seeing a decent amount of parallelism gain by allowing them to join in, 
then it necessarily the case that times where there are no shared locks at 
all are few, and the exclusive locks are necessarily starved. The current 
implementation guarantees either one of these scenarios.


The advantage of queueing all shared requests while servicing all 
exclusive requests one by one is that a decent number of shared requests 
will be able to build up, allowing a good amount of parallelism to be 
released in the thundering herd when shared locks are favoured again. This 
method increases the parallelism as the number of parallel processes 
increases.


Matthew

--
Illiteracy - I don't know the meaning of the word!

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] parallelizing slow queries for multiple cores (PostgreSQL + Gearman)

2009-03-18 Thread henk de wit

Hi,
Has anyone done similar work in the light of upcoming many-core CPUs/systems? 
Any better results than 2x improvement?

Yes, in fact I've done a very similar thing on a quad CPU box a while back. In 
my case the table in question had about 26 million rows. I did nothing special 
to the table (no cluster, no partitioning, nothing, of course the table did had 
the appropriate indexes). Queries on this table are analytic/reporting kind of 
queries. Basically they are just aggregations over a large number of rows. E.g. 
the sum of column1 and the sum of column2 where time is some time and columnA 
has some value and columnB has some other value, that kind of thing. From 
analysis the queries appeared to be nearly 100% CPU bound.
In my (Java) application I divided a reporting query for say the last 60 days 
in 2 equal portions: day 1 to 30 and day 31 to 60 and assigned these to two 
worker threads. The results of these worker threads was merged using a simple 
resultset merge (the end result is simply the total of all rows returned by 
thread1 and thread2). The speed up I measured on the quad box was a near 
perfect factor 2.  I then divided the workload in 4 equal portions: day 1 to 
15, 16 to 30, 31 to 45 and 46 till 60. The speed up I measured was only a 
little less then a factor 4. I my situation too, the time I measured included 
dispatching the jobs to a thread pool and merging their results.
Of course, such a scheme can only be easily used when all workers return 
individual rows that are directly part of the end result. If some further 
calculation has to be done on those rows, which happens to be the same 
calculation that is also done in the query you are parallelizing, then in 
effect you are duplicating logic. If you do that a lot in your code you can 
easily create a maintenance nightmare. Also, you have to be aware that without 
additional measures, every worker lives in its own transaction. Depending on 
the nature of the data this could potentially result in inconsistent data being 
returned. In your case, on tables generated once per day this wouldn't be the 
case, but as a general technique you have to be aware of this.
Anyway, it's very clear that computers are moving to many-core architectures. 
Simple entry level servers already come these days with 8 cores. I've asked a 
couple of times on this list whether PG is going to support using multiple 
cores for a single query anytime soon, but this appears to be very unlikely. 
Until then it seems the only way to utilize multiple cores for a single query 
is doing it at the application level or by using something like pgpool-II.

_
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Jignesh K. Shah



On 03/18/09 08:06, Simon Riggs wrote:

On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote:
  

On Wed, 18 Mar 2009, Simon Riggs wrote:


I agree with that, apart from the granting no more bit.

The most useful behaviour is just to have two modes:
* exclusive-lock held - all other x locks welcome, s locks queue
* shared-lock held - all other s locks welcome, x locks queue
  
The problem with making all other locks welcome is that there is a 
possibility of starvation. Imagine a case where there is a constant stream 
of shared locks - the exclusive locks may never actually get hold of the 
lock under the all other shared locks welcome strategy. 



That's exactly what happens now. 

  

Likewise with the reverse.



I think it depends upon how frequently requests arrive. Commits cause X
locks and we don't commit that often, so its very unlikely that we'd see
a constant stream of X locks and prevent shared lockers.


Some comments from an earlier post on this topic (about 20 months ago):

Since shared locks are currently queued behind exclusive requests
when they cannot be immediately satisfied, it might be worth
reconsidering the way LWLockRelease works also. When we wake up the
queue we only wake the Shared requests that are adjacent to the head of
the queue. Instead we could wake *all* waiting Shared requestors.

e.g. with a lock queue like this:
(HEAD)  S-S-X-S-X-S-X-S
Currently we would wake the 1st and 2nd waiters only. 


If we were to wake the 3rd, 5th and 7th waiters also, then the queue
would reduce in length very quickly, if we assume generally uniform
service times. (If the head of the queue is X, then we wake only that
one process and I'm not proposing we change that). That would mean queue
jumping right? Well thats what already happens in other circumstances,
so there cannot be anything intrinsically wrong with allowing it, the
only question is: would it help? 

  


I thought about that.. Except without putting a restriction a huge queue 
will cause lot of time spent in manipulating the lock list every time. 
One more thing will be to maintain two list shared and exclusive and 
round robin through them for every time you access the list so 
manipulation is low.. But the best thing is to allow flexibility to 
change the algorithm since some workloads may work fine with one and 
others will NOT. The flexibility then allows to tinker for those already 
reaching the limits.


-Jignesh


We need not wake the whole queue, there may be some generally more
beneficial heuristic. The reason for considering this is not to speed up
Shared requests but to reduce the queue length and thus the waiting time
for the Xclusive requestors. Each time a Shared request is dequeued, we
effectively re-enable queue jumping, so a Shared request arriving during
that point will actually jump ahead of Shared requests that were unlucky
enough to arrive while an Exclusive lock was held. Worse than that, the
new incoming Shared requests exacerbate the starvation, so the more
non-adjacent groups of Shared lock requests there are in the queue, the
worse the starvation of the exclusive requestors becomes. We are
effectively randomly starving some shared locks as well as exclusive
locks in the current scheme, based upon the state of the lock when they
make their request. The situation is worst when the lock is heavily
contended and the workload has a 50/50 mix of shared/exclusive requests,
e.g. serializable transactions or transactions with lots of
subtransactions.

  


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Matthew Wakeling

On Wed, 18 Mar 2009, Jignesh K. Shah wrote:

I thought about that.. Except without putting a restriction a huge queue will 
cause lot of time spent in manipulating the lock
list every time. One more thing will be to maintain two list shared and 
exclusive and round robin through them for every time you
access the list so manipulation is low.. But the best thing is to allow 
flexibility to change the algorithm since some workloads
may work fine with one and others will NOT. The flexibility then allows to 
tinker for those already reaching the limits.


Yeah, having two separate queues is the obvious way of doing this. It 
would make most operations really trivial. Just wake everything in the 
shared queue at once, and you can throw it away wholesale and allocate a 
new queue. It avoids a whole lot of queue manipulation.


Matthew

--
Software suppliers are trying to make their software packages more
'user-friendly' Their best approach, so far, has been to take all
the old brochures, and stamp the words, 'user-friendly' on the cover.
-- Bill Gates

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Extremely slow intarray index creation and inserts.

2009-03-18 Thread Ron Mayer
Tom Lane wrote:
 Ron Mayer rm...@cheapcomplexdevices.com writes:
 vm=# create index gist7 on tmp_intarray_test using GIST (my_int_array 
 gist__int_ops);
 CREATE INDEX
 Time: 2069836.856 ms
 
 Is that expected, or does it sound like a bug to take over
 half an hour to index 7 rows of mostly 5 and 6-element
 integer arrays?
 
 I poked at this example with oprofile.  It's entirely CPU-bound AFAICT,

Oleg pointed out to me (off-list I now see) that it's not totally
unexpected behavior and I should have been using gist__intbig_ops,
since the big refers to the cardinality of the entire set (which
was large, in my case) and not the length of the arrays.

Oleg Bartunov wrote:
OB: it's not about short or long arrays, it's about small or big
OB: cardinality of the whole set (the number of unique elements)

I'm re-reading the docs and still wasn't obvious to me.   A
potential docs patch is attached below.

 and the CPU utilization is approximately
 
   55% g_int_compress
   35% memmove/memcpy (difficult to distinguish these)
1% pg_qsort
   1% anything else
 
 Probably need to look at reducing the number of calls to g_int_compress
 ... it must be getting called a whole lot more than once per new index
 entry, and I wonder why that should need to be.

Perhaps that's a separate issue, but we're working
fine with gist__intbig_ops for the time being.



Here's a proposed docs patch that makes this more obvious.

*** a/doc/src/sgml/intarray.sgml
--- b/doc/src/sgml/intarray.sgml
***
*** 239,245 
 literalgist__int_ops/ (used by default) is suitable for
 small and medium-size arrays, while
 literalgist__intbig_ops/ uses a larger signature and is more
!suitable for indexing large arrays.
/para

para
--- 239,247 
 literalgist__int_ops/ (used by default) is suitable for
 small and medium-size arrays, while
 literalgist__intbig_ops/ uses a larger signature and is more
!suitable for indexing high-cardinality data sets - where there
!are a large number of unique elements across all rows being
!indexed.
/para

para


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Scott Carey
On 3/12/09 6:29 PM, Robert Haas robertmh...@gmail.com wrote:

 Its worth ruling out given that even if the likelihood is small, the fix is
 easy.  However, I don¹t see the throughput drop from peak as more
 concurrency is added that is the hallmark of this problem  usually with a
 lot of context switching and a sudden increase in CPU use per transaction.
 
 The problem is that the proposed fix bears a strong resemblence to
 attempting to improve your gas mileage by removing a few non-critical
 parts from your card, like, say, the bumpers, muffler, turn signals,
 windshield wipers, and emergency brake.
 

The fix I was referring to as easy was using a connection pooler -- as a
reply to the previous post. Even if its a low likelihood that the connection
pooler fixes this case, its worth looking at.

 
 While it's true that the car
 might be drivable in that condition (as long as nothing unexpected
 happens), you're going to have a hard time convincing the manufacturer
 to offer that as an options package.
 

The original poster's request is for a config parameter, for experimentation
and testing by the brave. My own request was for that version of the lock to
prevent possible starvation but improve performance by unlocking all shared
at once, then doing all exclusives one at a time next, etc.

 
 I think that changing the locking behavior is attacking the problem at
 the wrong level anyway.  If someone want to look at optimizing
 PostgreSQL for very large numbers of concurrent connections without a
 connection pooler... at least IMO, it would be more worthwhile to
 study WHY there's so much locking contention, and, on a lock by lock
 basis, what can be done about it without harming performance under
 more normal loads?  The fact that there IS locking contention is sorta
 interesting, but it would be a lot more interesting to know why.
 
 ...Robert
 

I alluded to the three main ways of dealing with lock contention elsewhere.
Avoiding locks, making finer grained locks, and making locks faster.
All are worthy.  Some are harder to do than others.  Some have been heavily
tuned already.  Its a case by case basis.  And regardless, the unfair lock
is a good test tool.


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Extremely slow intarray index creation and inserts.

2009-03-18 Thread Tom Lane
Ron Mayer rm...@cheapcomplexdevices.com writes:
 Oleg Bartunov wrote:
 OB: it's not about short or long arrays, it's about small or big
 OB: cardinality of the whole set (the number of unique elements)

 I'm re-reading the docs and still wasn't obvious to me.   A
 potential docs patch is attached below.

Done, though not in exactly those words.  I wonder though if we can
be less vague about it --- can we suggest a typical cutover point?
Like use gist__intbig_ops if there are more than about 10,000 distinct
array values?  Even a rough order of magnitude for where to worry
about this would save a lot of people time.

regards, tom lane

Index: intarray.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/intarray.sgml,v
retrieving revision 1.5
retrieving revision 1.6
diff -c -r1.5 -r1.6
*** intarray.sgml   10 Dec 2007 05:32:51 -  1.5
--- intarray.sgml   18 Mar 2009 20:18:18 -  1.6
***
*** 237,245 
para
 Two GiST index operator classes are provided:
 literalgist__int_ops/ (used by default) is suitable for
!small and medium-size arrays, while
 literalgist__intbig_ops/ uses a larger signature and is more
!suitable for indexing large arrays.
/para
  
para
--- 237,246 
para
 Two GiST index operator classes are provided:
 literalgist__int_ops/ (used by default) is suitable for
!small- to medium-size data sets, while
 literalgist__intbig_ops/ uses a larger signature and is more
!suitable for indexing large data sets (i.e., columns containing
!a large number of distinct array values).
/para
  
para

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote:
 One possibility would be for the locks to alternate between exclusive
 and 
 shared - that is:
 
 1. Take a snapshot of all shared waits, and grant them all -
 thundering
 herd style.
 2. Wait until ALL of them have finished, granting no more.
 3. Take a snapshot of all exclusive waits, and grant them all, one by
 one.
 4. Wait until all of them have been finished, granting no more.
 5. Back to (1)

 I agree with that, apart from the granting no more bit.

 Currently we queue up exclusive locks, but there is no need to since for
 ProcArrayLock commits are all changing different data.

 The most useful behaviour is just to have two modes:
 * exclusive-lock held - all other x locks welcome, s locks queue
 * shared-lock held - all other s locks welcome, x locks queue

My goodness, it seems people have forgotten about the lightweight
part of the LWLock design.

regards, tom lane

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Scott Carey

On 3/18/09 4:36 AM, Gregory Stark st...@enterprisedb.com wrote:

 
 
 Jignesh K. Shah j.k.s...@sun.com writes:
 
 In next couple of weeks I plan to test the patch on a different x64 based
 system to do a sanity testing on lower number of cores and also try out other
 workloads ...
 
 I'm actually more interested in the large number of cores but fewer processes
 and lower max_connections. If you set max_connections to 64 and eliminate the
 wait time you should, in theory, be able to get 100% cpu usage. It would be
 very interesting to track down the contention which is preventing that.

My previous calculation in this thread showed that even at 0 wait time, the
client seems to introduce ~3ms wait time overhead on average.  So it takes
close to 128 threads in each test to stop the linear scaling since the
average processing time seems to be about ~3ms.
Either that, or the tests actually are running on a system capable of 128
threads.

 
 --
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
   Ask me about EnterpriseDB's PostGIS support!
 
 -
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Robert Haas
On Wed, Mar 18, 2009 at 1:43 PM, Scott Carey sc...@richrelevance.com wrote:
 Its worth ruling out given that even if the likelihood is small, the fix is
 easy.  However, I don¹t see the throughput drop from peak as more
 concurrency is added that is the hallmark of this problem  usually with a
 lot of context switching and a sudden increase in CPU use per transaction.

 The problem is that the proposed fix bears a strong resemblence to
 attempting to improve your gas mileage by removing a few non-critical
 parts from your card, like, say, the bumpers, muffler, turn signals,
 windshield wipers, and emergency brake.

 The fix I was referring to as easy was using a connection pooler -- as a
 reply to the previous post. Even if its a low likelihood that the connection
 pooler fixes this case, its worth looking at.

Oh, OK.  There seem to be some smart people saying that's a pretty
high-likelihood fix.  I thought you were talking about the proposed
locking change.

 While it's true that the car
 might be drivable in that condition (as long as nothing unexpected
 happens), you're going to have a hard time convincing the manufacturer
 to offer that as an options package.

 The original poster's request is for a config parameter, for experimentation
 and testing by the brave. My own request was for that version of the lock to
 prevent possible starvation but improve performance by unlocking all shared
 at once, then doing all exclusives one at a time next, etc.

That doesn't prevent starvation in general, although it will for some workloads.

Anyway, it seems rather pointless to add a config parameter that isn't
at all safe, and adds overhead to a critical part of the system for
people who don't use it.  After all, if you find that it helps, what
are you going to do?  Turn it on in production?  I just don't see how
this is any good other than as a thought-experiment.

At any rate, as I understand it, even after Jignesh eliminated the
waits, he wasn't able to push his CPU utilization above 48%.  Surely
something's not right there.  And he also said that when he added a
knob to control the behavior, he got a performance improvement even
when the knob was set to 0, which corresponds to the behavior we have
already anyway.  So I'm very skeptical that there's something wrong
with either the system or the test.  Until that's understood and
fixed, I don't think that looking at the numbers is worth much.

 I alluded to the three main ways of dealing with lock contention elsewhere.
 Avoiding locks, making finer grained locks, and making locks faster.
 All are worthy.  Some are harder to do than others.  Some have been heavily
 tuned already.  Its a case by case basis.  And regardless, the unfair lock
 is a good test tool.

In view of the caveats above, I'll give that a firm maybe.

...Robert

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Jignesh K. Shah



On 03/18/09 17:16, Scott Carey wrote:

On 3/18/09 4:36 AM, Gregory Stark st...@enterprisedb.com wrote:

  

Jignesh K. Shah j.k.s...@sun.com writes:



In next couple of weeks I plan to test the patch on a different x64 based
system to do a sanity testing on lower number of cores and also try out other
workloads ...
  

I'm actually more interested in the large number of cores but fewer processes
and lower max_connections. If you set max_connections to 64 and eliminate the
wait time you should, in theory, be able to get 100% cpu usage. It would be
very interesting to track down the contention which is preventing that.



My previous calculation in this thread showed that even at 0 wait time, the
client seems to introduce ~3ms wait time overhead on average.  So it takes
close to 128 threads in each test to stop the linear scaling since the
average processing time seems to be about ~3ms.
Either that, or the tests actually are running on a system capable of 128
threads.

  


Nope 64 threads for sure .. Verified it number of times ..

-Jignesh


--
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance





-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
  


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Jignesh K. Shah



On 03/18/09 17:25, Robert Haas wrote:

On Wed, Mar 18, 2009 at 1:43 PM, Scott Carey sc...@richrelevance.com wrote:
  

Its worth ruling out given that even if the likelihood is small, the fix is
easy.  However, I don¹t see the throughput drop from peak as more
concurrency is added that is the hallmark of this problem  usually with a
lot of context switching and a sudden increase in CPU use per transaction.


The problem is that the proposed fix bears a strong resemblence to
attempting to improve your gas mileage by removing a few non-critical
parts from your card, like, say, the bumpers, muffler, turn signals,
windshield wipers, and emergency brake.
  

The fix I was referring to as easy was using a connection pooler -- as a
reply to the previous post. Even if its a low likelihood that the connection
pooler fixes this case, its worth looking at.



Oh, OK.  There seem to be some smart people saying that's a pretty
high-likelihood fix.  I thought you were talking about the proposed
locking change.

  

While it's true that the car
might be drivable in that condition (as long as nothing unexpected
happens), you're going to have a hard time convincing the manufacturer
to offer that as an options package.
  

The original poster's request is for a config parameter, for experimentation
and testing by the brave. My own request was for that version of the lock to
prevent possible starvation but improve performance by unlocking all shared
at once, then doing all exclusives one at a time next, etc.



That doesn't prevent starvation in general, although it will for some workloads.

Anyway, it seems rather pointless to add a config parameter that isn't
at all safe, and adds overhead to a critical part of the system for
people who don't use it.  After all, if you find that it helps, what
are you going to do?  Turn it on in production?  I just don't see how
this is any good other than as a thought-experiment.
  


Actually the patch I submitted shows no overhead from what I have seen 
and I think it is useful depending on workloads where it can be turned 
on  even on production.

At any rate, as I understand it, even after Jignesh eliminated the
waits, he wasn't able to push his CPU utilization above 48%.  Surely
something's not right there.  And he also said that when he added a
knob to control the behavior, he got a performance improvement even
when the knob was set to 0, which corresponds to the behavior we have
already anyway.  So I'm very skeptical that there's something wrong
with either the system or the test.  Until that's understood and
fixed, I don't think that looking at the numbers is worth much.

  


I dont think anything is majorly wrong in my system.. Sometimes it is 
PostgreSQL locks in play and sometimes it can be OS/system related locks 
in play (network, IO, file system, etc).  Right now in my patch after I 
fix waiting procarray  problem other PostgreSQL locks comes into play: 
CLogControlLock, WALInsertLock , etc.  Right now out of the box we have 
no means of tweaking something in production if you do land in that 
problem. With the patch there is means of doing knob control to tweak 
the bottlenecks of Locks for the main workload for which it is put in 
production.


I still haven't seen any downsides with the patch yet other than 
highlighting other bottlenecks in the system. (For example I haven't 
seen a run where the tpm on my workload decreases as you increase the 
number) What I am suggesting is run the patch and see if you find a 
workload where you see a downside in performance and the lock statistics 
output to see if it is pushing the bottleneck elsewhere more likely 
WALInsertLock or CLogControlBlock. If yes then this patch gives you the 
right tweaking opportunity to reduce stress on ProcArrayLock for a 
workload while still not seriously stressing WALInsertLock or 
CLogControlBlock.


Right now.. the standard answer applies.. nope you are running the wrong 
workload for PostgreSQL, use a connection pooler or your own application 
logic. Or maybe.. you have too many users for PostgreSQL use some 
proprietary database.


-Jignesh





I alluded to the three main ways of dealing with lock contention elsewhere.
Avoiding locks, making finer grained locks, and making locks faster.
All are worthy.  Some are harder to do than others.  Some have been heavily
tuned already.  Its a case by case basis.  And regardless, the unfair lock
is a good test tool.



In view of the caveats above, I'll give that a firm maybe.

...Robert
  


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Wed, 2009-03-18 at 16:26 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote:
  One possibility would be for the locks to alternate between exclusive
  and 
  shared - that is:
  
  1. Take a snapshot of all shared waits, and grant them all -
  thundering
  herd style.
  2. Wait until ALL of them have finished, granting no more.
  3. Take a snapshot of all exclusive waits, and grant them all, one by
  one.
  4. Wait until all of them have been finished, granting no more.
  5. Back to (1)
 
  I agree with that, apart from the granting no more bit.
 
  Currently we queue up exclusive locks, but there is no need to since for
  ProcArrayLock commits are all changing different data.
 
  The most useful behaviour is just to have two modes:
  * exclusive-lock held - all other x locks welcome, s locks queue
  * shared-lock held - all other s locks welcome, x locks queue
 
 My goodness, it seems people have forgotten about the lightweight
 part of the LWLock design.

Lightweight is only useful if it fits purpose. If the LWlock design
doesn't fit all cases, especially with critical lock types, then we can
have special cases. We have both spinlocks and LWlocks, plus we split
hash tables into multiple lock partitions. If we have 3 types of
lightweight locking, why not consider having 4?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Wed, 2009-03-18 at 13:49 +, Matthew Wakeling wrote:
 On Wed, 18 Mar 2009, Jignesh K. Shah wrote:
  I thought about that.. Except without putting a restriction a huge queue 
  will cause lot of time spent in manipulating the lock
  list every time. One more thing will be to maintain two list shared and 
  exclusive and round robin through them for every time you
  access the list so manipulation is low.. But the best thing is to allow 
  flexibility to change the algorithm since some workloads
  may work fine with one and others will NOT. The flexibility then allows to 
  tinker for those already reaching the limits.
 
 Yeah, having two separate queues is the obvious way of doing this. It 
 would make most operations really trivial. Just wake everything in the 
 shared queue at once, and you can throw it away wholesale and allocate a 
 new queue. It avoids a whole lot of queue manipulation.

Yes, that sounds good.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance