Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Sat, 2009-03-14 at 12:09 -0400, Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: WALInsertLock is also quite high on Jignesh's list. That I've seen become the bottleneck on other tests too. Yeah, that's been seen to be an issue before. I had the germ of an idea about how to fix that: ... with no lock, determine size of WAL record ... obtain WALInsertLock identify WAL start address of my record, advance insert pointer past record end *release* WALInsertLock without lock, copy record into the space just reserved The idea here is to allow parallelization of the copying of data into the buffers. The hold time on WALInsertLock would be very short. Maybe it could even become a spinlock, though I'm not sure, because the advance insert pointer bit is more complicated than it looks (you have to allow for the extra overhead when crossing a WAL page boundary). Now the fly in the ointment is that there would need to be some way to ensure that we didn't write data out to disk until it was valid; in particular how do we implement a request to flush WAL up to a particular LSN value, when maybe some of the records before that haven't been fully transferred into the buffers yet? The best idea I've thought of so far is shared/exclusive locks on the individual WAL buffer pages, with the rather unusual behavior that writers of the page would take shared lock and only the reader (he who has to dump to disk) would take exclusive lock. But maybe there's a better way. Currently I don't believe that dumping a WAL buffer (WALWriteLock) blocks insertion of new WAL data, and it would be nice to preserve that property. Yeh, that's just what we'd discussed previously: http://markmail.org/message/gectqy3yzvjs2hru#query:Reworking%20WAL% 20locking+page:1+mid:gectqy3yzvjs2hru+state:results Are you thinking of doing this for 8.4? :-) -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote: One possibility would be for the locks to alternate between exclusive and shared - that is: 1. Take a snapshot of all shared waits, and grant them all - thundering herd style. 2. Wait until ALL of them have finished, granting no more. 3. Take a snapshot of all exclusive waits, and grant them all, one by one. 4. Wait until all of them have been finished, granting no more. 5. Back to (1) I agree with that, apart from the granting no more bit. Currently we queue up exclusive locks, but there is no need to since for ProcArrayLock commits are all changing different data. The most useful behaviour is just to have two modes: * exclusive-lock held - all other x locks welcome, s locks queue * shared-lock held - all other s locks welcome, x locks queue This *only* works for ProcArrayLock. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance of archive logging in a PITR restore
Joshua D. Drake wrote: On Mon, 2009-03-16 at 12:11 -0400, Mark Steben wrote: The issue is that during a restore on a remote site, (Postgres 8.2.5) 8.2.5 is quite old. You should upgrade to the latest 8.2.X release. archived logs are taking an average of 35 – 40 seconds apiece to restore. Archive logs are restored in a serialized manner so they will be slower to restore in general. Yeah, if you have several concurrent processes on the primary doing I/O and generating log, at restore the I/O will be serialized. Version 8.3 is significantly better with this (as long as you don't disable full_page_writes). In earlier versions, each page referenced in the WAL was read from the filesystem, only to be replaced with the full page image. In 8.3, we skip the read and just write over the page image. Depending on your application, that can make a very dramatic difference to restore time. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Jignesh K. Shah j.k.s...@sun.com writes: In next couple of weeks I plan to test the patch on a different x64 based system to do a sanity testing on lower number of cores and also try out other workloads ... I'm actually more interested in the large number of cores but fewer processes and lower max_connections. If you set max_connections to 64 and eliminate the wait time you should, in theory, be able to get 100% cpu usage. It would be very interesting to track down the contention which is preventing that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Wed, 18 Mar 2009, Simon Riggs wrote: I agree with that, apart from the granting no more bit. The most useful behaviour is just to have two modes: * exclusive-lock held - all other x locks welcome, s locks queue * shared-lock held - all other s locks welcome, x locks queue The problem with making all other locks welcome is that there is a possibility of starvation. Imagine a case where there is a constant stream of shared locks - the exclusive locks may never actually get hold of the lock under the all other shared locks welcome strategy. Likewise with the reverse. Taking a snapshot and queueing all newer locks forces fairness in the locking strategy, and avoids one of the sides getting starved. Matthew -- I've run DOOM more in the last few days than I have the last few months. I just love debugging ;-) -- Linus Torvalds - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Matthew Wakeling wrote: On Sat, 14 Mar 2009, Heikki Linnakangas wrote: It's going require some hard thinking to bust that bottleneck. I've sometimes thought about maintaining a pre-calculated array of in-progress XIDs in shared memory. GetSnapshotData would simply memcpy() that to private memory, instead of collecting the xids from ProcArray. Shifting the contention from reading that data to altering it. But that would probably be quite a lot fewer times, so it would be a benefit. It's true that it would shift work from reading (GetSnapshotData) to modifying (xact end) the ProcArray. Which could actually be much worse: when modifying, you hold an ExclusiveLock, but readers only hold a SharedLock. I don't think it's that bad in reality since at transaction end you would only need to remove your own xid from an array. That should be very fast, especially if you know exactly where in the array your own xid is. On Sat, 14 Mar 2009, Tom Lane wrote: Now the fly in the ointment is that there would need to be some way to ensure that we didn't write data out to disk until it was valid; in particular how do we implement a request to flush WAL up to a particular LSN value, when maybe some of the records before that haven't been fully transferred into the buffers yet? The best idea I've thought of so far is shared/exclusive locks on the individual WAL buffer pages, with the rather unusual behavior that writers of the page would take shared lock and only the reader (he who has to dump to disk) would take exclusive lock. But maybe there's a better way. Currently I don't believe that dumping a WAL buffer (WALWriteLock) blocks insertion of new WAL data, and it would be nice to preserve that property. The writers would need to take a shared lock on the page before releasing the lock that marshals access to the how long is the log data. Other than that, your idea would work. An alternative would be to maintain a concurrent linked list of WAL writes in progress. An entry would be added to the tail every time a new writer is generated, marking the end of the log. When a writer finishes, it can remove the entry from the list very cheaply and with very little contention. The reader (who dumps the WAL to disc) need only look at the head of the list to find out how far the log is completed, because the list is guaranteed to be in order of position in the log. A linked list or an array of in-progress writes was my first thought as well. But the real problem is: how does the reader wait until all WAL up to X have been written? It could poll, but that's inefficient. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Wed, 18 Mar 2009, Heikki Linnakangas wrote: A linked list or an array of in-progress writes was my first thought as well. But the real problem is: how does the reader wait until all WAL up to X have been written? It could poll, but that's inefficient. Good point - waiting for an exclusive lock on a page is a pretty easy way to wake up at the right time. However, is there not some way to wait for a notify? I'm no C expert, but in Java that's one of the most fundamental features of a lock. Matthew -- A bus station is where buses stop. A train station is where trains stop. On my desk, I have a workstation. - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote: On Wed, 18 Mar 2009, Simon Riggs wrote: I agree with that, apart from the granting no more bit. The most useful behaviour is just to have two modes: * exclusive-lock held - all other x locks welcome, s locks queue * shared-lock held - all other s locks welcome, x locks queue The problem with making all other locks welcome is that there is a possibility of starvation. Imagine a case where there is a constant stream of shared locks - the exclusive locks may never actually get hold of the lock under the all other shared locks welcome strategy. That's exactly what happens now. Likewise with the reverse. I think it depends upon how frequently requests arrive. Commits cause X locks and we don't commit that often, so its very unlikely that we'd see a constant stream of X locks and prevent shared lockers. Some comments from an earlier post on this topic (about 20 months ago): Since shared locks are currently queued behind exclusive requests when they cannot be immediately satisfied, it might be worth reconsidering the way LWLockRelease works also. When we wake up the queue we only wake the Shared requests that are adjacent to the head of the queue. Instead we could wake *all* waiting Shared requestors. e.g. with a lock queue like this: (HEAD) S-S-X-S-X-S-X-S Currently we would wake the 1st and 2nd waiters only. If we were to wake the 3rd, 5th and 7th waiters also, then the queue would reduce in length very quickly, if we assume generally uniform service times. (If the head of the queue is X, then we wake only that one process and I'm not proposing we change that). That would mean queue jumping right? Well thats what already happens in other circumstances, so there cannot be anything intrinsically wrong with allowing it, the only question is: would it help? We need not wake the whole queue, there may be some generally more beneficial heuristic. The reason for considering this is not to speed up Shared requests but to reduce the queue length and thus the waiting time for the Xclusive requestors. Each time a Shared request is dequeued, we effectively re-enable queue jumping, so a Shared request arriving during that point will actually jump ahead of Shared requests that were unlucky enough to arrive while an Exclusive lock was held. Worse than that, the new incoming Shared requests exacerbate the starvation, so the more non-adjacent groups of Shared lock requests there are in the queue, the worse the starvation of the exclusive requestors becomes. We are effectively randomly starving some shared locks as well as exclusive locks in the current scheme, based upon the state of the lock when they make their request. The situation is worst when the lock is heavily contended and the workload has a 50/50 mix of shared/exclusive requests, e.g. serializable transactions or transactions with lots of subtransactions. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Wed, 18 Mar 2009, Simon Riggs wrote: On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote: The problem with making all other locks welcome is that there is a possibility of starvation. Imagine a case where there is a constant stream of shared locks - the exclusive locks may never actually get hold of the lock under the all other shared locks welcome strategy. That's exactly what happens now. So the question becomes whether such shared starvation of exclusive locks is an issue or not. I would imagine that the greater the number of CPUs and backend processes in the system, the more likely this is to become an issue. Likewise with the reverse. I think it depends upon how frequently requests arrive. Commits cause X locks and we don't commit that often, so its very unlikely that we'd see a constant stream of X locks and prevent shared lockers. Well, on a very large system, and in the case where exclusive locks are actually exclusive (so, not ProcArrayList), then processing can only happen one at a time rather than in parallel, so that offsets the reduced frequency of requests compared to shared. Again, it'd only become an issue with very large numbers of CPUs and backends. Interesting comments from the previous thread - thanks for that. If the goal is to reduce the waiting time for exclusive, then some fairness would seem to be useful. The problem is that under the current system where shared locks join in on the fun, you are relying on there being a time when there are no shared locks at all in the queue in order for exclusive locks to ever get a chance. Statistically, if such a situation is likely to occur frequently, then the average queue length of shared locks is small. If that is the case, then there is little benefit in letting them join in, because the parallelism gain is small. However, if the average queue length is large, and you are seeing a decent amount of parallelism gain by allowing them to join in, then it necessarily the case that times where there are no shared locks at all are few, and the exclusive locks are necessarily starved. The current implementation guarantees either one of these scenarios. The advantage of queueing all shared requests while servicing all exclusive requests one by one is that a decent number of shared requests will be able to build up, allowing a good amount of parallelism to be released in the thundering herd when shared locks are favoured again. This method increases the parallelism as the number of parallel processes increases. Matthew -- Illiteracy - I don't know the meaning of the word! - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] parallelizing slow queries for multiple cores (PostgreSQL + Gearman)
Hi, Has anyone done similar work in the light of upcoming many-core CPUs/systems? Any better results than 2x improvement? Yes, in fact I've done a very similar thing on a quad CPU box a while back. In my case the table in question had about 26 million rows. I did nothing special to the table (no cluster, no partitioning, nothing, of course the table did had the appropriate indexes). Queries on this table are analytic/reporting kind of queries. Basically they are just aggregations over a large number of rows. E.g. the sum of column1 and the sum of column2 where time is some time and columnA has some value and columnB has some other value, that kind of thing. From analysis the queries appeared to be nearly 100% CPU bound. In my (Java) application I divided a reporting query for say the last 60 days in 2 equal portions: day 1 to 30 and day 31 to 60 and assigned these to two worker threads. The results of these worker threads was merged using a simple resultset merge (the end result is simply the total of all rows returned by thread1 and thread2). The speed up I measured on the quad box was a near perfect factor 2. I then divided the workload in 4 equal portions: day 1 to 15, 16 to 30, 31 to 45 and 46 till 60. The speed up I measured was only a little less then a factor 4. I my situation too, the time I measured included dispatching the jobs to a thread pool and merging their results. Of course, such a scheme can only be easily used when all workers return individual rows that are directly part of the end result. If some further calculation has to be done on those rows, which happens to be the same calculation that is also done in the query you are parallelizing, then in effect you are duplicating logic. If you do that a lot in your code you can easily create a maintenance nightmare. Also, you have to be aware that without additional measures, every worker lives in its own transaction. Depending on the nature of the data this could potentially result in inconsistent data being returned. In your case, on tables generated once per day this wouldn't be the case, but as a general technique you have to be aware of this. Anyway, it's very clear that computers are moving to many-core architectures. Simple entry level servers already come these days with 8 cores. I've asked a couple of times on this list whether PG is going to support using multiple cores for a single query anytime soon, but this appears to be very unlikely. Until then it seems the only way to utilize multiple cores for a single query is doing it at the application level or by using something like pgpool-II. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/18/09 08:06, Simon Riggs wrote: On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote: On Wed, 18 Mar 2009, Simon Riggs wrote: I agree with that, apart from the granting no more bit. The most useful behaviour is just to have two modes: * exclusive-lock held - all other x locks welcome, s locks queue * shared-lock held - all other s locks welcome, x locks queue The problem with making all other locks welcome is that there is a possibility of starvation. Imagine a case where there is a constant stream of shared locks - the exclusive locks may never actually get hold of the lock under the all other shared locks welcome strategy. That's exactly what happens now. Likewise with the reverse. I think it depends upon how frequently requests arrive. Commits cause X locks and we don't commit that often, so its very unlikely that we'd see a constant stream of X locks and prevent shared lockers. Some comments from an earlier post on this topic (about 20 months ago): Since shared locks are currently queued behind exclusive requests when they cannot be immediately satisfied, it might be worth reconsidering the way LWLockRelease works also. When we wake up the queue we only wake the Shared requests that are adjacent to the head of the queue. Instead we could wake *all* waiting Shared requestors. e.g. with a lock queue like this: (HEAD) S-S-X-S-X-S-X-S Currently we would wake the 1st and 2nd waiters only. If we were to wake the 3rd, 5th and 7th waiters also, then the queue would reduce in length very quickly, if we assume generally uniform service times. (If the head of the queue is X, then we wake only that one process and I'm not proposing we change that). That would mean queue jumping right? Well thats what already happens in other circumstances, so there cannot be anything intrinsically wrong with allowing it, the only question is: would it help? I thought about that.. Except without putting a restriction a huge queue will cause lot of time spent in manipulating the lock list every time. One more thing will be to maintain two list shared and exclusive and round robin through them for every time you access the list so manipulation is low.. But the best thing is to allow flexibility to change the algorithm since some workloads may work fine with one and others will NOT. The flexibility then allows to tinker for those already reaching the limits. -Jignesh We need not wake the whole queue, there may be some generally more beneficial heuristic. The reason for considering this is not to speed up Shared requests but to reduce the queue length and thus the waiting time for the Xclusive requestors. Each time a Shared request is dequeued, we effectively re-enable queue jumping, so a Shared request arriving during that point will actually jump ahead of Shared requests that were unlucky enough to arrive while an Exclusive lock was held. Worse than that, the new incoming Shared requests exacerbate the starvation, so the more non-adjacent groups of Shared lock requests there are in the queue, the worse the starvation of the exclusive requestors becomes. We are effectively randomly starving some shared locks as well as exclusive locks in the current scheme, based upon the state of the lock when they make their request. The situation is worst when the lock is heavily contended and the workload has a 50/50 mix of shared/exclusive requests, e.g. serializable transactions or transactions with lots of subtransactions.
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Wed, 18 Mar 2009, Jignesh K. Shah wrote: I thought about that.. Except without putting a restriction a huge queue will cause lot of time spent in manipulating the lock list every time. One more thing will be to maintain two list shared and exclusive and round robin through them for every time you access the list so manipulation is low.. But the best thing is to allow flexibility to change the algorithm since some workloads may work fine with one and others will NOT. The flexibility then allows to tinker for those already reaching the limits. Yeah, having two separate queues is the obvious way of doing this. It would make most operations really trivial. Just wake everything in the shared queue at once, and you can throw it away wholesale and allocate a new queue. It avoids a whole lot of queue manipulation. Matthew -- Software suppliers are trying to make their software packages more 'user-friendly' Their best approach, so far, has been to take all the old brochures, and stamp the words, 'user-friendly' on the cover. -- Bill Gates - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Extremely slow intarray index creation and inserts.
Tom Lane wrote: Ron Mayer rm...@cheapcomplexdevices.com writes: vm=# create index gist7 on tmp_intarray_test using GIST (my_int_array gist__int_ops); CREATE INDEX Time: 2069836.856 ms Is that expected, or does it sound like a bug to take over half an hour to index 7 rows of mostly 5 and 6-element integer arrays? I poked at this example with oprofile. It's entirely CPU-bound AFAICT, Oleg pointed out to me (off-list I now see) that it's not totally unexpected behavior and I should have been using gist__intbig_ops, since the big refers to the cardinality of the entire set (which was large, in my case) and not the length of the arrays. Oleg Bartunov wrote: OB: it's not about short or long arrays, it's about small or big OB: cardinality of the whole set (the number of unique elements) I'm re-reading the docs and still wasn't obvious to me. A potential docs patch is attached below. and the CPU utilization is approximately 55% g_int_compress 35% memmove/memcpy (difficult to distinguish these) 1% pg_qsort 1% anything else Probably need to look at reducing the number of calls to g_int_compress ... it must be getting called a whole lot more than once per new index entry, and I wonder why that should need to be. Perhaps that's a separate issue, but we're working fine with gist__intbig_ops for the time being. Here's a proposed docs patch that makes this more obvious. *** a/doc/src/sgml/intarray.sgml --- b/doc/src/sgml/intarray.sgml *** *** 239,245 literalgist__int_ops/ (used by default) is suitable for small and medium-size arrays, while literalgist__intbig_ops/ uses a larger signature and is more !suitable for indexing large arrays. /para para --- 239,247 literalgist__int_ops/ (used by default) is suitable for small and medium-size arrays, while literalgist__intbig_ops/ uses a larger signature and is more !suitable for indexing high-cardinality data sets - where there !are a large number of unique elements across all rows being !indexed. /para para - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 3/12/09 6:29 PM, Robert Haas robertmh...@gmail.com wrote: Its worth ruling out given that even if the likelihood is small, the fix is easy. However, I don¹t see the throughput drop from peak as more concurrency is added that is the hallmark of this problem usually with a lot of context switching and a sudden increase in CPU use per transaction. The problem is that the proposed fix bears a strong resemblence to attempting to improve your gas mileage by removing a few non-critical parts from your card, like, say, the bumpers, muffler, turn signals, windshield wipers, and emergency brake. The fix I was referring to as easy was using a connection pooler -- as a reply to the previous post. Even if its a low likelihood that the connection pooler fixes this case, its worth looking at. While it's true that the car might be drivable in that condition (as long as nothing unexpected happens), you're going to have a hard time convincing the manufacturer to offer that as an options package. The original poster's request is for a config parameter, for experimentation and testing by the brave. My own request was for that version of the lock to prevent possible starvation but improve performance by unlocking all shared at once, then doing all exclusives one at a time next, etc. I think that changing the locking behavior is attacking the problem at the wrong level anyway. If someone want to look at optimizing PostgreSQL for very large numbers of concurrent connections without a connection pooler... at least IMO, it would be more worthwhile to study WHY there's so much locking contention, and, on a lock by lock basis, what can be done about it without harming performance under more normal loads? The fact that there IS locking contention is sorta interesting, but it would be a lot more interesting to know why. ...Robert I alluded to the three main ways of dealing with lock contention elsewhere. Avoiding locks, making finer grained locks, and making locks faster. All are worthy. Some are harder to do than others. Some have been heavily tuned already. Its a case by case basis. And regardless, the unfair lock is a good test tool. - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Extremely slow intarray index creation and inserts.
Ron Mayer rm...@cheapcomplexdevices.com writes: Oleg Bartunov wrote: OB: it's not about short or long arrays, it's about small or big OB: cardinality of the whole set (the number of unique elements) I'm re-reading the docs and still wasn't obvious to me. A potential docs patch is attached below. Done, though not in exactly those words. I wonder though if we can be less vague about it --- can we suggest a typical cutover point? Like use gist__intbig_ops if there are more than about 10,000 distinct array values? Even a rough order of magnitude for where to worry about this would save a lot of people time. regards, tom lane Index: intarray.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/intarray.sgml,v retrieving revision 1.5 retrieving revision 1.6 diff -c -r1.5 -r1.6 *** intarray.sgml 10 Dec 2007 05:32:51 - 1.5 --- intarray.sgml 18 Mar 2009 20:18:18 - 1.6 *** *** 237,245 para Two GiST index operator classes are provided: literalgist__int_ops/ (used by default) is suitable for !small and medium-size arrays, while literalgist__intbig_ops/ uses a larger signature and is more !suitable for indexing large arrays. /para para --- 237,246 para Two GiST index operator classes are provided: literalgist__int_ops/ (used by default) is suitable for !small- to medium-size data sets, while literalgist__intbig_ops/ uses a larger signature and is more !suitable for indexing large data sets (i.e., columns containing !a large number of distinct array values). /para para - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Simon Riggs si...@2ndquadrant.com writes: On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote: One possibility would be for the locks to alternate between exclusive and shared - that is: 1. Take a snapshot of all shared waits, and grant them all - thundering herd style. 2. Wait until ALL of them have finished, granting no more. 3. Take a snapshot of all exclusive waits, and grant them all, one by one. 4. Wait until all of them have been finished, granting no more. 5. Back to (1) I agree with that, apart from the granting no more bit. Currently we queue up exclusive locks, but there is no need to since for ProcArrayLock commits are all changing different data. The most useful behaviour is just to have two modes: * exclusive-lock held - all other x locks welcome, s locks queue * shared-lock held - all other s locks welcome, x locks queue My goodness, it seems people have forgotten about the lightweight part of the LWLock design. regards, tom lane - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 3/18/09 4:36 AM, Gregory Stark st...@enterprisedb.com wrote: Jignesh K. Shah j.k.s...@sun.com writes: In next couple of weeks I plan to test the patch on a different x64 based system to do a sanity testing on lower number of cores and also try out other workloads ... I'm actually more interested in the large number of cores but fewer processes and lower max_connections. If you set max_connections to 64 and eliminate the wait time you should, in theory, be able to get 100% cpu usage. It would be very interesting to track down the contention which is preventing that. My previous calculation in this thread showed that even at 0 wait time, the client seems to introduce ~3ms wait time overhead on average. So it takes close to 128 threads in each test to stop the linear scaling since the average processing time seems to be about ~3ms. Either that, or the tests actually are running on a system capable of 128 threads. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Wed, Mar 18, 2009 at 1:43 PM, Scott Carey sc...@richrelevance.com wrote: Its worth ruling out given that even if the likelihood is small, the fix is easy. However, I don¹t see the throughput drop from peak as more concurrency is added that is the hallmark of this problem usually with a lot of context switching and a sudden increase in CPU use per transaction. The problem is that the proposed fix bears a strong resemblence to attempting to improve your gas mileage by removing a few non-critical parts from your card, like, say, the bumpers, muffler, turn signals, windshield wipers, and emergency brake. The fix I was referring to as easy was using a connection pooler -- as a reply to the previous post. Even if its a low likelihood that the connection pooler fixes this case, its worth looking at. Oh, OK. There seem to be some smart people saying that's a pretty high-likelihood fix. I thought you were talking about the proposed locking change. While it's true that the car might be drivable in that condition (as long as nothing unexpected happens), you're going to have a hard time convincing the manufacturer to offer that as an options package. The original poster's request is for a config parameter, for experimentation and testing by the brave. My own request was for that version of the lock to prevent possible starvation but improve performance by unlocking all shared at once, then doing all exclusives one at a time next, etc. That doesn't prevent starvation in general, although it will for some workloads. Anyway, it seems rather pointless to add a config parameter that isn't at all safe, and adds overhead to a critical part of the system for people who don't use it. After all, if you find that it helps, what are you going to do? Turn it on in production? I just don't see how this is any good other than as a thought-experiment. At any rate, as I understand it, even after Jignesh eliminated the waits, he wasn't able to push his CPU utilization above 48%. Surely something's not right there. And he also said that when he added a knob to control the behavior, he got a performance improvement even when the knob was set to 0, which corresponds to the behavior we have already anyway. So I'm very skeptical that there's something wrong with either the system or the test. Until that's understood and fixed, I don't think that looking at the numbers is worth much. I alluded to the three main ways of dealing with lock contention elsewhere. Avoiding locks, making finer grained locks, and making locks faster. All are worthy. Some are harder to do than others. Some have been heavily tuned already. Its a case by case basis. And regardless, the unfair lock is a good test tool. In view of the caveats above, I'll give that a firm maybe. ...Robert - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/18/09 17:16, Scott Carey wrote: On 3/18/09 4:36 AM, Gregory Stark st...@enterprisedb.com wrote: Jignesh K. Shah j.k.s...@sun.com writes: In next couple of weeks I plan to test the patch on a different x64 based system to do a sanity testing on lower number of cores and also try out other workloads ... I'm actually more interested in the large number of cores but fewer processes and lower max_connections. If you set max_connections to 64 and eliminate the wait time you should, in theory, be able to get 100% cpu usage. It would be very interesting to track down the contention which is preventing that. My previous calculation in this thread showed that even at 0 wait time, the client seems to introduce ~3ms wait time overhead on average. So it takes close to 128 threads in each test to stop the linear scaling since the average processing time seems to be about ~3ms. Either that, or the tests actually are running on a system capable of 128 threads. Nope 64 threads for sure .. Verified it number of times .. -Jignesh -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/18/09 17:25, Robert Haas wrote: On Wed, Mar 18, 2009 at 1:43 PM, Scott Carey sc...@richrelevance.com wrote: Its worth ruling out given that even if the likelihood is small, the fix is easy. However, I don¹t see the throughput drop from peak as more concurrency is added that is the hallmark of this problem usually with a lot of context switching and a sudden increase in CPU use per transaction. The problem is that the proposed fix bears a strong resemblence to attempting to improve your gas mileage by removing a few non-critical parts from your card, like, say, the bumpers, muffler, turn signals, windshield wipers, and emergency brake. The fix I was referring to as easy was using a connection pooler -- as a reply to the previous post. Even if its a low likelihood that the connection pooler fixes this case, its worth looking at. Oh, OK. There seem to be some smart people saying that's a pretty high-likelihood fix. I thought you were talking about the proposed locking change. While it's true that the car might be drivable in that condition (as long as nothing unexpected happens), you're going to have a hard time convincing the manufacturer to offer that as an options package. The original poster's request is for a config parameter, for experimentation and testing by the brave. My own request was for that version of the lock to prevent possible starvation but improve performance by unlocking all shared at once, then doing all exclusives one at a time next, etc. That doesn't prevent starvation in general, although it will for some workloads. Anyway, it seems rather pointless to add a config parameter that isn't at all safe, and adds overhead to a critical part of the system for people who don't use it. After all, if you find that it helps, what are you going to do? Turn it on in production? I just don't see how this is any good other than as a thought-experiment. Actually the patch I submitted shows no overhead from what I have seen and I think it is useful depending on workloads where it can be turned on even on production. At any rate, as I understand it, even after Jignesh eliminated the waits, he wasn't able to push his CPU utilization above 48%. Surely something's not right there. And he also said that when he added a knob to control the behavior, he got a performance improvement even when the knob was set to 0, which corresponds to the behavior we have already anyway. So I'm very skeptical that there's something wrong with either the system or the test. Until that's understood and fixed, I don't think that looking at the numbers is worth much. I dont think anything is majorly wrong in my system.. Sometimes it is PostgreSQL locks in play and sometimes it can be OS/system related locks in play (network, IO, file system, etc). Right now in my patch after I fix waiting procarray problem other PostgreSQL locks comes into play: CLogControlLock, WALInsertLock , etc. Right now out of the box we have no means of tweaking something in production if you do land in that problem. With the patch there is means of doing knob control to tweak the bottlenecks of Locks for the main workload for which it is put in production. I still haven't seen any downsides with the patch yet other than highlighting other bottlenecks in the system. (For example I haven't seen a run where the tpm on my workload decreases as you increase the number) What I am suggesting is run the patch and see if you find a workload where you see a downside in performance and the lock statistics output to see if it is pushing the bottleneck elsewhere more likely WALInsertLock or CLogControlBlock. If yes then this patch gives you the right tweaking opportunity to reduce stress on ProcArrayLock for a workload while still not seriously stressing WALInsertLock or CLogControlBlock. Right now.. the standard answer applies.. nope you are running the wrong workload for PostgreSQL, use a connection pooler or your own application logic. Or maybe.. you have too many users for PostgreSQL use some proprietary database. -Jignesh I alluded to the three main ways of dealing with lock contention elsewhere. Avoiding locks, making finer grained locks, and making locks faster. All are worthy. Some are harder to do than others. Some have been heavily tuned already. Its a case by case basis. And regardless, the unfair lock is a good test tool. In view of the caveats above, I'll give that a firm maybe. ...Robert
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Wed, 2009-03-18 at 16:26 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote: One possibility would be for the locks to alternate between exclusive and shared - that is: 1. Take a snapshot of all shared waits, and grant them all - thundering herd style. 2. Wait until ALL of them have finished, granting no more. 3. Take a snapshot of all exclusive waits, and grant them all, one by one. 4. Wait until all of them have been finished, granting no more. 5. Back to (1) I agree with that, apart from the granting no more bit. Currently we queue up exclusive locks, but there is no need to since for ProcArrayLock commits are all changing different data. The most useful behaviour is just to have two modes: * exclusive-lock held - all other x locks welcome, s locks queue * shared-lock held - all other s locks welcome, x locks queue My goodness, it seems people have forgotten about the lightweight part of the LWLock design. Lightweight is only useful if it fits purpose. If the LWlock design doesn't fit all cases, especially with critical lock types, then we can have special cases. We have both spinlocks and LWlocks, plus we split hash tables into multiple lock partitions. If we have 3 types of lightweight locking, why not consider having 4? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Wed, 2009-03-18 at 13:49 +, Matthew Wakeling wrote: On Wed, 18 Mar 2009, Jignesh K. Shah wrote: I thought about that.. Except without putting a restriction a huge queue will cause lot of time spent in manipulating the lock list every time. One more thing will be to maintain two list shared and exclusive and round robin through them for every time you access the list so manipulation is low.. But the best thing is to allow flexibility to change the algorithm since some workloads may work fine with one and others will NOT. The flexibility then allows to tinker for those already reaching the limits. Yeah, having two separate queues is the obvious way of doing this. It would make most operations really trivial. Just wake everything in the shared queue at once, and you can throw it away wholesale and allocate a new queue. It avoids a whole lot of queue manipulation. Yes, that sounds good. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance