Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
From: Robert Haas [robertmh...@gmail.com] Sent: Thursday, March 19, 2009 8:45 PM To: Scott Carey Cc: Jignesh K. Shah; Greg Smith; Kevin Grittner; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Proposal of tunable fix for scalability of 8.4 On Thu, Mar 19, 2009 at 5:43 PM, Scott Carey sc...@richrelevance.com wrote: Well, unless I'm misunderstanding something, waking all waiters every time could lead to arbitrarily long delays for writers on mostly read-only workloads... and by arbitrarily along, we mean to say potentially just about forever. That doesn't sound safe for production to me. The other discussion going on indicates that that condition already can happen, shared can always currently cut in line while other shared locks have the lock, though I don't understand all the details. No. If the first process waiting for an LWLock wants an exclusive lock, we wake up that process, and only that process. If the first process waiting for an LWLock wants a shared lock, we wake up that process, and the processes which it follow it in the queue that also want shared locks. But if we come to a process which holds an exclusive lock, we stop. So if the wait queue looks like this SSSXSSSXSSS, then the first three processes will be woken up, but the remainder will not. The new wait queue will look like this: XSSSXSSS - and the exclusive waiter at the head of the queue is guaranteed to get the next turn. Your description (much of which I cut out) is exactly how I understood it until Simon Riggs' post which changed my view and understanding. Under that situation, waking all shared will leave all X at the front and hence alternate shared/exclusive/shared/exclusive as long as both types are contending. Simon's post changed my view. Below is some cut/paste from it: NOTE: things without a in front here represent Simon until the ENDQUOTE: QUOTE --- On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote: On Wed, 18 Mar 2009, Simon Riggs wrote: I agree with that, apart from the granting no more bit. The most useful behaviour is just to have two modes: * exclusive-lock held - all other x locks welcome, s locks queue * shared-lock held - all other s locks welcome, x locks queue The problem with making all other locks welcome is that there is a possibility of starvation. Imagine a case where there is a constant stream of shared locks - the exclusive locks may never actually get hold of the lock under the all other shared locks welcome strategy. That's exactly what happens now. -- [Scott Carey] (Further down in Simon's post, a quote from months ago: ) -- Each time a Shared request is dequeued, we effectively re-enable queue jumping, so a Shared request arriving during that point will actually jump ahead of Shared requests that were unlucky enough to arrive while an Exclusive lock was held. Worse than that, the new incoming Shared requests exacerbate the starvation, so the more non-adjacent groups of Shared lock requests there are in the queue, the worse the starvation of the exclusive requestors becomes. We are effectively randomly starving some shared locks as well as exclusive locks in the current scheme, based upon the state of the lock when they make their request. ENDQUOTE ( Simon Riggs, cut/paste by me. post from his post Wednesday 3/18 5:10 AM pacific time). -- I read that to mean that what is happening now is that in ADDITION to your explanation of how the queue works, while a batch of shared locks are executing, NEW shared locks execute immediately and don't even queue. That is, there is shared request queue jumping. The queue operates as your description but not everythig queues. It seems pretty conclusive if that is truthful -- that there is starvation possible in the current system. At this stage, it would seem that neither of us are experts on the current behavior, or that Simon is wrong, or that I completely misunderstood his comments above. Now, of course, EVENTUALLY one of the X guys will probably beat out all the S-lock waiters and he'll get to do his thing. But there's no upper bound on how long this can take, and if the rate at which S-lock waiters are joining the queue is much higher than the rate at which X-lock waiters are joining the queue, it may be quite a long time. And the average expected time and distribution of those events can be statistically calculated and empirically measured. The fact that there is a chance at all is not as important as the magitude of the chance and the distribution of those probabilities. Even if the overall system throughput is better with this change, the fact that the guys who need the X-lock get seriously shafted is a really serious problem. If 'serious shafting' is so, yes! We only disagree on the current possibility of this and the magnitude/likelihood of it. By Simon's comments above the
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
From: pgsql-performance-ow...@postgresql.org [pgsql-performance-ow...@postgresql.org] On Behalf Of Simon Riggs [si...@2ndquadrant.com] Sent: Wednesday, March 18, 2009 12:53 AM To: Matthew Wakeling Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Proposal of tunable fix for scalability of 8.4 On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote: One possibility would be for the locks to alternate between exclusive and shared - that is: 1. Take a snapshot of all shared waits, and grant them all - thundering herd style. 2. Wait until ALL of them have finished, granting no more. 3. Take a snapshot of all exclusive waits, and grant them all, one by one. 4. Wait until all of them have been finished, granting no more. 5. Back to (1) I agree with that, apart from the granting no more bit. Currently we queue up exclusive locks, but there is no need to since for ProcArrayLock commits are all changing different data. The most useful behaviour is just to have two modes: * exclusive-lock held - all other x locks welcome, s locks queue * shared-lock held - all other s locks welcome, x locks queue This *only* works for ProcArrayLock. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support I want to comment on an important distinction between these two variants. The granting no more bit WILL decrease performance under high contention. Here is my reasoning. We have two two lists proposals. Type A: allow line cutting (Simon, above): * exclusive-lock held and all exclusives process - all other NEW x locks welcome, s locks queue * shared-lock held and all shareds process- all other NEW s locks welcome, x locks queue Type B: forbid line cutting (Matthew, above, modified to allow multiple exclusive for ProcArrayLock -- for other types exclusive would be one at a time) * exclusive-lock held and all exclusives process - all NEW lock requests queue * shared-lock held and shareds process - all NEW lock requests queue A big benefit of the wake all proposal, is that a lot of access does not have to context switch out and back in. On a quick assessment, the type A above would lock and context switch even less than the wake-all (since exclusives don't go one at a time) but otherwise be similar. But this won't matter much if it is shared lock dominated. I would LOVE to have seen context switch rate numbers with the results so far, but many base unix tools don't show it by default (can get it from sar, rstat reports it) average # of context switches per transaction is an awesome measure of lock contention and lock efficiency. In type A above, the ratio of requests that require a context switch is Q / (M + Q), where Q is the average queue size when the 'shared-exclusive' swap occrs and M is the average number of line cutters. In type B, the ratio of requests that must context switch is always == 1. Every request must queue and wait! This may perform worse than the current lock! One way to guarantee some fairness is to compromise between the two. Lets call this proposal C. Unfortunately, this is less elegant than the other two, since it has logic for both. It could be made tunable to be the complete spectrum though. * exclusive-lock held and all exclusives process - first N new X requests welcome, N+1 and later X requests and all shared locks queue. * shared-lock held and shareds process - first N new S requests welcom, N+1 and later S requests and all X locks queue So, if shared locks are queuing and exclusive hold the lock and are operating, and another exclusive request arrives, it can cut in line only if it is one of the first N to do so before it will queue and wait and give shared locks their turn. This counting condition can be done with an atomically incrementing integer using compare and set operations and no locks, and under heavy contention will reduce the number of context switches per operation to Q/(N + Q) where N is the number of 'line cutters' achieved and Q is the average queue size when the queued items are unlocked. Note how this is the same as the 'unbounded' equation with M above, except that N can never be greater than M (the 'natural' line cut count). So for N = Q half are forced to context switch and half cut in line without a context switch. N can be tunable, and it can be a different number for shared and exclusive to bias towards one or the other if desired. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Prepared statement does not exist
Glyn Astill, Thank for your reply. But can you confirm on this? As what I see from the logs, its pgpool which is trying to deallocate the prepared statement and not the application. The application just disconnects and not tyring to use the same connection. Regards, Nimesh. On Thu, Mar 19, 2009 at 5:07 PM, Glyn Astill glynast...@yahoo.co.uk wrote: --- On Thu, 19/3/09, Nimesh Satam nimesh.z...@gmail.com wrote: We are receving the following error in the postgres database logs: 2009-03-19 02:14:20 PDT [2547]: [79-1] LOG: duration: 0.039 ms statement: RESET ALL 2009-03-19 02:14:20 PDT [2547]: [80-1] LOG: duration: 0.027 ms statement: SET SESSION AUTHORIZATION DEFAULT 2009-03-19 02:14:20 PDT [2547]: [81-1] ERROR: prepared statement S_1 does not exist 2009-03-19 02:14:20 PDT [2547]: [82-1] STATEMENT: DEALLOCATE S_1 2009-03-19 02:14:20 PDT [2547]: [83-1] ERROR: prepared statement S_4 does not exist 2009-03-19 02:14:20 PDT [2547]: [84-1] STATEMENT: DEALLOCATE S_4 We receive this errors when we start connecting the java application thorugh pgpool. What causes this problem and how can it be avoided? Looks like your app is dissconnecting from pgpool which is causing pgpool to send the RESET ALL, this will deallocate the prepared statement. Then the app is reconnecting to pgpool again and expecting the prepared statement to still be available, which it will not be.
[PERFORM] current transaction in productive database
Hello List, is there a way to find out, how many transactions my currenc productive database is doing? I know know how much i an offer with my new database and hardware, but i would also like to know what i actually _need_ on my current productive system. Is there a way to find this out? Cheers, Mario -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with one query
Anne Rosset wrote: EXPLAIN ANALYZE SELECT audit_change.id AS id, audit_change.audit_entry_id AS auditEntryId, audit_entry.object_id AS objectId, audit_change.property_name AS propertyName, audit_change.property_type AS propertyType, audit_change.old_value AS oldValue, audit_change.new_value AS newValue, audit_change.flexfield AS flexField FROM audit_entry audit_entry, audit_change audit_change WHERE audit_change.audit_entry_id = audit_entry.id AND audit_entry.object_id = 'artf414029'; [query reformatted to make it more readable] Not quite clear why you are aliasing the tables to their own names... - Hash Join (cost=8.79..253664.55 rows=4 width=136) (actual time=4612.674..6683.158 rows=4 loops=1) Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text) - Seq Scan on audit_change (cost=0.00..225212.52 rows=7584852 width=123) (actual time=0.009..2838.216 rows=7584852 loops=1) - Hash (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049 rows=4 loops=1) - Index Scan using audit_entry_object on audit_entry (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1) Index Cond: ((object_id)::text = 'artf414029'::text) Total runtime: 6683.220 ms Very odd. It knows the table is large and that the seq-scan is going to be expensive. Try issuing set enable_seqscan = off and run the explain analyse again. That should show the cost of using the indexes. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Prepared statement does not exist
--- On Fri, 20/3/09, Nimesh Satam nimesh.z...@gmail.com wrote: From: Nimesh Satam nimesh.z...@gmail.com We are receving the following error in the postgres database logs: 2009-03-19 02:14:20 PDT [2547]: [79-1] LOG: duration: 0.039 ms statement: RESET ALL 2009-03-19 02:14:20 PDT [2547]: [80-1] LOG: duration: 0.027 ms statement: SET SESSION AUTHORIZATION DEFAULT 2009-03-19 02:14:20 PDT [2547]: [81-1] ERROR: prepared statement S_1 does not exist 2009-03-19 02:14:20 PDT [2547]: [82-1] STATEMENT: DEALLOCATE S_1 2009-03-19 02:14:20 PDT [2547]: [83-1] ERROR: prepared statement S_4 does not exist 2009-03-19 02:14:20 PDT [2547]: [84-1] STATEMENT: DEALLOCATE S_4 We receive this errors when we start connecting the java application thorugh pgpool. What causes this problem and how can it be avoided? Looks like your app is dissconnecting from pgpool which is causing pgpool to send the RESET ALL, this will deallocate the prepared statement. Then the app is reconnecting to pgpool again and expecting the prepared statement to still be available, which it will not be. Thank for your reply. But can you confirm on this? As what I see from the logs, its pgpool which is trying to deallocate the prepared statement and not the application. The application just disconnects and not tyring to use the same connection. There is the possibility that it's pgpool sending the deallocate in error after the reset all then. Either way, this is not relevent to the performance list, send it over to the pgpool list... and tell them your pgpool version number too - it may be a fixed bug. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with one query
Richard Huxton d...@archonet.com writes: Hash Join (cost=8.79..253664.55 rows=4 width=136) (actual time=4612.674..6683.158 rows=4 loops=1) Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text) - Seq Scan on audit_change (cost=0.00..225212.52 rows=7584852 width=123) (actual time=0.009..2838.216 rows=7584852 loops=1) - Hash (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049 rows=4 loops=1) - Index Scan using audit_entry_object on audit_entry (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1) Index Cond: ((object_id)::text = 'artf414029'::text) Total runtime: 6683.220 ms Very odd. It knows the table is large and that the seq-scan is going to be expensive. Yeah, *very* odd. A nestloop with inner indexscan should have an estimated cost far lower than this plan. What Postgres version is this exactly? Do you have any nondefault planner parameter settings? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Thu, 19 Mar 2009, Scott Carey wrote: In type B, the ratio of requests that must context switch is always == 1. Every request must queue and wait! A remarkably good point, although not completely correct. Every request that arrives when the lock is held in any way already will queue and wait. Requests that arrive when the lock is free will run immediately. I admit it, this is a killer for this particular locking strategy. Firstly, let's say that if the lock is in shared mode, and there are no exclusive waiters, then incoming shared lockers can be allowed to process immediately. That's just obvious. Strictly following your or my suggestion would preclude that, forcing a queue every so often. One way to guarantee some fairness is to compromise between the two. Lets call this proposal C. Unfortunately, this is less elegant than the other two, since it has logic for both. It could be made tunable to be the complete spectrum though. * exclusive-lock held and all exclusives process - first N new X requests welcome, N+1 and later X requests and all shared locks queue. * shared-lock held and shareds process - first N new S requests welcom, N+1 and later S requests and all X locks queue I like your solution. For now, let's just examine normal shared/exclusive locks, not the ProcArrayLock. The question is, what is the ideal number for N? With your solution, N is basically a time limit, to prevent the lock from completely starving exclusive (or possibly shared) locks. If the shared locks are processing, then either the incoming shared requests are frequent, at which point N will be reached soon and force a switch to exclusive mode, or the shared requests are infrequent, at which point the lock should become free fairly soon. This means that having a count should be sufficient as a time limit. So, what is too unfair? I'm guessing N can be set really quite high, and it should definitely scale by the number of CPUs in the machine. Exact values are probably best determined by experiment, but I'd say something like ten times the number of CPUs. As for ProcArrayLock, it sounds like it is very much a special case. The statement that the writers don't interfere with each other seems very strange to me, and makes me wonder if the structure needs any locks at all, or at least can be very partitioned. Perhaps it could be implemented as a lock-free structure. But I don't know what the actual structure is, so I could be talking through my hat. Matthew -- So, given 'D' is undeclared too, with a default of zero, C++ is equal to D. mnw21, commenting on the Surely the value of C++ is zero, but C is now 1 response to No, C++ isn't equal to D. 'C' is undeclared [...] C++ should really be called 1 response to C++ -- shouldn't it be called D? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Scott Carey escribió: Your description (much of which I cut out) is exactly how I understood it until Simon Riggs' post which changed my view and understanding. Under that situation, waking all shared will leave all X at the front and hence alternate shared/exclusive/shared/exclusive as long as both types are contending. Simon's post changed my view. Below is some cut/paste from it: Simon's explanation, however, is at odds with the code. http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/storage/lmgr/lwlock.c There is queue jumping in the regular (heavyweight) lock manager, but that's a pretty different body of code. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Matthew Wakeling matt...@flymine.org writes: As for ProcArrayLock, it sounds like it is very much a special case. Quite. Read the section Interlocking Transaction Begin, Transaction End, and Snapshots in src/backend/access/transam/README before proposing any changes in this area --- it's a lot more delicate than one might think. We'd have partitioned the ProcArray long ago if it wouldn't have broken the transaction system. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] current transaction in productive database
On Mar 20, 2009, at 5:26 AM, m...@bortal.de wrote: Hello List, is there a way to find out, how many transactions my currenc productive database is doing? I know know how much i an offer with my new database and hardware, but i would also like to know what i actually _need_ on my current productive system. Is there a way to find this out? Are you looking to see how many transactions per second or more how many transactions concurrently at a given time? For the former you can use pgspy (its on pgfoundry) to get an idea of queries per second coming in. For the latter, just select * from pg_stat_activity where current_query 'IDLE'; -- Jeff Trout j...@jefftrout.com http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with one query
Richard Huxton wrote: Anne Rosset wrote: EXPLAIN ANALYZE SELECT audit_change.id AS id, audit_change.audit_entry_id AS auditEntryId, audit_entry.object_id AS objectId, audit_change.property_name AS propertyName, audit_change.property_type AS propertyType, audit_change.old_value AS oldValue, audit_change.new_value AS newValue, audit_change.flexfield AS flexField FROM audit_entry audit_entry, audit_change audit_change WHERE audit_change.audit_entry_id = audit_entry.id AND audit_entry.object_id = 'artf414029'; [query reformatted to make it more readable] Not quite clear why you are aliasing the tables to their own names... - Hash Join (cost=8.79..253664.55 rows=4 width=136) (actual time=4612.674..6683.158 rows=4 loops=1) Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text) - Seq Scan on audit_change (cost=0.00..225212.52 rows=7584852 width=123) (actual time=0.009..2838.216 rows=7584852 loops=1) - Hash (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049 rows=4 loops=1) - Index Scan using audit_entry_object on audit_entry (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1) Index Cond: ((object_id)::text = 'artf414029'::text) Total runtime: 6683.220 ms Very odd. It knows the table is large and that the seq-scan is going to be expensive. Try issuing set enable_seqscan = off and run the explain analyse again. That should show the cost of using the indexes. With set enable_seqscan = off: QUERY PLAN -- Nested Loop (cost=11.35..12497.53 rows=59 width=859) (actual time=46.074..49.742 rows=7 loops=1) - Index Scan using audit_entry_pk on audit_entry (cost=0.00..7455.95 rows=55 width=164) (actual time=45.940..49.541 rows=2 loops=1) Filter: ((object_id)::text = 'artf1024'::text) - Bitmap Heap Scan on audit_change (cost=11.35..90.93 rows=59 width=777) (actual time=0.086..0.088 rows=4 loops=2) Recheck Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text) - Bitmap Index Scan on audit_change_entry (cost=0.00..11.33 rows=59 width=0) (actual time=0.076..0.076 rows=4 loops=2) Index Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text) Total runtime: 49.801 ms The db version is 8.2.4 We are wondering if it is because of our audit_entry_id's format (like 'adte1DDFEA5B011C8988C3928752'). Any inputs? Thanks, Anne -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] current transaction in productive database
On Fri, 20 Mar 2009, m...@bortal.de wrote: is there a way to find out, how many transactions my currenc productive database is doing? What you probably want here is not a true transaction count, which might include thing that don't matter much for scaling purposes, but instead to count things happening that involve a database change. You can find out totals for that broken down by table using this: select * from pg_stat_user_tables See http://www.postgresql.org/docs/8.3/static/monitoring-stats.html for more details. You'll want to sum the totals for inserts, updates, and deletes to get all the normal transcations. That will be a total since the statistics were last reset. If you want a snapshot for a period, you can either sample at the beginning and end and subtract, or you can use: select pg_stat_reset(); To reset everything, wait for some period, and then look at the totals. You may not want to do that immediately though. The totals since the database were brought up that you'll find in the statistics views can be interesting to look at for some historical perspective, so you should probably save any of those that look interesting before you reset anything. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] current transaction in productive database
m...@bortal.de escreveu: is there a way to find out, how many transactions my currenc productive database is doing? If you're looking for number of transactions then you can query the catalogs as: $ export myq=select sum(xact_commit+xact_rollback) from pg_stat_database $ psql -U postgres -c $myq sleep 60 psql -U postgres -c $myq sum --- 178992891 (1 row) sum --- 178996065 (1 row) $ bc -q scale=3 (178996065-178992891)/60 52.900 Depending on your workload pattern, it's recommended to increase the sleep time. -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 3/20/09 8:28 AM, Matthew Wakeling matt...@flymine.org wrote: On Thu, 19 Mar 2009, Scott Carey wrote: In type B, the ratio of requests that must context switch is always == 1. Every request must queue and wait! A remarkably good point, although not completely correct. Every request that arrives when the lock is held in any way already will queue and wait. Requests that arrive when the lock is free will run immediately. I admit it, this is a killer for this particular locking strategy. Yeah, its the when there is lock contention part that is a general truth for all locks. As for this killing this strategy, there is one exception: If we know the operations done inside the lock are very fast, then we can use pure spin locks. Then there is no context switching at all, ant it is more optimal to go from list to list in smaller chunks with no 'cutting in line' as in this strategy. Although, even with spins, a limited number of line cutters is helpful to reduce overall spin time. As a general reader/writer lock spin locks are more dangerous. It is often optimal to spin for a short time, then if the lock is still not attained context switch out with a wait. Generally speaking, lock optimization for heavily contended locks is an attempt to minimize context switches with the least additional CPU overhead. Firstly, let's say that if the lock is in shared mode, and there are no exclusive waiters, then incoming shared lockers can be allowed to process immediately. That's just obvious. Strictly following your or my suggestion would preclude that, forcing a queue every so often. Definitely an important optimization! One way to guarantee some fairness is to compromise between the two. Lets call this proposal C. Unfortunately, this is less elegant than the other two, since it has logic for both. It could be made tunable to be the complete spectrum though. * exclusive-lock held and all exclusives process - first N new X requests welcome, N+1 and later X requests and all shared locks queue. * shared-lock held and shareds process - first N new S requests welcom, N+1 and later S requests and all X locks queue I like your solution. For now, let's just examine normal shared/exclusive locks, not the ProcArrayLock. The question is, what is the ideal number for N? With your solution, N is basically a time limit, to prevent the lock from completely starving exclusive (or possibly shared) locks. If the shared locks are processing, then either the incoming shared requests are frequent, at which point N will be reached soon and force a switch to exclusive mode, or the shared requests are infrequent, at which point the lock should become free fairly soon. This means that having a count should be sufficient as a time limit. So, what is too unfair? I'm guessing N can be set really quite high, and it should definitely scale by the number of CPUs in the machine. Exact values are probably best determined by experiment, but I'd say something like ten times the number of CPUs. I would have guessed something large as well. Its the extremes and pathological cases that are most concerning. In normal operation, the limit should not be hit. As for ProcArrayLock, it sounds like it is very much a special case. The statement that the writers don't interfere with each other seems very strange to me, and makes me wonder if the structure needs any locks at all, or at least can be very partitioned. Perhaps it could be implemented as a lock-free structure. But I don't know what the actual structure is, so I could be talking through my hat. I do too much of that. If it is something that should have very short lived lock holding then spin locks or other very simple structures built on atomics could do it. Even a linked list is not necessary if its all built with atomics and spins since 'waking up' is merely setting a single value all waiters share. But I know too little about what goes on when the lock is held so this is getting very speculative. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Full statement logging problematic on larger machines?
On Thu, Mar 12, 2009 at 9:32 PM, Laurent Laborde kerdez...@gmail.com wrote: On Wed, Mar 11, 2009 at 11:42 PM, Frank Joerdens fr...@joerdens.de wrote: effective_cache_size = 4GB Only 4GB with 64GB of ram ? I'd been overly cautious lately with config changes as it's been difficult to argue for downtime and associated service risk. Changed it to 48 GB now since it doesn't require a restart which I'd missed. Thanks for spotting! Regards, Frank -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with one query
On Fri, Mar 20, 2009 at 1:16 PM, Anne Rosset aros...@collab.net wrote: Richard Huxton wrote: Anne Rosset wrote: EXPLAIN ANALYZE SELECT audit_change.id AS id, audit_change.audit_entry_id AS auditEntryId, audit_entry.object_id AS objectId, audit_change.property_name AS propertyName, audit_change.property_type AS propertyType, audit_change.old_value AS oldValue, audit_change.new_value AS newValue, audit_change.flexfield AS flexField FROM audit_entry audit_entry, audit_change audit_change WHERE audit_change.audit_entry_id = audit_entry.id AND audit_entry.object_id = 'artf414029'; [query reformatted to make it more readable] Not quite clear why you are aliasing the tables to their own names... - Hash Join (cost=8.79..253664.55 rows=4 width=136) (actual time=4612.674..6683.158 rows=4 loops=1) Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text) - Seq Scan on audit_change (cost=0.00..225212.52 rows=7584852 width=123) (actual time=0.009..2838.216 rows=7584852 loops=1) - Hash (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049 rows=4 loops=1) - Index Scan using audit_entry_object on audit_entry (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1) Index Cond: ((object_id)::text = 'artf414029'::text) Total runtime: 6683.220 ms Very odd. It knows the table is large and that the seq-scan is going to be expensive. Try issuing set enable_seqscan = off and run the explain analyse again. That should show the cost of using the indexes. With set enable_seqscan = off: QUERY PLAN -- Nested Loop (cost=11.35..12497.53 rows=59 width=859) (actual time=46.074..49.742 rows=7 loops=1) - Index Scan using audit_entry_pk on audit_entry (cost=0.00..7455.95 rows=55 width=164) (actual time=45.940..49.541 rows=2 loops=1) Filter: ((object_id)::text = 'artf1024'::text) - Bitmap Heap Scan on audit_change (cost=11.35..90.93 rows=59 width=777) (actual time=0.086..0.088 rows=4 loops=2) Recheck Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text) - Bitmap Index Scan on audit_change_entry (cost=0.00..11.33 rows=59 width=0) (actual time=0.076..0.076 rows=4 loops=2) Index Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text) Total runtime: 49.801 ms The db version is 8.2.4 We are wondering if it is because of our audit_entry_id's format (like 'adte1DDFEA5B011C8988C3928752'). Any inputs? Thanks, Anne Something is wrong here. How can setting enable_seqscan to off result in a plan with a far lower estimated cost than the original plan? If the planner thought the non-seq-scan plan is cheaper, it would have picked that one to begin with. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with one query
Robert Haas escribió: Something is wrong here. How can setting enable_seqscan to off result in a plan with a far lower estimated cost than the original plan? If the planner thought the non-seq-scan plan is cheaper, it would have picked that one to begin with. GEQO? Anne, what's geqo_threshold set to? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Full statement logging problematic on larger machines?
On Thu, Mar 12, 2009 at 1:38 PM, Frank Joerdens fr...@joerdens.de wrote: On Thu, Mar 12, 2009 at 1:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: [...] You could try changing _IOLBF to _IOFBF near the head of postmaster/syslogger.c and see if that helps. The patched server is now running on live, and we'll be watching it over the weekend with log_duration = off log_min_duration_statement = 1000 log_statement = 'ddl' and then run a full logging test early next week if there are no problems with the above settings. Can you explain again what the extent of multiplexed messages I'll have to expect is? What exactly is the substance of the tradeoff? Occasionally the server will write the same line twice? Don't really follow why ... And the next problem is that now unfortunately the entire comparison is obfuscated and complicated by a release we did on Monday which has had a strange effect: Quite extreme load average spikes occurring frequently that do not seem to impact query speed - not much anyway or if they do then in a random intermittent manner that's not become apparent (yet) - CPU usage is actually marginally down, locks significantly down, and all other relevant metrics basically unchanged like context switches and memory usage profile. Now, it *seems* that the extra load is caused by idle (sic!) backends (*not* idle in transaction even) consuming significant CPU when you look at htop. I don't have a theory as to that right now. We use pgbouncer as a connection pooler. What could make idle backends load the server substantially? Regards, Frank -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU Utilization
On Mar 17, 2009, at 12:19 AM, Greg Smith wrote: On Tue, 17 Mar 2009, Gregory Stark wrote: Hm, well the tests I ran for posix_fadvise were actually on a Perc5 -- though who knows if it was the same under the hood -- and I saw better performance than this. I saw about 4MB/s for a single drive and up to about 35MB/s for 15 drives. However this was using linux md raid-0, not hardware raid. Right, it's the hardware RAID on the Perc5 I think people mainly complain about. If you use it in JBOD mode and let the higher performance CPU in your main system drive the RAID functions it's not so bad. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD I have not yet had a chance to try software raid on the standby server (still planning to) but wanted to follow up to see if there was any good way to figure out what the postgresql processes are spending their CPU time on. We are under peak load right now, and I have Zabbix plotting CPU utilization and CPU wait (from vmstat output) along with all sorts of other vitals on charts. CPU utilization is a sustained 90% - 95% and CPU Wait is hanging below 10%. Since being pointed at vmstat by this list I have been watching CPU Wait and it does get high at times (hence still wanting to try Perc5 in JBOD) but then there are sustained periods, right now included, where our CPUs are just getting crushed while wait and IO (only doing about 1.5 MB/sec right now) are very low. This high CPU utilization only occurs when under peak load and when our JDBC pools are fully loaded. We are moving more things into our cache and constantly tuning indexes/tables but just want to see if there is some underlying cause that is killing us. Any recommendations for figuring out what our database is spending its CPU time on? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with one query
Alvaro Herrera wrote: Robert Haas escribió: Something is wrong here. How can setting enable_seqscan to off result in a plan with a far lower estimated cost than the original plan? If the planner thought the non-seq-scan plan is cheaper, it would have picked that one to begin with. GEQO? Anne, what's geqo_threshold set to? Hi Alvaro: It is turned off geqo | off | Enables genetic query optimization. Thanks, Anne -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU Utilization
On Fri, Mar 20, 2009 at 2:26 PM, Joe Uhl joe...@gmail.com wrote: On Mar 17, 2009, at 12:19 AM, Greg Smith wrote: On Tue, 17 Mar 2009, Gregory Stark wrote: Hm, well the tests I ran for posix_fadvise were actually on a Perc5 -- though who knows if it was the same under the hood -- and I saw better performance than this. I saw about 4MB/s for a single drive and up to about 35MB/s for 15 drives. However this was using linux md raid-0, not hardware raid. Right, it's the hardware RAID on the Perc5 I think people mainly complain about. If you use it in JBOD mode and let the higher performance CPU in your main system drive the RAID functions it's not so bad. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD I have not yet had a chance to try software raid on the standby server (still planning to) but wanted to follow up to see if there was any good way to figure out what the postgresql processes are spending their CPU time on. We are under peak load right now, and I have Zabbix plotting CPU utilization and CPU wait (from vmstat output) along with all sorts of other vitals on charts. CPU utilization is a sustained 90% - 95% and CPU Wait is hanging below 10%. Since being pointed at vmstat by this list I have been watching CPU Wait and it does get high at times (hence still wanting to try Perc5 in JBOD) but then there are sustained periods, right now included, where our CPUs are just getting crushed while wait and IO (only doing about 1.5 MB/sec right now) are very low. This high CPU utilization only occurs when under peak load and when our JDBC pools are fully loaded. We are moving more things into our cache and constantly tuning indexes/tables but just want to see if there is some underlying cause that is killing us. Any recommendations for figuring out what our database is spending its CPU time on? What does the cs entry on vmstat say at this time? If you're cs is skyrocketing then you're getting a context switch storm, which is usually a sign that there are just too many things going on at once / you've got an old kernel things like that. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU Utilization
On Mar 20, 2009, at 4:29 PM, Scott Marlowe wrote: On Fri, Mar 20, 2009 at 2:26 PM, Joe Uhl joe...@gmail.com wrote: On Mar 17, 2009, at 12:19 AM, Greg Smith wrote: On Tue, 17 Mar 2009, Gregory Stark wrote: Hm, well the tests I ran for posix_fadvise were actually on a Perc5 -- though who knows if it was the same under the hood -- and I saw better performance than this. I saw about 4MB/s for a single drive and up to about 35MB/s for 15 drives. However this was using linux md raid-0, not hardware raid. Right, it's the hardware RAID on the Perc5 I think people mainly complain about. If you use it in JBOD mode and let the higher performance CPU in your main system drive the RAID functions it's not so bad. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD I have not yet had a chance to try software raid on the standby server (still planning to) but wanted to follow up to see if there was any good way to figure out what the postgresql processes are spending their CPU time on. We are under peak load right now, and I have Zabbix plotting CPU utilization and CPU wait (from vmstat output) along with all sorts of other vitals on charts. CPU utilization is a sustained 90% - 95% and CPU Wait is hanging below 10%. Since being pointed at vmstat by this list I have been watching CPU Wait and it does get high at times (hence still wanting to try Perc5 in JBOD) but then there are sustained periods, right now included, where our CPUs are just getting crushed while wait and IO (only doing about 1.5 MB/sec right now) are very low. This high CPU utilization only occurs when under peak load and when our JDBC pools are fully loaded. We are moving more things into our cache and constantly tuning indexes/tables but just want to see if there is some underlying cause that is killing us. Any recommendations for figuring out what our database is spending its CPU time on? What does the cs entry on vmstat say at this time? If you're cs is skyrocketing then you're getting a context switch storm, which is usually a sign that there are just too many things going on at once / you've got an old kernel things like that. cs column (plus cpu columns) of vmtstat 1 30 reads as follows: csus sy id wa 11172 95 4 1 0 12498 94 5 1 0 14121 91 7 1 1 11310 90 7 1 1 12918 92 6 1 1 10613 93 6 1 1 9382 94 4 1 1 14023 89 8 2 1 10138 92 6 1 1 11932 94 4 1 1 15948 93 5 2 1 12919 92 5 3 1 10879 93 4 2 1 14014 94 5 1 1 9083 92 6 2 0 11178 94 4 2 0 10717 94 5 1 0 9279 97 2 1 0 12673 94 5 1 0 8058 82 17 1 1 8150 94 5 1 1 11334 93 6 0 0 13884 91 8 1 0 10159 92 7 0 0 9382 96 4 0 0 11450 95 4 1 0 11947 96 3 1 0 8616 95 4 1 0 10717 95 3 1 0 We are running on 2.6.28.7-2 kernel. I am unfamiliar with vmstat output but reading the man page (and that cs = context switches per second) makes my numbers seem very high. Our sum JDBC pools currently top out at 400 connections (and we are doing work on all 400 right now). I may try dropping those pools down even smaller. Are there any general rules of thumb for figuring out how many connections you should service at maximum? I know of the memory constraints, but thinking more along the lines of connections per CPU core. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU Utilization
On Fri, Mar 20, 2009 at 2:49 PM, Joe Uhl joe...@gmail.com wrote: On Mar 20, 2009, at 4:29 PM, Scott Marlowe wrote: What does the cs entry on vmstat say at this time? If you're cs is skyrocketing then you're getting a context switch storm, which is usually a sign that there are just too many things going on at once / you've got an old kernel things like that. cs column (plus cpu columns) of vmtstat 1 30 reads as follows: cs us sy id wa 11172 95 4 1 0 12498 94 5 1 0 14121 91 7 1 1 11310 90 7 1 1 12918 92 6 1 1 10613 93 6 1 1 9382 94 4 1 1 14023 89 8 2 1 10138 92 6 1 1 11932 94 4 1 1 15948 93 5 2 1 12919 92 5 3 1 10879 93 4 2 1 14014 94 5 1 1 9083 92 6 2 0 11178 94 4 2 0 10717 94 5 1 0 9279 97 2 1 0 12673 94 5 1 0 8058 82 17 1 1 8150 94 5 1 1 11334 93 6 0 0 13884 91 8 1 0 10159 92 7 0 0 9382 96 4 0 0 11450 95 4 1 0 11947 96 3 1 0 8616 95 4 1 0 10717 95 3 1 0 We are running on 2.6.28.7-2 kernel. I am unfamiliar with vmstat output but reading the man page (and that cs = context switches per second) makes my numbers seem very high. No, those aren't really all that high. If you were hitting cs contention, I'd expect it to be in the 25k to 100k range. 10k average under load is pretty reasonable. Our sum JDBC pools currently top out at 400 connections (and we are doing work on all 400 right now). I may try dropping those pools down even smaller. Are there any general rules of thumb for figuring out how many connections you should service at maximum? I know of the memory constraints, but thinking more along the lines of connections per CPU core. Well, maximum efficiency is usually somewhere in the range of 1 to 2 times the number of cores you have, so trying to get the pool down to a dozen or two connections would be the direction to generally head. May not be reasonable or doable though. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU Utilization
On Mar 20, 2009, at 4:58 PM, Scott Marlowe wrote: On Fri, Mar 20, 2009 at 2:49 PM, Joe Uhl joe...@gmail.com wrote: On Mar 20, 2009, at 4:29 PM, Scott Marlowe wrote: What does the cs entry on vmstat say at this time? If you're cs is skyrocketing then you're getting a context switch storm, which is usually a sign that there are just too many things going on at once / you've got an old kernel things like that. cs column (plus cpu columns) of vmtstat 1 30 reads as follows: csus sy id wa 11172 95 4 1 0 12498 94 5 1 0 14121 91 7 1 1 11310 90 7 1 1 12918 92 6 1 1 10613 93 6 1 1 9382 94 4 1 1 14023 89 8 2 1 10138 92 6 1 1 11932 94 4 1 1 15948 93 5 2 1 12919 92 5 3 1 10879 93 4 2 1 14014 94 5 1 1 9083 92 6 2 0 11178 94 4 2 0 10717 94 5 1 0 9279 97 2 1 0 12673 94 5 1 0 8058 82 17 1 1 8150 94 5 1 1 11334 93 6 0 0 13884 91 8 1 0 10159 92 7 0 0 9382 96 4 0 0 11450 95 4 1 0 11947 96 3 1 0 8616 95 4 1 0 10717 95 3 1 0 We are running on 2.6.28.7-2 kernel. I am unfamiliar with vmstat output but reading the man page (and that cs = context switches per second) makes my numbers seem very high. No, those aren't really all that high. If you were hitting cs contention, I'd expect it to be in the 25k to 100k range. 10k average under load is pretty reasonable. Our sum JDBC pools currently top out at 400 connections (and we are doing work on all 400 right now). I may try dropping those pools down even smaller. Are there any general rules of thumb for figuring out how many connections you should service at maximum? I know of the memory constraints, but thinking more along the lines of connections per CPU core. Well, maximum efficiency is usually somewhere in the range of 1 to 2 times the number of cores you have, so trying to get the pool down to a dozen or two connections would be the direction to generally head. May not be reasonable or doable though. Thanks for the info. Figure I can tune our pools down and monitor throughput/CPU/IO and look for a sweet spot with our existing hardware. Just wanted to see if tuning connections down could potentially help. I feel as though we are going to have to replicate this DB before too long. We've got an almost identical server doing nothing but PITR with 8 CPU cores mostly idle that could be better spent. Our pgfouine reports, though only logging queries that take over 1 second, show 90% reads. I have heard much about Slony, but has anyone used the newer version of Mammoth Replicator (or looks to be called PostgreSQL + Replication now) on 8.3? From the documentation, it appears to be easier to set up and less invasive but I struggle to find usage information/stories online. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Alvaro Herrera escribió: Simon's explanation, however, is at odds with the code. http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/storage/lmgr/lwlock.c There is queue jumping in the regular (heavyweight) lock manager, but that's a pretty different body of code. I'll just embarrass myself by pointing out that Neil Conway described this back in 2004: http://archives.postgresql.org//pgsql-hackers/2004-11/msg00905.php So Simon's correct. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Alvaro Herrera escribió: So Simon's correct. And perhaps this explains why Jignesh is measuring an improvement on his benchmark. Perhaps an useful experiment would be to turn this behavior off and compare performance. This lack of measurement is probably the cause that the suggested patch to fix it was never applied. The patch is here http://archives.postgresql.org//pgsql-hackers/2004-11/msg00935.php -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Full statement logging problematic on larger machines?
On Fri, Mar 20, 2009 at 8:21 PM, Andrew Dunstan and...@dunslane.net wrote: Frank Joerdens wrote: On Thu, Mar 12, 2009 at 1:38 PM, Frank Joerdens fr...@joerdens.de wrote: On Thu, Mar 12, 2009 at 1:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: [...] You could try changing _IOLBF to _IOFBF near the head of postmaster/syslogger.c and see if that helps. [...] Can you explain again what the extent of multiplexed messages I'll have to expect is? What exactly is the substance of the tradeoff? Occasionally the server will write the same line twice? Don't really follow why ... [...] I don't believe changing this will result in any multiplexing. The multiplexing problem was solved in 8.3 by the introduction of the chunking protocol between backends and the syslogger, and changing the output buffering of the syslogger should not cause a multiplexing problem, since it's a single process. Hum, we're still on 8.2 - last attempt to upgrade before xmas was unsuccessful; we had to roll back due to not fully understood performance issues. We think we nailed the root cause(s) those though and will make another better planned effort to upgrade before March is out. Oh well, maybe this all means we shouldn't try to get this running on 8.2 and just tackle the issue again after the upgrade ... Cheers, Frank -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Alvaro Herrera wrote: Alvaro Herrera escribió: So Simon's correct. And perhaps this explains why Jignesh is measuring an improvement on his benchmark. Perhaps an useful experiment would be to turn this behavior off and compare performance. This lack of measurement is probably the cause that the suggested patch to fix it was never applied. The patch is here http://archives.postgresql.org//pgsql-hackers/2004-11/msg00935.php One of the reasons why my patch helps is it keeps this check intact but allows other exclusive Wake up.. Now what PostgreSQL calls Wakes is in reality just makes a variable indicating wake up and not really signalling a process to wake up. This is a key point to note. So when the process wanting the exclusive fights the OS Scheduling policy to finally get time on the CPU then it check the value to see if it is allowed to wake up and potentially due the delay between when some other process marked that process Waked up and when the process check the value Waked up it is likely that the lock is free (or other exclusive process had the lock, did its work and releaed it ). Over it works well since it lives within the logical semantics of the locks but just uses various differences in OS scheduling and inherent delays in the system. It actually makes sense if the process is on CPU wanting exclusive while someone else is doing exclusive, let them try getting the lock rather than preventing it from trying. The Lock semantic will make sure that they don't issue exclusive locks to two process so there is no issue with it trying. It's late in Friday so I wont be able to explain it better but when load is heavy, getting on CPU is an achievement, let them try an exclusive lock while they are already there. Try it!! -Jignesh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with one query
Robert Haas robertmh...@gmail.com writes: On Fri, Mar 20, 2009 at 1:16 PM, Anne Rosset aros...@collab.net wrote: The db version is 8.2.4 Something is wrong here. How can setting enable_seqscan to off result in a plan with a far lower estimated cost than the original plan? Planner bug no doubt ... given how old the PG release is, I'm not particularly interested in probing into it now. If Anne can still reproduce this behavior on 8.2.something-recent, we should look closer. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with one query
On Fri, Mar 20, 2009 at 4:29 PM, Anne Rosset aros...@collab.net wrote: Alvaro Herrera wrote: Robert Haas escribió: Something is wrong here. How can setting enable_seqscan to off result in a plan with a far lower estimated cost than the original plan? If the planner thought the non-seq-scan plan is cheaper, it would have picked that one to begin with. GEQO? Anne, what's geqo_threshold set to? Hi Alvaro: It is turned off geqo | off | Enables genetic query optimization. Thanks, Anne Can you please send ALL your non-commented postgresql.conf lines? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On Fri, Mar 20, 2009 at 7:39 PM, Jignesh K. Shah j.k.s...@sun.com wrote: Alvaro Herrera wrote: So Simon's correct. And perhaps this explains why Jignesh is measuring an improvement on his benchmark. Perhaps an useful experiment would be to turn this behavior off and compare performance. This lack of measurement is probably the cause that the suggested patch to fix it was never applied. The patch is here http://archives.postgresql.org//pgsql-hackers/2004-11/msg00935.php One of the reasons why my patch helps is it keeps this check intact but allows other exclusive Wake up.. Now what PostgreSQL calls Wakes is in reality just makes a variable indicating wake up and not really signalling a process to wake up. This is a key point to note. So when the process wanting the exclusive fights the OS Scheduling policy to finally get time on the CPU then it check the value to see if it is allowed to wake up and potentially I'm confused. Is a process waiting for an LWLock is in a runnable state? I thought we went to sleep on a semaphore. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] iowait bug?
I just discovered this on a LinkedIn user group: http://bugzilla.kernel.org/show_bug.cgi?id=12309 Is anyone here seeing evidence of this in PostgreSQL?? -- M. Edward (Ed) Borasky http://www.linkedin.com/in/edborasky I've never met a happy clam. In fact, most of them were pretty steamed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance