On 3/13/09 9:42 AM, "Jignesh K. Shah" <j.k.s...@sun.com> wrote:
Now with a modified Fix (not the original one that I proposed but something that works like a heart valve : Opens and shuts to minimum default way thus controlling how many waiters are waked up ) Is this the server with 128 thread capability or 64 threads? Idle time is reduced but other locks are hit. With 200ms sleeps, no lock change: Peak throughput 102000/min @ 1000 users.avg response time is 23ms. Linear ramp up until 900 users @98000/min and 12ms response time. At 2000 users, response time is 229ms and throughput is 90000/min. With 200ms sleeps, lock modification 1 (wake all) Peak throughput at 1701112/min @2000 users and avg response time 63ms. Plateau starts at 1600 users and 160000/min throughput. As before, plateau starts when response time breaches 20ms, indicating contention. Lets call the above a 65% throughput improvement with large connection count. ----------------- Now, with 0ms delay, no threading change: Throughput is 136000/min @184 users, response time 13ms. Response time has not jumped too drastically yet, but linear performance increases stopped at about 130 users or so. ProcArrayLock busy, very busy. CPU: 35% user, 11% system, 54% idle With 0ms delay, and lock modification 2 (wake some, but not all) Throughput is 161000/min @328 users, response time 28ms. At 184 users as before the change, throughput is 147000/min with response time 0.12ms. Performance scales linearly to 144 users, then slows down and slightly increases after that with more concurrency. Throughput increase is between 15% and 25%. What I see in the above is twofold: This change improves throughput on this machine regardless of connection count. The change seems to help with more connection count and the wait - in fact, it seems to make connection count at this level not be much of a factor at all. The two changes tested are different, which clouds things a bit. I wonder what the first change would do in the second test case. In any event, the second detail above is facinating - it suggests that these locks are what is responsible for a significant chunk of the overhead of idle or mostly idle connections (making connection pools less useful, though they can never fix mid-transaction pauses which are very common). And in any event, on large multiprocessor systems like this postgres is lock limited regardless of using a connection pool or not.