Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Robert Haas wrote: On Fri, Mar 20, 2009 at 7:39 PM, Jignesh K. Shah wrote: Alvaro Herrera wrote: So Simon's correct. And perhaps this explains why Jignesh is measuring an improvement on his benchmark. Perhaps an useful experiment would be to turn this behavior off and compare performance. This lack of measurement is probably the cause that the suggested patch to fix it was never applied. The patch is here http://archives.postgresql.org//pgsql-hackers/2004-11/msg00935.php One of the reasons why my patch helps is it keeps this check intact but allows other exclusive Wake up.. Now what PostgreSQL calls "Wakes" is in reality just makes a variable indicating wake up and not really signalling a process to wake up. This is a key point to note. So when the process wanting the exclusive fights the OS Scheduling policy to finally get time on the CPU then it check the value to see if it is allowed to wake up and potentially I'm confused. Is a process waiting for an LWLock is in a runnable state? I thought we went to sleep on a semaphore. ...Robert If you check the code http://doxygen.postgresql.org/lwlock_8c-source.html#l00451 Semaphore lock can wake up but then it needs to confirm !proc->lwWaiting which can be TRUE if you have not been "Waked up" then it increase the extraWaits count and go back to PGSemaphoreLock .. What my patch gives the flexibility with sequential X wakeups that it can still exit and check for getting the exclusive lock and if not add back to the queue. My theory is when it is already on CPU running makes sense to check for the lock if another exclusive is running since the chances are that it has completed within few cycles is very high.. and the improvement that I see leads to that inference. Otherwise if lwWaiting is TRUE then it does not even check if the lock is available or not and just goes back and waits for the next chance.. This is the part that gets the benefit of my patch. -Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Alvaro Herrera wrote: Alvaro Herrera escribió: So Simon's correct. And perhaps this explains why Jignesh is measuring an improvement on his benchmark. Perhaps an useful experiment would be to turn this behavior off and compare performance. This lack of measurement is probably the cause that the suggested patch to fix it was never applied. The patch is here http://archives.postgresql.org//pgsql-hackers/2004-11/msg00935.php One of the reasons why my patch helps is it keeps this check intact but allows other exclusive Wake up.. Now what PostgreSQL calls "Wakes" is in reality just makes a variable indicating wake up and not really signalling a process to wake up. This is a key point to note. So when the process wanting the exclusive fights the OS Scheduling policy to finally get time on the CPU then it check the value to see if it is allowed to wake up and potentially due the delay between when some other process marked that process "Waked up" and when the process check the value "Waked up" it is likely that the lock is free (or other exclusive process had the lock, did its work and releaed it ). Over it works well since it lives within the logical semantics of the locks but just uses various differences in OS scheduling and inherent delays in the system. It actually makes sense if the process is on CPU wanting exclusive while someone else is doing exclusive, let them try getting the lock rather than preventing it from trying. The Lock semantic will make sure that they don't issue exclusive locks to two process so there is no issue with it trying. It's late in Friday so I wont be able to explain it better but when load is heavy, getting on CPU is an achievement, let them try an exclusive lock while they are already there. Try it!! -Jignesh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres benchmarking with pgbench
m...@bortal.de wrote: Hi Greg, thanks a lot for your hints. I changed my config and changed raid6 to raid10, but whatever i do, the benchmark breaks down at a scaling factor 75 where the database is "only" 1126MB big. Here are my benchmark Results (scaling factor, DB size in MB, TPS) using: pgbench -S -c X -t 1000 -U pgsql -d benchmark -h MYHOST 1 19 8600 5 79 8743 10 154 8774 20 303 8479 30 453 8775 40 602 8093 50 752 6334 75 1126 3881 150 2247 2297 200 2994 701 250 3742 656 300 4489 596 400 5984 552 500 7479 513 I have no idea if this is any good for a QuardCore Intel(R) Xeon(R) CPU E5320 @ 1.86GHz with 4GB Ram and 6 SATA disk (7200rpm) in raid 10. Here is my config (maybe with some odd setting): http://pastebin.com/m5d7f5717 I played around with: - max_connections - shared_buffers - work_mem - maintenance_work_mem - checkpoint_segments - effective_cache_size ..but whatever i do, the graph looks the same. Any hints or tips what my config should look like? Or are these results even okay? Maybe i am driving myself crazy for nothing? Are you running the pgbench client from a different system? Did you check if the pgbench client itself is bottlenecked or not. I have seen before the client of pgbench is severely limited on the load it can drive and process. -Jignesh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Robert Haas wrote: Actually the patch I submitted shows no overhead from what I have seen and I think it is useful depending on workloads where it can be turned on even on production. Well, unless I'm misunderstanding something, waking all waiters every time could lead to arbitrarily long delays for writers on mostly read-only workloads... and by arbitrarily along, we mean to say "potentially just about forever". That doesn't sound safe for production to me. Hi Robert, The patch I submmitted does not do any manipulation with the list. All it changes is gives the flexibility to change how many to wake up at one go. 0 is default which wakes up only 1 X (Exclusive) at a time or all sequential S (Shared). Changing the value to 1 will wake up all sequential X or all sequential S as they are in the queue (no manipulation). Values 2 and higher upto 32 wakes up the next n waiter in the queue (X or S) AS they are in the queue. It absolutely does no manipulation and hence there is no overhead. Absolutely safe for Production as Scott mentioned there are other things in postgresql.conf which can be more dangerous than this tunable. I dont think anything is majorly wrong in my system.. Sometimes it is PostgreSQL locks in play and sometimes it can be OS/system related locks in play (network, IO, file system, etc). Right now in my patch after I fix waiting procarray problem other PostgreSQL locks comes into play: CLogControlLock, WALInsertLock , etc. Right now out of the box we have no means of tweaking something in production if you do land in that problem. With the patch there is means of doing knob control to tweak the bottlenecks of Locks for the main workload for which it is put in production. I'll reiterate my previous objection: I think your approach is too simplistic. I think Tom said it the best: a lot of work has gone into making the locking mechanism lightweight and safe. I'm pretty doubtful that you're going to find a change that is still safe, but performs much better. The discussions by Heikki, Simon, and others about changing the way locks are used or inventing new kinds of locks seem much more promising to me. That is the beauty : The approach is simplistic but very effective. Lot of work has gone which is more incremental and this is another one of those incremental changes which allows minor tweaks which the workload may like very much and perform very well.. Performance tuning game is almost like harmonic frequency. I agree that other kinds of locks seem more promising. I had infact proposed one last year too: http://archives.postgresql.org//pgsql-hackers/2008-06/msg00291.php Seriously speaking a change will definitely cannot be done before 8.5 time frame while this one is simple enough to go for 8.4. The best thing one can contribute to the thread is to actually try the patch on the test system and run your own tests to see how it behaves. -Jignesh Right now.. the standard answer applies.. nope you are running the wrong workload for PostgreSQL, use a connection pooler or your own application logic. Or maybe.. you have too many users for PostgreSQL use some proprietary database. Well I certainly agree that we need to get away from that mentality, although there's nothing particularly evil about a connection pooler... it might not be suitable for every workload, but you haven't specified why one couldn't or shouldn't be used in the situation you're trying to simulate here. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/18/09 17:25, Robert Haas wrote: On Wed, Mar 18, 2009 at 1:43 PM, Scott Carey wrote: Its worth ruling out given that even if the likelihood is small, the fix is easy. However, I don¹t see the throughput drop from peak as more concurrency is added that is the hallmark of this problem < usually with a lot of context switching and a sudden increase in CPU use per transaction. The problem is that the proposed "fix" bears a strong resemblence to attempting to improve your gas mileage by removing a few non-critical parts from your card, like, say, the bumpers, muffler, turn signals, windshield wipers, and emergency brake. The fix I was referring to as easy was using a connection pooler -- as a reply to the previous post. Even if its a low likelihood that the connection pooler fixes this case, its worth looking at. Oh, OK. There seem to be some smart people saying that's a pretty high-likelihood fix. I thought you were talking about the proposed locking change. While it's true that the car might be drivable in that condition (as long as nothing unexpected happens), you're going to have a hard time convincing the manufacturer to offer that as an options package. The original poster's request is for a config parameter, for experimentation and testing by the brave. My own request was for that version of the lock to prevent possible starvation but improve performance by unlocking all shared at once, then doing all exclusives one at a time next, etc. That doesn't prevent starvation in general, although it will for some workloads. Anyway, it seems rather pointless to add a config parameter that isn't at all safe, and adds overhead to a critical part of the system for people who don't use it. After all, if you find that it helps, what are you going to do? Turn it on in production? I just don't see how this is any good other than as a thought-experiment. Actually the patch I submitted shows no overhead from what I have seen and I think it is useful depending on workloads where it can be turned on even on production. At any rate, as I understand it, even after Jignesh eliminated the waits, he wasn't able to push his CPU utilization above 48%. Surely something's not right there. And he also said that when he added a knob to control the behavior, he got a performance improvement even when the knob was set to 0, which corresponds to the behavior we have already anyway. So I'm very skeptical that there's something wrong with either the system or the test. Until that's understood and fixed, I don't think that looking at the numbers is worth much. I dont think anything is majorly wrong in my system.. Sometimes it is PostgreSQL locks in play and sometimes it can be OS/system related locks in play (network, IO, file system, etc). Right now in my patch after I fix waiting procarray problem other PostgreSQL locks comes into play: CLogControlLock, WALInsertLock , etc. Right now out of the box we have no means of tweaking something in production if you do land in that problem. With the patch there is means of doing knob control to tweak the bottlenecks of Locks for the main workload for which it is put in production. I still haven't seen any downsides with the patch yet other than highlighting other bottlenecks in the system. (For example I haven't seen a run where the tpm on my workload decreases as you increase the number) What I am suggesting is run the patch and see if you find a workload where you see a downside in performance and the lock statistics output to see if it is pushing the bottleneck elsewhere more likely WALInsertLock or CLogControlBlock. If yes then this patch gives you the right tweaking opportunity to reduce stress on ProcArrayLock for a workload while still not seriously stressing WALInsertLock or CLogControlBlock. Right now.. the standard answer applies.. nope you are running the wrong workload for PostgreSQL, use a connection pooler or your own application logic. Or maybe.. you have too many users for PostgreSQL use some proprietary database. -Jignesh I alluded to the three main ways of dealing with lock contention elsewhere. Avoiding locks, making finer grained locks, and making locks faster. All are worthy. Some are harder to do than others. Some have been heavily tuned already. Its a case by case basis. And regardless, the unfair lock is a good test tool. In view of the caveats above, I'll give that a firm maybe. ...Robert
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/18/09 17:16, Scott Carey wrote: On 3/18/09 4:36 AM, "Gregory Stark" wrote: "Jignesh K. Shah" writes: In next couple of weeks I plan to test the patch on a different x64 based system to do a sanity testing on lower number of cores and also try out other workloads ... I'm actually more interested in the large number of cores but fewer processes and lower max_connections. If you set max_connections to 64 and eliminate the wait time you should, in theory, be able to get 100% cpu usage. It would be very interesting to track down the contention which is preventing that. My previous calculation in this thread showed that even at 0 wait time, the client seems to introduce ~3ms wait time overhead on average. So it takes close to 128 threads in each test to stop the linear scaling since the average processing time seems to be about ~3ms. Either that, or the tests actually are running on a system capable of 128 threads. Nope 64 threads for sure .. Verified it number of times .. -Jignesh -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/18/09 08:06, Simon Riggs wrote: On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote: On Wed, 18 Mar 2009, Simon Riggs wrote: I agree with that, apart from the "granting no more" bit. The most useful behaviour is just to have two modes: * exclusive-lock held - all other x locks welcome, s locks queue * shared-lock held - all other s locks welcome, x locks queue The problem with making all other locks welcome is that there is a possibility of starvation. Imagine a case where there is a constant stream of shared locks - the exclusive locks may never actually get hold of the lock under the "all other shared locks welcome" strategy. That's exactly what happens now. Likewise with the reverse. I think it depends upon how frequently requests arrive. Commits cause X locks and we don't commit that often, so its very unlikely that we'd see a constant stream of X locks and prevent shared lockers. Some comments from an earlier post on this topic (about 20 months ago): Since shared locks are currently queued behind exclusive requests when they cannot be immediately satisfied, it might be worth reconsidering the way LWLockRelease works also. When we wake up the queue we only wake the Shared requests that are adjacent to the head of the queue. Instead we could wake *all* waiting Shared requestors. e.g. with a lock queue like this: (HEAD) S<-S<-X<-S<-X<-S<-X<-S Currently we would wake the 1st and 2nd waiters only. If we were to wake the 3rd, 5th and 7th waiters also, then the queue would reduce in length very quickly, if we assume generally uniform service times. (If the head of the queue is X, then we wake only that one process and I'm not proposing we change that). That would mean queue jumping right? Well thats what already happens in other circumstances, so there cannot be anything intrinsically wrong with allowing it, the only question is: would it help? I thought about that.. Except without putting a restriction a huge queue will cause lot of time spent in manipulating the lock list every time. One more thing will be to maintain two list shared and exclusive and round robin through them for every time you access the list so manipulation is low.. But the best thing is to allow flexibility to change the algorithm since some workloads may work fine with one and others will NOT. The flexibility then allows to tinker for those already reaching the limits. -Jignesh We need not wake the whole queue, there may be some generally more beneficial heuristic. The reason for considering this is not to speed up Shared requests but to reduce the queue length and thus the waiting time for the Xclusive requestors. Each time a Shared request is dequeued, we effectively re-enable queue jumping, so a Shared request arriving during that point will actually jump ahead of Shared requests that were unlucky enough to arrive while an Exclusive lock was held. Worse than that, the new incoming Shared requests exacerbate the starvation, so the more non-adjacent groups of Shared lock requests there are in the queue, the worse the starvation of the exclusive requestors becomes. We are effectively randomly starving some shared locks as well as exclusive locks in the current scheme, based upon the state of the lock when they make their request. The situation is worst when the lock is heavily contended and the workload has a 50/50 mix of shared/exclusive requests, e.g. serializable transactions or transactions with lots of subtransactions.
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Simon Riggs wrote: On Tue, 2009-03-17 at 17:41 -0400, Jignesh K. Shah wrote: I did a quick test with patch. Unfortunately it improves my number even with default setting 0 (not sure whether I should be pleased or sad - Definitely no overhead infact seems to help performance a bit. NOTE: Logic is same, implementation is slightly different for default set) OK, I bite. 25% gain from doing nothing??? You're stretching my... err, credulity. I like the train of thought for setting 1 and it is worth investigating, but something feels wrong somewhere. Actually I think I am hurting my credibility here since I cannot explain the improvement with the patch but still using default logic (thought different way I compare sequential using fields from the previous proc structure instead of comparing with constant boolean) But the change was necessary to allow it to handle multiple algorithms and yet be sleek and not bloated. In next couple of weeks I plan to test the patch on a different x64 based system to do a sanity testing on lower number of cores and also try out other workloads ... Regards, Jignesh - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/16/09 13:39, Simon Riggs wrote: On Wed, 2009-03-11 at 22:20 -0400, Jignesh K. Shah wrote: A tunable does not impact existing behavior Why not put the tunable parameter into the patch and then show the test results with it in? If there is no overhead, we should then be able to see that. I did a patch where I define lock_wakeup_algorithm with default value of 0, and range is 0 to 32 It basically handles three types of algorithms and 32 different permutations, such that: When lock_wakeup_algorithm is set to 0 => default logic of wakeup (only 1 exclusive or all sequential shared) 1 => wake up all sequential exclusives or all sequential shared 32>= n >=2 => wake up first n waiters irrespective of exclusive or sequential I did a quick test with patch. Unfortunately it improves my number even with default setting 0 (not sure whether I should be pleased or sad - Definitely no overhead infact seems to help performance a bit. NOTE: Logic is same, implementation is slightly different for default set) my Prepatch numbers typically peaked around 136,000 tpm With the patch and settings: lock_wakeup_algorithm=0 PEAK: 962: 512: Medium Throughput: 161121.000 Avg Medium Resp: 0.051 When lock_wakeup_algorithm=1 Then my PEAK increases to PEAK 1560: 832: Medium Throughput: 176577.000 Avg Medium Resp: 0.086 (Couldn't recreate the 184K+ result.. need to check that) I still havent tested for the rest 2-32 values but you get the point, the patch is quite flexible with various types of permutations and no overhead. Do give it a try on your own setup and play with values and compare it with your original builds. Regards, Jignesh *** lwlock.cTue Mar 17 12:27:49 2009 --- lwlock.c.orig Wed Mar 11 12:48:27 2009 *** *** 87,93 static intlock_addin_request = 0; static bool lock_addin_request_allowed = true; - int LWLockWakeupAlgorithm; #ifdef LWLOCK_STATS static intcounts_for_pid = 0; --- 87,92 *** *** 564,570 PGPROC *head; PGPROC *proc; int i; - int runq; PRINT_LWDEBUG("LWLockRelease", lockid, lock); --- 563,568 *** *** 612,631 * as many waiters as want shared access. */ proc = head; !if (LWLockWakeupAlgorithm || !proc->lwExclusive) !{ ! if (LWLockWakeupAlgorithm <= 1) ! { while (proc->lwWaitLink != NULL && ! (proc->lwExclusive == proc->lwWaitLink->lwExclusive)) proc = proc->lwWaitLink; - } -else - { - runq= LWLockWakeupAlgorithm; - while (proc->lwWaitLink != NULL && --runq) - proc = proc->lwWaitLink; - } } /* proc is now the last PGPROC to be released */ lock->head = proc->lwWaitLink; --- 610,620 * as many waiters as want shared access. */ proc = head; ! if (!proc->lwExclusive) ! { while (proc->lwWaitLink != NULL && ! !proc->lwWaitLink->lwExclusive) proc = proc->lwWaitLink; } /* proc is now the last PGPROC to be released */ lock->head = proc->lwWaitLink; *** lwlock.h.orig Tue Mar 17 14:27:10 2009 --- lwlock.hTue Mar 17 08:24:40 2009 *** *** 103,106 --- 103,107 extern void RequestAddinLWLocks(int n); + extern int LWLockWakeupAlgorithm; #endif /* LWLOCK_H */ *** guc.c.orig Tue Mar 17 07:30:26 2009 --- guc.c Tue Mar 17 07:47:10 2009 *** *** 57,62 --- 57,63 #include "postmaster/walwriter.h" #include "regex/regex.h" #include "storage/bufmgr.h" + #include "storage/lwlock.h" #include "storage/fd.h" #include "tcop/tcopprot.h" #include "tsearch/ts_cache.h" *** *** 167,172 --- 168,174 static bool assign_maxconnections(int newval, bool doit, GucSource source); static bool assign_autovacuum_max_workers(int newval, bool doit, GucSource source); static bool assign_effective_io_concurrency(int newval, bool doit, GucSource source); + stat
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Simon Riggs wrote: On Wed, 2009-03-11 at 22:20 -0400, Jignesh K. Shah wrote: A tunable does not impact existing behavior Why not put the tunable parameter into the patch and then show the test results with it in? If there is no overhead, we should then be able to see that. Can do? Though will need quick primer on adding tunables. Is it on wiki.postgresql.org anywhere? -Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/16/09 11:08, Gregory Stark wrote: "Jignesh K. Shah" writes: Generally when there is dead constant.. signs of classic bottleneck ;-) We will be fixing one to get to another.. but knocking bottlenecks is the name of the game I think Indeed. I think the bottleneck we're interested in addressing here is why you say you weren't able to saturate the 64 threads with 64 processes when they're all RAM-resident. From what I see you still have 400+ processes? Is that right? Any one claiming they run CPU intensive are not always telling the truth.. They *Think* they are running CPU intensive for the right part but there could be memory misses, they could be doing statistics where they are not really stressing the intended stuff to test, they could be parsing through the results where they are not stressing the backend while still claiming to be cpu intensive (though from a different perspective) So yes a single process specially a client cannot claim to keep the backend 100% active but so can neither a connection pooler since it still has to some other stuff within the process. -Jignesh
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
decibel wrote: On Mar 13, 2009, at 3:02 PM, Jignesh K. Shah wrote: vmstat seems similar to wakeup some kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs us sy id 63 0 0 45535728 38689856 0 14 0 0 0 0 0 0 0 0 0 163318 334225 360179 47 17 36 85 0 0 45436736 38690760 0 6 0 0 0 0 0 0 0 0 0 165536 347462 365987 47 17 36 59 0 0 45405184 38681752 0 11 0 0 0 0 0 0 0 0 0 155153 326182 345527 47 16 37 53 0 0 45393816 38673344 0 6 0 0 0 0 0 0 0 0 0 152752 317851 340737 47 16 37 66 0 0 45378312 38651920 0 11 0 0 0 0 0 0 0 0 0 150979 304350 336915 47 16 38 67 0 0 45489520 38639664 0 5 0 0 0 0 0 0 0 0 0 157188 318958 351905 47 16 37 82 0 0 45483600 38633344 0 10 0 0 0 0 0 0 0 0 0 168797 348619 375827 47 17 36 68 0 0 45463008 38614432 0 9 0 0 0 0 0 0 0 0 0 173020 376594 385370 47 18 35 54 0 0 45451376 38603792 0 13 0 0 0 0 0 0 0 0 0 161891 342522 364286 48 17 35 41 0 0 45356544 38605976 0 5 0 0 0 0 0 0 0 0 0 167250 358320 372469 47 17 36 27 0 0 45323472 38596952 0 11 0 0 0 0 0 0 0 0 0 165099 344695 364256 48 17 35 The good news is there's now at least enough runnable procs. What I find *extremely* odd is the CPU usage is almost dead constant... Generally when there is dead constant.. signs of classic bottleneck ;-) We will be fixing one to get to another.. but knocking bottlenecks is the name of the game I think -Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
decibel wrote: On Mar 11, 2009, at 10:48 PM, Jignesh K. Shah wrote: Fair enough.. Well I am now appealing to all who has a fairly decent sized hardware want to try it out and see whether there are "gains", "no-changes" or "regressions" based on your workload. Also it will help if you report number of cpus when you respond back to help collect feedback. Do you have a self-contained test case? I have several boxes with 16-cores worth of Xeon with 96GB I could try it on (though you might not care about having "only" 16 cores :P) I dont have authority over iGen, but I am pretty sure that with sysbench we should be able to recreate the test case or even dbt-2 That said the patch should be pretty easy to apply to your own workloads (where more feedback is more appreciated ).. On x64 16 cores might bring out the problem faster too since typically they are 2.5X higher clock frequency.. Try it out.. stock build vs patched builds. -Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Simon Riggs wrote: On Wed, 2009-03-11 at 16:53 -0400, Jignesh K. Shah wrote: 1200: 2000: Medium Throughput: -1781969.000 Avg Medium Resp: 0.019 I think you need to iron out bugs in your test script before we put too much stock into the results generated. Your throughput should not be negative. I'd be interested in knowing the number of S and X locks requested, so we can think about this from first principles. My understanding is that ratio of S:X is about 10:1. Do you have more exact numbers? Simon, that's a known bug for the test where the first time it reaches the max number of users, it throws a negative number. But all other numbers are pretty much accurate Generally the users:transactions count depends on think time.. -Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
XidGenLock ExclusiveAcquired 116 ProcArrayLock ExclusiveAcquired 117 CLogControlLock ExclusiveAcquired 176 WALInsertLock ExclusiveAcquired 370 FirstLockMgrLock SharedAcquired 793 FirstBufMappingLock SharedAcquired 799 ProcArrayLock SharedAcquired 882 SInvalReadLock SharedAcquired1827 Lock IdMode State Combined Time (ns) WALInsertLockAcquired52915 CLogControlLockAcquired78332 XidGenLockAcquired 103026 FirstLockMgrLockAcquired 392836 FirstBufMappingLockAcquired 2919896 CLogControlLock Shared Waiting 5342211 CLogControlLock Exclusive Waiting 9172692 ProcArrayLock Shared Waiting 18186546 ProcArrayLockAcquired 22478607 XidGenLock ExclusiveAcquired 26561444 SInvalReadLockAcquired 29012891 CLogControlLock ExclusiveAcquired 30490159 WALInsertLock Exclusive Waiting 35055294 FirstLockMgrLock ExclusiveAcquired 47077668 FirstBufMappingLock ExclusiveAcquired 47460381 WALInsertLock ExclusiveAcquired 99288648 ProcArrayLock Exclusive Waiting104221100 ProcArrayLock ExclusiveAcquired356644807 SInvalReadLock ExclusiveAcquired357530794 So clearly even waking up some more exclusives than just 1 seems to help scalability improve (though actual improvement mileage varies but there is some positive improvement). One more change that I can think of doing is a minor change where we wake all sequential shared waiters but only 1 exclusive waiter.. I am going to change that to ... whatever sequential you get wake them all up.. so in essense it does a similar heart valve type approach of doing little bursts rather than tie them to 1 exclusive only. -Jignesh Jignesh K. Shah wrote: Now with a modified Fix (not the original one that I proposed but something that works like a heart valve : Opens and shuts to minimum default way thus controlling how many waiters are waked up ) Time:Users:throughput: Reponse 60: 8: Medium Throughput: 7774.000 Avg Medium Resp: 0.004 120: 16: Medium Throughput: 16874.000 Avg Medium Resp: 0.004 180: 24: Medium Throughput: 25159.000 Avg Medium Resp: 0.004 240: 32: Medium Throughput: 33216.000 Avg Medium Resp: 0.005 300: 40: Medium Throughput: 42418.000 Avg Medium Resp: 0.005 360: 48: Medium Throughput: 49655.000 Avg Medium Resp: 0.005 420: 56: Medium Throughput: 58149.000 Avg Medium Resp: 0.005 480: 64: Medium Throughput: 66558.000 Avg Medium Resp: 0.005 540: 72: Medium Throughput: 74474.000 Avg Medium Resp: 0.005 600: 80: Medium Throughput: 82241.000 Avg Medium Resp: 0.005 660: 88: Medium Throughput: 90336.000 Avg Medium Resp: 0.005 720: 96: Medium Throughput: 99101.000 Avg Medium Resp: 0.006 780: 104: Medium Throughput: 106028.000 Avg Medium Resp: 0.006 840: 112: Medium Throughput: 113196.000 Avg Medium Resp: 0.006 900: 120: Medium Throughput: 119174.000 Avg Medium Resp: 0.006 960: 128: Medium Throughput: 129408.000 Avg Medium Resp: 0.006 1020: 136: Medium Throughput: 134433.000 Avg Medium Resp: 0.007 1080: 144: Medium Throughput: 143121.000 Avg Medium Resp: 0.007 1140: 152: Medium Throughput: 144603.000 Avg Medium Resp: 0.007 1200: 160: Medium Throughput: 148604.000 Avg Medium Resp: 0.008 1260: 168: Medium Throughput: 150274.000 Avg Medium Resp: 0.009 1320: 176: Medium Throughput: 150581.000 Avg Medium Resp: 0.010 1380: 184: Medium Throughput: 146912.000 Avg Medium Resp: 0.012 1440: 192: Medium Throughput: 143945.000 Avg Medium Resp: 0.013 1500: 200: Medium Throughput: 144029.000 Avg Medium Resp: 0.015 1560: 208: Medium Throughput: 143468.000 Avg Medium Resp: 0.016 1620: 216: Medium Throughput: 144367.000 Avg Medium Resp: 0.017 1680: 224: Medium Throughput: 148340.000 Avg Medium Resp: 0.017 1740: 232: Medium Throughput: 148842.000 Avg Medium Resp: 0.018 1800: 240: Medium Throughput: 149533.000 Avg Medium Resp: 0.019 1860: 248: Medium Throughput: 152334.000 Avg Medium Resp: 0.019 1920: 256: Medium Throughput: 151521.000 Avg Medium Resp: 0.020 1980: 264: Medium Throughput: 148961.000 Avg Medium Resp: 0.022 2040: 272: Medium Throughput: 151270.000 Avg Medium Resp: 0.022 2100: 280: Medium
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Somebody else asked a question: This is actually a two socket machine (128) threads but one socket is disabled by the OS so only 64-threads are available... The idea being let me choke one socket first with 100% CPU .. Forgot some data: with the second test above, CPU: 48% user, 18% sys, 35% idle. CPU increased from 46% used in the first test to 65% used, the corresponding throughput increase was not as large, but that is expected on an 8-threads per core server since memory bandwidth and cache resources at a minimum are shared and only trivial tasks can scale 100%. Based on the above, I would guess that attaining closer to 100% utilization (its hard to get past 90% with that many cores no matter what), will probablyl give another 10 to 15% improvement at most, to maybe 18/min throughput. Its also rather interesting that the 2000 connection case with wait times gets 17/min throughput and beats the 328 users with 0 delay result above. I suspect the ‘wake all’ version is just faster. I would love to see a ‘wake all shared, leave exclusives at front of queue’ version, since that would not allow lock starvation. Considering that there is one link list it is just easier to wake the sequential selected few or wake them all up.. If I go through the list trying to wake all the shared ones then I essentially need to have another link list to collect all the exclusives ... I will retry the thundering herd of waking all waiters irrespective of shared, exclusive and see how that behaves.. I think the biggest benefit is when the process is waked up and the process in reality is already on the cpu checking the field to see whether last guy who released the lock is allowing him to wake up or not. Still I will try some more experiments.. Definitely reducing time in "Waiting" lock waits benefits and making "Acquired" times more efficient results in more tpm per user. I will try another run with plain wake up all and see with the same parameters (0 think time) that test behaves.. -Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Scott Carey wrote: On 3/13/09 8:55 AM, "Kevin Grittner" wrote: >>> "Jignesh K. Shah" wrote: > usr sys wt idl sze > 38 11 0 50 64 The fact that you're maxing out at 50% CPU utilization has me wondering -- are there really 64 CPUs here, or are there 32 CPUs with "hyperthreading" technology (or something conceptually similar)? -Kevin Its a sun T1000 or T2000 type box, which are 4 threads per processor core IIRC. Its in his first post: “ UltraSPARC T2 based 1 socket (64 threads) and 2 socket (128 threads) servers that Sun sells. “ These processors use an in-order execution engine and fill the bubbles in the pipelines with SMT (the non-marketing name for hyperthreading). They are rather efficient at it though, moreso than Intel’s first stab at it. And Intel’s next generation chips hitting the streets in servers in less than a month, have it again. This are UltraSPARC T2 Plus which is 8 threads per core(ala CMT for us) .. Though the CPU% reported by vmstat is more based on "scheduled in execution" rather than what is executed by "computing engine" of the the core.. So unless you have scheduled in execution 100% on the thread, it wont be executing .. So if you want to read mpstat right, you may not be executing everything that is shown as executing but you are definitely NOT going to execute anything that is not shown as executing.. My goal is to reach a level where we can show PostgreSQL can effectively get to 100% CPU in say vmstat,mpstat first... -Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Now with a modified Fix (not the original one that I proposed but something that works like a heart valve : Opens and shuts to minimum default way thus controlling how many waiters are waked up ) Time:Users:throughput: Reponse 60: 8: Medium Throughput: 7774.000 Avg Medium Resp: 0.004 120: 16: Medium Throughput: 16874.000 Avg Medium Resp: 0.004 180: 24: Medium Throughput: 25159.000 Avg Medium Resp: 0.004 240: 32: Medium Throughput: 33216.000 Avg Medium Resp: 0.005 300: 40: Medium Throughput: 42418.000 Avg Medium Resp: 0.005 360: 48: Medium Throughput: 49655.000 Avg Medium Resp: 0.005 420: 56: Medium Throughput: 58149.000 Avg Medium Resp: 0.005 480: 64: Medium Throughput: 66558.000 Avg Medium Resp: 0.005 540: 72: Medium Throughput: 74474.000 Avg Medium Resp: 0.005 600: 80: Medium Throughput: 82241.000 Avg Medium Resp: 0.005 660: 88: Medium Throughput: 90336.000 Avg Medium Resp: 0.005 720: 96: Medium Throughput: 99101.000 Avg Medium Resp: 0.006 780: 104: Medium Throughput: 106028.000 Avg Medium Resp: 0.006 840: 112: Medium Throughput: 113196.000 Avg Medium Resp: 0.006 900: 120: Medium Throughput: 119174.000 Avg Medium Resp: 0.006 960: 128: Medium Throughput: 129408.000 Avg Medium Resp: 0.006 1020: 136: Medium Throughput: 134433.000 Avg Medium Resp: 0.007 1080: 144: Medium Throughput: 143121.000 Avg Medium Resp: 0.007 1140: 152: Medium Throughput: 144603.000 Avg Medium Resp: 0.007 1200: 160: Medium Throughput: 148604.000 Avg Medium Resp: 0.008 1260: 168: Medium Throughput: 150274.000 Avg Medium Resp: 0.009 1320: 176: Medium Throughput: 150581.000 Avg Medium Resp: 0.010 1380: 184: Medium Throughput: 146912.000 Avg Medium Resp: 0.012 1440: 192: Medium Throughput: 143945.000 Avg Medium Resp: 0.013 1500: 200: Medium Throughput: 144029.000 Avg Medium Resp: 0.015 1560: 208: Medium Throughput: 143468.000 Avg Medium Resp: 0.016 1620: 216: Medium Throughput: 144367.000 Avg Medium Resp: 0.017 1680: 224: Medium Throughput: 148340.000 Avg Medium Resp: 0.017 1740: 232: Medium Throughput: 148842.000 Avg Medium Resp: 0.018 1800: 240: Medium Throughput: 149533.000 Avg Medium Resp: 0.019 1860: 248: Medium Throughput: 152334.000 Avg Medium Resp: 0.019 1920: 256: Medium Throughput: 151521.000 Avg Medium Resp: 0.020 1980: 264: Medium Throughput: 148961.000 Avg Medium Resp: 0.022 2040: 272: Medium Throughput: 151270.000 Avg Medium Resp: 0.022 2100: 280: Medium Throughput: 149783.000 Avg Medium Resp: 0.024 2160: 288: Medium Throughput: 151743.000 Avg Medium Resp: 0.024 2220: 296: Medium Throughput: 155190.000 Avg Medium Resp: 0.026 2280: 304: Medium Throughput: 150955.000 Avg Medium Resp: 0.027 2340: 312: Medium Throughput: 147118.000 Avg Medium Resp: 0.029 2400: 320: Medium Throughput: 152768.000 Avg Medium Resp: 0.029 2460: 328: Medium Throughput: 161044.000 Avg Medium Resp: 0.028 2520: 336: Medium Throughput: 157926.000 Avg Medium Resp: 0.029 2580: 344: Medium Throughput: 161005.000 Avg Medium Resp: 0.029 2640: 352: Medium Throughput: 167274.000 Avg Medium Resp: 0.029 2700: 360: Medium Throughput: 168253.000 Avg Medium Resp: 0.031 With final vmstats improving but still far from 100% kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs us sy id 38 0 0 46052840 39345096 0 11 0 0 0 0 0 0 0 0 0 134137 290703 303518 40 14 45 43 0 0 45656456 38882912 23 77 0 0 0 0 0 0 0 0 0 135820 272899 300749 40 15 45 38 0 0 45650488 38816984 23 80 0 0 0 0 0 0 0 0 0 135009 272767 300192 39 15 46 47 0 0 46020792 39187688 0 5 0 0 0 0 0 0 0 0 0 140473 285445 312826 40 14 46 24 0 0 46143984 39326848 9 61 0 0 0 0 0 0 0 0 0 146194 308590 328241 40 15 45 37 0 0 45465256 38757000 22 74 0 0 0 0 0 0 0 0 0 136835 293971 301433 38 14 48 35 0 0 46017544 39308072 12 61 0 0 0 0 0 0 0 0 0 142749 312355 320592 42 15 43 36 0 0 45456000 38744688 11 24 0 0 0 0 0 0 0 0 0 143566 303461 317683 41 15 43 23 0 0 46007408 39291312 2 22 0 0 0 0 0 0 0 0 0 140246 300061 316663 42 15 43 20 0 0 46029656 39281704 10 25 0 0 0 0 0 0 0 0 0 147787 291825 326387 43 15 42 24 0 0 46131016 39288528 2 21 0 0 0 0 0 0 0 0 0 150796 310697 335791 43 15 42 20 0 0 46109448 39269392 16 67 0 0 0 0 0 0 0 0 0 150075 315517 332881 43 16 41 30 0 0 45540928 38710376 9 27 0 0 0 0 0 0 0 0 0 155214 316448 341472 43 16 40 14 0 0 45987496 39270016 0 5 0 0 0 0 0 0 0 0 0 155028 333711 344207 44 16 40 25 0 0 45981136 39263008 0 10 0 0 0 0 0 0 0 0 0 153968 327343 343776 45 16 39 54 0 0 46062984 39259936 0 7 0 0 0 0 0 0 0 0 0 153721 315839 344732 45 16 39 42 0 0 46099704 39252920 0 15 0 0 0 0 0 0 0 0 0 154629 323125 348798 45 16 39 54 0 0 46068944 39230808 0 8 0 0 0 0 0 0 0 0 0 157166 340265 354135 46 17 37 But the real winner shows up in lockstat where it seems to indicate that stress on Waiting from ProcArrayLock is relieved (thought shifting somewhere else which is how lock works): #
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
In general, I suggest that it is useful to run tests with a few different types of pacing. Zero delay pacing will not have realistic number of connections, but will expose bottlenecks that are universal, and less controversial. Small latency (100ms to 1s) tests are easy to make from the zero delay ones, and help expose problems with connection count or other forms of ‘non-active’ concurrency. End-user realistic delays are app specific, and useful with larger holistic load tests (say, through the application interface). Generally, running them in this order helps because at each stage you are adding complexity. Based on your explanations, you’ve probably done much of this so far and your approach sounds solid to me. If the first case fails (zero delay, smaller user count), there is no way the others will pass. I think I have done that before so I can do that again by running the users at 0 think time which will represent a "Connection pool" which is highly utilized" and test how big the connection pool can be before the throughput tanks.. This can be useful for App Servers which sets up connections pools of their own talking with PostgreSQL. -Jignesh So I backed out my change and used the stock 8.4 snapshot that I had downloaded.. With now 0 think time I do runs with lot less users.. still I cannot get it to go to 100% CPU 60: 8: Medium Throughput: 7761.000 Avg Medium Resp: 0.004 120: 16: Medium Throughput: 16876.000 Avg Medium Resp: 0.004 180: 24: Medium Throughput: 25359.000 Avg Medium Resp: 0.004 240: 32: Medium Throughput: 33104.000 Avg Medium Resp: 0.005 300: 40: Medium Throughput: 42200.000 Avg Medium Resp: 0.005 360: 48: Medium Throughput: 49996.000 Avg Medium Resp: 0.005 420: 56: Medium Throughput: 58260.000 Avg Medium Resp: 0.005 480: 64: Medium Throughput: 66289.000 Avg Medium Resp: 0.005 540: 72: Medium Throughput: 74667.000 Avg Medium Resp: 0.005 600: 80: Medium Throughput: 82632.000 Avg Medium Resp: 0.005 660: 88: Medium Throughput: 90211.000 Avg Medium Resp: 0.006 720: 96: Medium Throughput: 98236.000 Avg Medium Resp: 0.006 780: 104: Medium Throughput: 105517.000 Avg Medium Resp: 0.006 840: 112: Medium Throughput: 112921.000 Avg Medium Resp: 0.006 900: 120: Medium Throughput: 118256.000 Avg Medium Resp: 0.007 960: 128: Medium Throughput: 126499.000 Avg Medium Resp: 0.007 1020: 136: Medium Throughput: 133354.000 Avg Medium Resp: 0.007 1080: 144: Medium Throughput: 135826.000 Avg Medium Resp: 0.008 1140: 152: Medium Throughput: 121729.000 Avg Medium Resp: 0.012 1200: 160: Medium Throughput: 130487.000 Avg Medium Resp: 0.011 1260: 168: Medium Throughput: 123368.000 Avg Medium Resp: 0.013 1320: 176: Medium Throughput: 134649.000 Avg Medium Resp: 0.012 1380: 184: Medium Throughput: 136272.000 Avg Medium Resp: 0.013 Vmstat shows that CPUS are hardly busy in the 64-cpu system (CPUS are reported busy when there is active process assigned to the cpu) -bash-3.2$ vmstat 30 kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs us sy id 19 0 0 52691088 46220848 27 302 10 68 68 0 3 1 -0 -0 -0 13411 20762 26854 5 3 92 0 0 0 45095664 39898296 0 455 0 0 0 0 0 0 0 0 0 698 674 295 0 0 100 0 0 0 45040640 39867056 5 13 0 0 0 0 0 0 0 0 0 3925 4189 5721 0 0 99 0 0 0 45038856 39864016 0 5 0 0 0 0 0 0 0 0 0 9479 8643 15205 1 1 98 0 0 0 45037760 39862552 0 14 0 0 0 0 0 0 0 0 0 12088 9041 19890 2 1 98 0 0 0 45035960 39860080 0 6 0 0 0 0 0 0 0 0 0 16590 11611 28351 2 1 97 0 0 0 45034648 39858416 0 17 0 0 0 0 0 0 0 0 0 19192 13027 33218 3 1 96 0 0 0 45032360 39855464 0 10 0 0 0 0 0 0 0 0 0 22795 16467 40392 4 1 95 0 0 0 45030840 39853568 0 22 0 0 0 0 0 0 0 0 0 25349 18315 45178 4 1 94 0 0 0 45027456 39849648 0 10 0 0 0 0 0 0 0 0 0 28158 22500 50804 5 2 93 0 0 0 45000752 39832608 0 38 0 0 0 0 0 0 0 0 0 31332 25744 56751 6 2 92 0 0 0 45010120 39836728 0 6 0 0 0 0 0 0 0 0 0 36636 29334 66505 7 2 91 0 0 0 45017072 39838504 0 29 0 0 0 0 0 0 0 0 0 38553 32313 70915 7 2 91 0 0 0 45011384 39833768 0 11 0 0 0 0 0 0 0 0 0 41186 35949 76275 8 3 90 0 0 0 44890552 39826136 0 40 0 0 0 0 0 0 0 0 0 45123 44507 83665 9 3 88 0 0 0 44882808 39822048 0 6 0 0 0 0 0 0 0 0 0 49342 53431 91783 10 3 87 0 0 0 45003328 39825336 0 42 0 0 0 0 0 0 0 0 0 48516 42515 91135 10 3 87 0 0 0 44999688 39821008 0 6 0 0 0 0 0 0 0 0 0 54695 48741 102526 11 3 85 kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s2 sd in sy cs us sy id 0 0 0 44980744 39806400 0 55 0 0 0 0 0 0 0 0 0 54968 51946 103245 12 4 84 0 0 0 44992288 39812256 0 6 0 1 1 0 0 0 0 0 0 60506 58205 113911 13 4 83 0 0 0 44875648 39802128 1 60 0 0 0 0 0 1 0 0 0 60485 66576 114081 13 4 83 0 0 0 44848792 39795008 0 8 0 0 0 0
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Gregory Stark wrote: "Jignesh K. Shah" writes: Scott Carey wrote: On 3/12/09 11:37 AM, "Jignesh K. Shah" wrote: In general, I suggest that it is useful to run tests with a few different types of pacing. Zero delay pacing will not have realistic number of connections, but will expose bottlenecks that are universal, and less controversial I think I have done that before so I can do that again by running the users at 0 think time which will represent a "Connection pool" which is highly utilized" and test how big the connection pool can be before the throughput tanks.. This can be useful for App Servers which sets up connections pools of their own talking with PostgreSQL. Keep in mind when you do this that it's not interesting to test a number of connections much larger than the number of processors you have. Once the system reaches 100% cpu usage it would be a misconfigured connection pooler that kept more than that number of connections open. Greg, Unfortuately the problem is that.. I am trying to reach 100% CPU which I cannot and hence I am increasing the user count :-) -Jignesh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
8.4 Performance improvements: was Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Greg Smith wrote: On Thu, 12 Mar 2009, Jignesh K. Shah wrote: As soon as I get more "cycles" I will try variations of it but it would help if others can try it out in their own environments to see if it helps their instances. What you should do next is see whether you can remove the bottleneck your test is running into via using a connection pooler. That's what I think most informed people would do were you to ask how to setup an optimal environment using PostgreSQL that aimed to serve thousands of clients. If that makes your bottleneck go away, that's what you should be recommending to customers who want to scale in this fashion too. If the bottleneck moves to somewhere else, that new hot spot might be one people care more about. Given that there are multiple good pooling solutions floating around already, it's hard to justify dumping coding and testing resources here if that makes the problem move somewhere else. It's great that you've identified an alternate scheduling approach that helps on your problematic test case, but you're a long ways from having a full model of how changes to the locking model impact other database workloads. As for the idea of doing something in this area for 8.4, there are a significant number of performance-related changes already committed for that version that deserve more focused testing during beta. You're way too late to throw another one into that already crowded area. On the other hand I have taken up a task of showing 8.4 Performance improvements over 8.3. Can we do a vote on which specific performance features we want to test? I can use dbt2, dbt3 tests to see how 8.4 performs and compare it with 8.3? Also if you have your own favorite test to test it out let me know.. I have allocated some time for this task so it is feasible for me to do this. Many of the improvements may not be visible through this standard tests so feedback on testing methology for those is also appreciated. * Visibility map - Reduce Vacuum overhead - (I think I can time vacuum with some usage on both databases) * Prefetch IO with posix_fadvice () - Though I am not sure if it is supported on UNIX or not (but can be tested by standard tests) * Parallel pg_restore (Can be tested with a big database dump) Any more features that I can stress during the testing phase? Regards, Jignesh -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Scott Carey wrote: On 3/12/09 11:37 AM, "Jignesh K. Shah" wrote: And again this is the third time I am saying.. the test users also have some latency build up in them which is what generally is exploited to get more users than number of CPUS on the system but that's the point we want to exploit.. Otherwise if all new users begin to do their job with no latency then we would need 6+ billion cpus to handle all possible users. Typically as an administrator (System and database) I can only tweak/control latencies within my domain, that is network, disk, cpu's etc and those are what I am tweaking and coming to a *Configured* environment and now trying to improve lock contentions/waits in PostgreSQL so that we have an optimized setup. In general, I suggest that it is useful to run tests with a few different types of pacing. Zero delay pacing will not have realistic number of connections, but will expose bottlenecks that are universal, and less controversial. Small latency (100ms to 1s) tests are easy to make from the zero delay ones, and help expose problems with connection count or other forms of ‘non-active’ concurrency. End-user realistic delays are app specific, and useful with larger holistic load tests (say, through the application interface). Generally, running them in this order helps because at each stage you are adding complexity. Based on your explanations, you’ve probably done much of this so far and your approach sounds solid to me. If the first case fails (zero delay, smaller user count), there is no way the others will pass. I think I have done that before so I can do that again by running the users at 0 think time which will represent a "Connection pool" which is highly utilized" and test how big the connection pool can be before the throughput tanks.. This can be useful for App Servers which sets up connections pools of their own talking with PostgreSQL. -Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/12/09 15:10, Alvaro Herrera wrote: Tom Lane wrote: Scott Carey writes: They are not meaningless. It is certainly more to understand, but the test is entirely valid without that. In a CPU bound / RAM bound case, as concurrency increases you look for the throughput trend, the %CPU use trend and the context switch rate trend. More information would be useful but the test is validated by the evidence that it is held up by lock contention. Er ... *what* evidence? There might be evidence somewhere that proves that, but Jignesh hasn't shown it. The available data suggests that the first-order performance limiter in this test is something else. Otherwise it should be possible to max out the performance with a lot less than 1000 active backends. With 200ms of think times as Jignesh just said, 1000 users does not equate 1000 active backends. (It's probably closer to 100 backends, given an avg. response time of ~20ms) Something that might be useful for him to report is the avg number of active backends for each data point ... short of doing select * from pg_stat_activity and removing the IDLE entries, any other clean way to get that information. If there is no other latency then active backends should be active users * 10ms/200ms or activeusers/20 on average. However the number is still lower than that since active user can still be waiting for locks which can be either on CPU (spin) or sleeping (proven by increase in average response time of execution which includes the wait). Also till date I am primarily more interested in active backends which are waiting for acquiring the locks since I find making that more efficient gives me the biggest return on my buck.. Lower response time and higher throughput. -Jignesh
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/12/09 13:48, Scott Carey wrote: On 3/11/09 7:47 PM, "Tom Lane" wrote: All I'm adding, is that it makes some sense to me based on my experience in CPU / RAM bound scalability tuning. It was expressed that the test itself didn't even make sense. I was wrong in my understanding of what the change did. If it wakes ALL waiters up there is an indeterminate amount of time a lock will wait. However, if instead of waking up all of them, if it only wakes up the shared readers and leaves all the exclusive ones at the front of the queue, there is no possibility of starvation since those exclusives will be at the front of the line after the wake-up batch. As for this being a use case that is important: * SSDs will drive the % of use cases that are not I/O bound up significantly over the next couple years. All postgres installations with less than about 100GB of data TODAY could avoid being I/O bound with current SSD technology, and those less than 2TB can do so as well but at high expense or with less proven technology like the ZFS L2ARC flash cache. * Intel will have a mainstream CPU that handles 12 threads (6 cores, 2 threads each) at the end of this year. Mainstream two CPU systems will have access to 24 threads and be common in 2010. Higher end 4CPU boxes will have access to 48 CPU threads. Hardware thread count is only going up. This is the future. SSDs are precisely my motivation of doing RAM based tests with PostgreSQL. While I am waiting for my SSDs to arrive, I started to emulate SSDs by putting the whole database on RAM which in sense are better than SSDs so if we can tune with RAM disks then SSDs will be covered. What we have is a pool of 2000 users and we start making each user do series of transactions on different rows and see how much the database can handle linearly before some bottleneck (system or database) kicks in and there can be no more linear increase in active users. Many times there is drop after reaching some value of active users. If all 2000 users can scale linearly then another test with say 2500 can be executed .. All to do is what's the limit we can go till typically there are no system resources still remaining to be exploited. That said the testkit that I am using is a lightweight OLTP typish workload which a user runs against a preknown schema and between various transactions that it does it emulates a wait time of 200ms. That said it is some sense emulating a real user who clicks and then waits to see what he got and does another click which results in another transaction happening. (Not exactly but you get the point). Like all workloads it is generally used to find bottlenecks in systems before putting production stuff on it. That said my current environment I am having similar workloads and seeing how many users can go to the point where system has no more CPU resources available to do a linear growth in tpm. Generally as many of you mentioned you will see disk latency, network latency, cpu resource problems, etc.. And thats the work I am doing right now.. I am working around network latency by doing a private network, improving Operating systems tunables to improve efficiency out there.. I am improving disk latency by putting them on /RAM (and soon on SSDs).. However if I still cannot consume all CPU then it means I am probably hit by locks . Using PostgreSQL DTrace probes I can see what's happening.. At low user (100 users) counts my lock profiles from a user point of view are as follows: # dtrace -q -s 84_lwlock.d 1764 Lock IdMode State Count ProcArrayLock Shared Waiting 1 CLogControlLock SharedAcquired 2 ProcArrayLock Exclusive Waiting 3 ProcArrayLock ExclusiveAcquired 24 XidGenLock ExclusiveAcquired 24 FirstLockMgrLock SharedAcquired 25 CLogControlLock ExclusiveAcquired 26 FirstBufMappingLock SharedAcquired 55 WALInsertLock ExclusiveAcquired 75 ProcArrayLock SharedAcquired 178 SInvalReadLock SharedAcquired 378 Lock IdMode State Combined Time (ns) SInvalReadLockAcquired29849 ProcArrayLock Shared Waiting92261 ProcArrayLockAcquired 951470 FirstLockMgrLock ExclusiveAcquired 1069064 CLogControlLock ExclusiveAcquired 1295551 ProcArrayLock Exclusive Waiting 1758033 FirstBufMappingLock ExclusiveAcquired
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/12/09 11:13, Kevin Grittner wrote: Scott Carey wrote: "Kevin Grittner" wrote: I'm a lot more interested in what's happening between 60 and 180 than over 1000, personally. If there was a RAID involved, I'd put it down to better use of the numerous spindles, but when it's all in RAM it makes no sense. If there is enough lock contention and a common lock case is a short lived shared lock, it makes perfect sense sense. Fewer readers are blocked waiting on writers at any given time. Readers can 'cut' in line ahead of writers within a certain scope (only up to the number waiting at the time a shared lock is at the head of the queue). Essentially this clumps up shared and exclusive locks into larger streaks, and allows for higher shared lock throughput. You misunderstood me. I wasn't addressing the affects of his change, but rather the fact that his test shows a linear improvement in TPS up to 1000 connections for a 64 thread machine which is dealing entirely with RAM -- no disk access. Where's the bottleneck that allows this to happen? Without understanding that, his results are meaningless. -Kevin Every user has a think time (200ms) to wait before doing the next transaction which results in idle time and theoretically allows other users to run in between .. -Jignesh
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/11/09 22:01, Scott Carey wrote: On 3/11/09 3:27 PM, "Kevin Grittner" wrote: I'm a lot more interested in what's happening between 60 and 180 than over 1000, personally. If there was a RAID involved, I'd put it down to better use of the numerous spindles, but when it's all in RAM it makes no sense. If there is enough lock contention and a common lock case is a short lived shared lock, it makes perfect sense sense. Fewer readers are blocked waiting on writers at any given time. Readers can 'cut' in line ahead of writers within a certain scope (only up to the number waiting at the time a shared lock is at the head of the queue). Essentially this clumps up shared and exclusive locks into larger streaks, and allows for higher shared lock throughput. Exclusive locks may be delayed, but will NOT be starved, since on the next iteration, a streak of exclusive locks will occur first in the list and they will all process before any more shared locks can go. This will even help in on a single CPU system if it is read dominated, lowering read latency and slightly increasing write latency. If you want to make this more fair, instead of freeing all shared locks, limit the count to some number, such as the number of CPU cores. Perhaps rather than wake-up-all-waiters=true, the parameter can be an integer representing how many shared locks can be freed at once if an exclusive lock is encountered. Well I am waking up not just shared but shared and exclusives.. However i like your idea of waking up the next N waiters where N matches the number of cpus available. In my case it is 64 so yes this works well since the idea being of all the 64 waiters running right now one will be able to lock the next lock immediately and hence there are no cycles wasted where nobody gets a lock which is often the case when you say wake up only 1 waiter and hope that the process is on the CPU (which in my case it is 64 processes) and it is able to acquire the lock.. The probability of acquiring the lock within the next few cycles is much less for only 1 waiter than giving chance to 64 such processes and then let them fight based on who is already on CPU and acquire the lock. That way the period where nobody has a lock is reduced and that helps to cut out "artifact" idle time on the system. As soon as I get more "cycles" I will try variations of it but it would help if others can try it out in their own environments to see if it helps their instances. -Jignesh
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Tom Lane wrote: Scott Carey writes: If there is enough lock contention and a common lock case is a short lived shared lock, it makes perfect sense sense. Fewer readers are blocked waiting on writers at any given time. Readers can 'cut' in line ahead of writers within a certain scope (only up to the number waiting at the time a shared lock is at the head of the queue). Essentially this clumps up shared and exclusive locks into larger streaks, and allows for higher shared lock throughput. Exclusive locks may be delayed, but will NOT be starved, since on the next iteration, a streak of exclusive locks will occur first in the list and they will all process before any more shared locks can go. That's a lot of sunny assertions without any shred of evidence behind them... The current LWLock behavior was arrived at over multiple iterations and is not lightly to be toyed with IMHO. Especially not on the basis of one benchmark that does not reflect mainstream environments. Note that I'm not saying "no". I'm saying that I want a lot more evidence *before* we go to the trouble of making this configurable and asking users to test it. regards, tom lane Fair enough.. Well I am now appealing to all who has a fairly decent sized hardware want to try it out and see whether there are "gains", "no-changes" or "regressions" based on your workload. Also it will help if you report number of cpus when you respond back to help collect feedback. Regards, Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Tom Lane wrote: "Kevin Grittner" writes: I'm wondering about the testing methodology. Me too. This test case seems much too far away from real world use to justify diddling low-level locking behavior; especially a change that is obviously likely to have very negative effects in other scenarios. In particular, I think it would lead to complete starvation of would-be exclusive lockers in the face of competition from a steady stream of shared lockers. AFAIR the existing behavior was designed to reduce the odds of that, not for any other purpose. regards, tom lane Hi Tom, The test case is not that far fetched from real world.. Plus if you read my proposal I clearly mention a tunable for it so that we can set and hence obviously not impact 99% of the people who don't care about it but still allow the flexibility of the 1% of the people who do care about scalability when they go on bigger system. The fact that it is a tunable (and obviously not the default way) there is no impact to existing behavior. My test case clearly shows that Exclusive lockers ARE benefited from it otherwise I would have not seen the huge impact on throughput. A tunable does not impact existing behavior but adds flexibility for those using PostgreSQL on high end systems. Plus doing it the tunable way on PostgreSQL 8.4 will convince many people that I know to quickly adopt PostgreSQL 8.4 just because of the benefit it brings on systems with many cpus/cores/threads. All I am requesting is for the beta to have that tunable. Its not hard, people can then quickly try default (off) or on or as Scott Carey mentioned a more flexible of default, all or a fixed integer number (for people to experiment). Regards, Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah The New Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
On 03/11/09 18:27, Kevin Grittner wrote: "Jignesh K. Shah" wrote: Rerunning similar tests on a 64-thread UltraSPARC T2plus based server config (IO is not a problem... all in RAM .. no disks): Time:Users:Type:TPM: Response Time 60: 100: Medium Throughput: 10552.000 Avg Medium Resp: 0.006 120: 200: Medium Throughput: 22897.000 Avg Medium Resp: 0.006 180: 300: Medium Throughput: 33099.000 Avg Medium Resp: 0.009 240: 400: Medium Throughput: 44692.000 Avg Medium Resp: 0.007 300: 500: Medium Throughput: 56455.000 Avg Medium Resp: 0.007 360: 600: Medium Throughput: 67220.000 Avg Medium Resp: 0.008 420: 700: Medium Throughput: 77592.000 Avg Medium Resp: 0.009 480: 800: Medium Throughput: 87277.000 Avg Medium Resp: 0.011 540: 900: Medium Throughput: 98029.000 Avg Medium Resp: 0.012 600: 1000: Medium Throughput: 102547.000 Avg Medium Resp: 0.023 I'm wondering about the testing methodology. If there is no I/O, I wouldn't expect performance to improve after you have all the CPU threads busy. (OK, so there might be some brief blocking that would make the optimal number of connections somewhat above 64, but 1000???) What's the bottleneck which allows additional connections to improve the throughput? Network latency? I'm a lot more interested in what's happening between 60 and 180 than over 1000, personally. If there was a RAID involved, I'd put it down to better use of the numerous spindles, but when it's all in RAM it makes no sense. -Kevin Kevin, The problem is the CPUs are not all busy there is plenty of idle cycles since PostgreSQL ends up in situations where they are all waiting for lockacquires for exclusive.. In cases where there is say one cpu then waking up one or few waiters is more efficient.. However when you have 64 or 128 or 256 (as in my case), waking up one waiter is inefficient since only one waiter will be allowed to run while other waiters will still wake up, spin acquire lock and say.. oh I am still not allowed and go back to speed.. Testing methology is considering we can get fast storage, can PostgreSQL still scale to use say 32, 64, 128, 256 cpus... I am just ahead of the curve of wide spread usage here probably but I want to make sure PostgreSQL is well tested already for it. And yes I still have plenty of unused CPU so the goal is to make sure if system can handle it, so can PostgreSQL. Regards, Jignesh
[PERFORM] Proposal of tunable fix for scalability of 8.4
Hello All, As you know that one of the thing that constantly that I have been using benchmark kits to see how we can scale PostgreSQL on the UltraSPARC T2 based 1 socket (64 threads) and 2 socket (128 threads) servers that Sun sells. During last PgCon 2008 http://www.pgcon.org/2008/schedule/events/72.en.html you might remember that I mentioned that ProcArrayLock is pretty hot when you have many users. Rerunning similar tests on a 64-thread UltraSPARC T2plus based server config, I found that even with 8.4snap that I took I was still having similar problems (IO is not a problem... all in RAM .. no disks): Time:Users:Type:TPM: Response Time 60: 100: Medium Throughput: 10552.000 Avg Medium Resp: 0.006 120: 200: Medium Throughput: 22897.000 Avg Medium Resp: 0.006 180: 300: Medium Throughput: 33099.000 Avg Medium Resp: 0.009 240: 400: Medium Throughput: 44692.000 Avg Medium Resp: 0.007 300: 500: Medium Throughput: 56455.000 Avg Medium Resp: 0.007 360: 600: Medium Throughput: 67220.000 Avg Medium Resp: 0.008 420: 700: Medium Throughput: 77592.000 Avg Medium Resp: 0.009 480: 800: Medium Throughput: 87277.000 Avg Medium Resp: 0.011 540: 900: Medium Throughput: 98029.000 Avg Medium Resp: 0.012 600: 1000: Medium Throughput: 102547.000 Avg Medium Resp: 0.023 660: 1100: Medium Throughput: 100503.000 Avg Medium Resp: 0.044 720: 1200: Medium Throughput: 99506.000 Avg Medium Resp: 0.065 780: 1300: Medium Throughput: 95474.000 Avg Medium Resp: 0.089 840: 1400: Medium Throughput: 86254.000 Avg Medium Resp: 0.130 900: 1500: Medium Throughput: 91947.000 Avg Medium Resp: 0.139 960: 1600: Medium Throughput: 94838.000 Avg Medium Resp: 0.147 1020: 1700: Medium Throughput: 92446.000 Avg Medium Resp: 0.173 1080: 1800: Medium Throughput: 91032.000 Avg Medium Resp: 0.194 1140: 1900: Medium Throughput: 88236.000 Avg Medium Resp: 0.221 runDynamic: uCount = 2000delta = 1900 runDynamic: ALL Threads Have Been created 1200: 2000: Medium Throughput: -1352555.000 Avg Medium Resp: 0.071 1260: 2000: Medium Throughput: 88872.000 Avg Medium Resp: 0.238 1320: 2000: Medium Throughput: 88484.000 Avg Medium Resp: 0.248 1380: 2000: Medium Throughput: 90777.000 Avg Medium Resp: 0.231 1440: 2000: Medium Throughput: 90769.000 Avg Medium Resp: 0.229 You will notice that throughput drops around 1000 users.. Nothing new you have already heard me mention that zillion times.. Now while working on this today I was going through LWLockRelease like I have probably done quite a few times before to see what can be done.. The quick synopsis is that LWLockRelease releases the lock and wakes up the next waiter to take over and if the next waiter is waiting for exclusive then it only wakes that waiter up and if next waiter is waiting on shared then it goes through all shared waiters following and wakes them all up. Earlier last year I had tried various ways of doing intelligent waking up (finding all shared together and waking them up, coming up with a different lock type and waking multiple of them up simultaneously but ended up defining a new lock mode and of course none of them were stellar enough to make an impack.. Today I tried something else.. Forget the distinction of exclusive and shared and just wake them all up so I changed the code from /* * Remove the to-be-awakened PGPROCs from the queue. If the front * waiter wants exclusive lock, awaken him only. Otherwise awaken * as many waiters as want shared access. */ proc = head; if (!proc->lwExclusive) { while (proc->lwWaitLink != NULL && !proc->lwWaitLink->lwExclusive) proc = proc->lwWaitLink; } /* proc is now the last PGPROC to be released */ lock->head = proc->lwWaitLink; proc->lwWaitLink = NULL; /* prevent additional wakeups until retryer gets to run */ lock->releaseOK = false; to basically wake them all up: /* * Remove the to-be-awakened PGPROCs from the queue. If the front * waiter wants exclusive lock, awaken him only. Otherwise awaken * as many waiters as want shared access. */ proc = head; //if (!proc->lwExclusive) if (1) { while (proc->lwWaitLink != NULL && 1) // !proc->lwWaitLink->lwExclusive) proc = proc->lwWaitLink; } /* proc is now the last PGPROC to be released */ lock->head = proc->lwWaitLink;
Re: [PERFORM] [GENERAL] PostgreSQL TPC-H test result?
Moving this thread to Performance alias as it might make more sense for folks searching on this topic: Greg Smith wrote: On Tue, 9 Sep 2008, Amber wrote: I read something from http://monetdb.cwi.nl/projects/monetdb/SQL/Benchmark/TPCH/index.html saying that PostgreSQL can't give the correct result of the some TPC-H queries Jignesh Shah at Sun ran into that same problem. It's mentioned briefly in his presentation at http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_postgresql on pages 26 and 27. 5 of the 22 reference TCP-H queries (4, 5, 6, 10, 14) returned zero rows immediately for his tests. Looks like the MonetDB crew is saying it does that on queries 4,5,6,10,12,14,15 and that 20 takes too long to run to generate a result. Maybe 12/15/20 were fixed by changes in 8.3, or perhaps there were subtle errors there that Jignesh didn't catch--it's not like he did a formal submission run, was just kicking the tires. I suspect the difference on 20 was that his hardware and tuning was much better, so it probably did execute fast enough. I redid a quick test with the same workload on one of my systems with SF 10 which is about 10GB (I hope it comes out properly displayed) JigneshFrom Monet (8.3T/8.2.9) Q Time PG8.3.3Time PG8.2.9 Ratio 1429.01 5100.84 2 3.65 540.07 3 33.49 7980.04 4 6.53Empty 35 (E) 0.19 5 8.45Empty 5.5(E) 1.54 6 32.84Empty 172 (E) 0.19 7477.95 4391.09 8 58.55 2510.23 9781.96 22400.35 10 9.03Empty 6.1(E) 1.48 11 3.57Empty 250.14 1256.11Empty 179 (E) 0.31 1361.01 1400.44 1430.69Empty 169 (E) 0.18 1532.81Empty 168 (E) 0.2 1623.98 1150.21 17Did not finish Did not finish 1858.93 8820.07 1971.55 2180.33 20Did not finish Did not finish 21 550.51 4771.15 22 6.21 Did not finish All time is in seconds (sub seconds where availabe) Ratio > 1 means 8.3.3 is slower and <1 means 8.3.3 is faster My take on the results: * I had to tweak the statement of Q1 in order to execute it. (TPC-H kit does not directly support POSTGRESQL statements) * Timings with 8.3.3 and bit of tuning gives much better time overall This was expected (Some queries finish in 7% of the time than what MonetDB reported. From the queries that worked only Q7 & Q21 seem to have regressed) * However Empty rows results is occuring consistently (Infact Q11 also returned empty for me while it worked in their test) Queries: 4,5,6,10,11,12,14,15 (ACTION ITEM: I will start separate threads for each of those queries in HACKERS alias to figure out the problem since it looks like Functional problem to me and should be interesting to hackers alias) * Two queries 17,20 looks like will not finish (I let Q17 to run for 18 hrs and yet it had not completed. As for Q20 I killed it as it was approaching an hour.) (ACTION ITEM: Not sure whether debugging for these queries will go in hackers or perform alias but I will start a thread on them too.) * Looks like bit of tuning is required for Q1, Q7, Q9, Q21 to improve their overall time. Specially understanding if PostgreSQL is missing a more efficient plan for them. (ACTION ITEM: I will start separate threads on performance alias to dig into those queries) I hope to start separate threads for each queries so we can track them easier. I hope to provide explain analyze outputs for each one of them and lets see if there are any problems. Feedback welcome on what you want to see for each threads. Regards, Jignesh -- Jignesh Shah http://blogs.sun.com/jkshah Sun Microsystems,Inc http://sun.com/postgresql -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] ProcArrayLock (The Saga continues)
Based on feedback after the sessions I did few more tests which might be useful to share One point that was suggested to get each clients do more work and reduce the number of clients.. The igen benchmarks was flexible and what I did was remove all think time from it and repeated the test till the scalability stops (This was done with CVS downloaded yesterday) Note with this no think time concept, each clients can be about 75% CPU busy from what I observed. running it I found the clients scaling up saturates at about 60 now (compared to 500 from the original test). The peak throughput was at about 50 users (using synchrnous_commit=off) Here is the interesting DTrace Lock Ouput state (lock id, mode of lock and time in ns spent waiting for lock in a 10-sec snapshot (Just taking the last few top ones in ascending order): With less than 20 users it is WALInsert at the top: 52 Exclusive 721950129 4 Exclusive 768537190 46 Exclusive 842063837 7 Exclusive 1031851713 With 35 Users: 52 Exclusive 2599074739 4 Exclusive 2647927574 46 Exclusive 2789581991 7 Exclusive 3220008691 At the peak at about 50 users that I saw earlier (PEAK Throughput): 46 Exclusive 3669210393 4 Exclusive 6024966938 52 Exclusive 6529168107 7 Exclusive 9408290367 With about 60 users where the throughput actually starts to drop (throughput drops) 41 Exclusive 4570660567 52 Exclusive 10706741643 46 Exclusive 13152005125 4 Exclusive 13550187806 7 Exclusive 22146882562 With about 100 users ( below the peak value) 42 Exclusive4238582775 46 Exclusive6773515243 7 Exclusive7467346038 52 Exclusive9846216440 4 Shared 22528501166 4 Exclusive 223043774037 So it seems when both shared and exclusive time for ProcArrayLock wait are the top 2 it is basically saturated in terms of throughput it can handle. Optimizing wait queues will help improve shared which might help Exclusive a bit but eventually Exclusive for ProcArray will limit scaling with as few as 60-70 users. Lock hold times are below (though taken from different run) with 30 users: Lock IdMode Combined Time (ns) 1616992 Exclusive 1199791629 4 Exclusive 1399371867 34 Exclusive 1426153620 1616978 Exclusive 1528327035 1616990 Exclusive 1546374298 1616988 Exclusive 1553461559 5 Exclusive 2477558484 With 50+ users Lock IdMode Combined Time (ns) 4 Exclusive 1438509198 1616992 Exclusive 1450973466 1616978 Exclusive 1505626978 1616990 Exclusive 1850432217 1616988 Exclusive 2033226225 34 Exclusive 2098542547 5 Exclusive 3280151374 With 100 users Lock IdMode Combined Time (ns) 1616992 Exclusive 1206516505 1616988 Exclusive 1486704087 1616990 Exclusive 1521900997 34 Exclusive 1532815803 1616978 Exclusive 1541986895 5 Exclusive 2179043424 5 2395098279 (Why 5 was printing with blank??) Rerunning it with slight variation of the script Lock IdMode Combined Time (ns) 1616996 0 1167708953 36 0 1291958451 5 4299305160 1344486968 4 0 1347557908 1616978 0 1377931882 34 0 1724752938 5 0 2079012548 Looks like trend of 4's hold time looks similar to previous ones.. though the new kid is 5 with mode <> 0,1 .. not sure if that is causing problems..What mode is "4299305160" for Lock 5 (SInvalLock) ? Anyway at this point the wait time for 4 increases to a point where the database is not scaling anymore any thoughts? -Jignesh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 2GB or not 2GB
Greg Smith wrote: On Wed, 28 May 2008, Josh Berkus wrote: shared_buffers: according to witnesses, Greg Smith presented at East that based on PostgreSQL's buffer algorithms, buffers above 2GB would not really receive significant use. However, Jignesh Shah has tested that on workloads with large numbers of connections, allocating up to 10GB improves performance. Lies! The only upper-limit for non-Windows platforms I mentioned was suggesting those recent tests at Sun showed a practical limit in the low multi-GB range. I've run with 4GB usefully for one of the multi-TB systems I manage, the main index on the most frequently used table is 420GB and anything I can do to keep the most popular parts of that pegged in memory seems to help. I haven't tried to isolate the exact improvement going from 2GB to 4GB with benchmarks though. Yep its always the index that seems to benefit with high cache hits.. In one of the recent tests what I end up doing is writing a select count(*) from trade where t_id >= $1 and t_id < SOMEMAX just to kick in index scan and get it in memory first. So higher the bufferpool better the hit for index in it better the performance. -Jignesh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 2GB or not 2GB
Josh Berkus wrote: Folks, Subsequent to my presentation of the new annotated.conf at pgCon last week, there's been some argument about the utility of certain memory settings above 2GB. I'd like to hash those out on this list so that we can make some concrete recomendations to users. shared_buffers: according to witnesses, Greg Smith presented at East that based on PostgreSQL's buffer algorithms, buffers above 2GB would not really receive significant use. However, Jignesh Shah has tested that on workloads with large numbers of connections, allocating up to 10GB improves performance. I have certainly seen improvements in performance upto 10GB using EAStress. The delicate balance is between file system cache and shared buffers. I think the initial ones are more beneficial at shared buffers level and after that file system cache. I am trying to remember Greg's presentation where I think he suggested more like 50% of available RAM (eg in 4GB system used just for PostgreSQL, it may not help setting more than 2GB since you need memory for other stuff also).. Right Greg? But if you have 32GB RAM .. I dont mind allocating 10GB to PostgreSQL beyond which I find lots of other things that begin to impact.. BTW I am really +1 for just setting AvailRAM tunable for PostgreSQL (example that you showed in tutorials) and do default derivations for all other settings unless overridden manually. So people dont forget to bump up wal_buffers or one of them while bumping the rest and trying to fight why the hell they are not seeing what they are expecting. -Jignesh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarks WAS: Sun Talks about MySQL
Joshua D. Drake wrote: On Mon, 28 Apr 2008 14:40:25 -0400 Gregory Stark <[EMAIL PROTECTED]> wrote: We certainly can pass TPC-C. I'm curious what you mean by 1/4 though? On similar hardware? Or the maximum we can scale to is 1/4 as large as Oracle? Can you point me to the actual benchmark runs you're referring to? I would be curious as well considering there has been zero evidence provided to make such a statement. I am not saying it isn't true, it wouldn't be surprising to me if Oracle outperformed PostgreSQL in TPC-C but I would sure like to see in general how wel we do (or don't). Sincerely, Joshua D. Drake I am sorry but I am far from catching my emails: Best thing is to work with TPC-E benchmarks involving the community. (TPC-C requirements is way too high on storage and everybody seems to be getting on the TPC-E bandwagon slowly.) Where can I get the latest DBT5 (TPC-E) kit ? Using the kit should allow me to recreate setups which can then be made available for various PostgreSQL Performance engineers to look at it. Regards, Jignesh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 12 disks raid setup
Franck Routier wrote: Hi, I am in the process of setting up a postgresql server with 12 SAS disks. I am considering two options: 1) set up a 12 disks raid 10 array to get maximum raw performance from the system and put everything on it (it the whole pg cluster, including WAL, and every tablespcace) 2) set up 3 raid 10 arrays of 4 disks, and dispatch my data on these disks via tablespaces : data1 = pg cluster + references data (dimensions) tablespace data2 = fact data tablespace data3 = indices tablespace Option 2: Infact I would also say within one of the RAID1 use another softpartition and separate out pg_xlog also. My 2 cents based on my benchmarks. -Jignesh ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Greg Smith wrote: On Fri, 15 Feb 2008, Peter Schuller wrote: Or is it a matter of PostgreSQL doing non-direct I/O, such that anything cached in shared_buffers will also be cached by the OS? PostgreSQL only uses direct I/O for writing to the WAL; everything else goes through the regular OS buffer cache unless you force it to do otherwise at the OS level (like some Solaris setups do with forcedirectio). This is one reason it still make not make sense to give an extremely high percentage of RAM to PostgreSQL even with improvements in managing it. Another is that shared_buffers memory has to be reconciled with disk at every checkpoint, where OS buffers do not. A third is that your OS may just be more efficient at buffering--it knows more about the underlying hardware, and the clock-sweep method used internally by PostgreSQL to simulate a LRU cache is not extremely sophisticated. However, don't feel limited by the general 25% rule; it's certainly worth exploring whether 50% or more works better for your workload. You'll have to benchmark that yourself though, and I'd suggest using pg_buffercache: http://www.postgresql.org/docs/8.3/static/pgbuffercache.html to get an idea just what the pages are being used for. As per the test that I have done mostly with forcedirectio on Solaris, I have seen gains with increasing the buffercache to about somewhere between 10GB and thats when thing seem to take a turn... So in my case I am generally comfortable for Postgres to use about 8-10GB beyond which I am cautious. Also with tests with UFS buffered for table/index and forcedirectio it seems to perform better with forcedirectio .. However if you do want to exploit the extra RAM with UFS then you have to do some tunings for UFS in Solaris.. Now with machines with 32GB becoming common this is something worth pursuing depending on the storage if it can handle the directio load or not. Regards, Jignesh ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
Greg Smith wrote: On Wed, 6 Feb 2008, Simon Riggs wrote: For me, it would be good to see a --parallel=n parameter that would allow pg_loader to distribute rows in "round-robin" manner to "n" different concurrent COPY statements. i.e. a non-routing version. Let me expand on this. In many of these giant COPY situations the bottleneck is plain old sequential I/O to a single process. You can almost predict how fast the rows will load using dd. Having a process that pulls rows in and distributes them round-robin is good, but it won't crack that bottleneck. The useful approaches I've seen for other databases all presume that the data files involved are large enough that on big hardware, you can start multiple processes running at different points in the file and beat anything possible with a single reader. If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get way more than 1 disk worth of throughput reading. You have to play with the exact number because if you push the split too far you introduce seek slowdown instead of improvements, but that's the basic design I'd like to see one day. It's not parallel loading that's useful for the cases I'm thinking about until something like this comes around. Some food for thought here: Most BI Type applications which does data conversions/cleansing also might end up sorting the data before its loaded into a database so starting parallel loaders at Total different points ruins that effort. A More pragmatic approach will be to read the next rows from the input file So if there are N parallel streams then each one is offset by 1 from each other and jumps by N rows so the seeks are pretty much narrrowed down to few rows (ideally 1) instead of jumping 1/Nth rows every time a read happens. For example to replicate this with dd to see the impact use a big file and use the seek option and blocksizes .. Somebody out here once had done that test and showed that "seek time" on the file being read is reduced significantly and depending on the file system it does intelligent prefetching (which unfortunately UFS in Solaris does not do best by default) all the reads for the next stream will already be in memory. Regards, Jignesh ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Benchmark Data requested
Commercial Db bulk loaders work the same way.. they give you an option as a fast loader provided in case of error, the whole table is truncated. This I think also has real life advantages where PostgreSQL is used as datamarts which are recreated every now and then from other systems and they want fast loaders. So its not just the benchmarking folks like me that will take advantage of such features. INFACT I have seen that they force the clause "REPLACE TABLE" in the sense that will infact truncate the table before loading so there is no confusion what happens to the original data in the table and only then it avoids the logs. to be honest, its not the WAL Writes to the disk that I am worried about.. According to my tests, async_commit is coming pretty close to sync=off and solves the WALWriteLock contention. We should maybe just focus on making it more efficient which I think also involves WALInsertLock that may not be entirely efficient. Also all changes have to be addon options and not replacement for existing loads, I totally agree to that point.. The guys in production support don't even like optimizer query plan changes, forget corrupt index. (I have spent two days in previous role trying to figure out why a particular query plan on another database changed in production.) Simon Riggs wrote: On Tue, 2008-02-05 at 13:47 -0500, Jignesh K. Shah wrote: That sounds cool to me too.. How much work is to make pg_bulkload to work on 8.3? An Integrated version is certainly more beneficial. Specially I think it will also help for other setups like TPC-E too where this is a problem. If you don't write WAL then you can lose all your writes in a crash. That issue is surmountable on a table with no indexes, or even conceivably with one monotonically ascending index. With other indexes if we crash then we have a likely corrupt index. For most production systems I'm aware of, losing an index on a huge table is not anything you'd want to trade for performance. Assuming you've ever been knee-deep in it on a real server. Maybe we can have a "load mode" for a table where we skip writing any WAL, but if we crash we just truncate the whole table to nothing? Issue a WARNING if we enable this mode while any data in table. I'm nervous of it, but maybe people really want it? I don't really want to invent ext2 all over again, so we have to run an fsck on a table of we crash while loading. My concern is that many people would choose that then blame us for delivering unreliable software. e.g. direct path loader on Oracle used to corrupt a PK index if you loaded duplicate rows with it (whether it still does I couldn't care). That kind of behaviour is simply incompatible with production usage, even if it does good benchmark. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Benchmark Data requested
Hi Heikki, Is there a way such an operation can be spawned as a worker process? Generally during such loading - which most people will do during "offpeak" hours I expect additional CPU resources available. By delegating such additional work to worker processes, we should be able to capitalize on additional cores in the system. Even if it is a single core, the mere fact that the loading process will eventually wait for a read from the input file which cannot be non-blocking, the OS can timeslice it well for the second process to use those wait times for the index population work. What do you think? Regards, Jignesh Heikki Linnakangas wrote: Dimitri Fontaine wrote: Le mardi 05 février 2008, Simon Riggs a écrit : I'll look at COPY FROM internals to make this faster. I'm looking at this now to refresh my memory; I already had some plans on the shelf. Maybe stealing some ideas from pg_bulkload could somewhat help here? http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf IIRC it's mainly about how to optimize index updating while loading data, and I've heard complaints on the line "this external tool has to know too much about PostgreSQL internals to be trustworthy as non-core code"... so... I've been thinking of looking into that as well. The basic trick pg_bulkload is using is to populate the index as the data is being loaded. There's no fundamental reason why we couldn't do that internally in COPY. Triggers or constraints that access the table being loaded would make it impossible, but we should be able to detect that and fall back to what we have now. What I'm basically thinking about is to modify the indexam API of building a new index, so that COPY would feed the tuples to the indexam, instead of the indexam opening and scanning the heap. The b-tree indexam would spool the tuples into a tuplesort as the COPY progresses, and build the index from that at the end as usual. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Benchmark Data requested
That sounds cool to me too.. How much work is to make pg_bulkload to work on 8.3? An Integrated version is certainly more beneficial. Specially I think it will also help for other setups like TPC-E too where this is a problem. Regards, Jignesh Simon Riggs wrote: On Tue, 2008-02-05 at 18:15 +0100, Dimitri Fontaine wrote: Le mardi 05 février 2008, Simon Riggs a écrit : I'll look at COPY FROM internals to make this faster. I'm looking at this now to refresh my memory; I already had some plans on the shelf. Maybe stealing some ideas from pg_bulkload could somewhat help here? http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf IIRC it's mainly about how to optimize index updating while loading data, and I've heard complaints on the line "this external tool has to know too much about PostgreSQL internals to be trustworthy as non-core code"... so... Yeh, the batch index updates are a cool feature. Should be able to do that internally also. Not going to try the no-WAL route again though. If we can get it running efficiently and in parallel, then that will be OK. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Benchmark Data requested
One of the problems with "Empty Table optimization" is that if there are indexes created then it is considered as no longer empty. Commercial databases have options like "IRRECOVERABLE" clause along with DISK PARTITIONS and CPU partitions for their bulk loaders. So one option turns off logging, disk partitions create multiple processes to read various lines/blocks from input file and other various blocks to clean up the bufferpools to disk and CPU partitions to process the various blocks/lines read for their format and put the rows in bufferpool if successful. Regards, Jignesh Simon Riggs wrote: On Tue, 2008-02-05 at 15:05 +, Richard Huxton wrote: Only by locking the table, which serializes access, which then slows you down or at least restricts other options. Plus if you use pg_loader then you'll find only the first few rows optimized and all the rest not. Hmm - the table-locking requirement is true enough, but why would pg_loader cause problems after the first few rows? It runs a stream of COPY statements, so only first would be optimized with the "empty table optimization". ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmark Data requested
Gregory Stark wrote: Incidentally we found some cases that Solaris was particularly bad at. Is there anybody in particular that would be interested in hearing about them? (Not meant to be a knock on Solaris, I'm sure there are other cases Linux or BSD handle poorly too) Send me the details, I can file bugs for Solaris on behalf of the community. Since I am involved in lot of PostgreSQL testing on Solaris this year, I have a small list myself (mostly related to file system stuff though). I know one regarding bonnie rewriting blocks that you sent out. (I still havent done anything about it yet but finally have some test machines for such work instead of using my workstation to test it out :-) But I am really interested in seeing which one hits PostgreSQL performance/usability. Thanks in advance. Regards, Jignesh ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Benchmark Data requested
TPC-H has two runs PowerRun which is single stream (Q1-22 RF1, RF2) And Throughput Runs which has "N" (depends on scale) running simultaneously in a mixed sequence of the same queries and the two update functions. During throughput run you can expect to max out CPU... But commerial databases generally have PowerRuns running quite well even on multi-cores ( Oracle (without RAC have published with 144 cores on Solaris) As for IO system saturating the CPU its two folds Kernel fetching in the data which saturates at some value and in this case PostgreSQL reading the data and putting it in its bufferpool An example of how I use it is as follows: Do a select query on a table such that it results in table scan without actually returning any rows back Now keep throwing hardware (better storage) till it saturates the CPU. That's the practical max you can do with the CPU/OS combination (considering unlimited storage bandwidth). This one is primarily used in guessing how fast one of the queries in TPC-H will complete. In my tests with PostgreSQL, I generally reach the CPU limit without even reaching the storage bandwidth of the underlying storage. Just to give numbers Single 2Gb Fiber Channel port can practically go upto 180 MB/sec Single 4Gb ports have proven to go upto 360-370MB/sec So to saturate a FC port, postgreSQL has to be able to scan 370MB/sec without saturating the CPU. Then comes software stripping which allows multiple ports to be stripped over increasing the capacity of the bandwidth... Now scanning has to be able to drive Nx370MB/sec (all on single core). I had some numbers and I had some limitations based on cpu frequency, blocksize ,etc but those were for 8.1 days or so.. I think to take PostgreSQL a bit high end, we have to first scale out these numbers. Doing some sorts of test in PostgreSQL farms for every release actually does help people see the amount of data that it can drive through... We can actually work on some database operation metrics to also guage how much each release is improving over older releases.. I have ideas for few of them. Regards, Jignesh Gregory Stark wrote: "Jignesh K. Shah" <[EMAIL PROTECTED]> writes: Then for the power run that is essentially running one query at a time should essentially be able to utilize the full system (specially multi-core systems), unfortunately PostgreSQL can use only one core. (Plus since this is read only and there is no separate disk reader all other processes are idle) and system is running at 1/Nth capacity (where N is the number of cores/threads) Is the whole benchmark like this or is this just one part of it? Is the i/o system really able to saturate the cpu though? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Benchmark Data requested
Doing it at low scales is not attractive. Commercial databases are publishing at scale factor of 1000(about 1TB) to 1(10TB) with one in 30TB space. So ideally right now tuning should start at 1000 scale factor. Unfortunately I have tried that before with PostgreSQL the few of the problems are as follows: Single stream loader of PostgreSQL takes hours to load data. (Single stream load... wasting all the extra cores out there) Multiple table loads ( 1 per table) spawned via script is bit better but hits wal problems. To avoid wal problems, I had created tables and load statements within the same transaction, faster but cannot create index before load or it starts writing to wal... AND if indexes are created after load, it takes about a day or so to create all the indices required. (Its single threaded and creating multiple indexes/indices at the same time could result in running out of temporary "DISK" space since the tables are so big. Which means 1 thread at a time is the answer for creating tables that are really big. It is slow. Boy, by this time most people doing TPC-H in high end give up on PostgreSQL. I have not even started Partitioning of tables yet since with the current framework, you have to load the tables separately into each tables which means for the TPC-H data you need "extra-logic" to take that table data and split it into each partition child table. Not stuff that many people want to do by hand. Then for the power run that is essentially running one query at a time should essentially be able to utilize the full system (specially multi-core systems), unfortunately PostgreSQL can use only one core. (Plus since this is read only and there is no separate disk reader all other processes are idle) and system is running at 1/Nth capacity (where N is the number of cores/threads) (I am not sure here with Partitioned tables, do you get N processes running in the system when you scan the partitioned table?) Even off-loading work like "fetching the data into bufferpool" into separate processes will go big time with this type of workloads. I would be happy to help out if folks here want to do work related to it. Infact if you have time, I can request a project in one of the Sun Benchmarking center to see what we can learn with community members interested in understanding where PostgreSQL performs and fails. Regards, Jignesh Greg Smith wrote: On Mon, 4 Feb 2008, Simon Riggs wrote: Would anybody like to repeat these tests with the latest production versions of these databases (i.e. with PGSQL 8.3) Do you have any suggestions on how people should run TPC-H? It looked like a bit of work to sort through how to even start this exercise. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Benchmark Data requested
Hi Simon, I have some insight into TPC-H on how it works. First of all I think it is a violation of TPC rules to publish numbers without auditing them first. So even if I do the test to show the better performance of PostgreSQL 8.3, I cannot post it here or any public forum without doing going through the "process". (Even though it is partial benchmark as they are just doing the equivalent of the PowerRun of TPCH) Maybe the PR of PostgreSQL team should email [EMAIL PROTECTED] about them and see what they have to say about that comparison. On the technical side: Remember all TPC-H queries when run sequentially on PostgreSQL uses only 1 core or virtual CPU so it is a very bad for system to use it with PostgreSQL (same for MySQL too). Also very important unless you are running the UPDATE FUNCTIONS which are separate queries, all these Q1-Q22 Queries are pure "READ-ONLY" queries. Traditionally I think PostgreSQL does lack "READ-SPEED"s specially since it is bottlenecked by the size of the reads it does (BLOCKSIZE). Major database provides multi-block parameters to do multiple of reads/writes in terms of blocksizes to reduce IOPS and also for read only they also have READ-AHEAD or prefetch sizes which is generally bigger than multi-block or extent sizes to aid reads. Scale factor is in terms of gigs and hence using max scale of 5 (5G) is pretty useless since most of the rows could be cached in modern day systems. And comparing with 0.01 is what 10MB? Size of recent L2 cache of Intel is probably bigger than that size. If you are doing tuning for TPC-H Queries focus on few of them: For example Query 1 is very Join intensive and if your CPU is not 100% used then you have a problem in your IO to solve before tuning it. Another example is Query 16 is literally IO scan speed, many people use it to see if the database can scan at "line speeds" of the storage, ending up with 100% CPU means the database cannot process that many rows (just to bring it in). In essence each query does some combination of system features to highlight the performance. However since it is an old benchmark, database companies end up "re-engineering" their technologies to gain advantage in this benchmark (Hence its time for a successor in work called TPC-DS which will have more than 100 such queries) Few of the technologies that have really helped gain ground in TPC-H world * Hash and/or Range Partitioning of tables ( PostgreSQL 8.3 can do that but the setup cost of writing schema is great specially since data has to be loaded in separate tables) * Automated Aggregated Views - used by optmiziers - database technology to update more frequently used aggregations in a smaller views * Cube views Index - like bitmap but multidimensional (I think ..but not sure) That said, is it useful to be used in "Regression testing in PostgreSQL farms. I would think yes.. specially Q16 Hope this helps. Regards, Jignesh Simon Riggs wrote: Can I ask for some help with benchmarking? There are some results here that show PostgreSQL is slower in some cases than Monet and MySQL. Of course these results were published immediately prior to 8.2 being released, plus run out-of-the-box, so without even basic performance tuning. Would anybody like to repeat these tests with the latest production versions of these databases (i.e. with PGSQL 8.3), and with some sensible tuning settings for the hardware used? It will be useful to get some blind tests with more sensible settings. http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/ Multiple runs from different people/different hardware is useful since they help to iron-out differences in hardware and test methodology. So don't worry if you see somebody else doing this also. Thanks, ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 1 or 2 servers for large DB scenario.
Hi David, I have been running few tests with 8.2.4 and here is what I have seen: If fysnc=off is not an option (and it should not be an option :-) ) then commit_delay=10 setting seems to help a lot in my OLTP runs. Granted it will delay your transactions a bit, but the gain is big considering the WAL writes end up doing bigger writes under high load and got a good boost in performance due to that change (IIRC it was about 6-10% depending on load and contention). So that might help out. Curiosly I did spend why it helps out on write contention. Atleast on Solaris my observation is WAL logs then end up getting bigger than 8K (Blocksize). This meant an overall reduction in IOPS on the filesystem thats holding the logs and hence more IOPS capacity available to do do more Log writes. (Using EAStress type of benchmark, it ended up doing somewhere between 128K-256KB writes on the logs which was pretty fascinating since the benchmark does drive fair amount of WAL writes and without commit_delay, the disks were pretty saturated quickly. Also if the load is high, then the delay in transaction is pretty much non existent. (atleast what I observed with commit_delay=10 and commit_siblings left to default) Of course as already replied back, 8.3's async commit helps on top of commit_delay so thats an option if few transactions loss potential is acceptable. -Jignesh David Brain wrote: Hi, I'd appreciate some assistance in working through what would be the optimal configuration for the following situation. We currently have one large DB (~1.2TB on disk), that essentially consists of 1 table with somewhere in the order of 500 million rows , this database has daily inserts as well as being used for some semi-data mining type operations, so there are a fairly large number of indices on the table. The hardware storing this DB (a software RAID6) array seems to be very IO bound for writes and this is restricting our insert performance to ~50TPS. As we need to achieve significantly faster insert performance I have been considering splitting the table into 'new' and 'old' data, making inserts into the 'new' table (which will also help as there are really 1 insert, an update and some selects involved in populating the table), then moving the data over to the 'old' DB on a periodic basis. There would be new hardware involved, I'm thinking of HW RAID 10 to improve the write performance. The question really is, is it best to use two separate servers and databases (and have to come up with some copy process to move data from one to another), or to just add the faster storage hardware to the existing server and create a new tablespace for the 'new data' table on that hardware. Doing this would enable us to potentially move data more easily from new to old (we can't use partitioning because there is some logic involved in when things would need to be moved to 'old'). Are there any global resources that make just adding the faster storage to the existing box a bad idea (the wal_log springs to mind - although that could be moved too), that would make adding an additional server instead a better idea? Also are there any settings that might boost our TPS on the existing hardware (sync=off isn't an option.. (-: ). I have already significantly increased the various buffers, but this was mainly to improve select performance? Verson of Postgresql is 8.2.3. Thanks, David. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
I changed CLOG Buffers to 16 Running the test again: # ./read.d dtrace: script './read.d' matched 2 probes CPU IDFUNCTION:NAME 0 1027 :tick-5sec /export/home0/igen/pgdata/pg_clog/0024 -27530282192961 /export/home0/igen/pgdata/pg_clog/0025 -27530282111041 # ./read.d dtrace: script './read.d' matched 2 probes CPU IDFUNCTION:NAME 1 1027 :tick-5sec # ./read.d dtrace: script './read.d' matched 2 probes CPU IDFUNCTION:NAME 1 1027 :tick-5sec # ./read.d dtrace: script './read.d' matched 2 probes CPU IDFUNCTION:NAME 0 1027 :tick-5sec /export/home0/igen/pgdata/pg_clog/0025 -27530281947201 So Tom seems to be correct that it is a case of CLOG Buffer thrashing. But since I saw the same problem with two different workloads, I think people hitting this problem is pretty high. Also I am bit surprised that CLogControlFile did not show up as being hot.. Maybe because not much writes are going on .. Or maybe since I did not trace all 500 users to see their hot lock status.. Dmitri has another workload to test, I might try that out later on to see if it causes similar impact or not. Of course I havent seen my throughput go up yet since I am already CPU bound... But this is good since the number of IOPS to the disk are reduced (and hence system calls). If I take this as my baseline number.. I can then proceed to hunt other bottlenecks Whats the view of the community? Hunt down CPU utilizations or Lock waits next? Your votes are crucial on where I put my focus. Another thing Josh B told me to check out was the wal_writer_delay setting: I have done two settings with almost equal performance (with the CLOG 16 setting) .. One with 100ms and other default at 200ms.. Based on the runs it seemed that the 100ms was slightly better than the default .. (Plus the risk of loosing data is reduced from 600ms to 300ms) Thanks. Regards, Jignesh Tom Lane wrote: "Jignesh K. Shah" <[EMAIL PROTECTED]> writes: So the ratio of reads vs writes to clog files is pretty huge.. It looks to me that the issue is simply one of not having quite enough CLOG buffers. Your first run shows 8 different pages being fetched and the second shows 10. Bearing in mind that we "pin" the latest CLOG page into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for older pages, so what we've got here is thrashing for the available slots. Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
Also to give perspective on the equivalent writes on CLOG I used the following script which runs for 10 sec to track all writes to the clog directory and here is what it came up with... (This is with 500 users running) # cat write.d #!/usr/sbin/dtrace -s syscall::write:entry /execname=="postgres" && dirname(fds[arg0].fi_pathname)=="/export/home0/igen/pgdata/pg_clog"/ { @write[fds[arg0].fi_pathname,arg1] = count(); } tick-10sec { exit(0); } # ./write.d dtrace: script './write.d' matched 2 probes CPU IDFUNCTION:NAME 3 1026 :tick-10sec /export/home0/igen/pgdata/pg_clog/001E -27530282770881 # I modified read.d to do a 5sec read # ./read.d dtrace: script './read.d' matched 3 probes CPU IDFUNCTION:NAME 0 1 :BEGIN 0 1027 :tick-5sec /export/home0/igen/pgdata/pg_clog/001F -27530282688961 /export/home0/igen/pgdata/pg_clog/001F -27530282525121 /export/home0/igen/pgdata/pg_clog/001F -27530282852802 /export/home0/igen/pgdata/pg_clog/001F -27530282770883 /export/home0/igen/pgdata/pg_clog/001F -27530282361283 /export/home0/igen/pgdata/pg_clog/001E -27530282852805 /export/home0/igen/pgdata/pg_clog/001E -27530282361289 /export/home0/igen/pgdata/pg_clog/001E -2753028277088 13 /export/home0/igen/pgdata/pg_clog/001E -2753028268896 15 /export/home0/igen/pgdata/pg_clog/001E -2753028252512 27 # So the ratio of reads vs writes to clog files is pretty huge.. -Jignesh Jignesh K. Shah wrote: Tom, Here is what I did: I started aggregating all read information: First I also had added group by pid(arg0,arg1, pid) and the counts were all coming as 1 Then I just grouped by filename and location (arg0,arg1 of reads) and the counts came back as # cat read.d #!/usr/sbin/dtrace -s syscall::read:entry /execname=="postgres"/ { @read[fds[arg0].fi_pathname, arg1] = count(); } # ./read.d dtrace: script './read.d' matched 1 probe ^C /export/home0/igen/pgdata/pg_clog/0014 -27530282934721 /export/home0/igen/pgdata/pg_clog/0014 -27530282770881 /export/home0/igen/pgdata/pg_clog/0015 -27530282443202 /export/home0/igen/pgdata/pg_clog/0015 -2753028268896 14 /export/home0/igen/pgdata/pg_clog/0015 -2753028260704 25 /export/home0/igen/pgdata/pg_clog/0015 -2753028252512 27 /export/home0/igen/pgdata/pg_clog/0015 -2753028277088 28 /export/home0/igen/pgdata/pg_clog/0015 -2753028293472 37 FYI I pressed ctrl-c within like less than a second So to me this seems that multiple processes are reading the same page from different pids. (This was with about 600 suers active. Aparently we do have a problem that we are reading the same buffer address again. (Same as not being cached anywhere or not finding it in cache anywhere). I reran lock wait script on couple of processes and did not see CLogControlFileLock as a problem.. # ./83_lwlock_wait.d 14341 Lock IdMode Count WALInsertLock Exclusive 1 ProcArrayLock Exclusive 16 Lock Id Combined Time (ns) WALInsertLock 383109 ProcArrayLock198866236 # ./83_lwlock_wait.d 14607 Lock IdMode Count WALInsertLock Exclusive 2 ProcArrayLock Exclusive 15 Lock Id Combined Time (ns) WALInsertLock55243 ProcArrayLock 69700140 # What will help you find out why it is reading the same page again? -Jignesh Jignesh K. Shah wrote: I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is just avoiding the symptom to a later value.. I promise to look more into it before making any recommendations to increase NUM_CLOG_BUFFERs. Because though "iGen" showed improvements in that area by increasing num_clog_buffers , EAStress had shown no improvements.. Plus the reason I think this is not the problem in 8.3beta1 since the Lock Output clearly does not show CLOGControlFile as to be the issue which I had seen in earlier case. So I dont think that increasing NUM_CLOG_BUFFERS will change thing here. Now I dont understand the code pretty well yet I see three hotspots and not sure if they are relate
Re: [PERFORM] 8.3beta1 testing on Solaris
Hi George, I have seen the 4M/sec problem first actually during an EAStress type run with only 150 connections. I will try to do more testing today that Tom has requested. Regards, Jignesh Gregory Stark wrote: "Jignesh K. Shah" <[EMAIL PROTECTED]> writes: CLOG data is not cached in any PostgreSQL shared memory segments and hence becomes the bottleneck as it has to constantly go to the filesystem to get the read data. This is the same bottleneck you discussed earlier. CLOG reads are cached in the Postgres shared memory segment but only NUM_CLOG_BUFFERS are which defaults to 8 buffers of 8kb each. With 1,000 clients and the transaction rate you're running you needed a larger number of buffers. Using the filesystem buffer cache is also an entirely reasonable solution though. That's surely part of the logic behind not trying to keep more of the clog in shared memory. Do you have any measurements of how much time is being spent just doing the logical I/O to the buffer cache for the clog pages? 4MB/s seems like it's not insignificant but your machine is big enough that perhaps I'm thinking at the wrong scale. I'm really curious whether you see any benefit from the vxid read-only transactions. I'm not sure how to get an apples to apples comparison though. Ideally just comparing it to CVS HEAD from immediately prior to the vxid patch going in. Perhaps calling some function which forces an xid to be allocated and seeing how much it slows down the benchmark would be a good substitute. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
Tom, Here is what I did: I started aggregating all read information: First I also had added group by pid(arg0,arg1, pid) and the counts were all coming as 1 Then I just grouped by filename and location (arg0,arg1 of reads) and the counts came back as # cat read.d #!/usr/sbin/dtrace -s syscall::read:entry /execname=="postgres"/ { @read[fds[arg0].fi_pathname, arg1] = count(); } # ./read.d dtrace: script './read.d' matched 1 probe ^C /export/home0/igen/pgdata/pg_clog/0014 -27530282934721 /export/home0/igen/pgdata/pg_clog/0014 -27530282770881 /export/home0/igen/pgdata/pg_clog/0015 -27530282443202 /export/home0/igen/pgdata/pg_clog/0015 -2753028268896 14 /export/home0/igen/pgdata/pg_clog/0015 -2753028260704 25 /export/home0/igen/pgdata/pg_clog/0015 -2753028252512 27 /export/home0/igen/pgdata/pg_clog/0015 -2753028277088 28 /export/home0/igen/pgdata/pg_clog/0015 -2753028293472 37 FYI I pressed ctrl-c within like less than a second So to me this seems that multiple processes are reading the same page from different pids. (This was with about 600 suers active. Aparently we do have a problem that we are reading the same buffer address again. (Same as not being cached anywhere or not finding it in cache anywhere). I reran lock wait script on couple of processes and did not see CLogControlFileLock as a problem.. # ./83_lwlock_wait.d 14341 Lock IdMode Count WALInsertLock Exclusive 1 ProcArrayLock Exclusive 16 Lock Id Combined Time (ns) WALInsertLock 383109 ProcArrayLock198866236 # ./83_lwlock_wait.d 14607 Lock IdMode Count WALInsertLock Exclusive 2 ProcArrayLock Exclusive 15 Lock Id Combined Time (ns) WALInsertLock55243 ProcArrayLock 69700140 # What will help you find out why it is reading the same page again? -Jignesh Jignesh K. Shah wrote: I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is just avoiding the symptom to a later value.. I promise to look more into it before making any recommendations to increase NUM_CLOG_BUFFERs. Because though "iGen" showed improvements in that area by increasing num_clog_buffers , EAStress had shown no improvements.. Plus the reason I think this is not the problem in 8.3beta1 since the Lock Output clearly does not show CLOGControlFile as to be the issue which I had seen in earlier case. So I dont think that increasing NUM_CLOG_BUFFERS will change thing here. Now I dont understand the code pretty well yet I see three hotspots and not sure if they are related to each other * ProcArrayLock waits - causing Waits as reported by 83_lockwait.d script * SimpleLRUReadPage - causing read IOs as reported by iostat/rsnoop.d * GetSnapshotData - causing CPU utilization as reported by hotuser But I will shut up and do more testing. Regards, Jignesh Tom Lane wrote: Josh Berkus <[EMAIL PROTECTED]> writes: Actually, 32 made a significant difference as I recall ... do you still have the figures for that, Jignesh? I'd want to see a new set of test runs backing up any call for a change in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that benchmarks using code from a few months back shouldn't carry a lot of weight. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is just avoiding the symptom to a later value.. I promise to look more into it before making any recommendations to increase NUM_CLOG_BUFFERs. Because though "iGen" showed improvements in that area by increasing num_clog_buffers , EAStress had shown no improvements.. Plus the reason I think this is not the problem in 8.3beta1 since the Lock Output clearly does not show CLOGControlFile as to be the issue which I had seen in earlier case. So I dont think that increasing NUM_CLOG_BUFFERS will change thing here. Now I dont understand the code pretty well yet I see three hotspots and not sure if they are related to each other * ProcArrayLock waits - causing Waits as reported by 83_lockwait.d script * SimpleLRUReadPage - causing read IOs as reported by iostat/rsnoop.d * GetSnapshotData - causing CPU utilization as reported by hotuser But I will shut up and do more testing. Regards, Jignesh Tom Lane wrote: Josh Berkus <[EMAIL PROTECTED]> writes: Actually, 32 made a significant difference as I recall ... do you still have the figures for that, Jignesh? I'd want to see a new set of test runs backing up any call for a change in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that benchmarks using code from a few months back shouldn't carry a lot of weight. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
The problem I saw was first highlighted by EAStress runs with PostgreSQL on Solaris with 120-150 users. I just replicated that via my smaller internal benchmark that we use here to recreate that problem. EAStress should be just fine to highlight it.. Just put pg_clog on O_DIRECT or something so that all IOs go to disk making it easier to observe. In the meanwhile I will try to get more information. Regards, Jignesh Tom Lane wrote: Gregory Stark <[EMAIL PROTECTED]> writes: Didn't we already go through this? He and Simon were pushing to bump up NUM_CLOG_BUFFERS and you were arguing that the test wasn't representative and some other clog.c would have to be reengineered to scale well to larger values. AFAIR we never did get any clear explanation of what the test case is. I guess it must be write-mostly, else lazy XID assignment would have helped this by reducing the rate of XID consumption. It's still true that I'm leery of a large increase in the number of buffers without reengineering slru.c. That code was written on the assumption that there were few enough buffers that a linear search would be fine. I'd hold still for 16, or maybe even 32, but I dunno how much impact that will have for such a test case. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] PostgreSQL 8.3beta1 on Solaris testing case study
I thought I will update this to the Performance alias too about our testing with PG8.3beta1 on Solaris. Regards, Jignesh __Background_:_ We were using PostgreSQL 8.3beta1 testing on our latest Sun SPARC Enterprise T5220 Server using Solaris 10 8/07. Generally for performance benefits in Solaris we put file systems on forcedirectio we bypass the filesystem cache and go direct to disks. __Problem_:_ What we were observing that there were lots of reads happening about 4MB/sec on the file system holding $PGDATA and the database tables during an OLTP Benchmark run. Initially we thought that our bufferpools were not big enough. But thanks to 64-bit builds we could use bigger bufferpools. However even with extraordinary bufferpool sizes we still saw lots of reads going to the disks. __DTrace to the Rescue_:_ I modified iosnoop.d to just snoop on reads. The modified rsnoop.d is as follows: $ cat rsnoop.d #!/usr/sbin/dtrace -s syscall::read:entry /execname=="postgres"/ { printf("pid %d reading %s\n", pid, fds[arg0].fi_pathname); } Based on it I found that most postgresql processes were doing lots of reads from pg_clog directory. CLOG or commit logs keep track of transactions in flight. Writes of CLOG comes from recording of transaction commits( or when it aborts) or when an XLOG is generated. However though I am not clear on reads yet, it seems every process constantly reads it to get some status. CLOG data is not cached in any PostgreSQL shared memory segments and hence becomes the bottleneck as it has to constantly go to the filesystem to get the read data. __Workaround for the high reads on CLOG on Solaris_ : _Start with the cluster $PGDATA on regular UFS (which is buffered and logging is enabled). Always create a new tablespace for your database on forcedirectio mounted file system which bypasses the file system cache. This allows all PostgreSQL CLOG files to be cached in UFS greatly reducing stress on the underlying storage. For writes to the best of my knowledge, PostgreSQL will still do fsync to force the writes the CLOGs onto the disks so it is consistent. But the reads are spared from going to the disks and returned from the cache. __Result_:_ With rightly sized bufferpool now all database data can be in PostgreSQL cache and hence reads are spared from the tablespaces. As for PGDATA data, UFS will do the caching of CLOG files, etc and hence sparring reads from going to the disks again. In the end what we achieve is a right sized bufferpool where there are no reads required during a high OLTP environment and the disks are just busy doing the writes of updates and inserts. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] 8.3beta1 testing on Solaris
Update on my testing 8.3beta1 on Solaris. * CLOG reads * Asynchronous Commit benefit * Hot CPU Utilization Regards, Jignesh __Background_:_ We were using PostgreSQL 8.3beta1 testing on our latest Sun SPARC Enterprise T5220 Server using Solaris 10 8/07 and Sun Fire X4200 using Solaris 10 8/07. Generally for performance benefits in Solaris we put file systems on forcedirectio we bypass the filesystem cache and go direct to disks. __Problem_:_ What we were observing that there were lots of reads happening about 4MB/sec on the file system holding $PGDATA and the database tables during an OLTP Benchmark run. Initially we thought that our bufferpools were not big enough. But thanks to 64-bit builds we could use bigger bufferpools. However even with extraordinary bufferpool sizes we still saw lots of reads going to the disks. __DTrace to the Rescue_:_ I modified iosnoop.d to just snoop on reads. The modified rsnoop.d is as follows: $ cat rsnoop.d #!/usr/sbin/dtrace -s syscall::read:entry /execname=="postgres"/ { printf("pid %d reading %s\n", pid, fds[arg0].fi_pathname); } Based on it I found that most postgresql processes were doing lots of reads from pg_clog directory. CLOG or commit logs keep track of transactions in flight. Writes of CLOG comes from recording of transaction commits( or when it aborts) or when an XLOG is generated. However though I am not clear on reads yet, it seems every process constantly reads it to get some status. CLOG data is not cached in any PostgreSQL shared memory segments and hence becomes the bottleneck as it has to constantly go to the filesystem to get the read data. # ./rsnoop.d dtrace: script './rsnoop.d' matched 1 probe CPU IDFUNCTION:NAME 0 49222 read:entry pid 8739 reading /export/home0/igen/pgdata/pg_clog/000C 0 49222 read:entry pid 9607 reading /export/home0/igen/pgdata/pg_clog/000C 0 49222 read:entry pid 9423 reading /export/home0/igen/pgdata/pg_clog/000C 0 49222 read:entry pid 8731 reading /export/home0/igen/pgdata/pg_clog/000C 0 49222 read:entry pid 8719 reading /export/home0/igen/pgdata/pg_clog/000C 0 49222 read:entry pid 9019 reading /export/home0/igen/pgdata/pg_clog/000C 1 49222 read:entry pid 9255 reading /export/home0/igen/pgdata/pg_clog/000C 1 49222 read:entry pid 8867 reading /export/home0/igen/pgdata/pg_clog/000C Later on during another run I added ustack() after the printf in the above script to get the function name also: # ./rsnoop.d dtrace: script './rsnoop.d' matched 1 probe CPU IDFUNCTION:NAME 0 49222 read:entry pid 10956 reading /export/home0/igen/pgdata/pg_clog/0011 libc.so.1`_read+0xa postgres`SimpleLruReadPage+0x3e6 postgres`SimpleLruReadPage_ReadOnly+0x9b postgres`TransactionIdGetStatus+0x1f postgres`TransactionIdDidCommit+0x42 postgres`HeapTupleSatisfiesVacuum+0x21a postgres`heap_prune_chain+0x14b postgres`heap_page_prune_opt+0x1e6 postgres`index_getnext+0x144 postgres`IndexNext+0xe1 postgres`ExecScan+0x189 postgres`ExecIndexScan+0x43 postgres`ExecProcNode+0x183 postgres`ExecutePlan+0x9e postgres`ExecutorRun+0xab postgres`PortalRunSelect+0x47a postgres`PortalRun+0x262 postgres`exec_execute_message+0x565 postgres`PostgresMain+0xf45 postgres`BackendRun+0x3f9 0 49222 read:entry pid 10414 reading /export/home0/igen/pgdata/pg_clog/0011 libc.so.1`_read+0xa postgres`SimpleLruReadPage+0x3e6 postgres`SimpleLruReadPage_ReadOnly+0x9b postgres`TransactionIdGetStatus+0x1f postgres`TransactionIdDidCommit+0x42 postgres`HeapTupleSatisfiesVacuum+0x21a postgres`heap_prune_chain+0x14b postgres`heap_page_prune_opt+0x1e6 postgres`index_getnext+0x144 postgres`IndexNext+0xe1 postgres`ExecScan+0x189 ^C libc.so.1`_read+0xa postgres`SimpleLruReadPage+0x3e6 postgres`SimpleLruReadPage_ReadOnly+0x9b postgres`TransactionIdGetStatus+0x1f postgres`TransactionIdDidCommit+0x42 postgres`HeapTupleSatisfiesMVCC+0x34f postgres`index_getnext+0x29e postgres`IndexNext+0xe1 postgres`ExecScan+0x189 postgres`ExecIndexScan+0x43 postgres`ExecProcNode+0x183 postgres`ExecutePlan+0x9e postgres`ExecutorRun+0xab postgres`PortalRunSelect+0x47a postg
Re: [PERFORM] CLOG Patch
I tried with CLOG 24 also and I got linear performance upto 1250 users after which it started to tank. 32 got us to 1350 users before some other bottleneck overtook it. Based on what Tom said earlier, it might then make sense to make it a tunable with the default of 8 but something one can change for high number of users. Thanks. Regards, Jignesh Simon Riggs wrote: On Fri, 2007-08-03 at 16:09 -0400, Jignesh K. Shah wrote: This patch seems to work well (both with 32 and 64 value but not with 16 and the default 8). Could you test at 24 please also? Tom has pointed out the additional cost of setting this higher, even in workloads that don't benefit from the I/O-induced contention reduction. Is there a way we can integrate this in 8.3? I just replied to Josh's thread on -hackers about this. This will improve out of box performance quite a bit for high number of users (atleat 30% in my OLTP test) Yes, thats good. Will this have a dramatic effect on a particular benchmark, or for what reason might we need this? Tom has questioned the use case here, so I think it would be good to explain a little more for everyone. Thanks. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] CLOG Patch
Hi Simon, This patch seems to work well (both with 32 and 64 value but not with 16 and the default 8). Is there a way we can integrate this in 8.3? This will improve out of box performance quite a bit for high number of users (atleat 30% in my OLTP test) Regards, Jignesh Simon Riggs wrote: On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote: However at 900 Users where the big drop in throughput occurs: It gives a different top "consumer" of time: postgres`LWLockAcquire+0x1c8 postgres`SimpleLruReadPage+0x1ac postgres`TransactionIdGetStatus+0x14 postgres`TransactionLogFetch+0x58 TransactionIdGetStatus doesn't directly call SimpleLruReadPage(). Presumably the compiler has been rearranging things?? Looks like you're out of clog buffers. It seems like the clog buffers aren't big enough to hold clog pages for long enough and the SELECT FOR SHARE processing is leaving lots of additional read locks that are increasing the number of clog requests for older xids. Try the enclosed patch. Index: src/include/access/clog.h === RCS file: /projects/cvsroot/pgsql/src/include/access/clog.h,v retrieving revision 1.19 diff -c -r1.19 clog.h *** src/include/access/clog.h 5 Jan 2007 22:19:50 - 1.19 --- src/include/access/clog.h 26 Jul 2007 15:44:58 - *** *** 29,35 /* Number of SLRU buffers to use for clog */ ! #define NUM_CLOG_BUFFERS 8 extern void TransactionIdSetStatus(TransactionId xid, XidStatus status); --- 29,35 /* Number of SLRU buffers to use for clog */ ! #define NUM_CLOG_BUFFERS 64 extern void TransactionIdSetStatus(TransactionId xid, XidStatus status); ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] User concurrency thresholding: where do I look?
Yep quite a bit of transactions .. But the piece that's slow is where it is clearing it up in CommitTransaction(). I am not sure of how ProcArrayLock is designed to work and hence not clear what we are seeing is what we expect. Do we have some design doc on ProcArrayLock to understand its purpose? Thanks. Regards, Jignesh Josh Berkus wrote: Simon, Well thats pretty weird. That code path clearly only happens once per transaction and ought to be fast. The other code paths that take ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData() ought to spend more time holding the lock. Presumably you are running with a fair number of SERIALIZABLE transactions? Given that this is TPCC-analog, I'd assume that we are. Jignesh? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] User concurrency thresholding: where do I look?
With CLOG 16 the drp[s comes at about 1150 users with the following lock stats bash-3.00# ./4_lwlock_waits.d 16404 Lock IdMode Count ProcArrayLock Shared 2 XidGenLock Exclusive 2 XidGenLock Shared 4 WALInsertLock Exclusive 7 CLogControlLock Shared 8 WALWriteLock Exclusive 46 ProcArrayLock Exclusive 64 CLogControlLock Exclusive 263 Lock IdMode Combined Time (ns) XidGenLock Exclusive 528300 ProcArrayLock Shared 968800 WALInsertLock Exclusive 4090900 XidGenLock Shared 73987600 WALWriteLock Exclusive 86200700 ProcArrayLock Exclusive130756000 CLogControlLock Shared240471000 CLogControlLock Exclusive 4115158500 So I think 32 is a better option for CLogs before ProcArrayLock becomes the bottleneck. Though I havent seen what we can do with ProcArrayLock problem. Regards, Jignesh Jignesh K. Shah wrote: Using CLOG Buffers 32 and the commit sibling check patch I still see a drop at 1200-1300 users.. bash-3.00# ./4_lwlock_waits.d 18250 Lock IdMode Count XidGenLock Shared 1 CLogControlLock Shared 2 ProcArrayLock Shared 2 XidGenLock Exclusive 4 CLogControlLock Exclusive 15 WALInsertLock Exclusive 18 WALWriteLock Exclusive 38 ProcArrayLock Exclusive 77 Lock IdMode Combined Time (ns) XidGenLock Shared88700 WALInsertLock Exclusive 69556000 ProcArrayLock Shared 95656800 XidGenLock Exclusive139634100 CLogControlLock Exclusive148822200 CLogControlLock Shared16163 WALWriteLock Exclusive332781800 ProcArrayLock Exclusive 5688265500 bash-3.00# ./4_lwlock_waits.d 18599 Lock IdMode Count ProcArrayLock Shared 2 XidGenLock Exclusive 3 XidGenLock Shared 4 CLogControlLock Shared 5 WALInsertLock Exclusive 10 CLogControlLock Exclusive 21 WALWriteLock Exclusive 28 ProcArrayLock Exclusive 54 Lock IdMode Combined Time (ns) XidGenLock Exclusive 5688800 WALInsertLock Exclusive 11424700 CLogControlLock Shared 55589100 ProcArrayLock Shared135220400 WALWriteLock Exclusive177906900 XidGenLock Shared524146500 CLogControlLock Exclusive524563900 ProcArrayLock Exclusive 5828744500 bash-3.00# bash-3.00# ./6_lwlock_stack.d 4 18599 Lock IdMode Count ProcArrayLock Shared 1 ProcArrayLock Exclusive 52 Lock IdMode Combined Time (ns) ProcArrayLock Shared 41428300 ProcArrayLock Exclusive 3858386500 Lock Id Combined Time (ns) postgres`LWLockAcquire+0x1f0 postgres`GetSnapshotData+0x120 postgres`GetTransactionSnapshot+0x80 postgres`PortalStart+0x198 postgres`exec_bind_message+0x84c postgres`PostgresMain+0x17f8 postgres`BackendRun+0x2f8 postgres`ServerLoop+0x680 postgres`PostmasterMain+0xda8 postgres`main+0x3d0 postgres`_start+0x17c Shared 41428300 postgres`LWLockAcquire+0x1f0 postgres`CommitTransaction+0x104 postgres`CommitTransactionCommand+0xbc postgres`finish_xact_command+0x78 postgres`exec_execute_message+0x42c postgres`PostgresMain+0x1838 postgres`BackendRun+0x2f8 postgres`ServerLoop+0x680 postgres`PostmasterMain+0xda8 postgres`main+0x3d0 postgres`_start+0x17c Exclusive3858386500 -Jignesh ---(end of broadcast
Re: [PERFORM] User concurrency thresholding: where do I look?
Using CLOG Buffers 32 and the commit sibling check patch I still see a drop at 1200-1300 users.. bash-3.00# ./4_lwlock_waits.d 18250 Lock IdMode Count XidGenLock Shared 1 CLogControlLock Shared 2 ProcArrayLock Shared 2 XidGenLock Exclusive 4 CLogControlLock Exclusive 15 WALInsertLock Exclusive 18 WALWriteLock Exclusive 38 ProcArrayLock Exclusive 77 Lock IdMode Combined Time (ns) XidGenLock Shared88700 WALInsertLock Exclusive 69556000 ProcArrayLock Shared 95656800 XidGenLock Exclusive139634100 CLogControlLock Exclusive148822200 CLogControlLock Shared16163 WALWriteLock Exclusive332781800 ProcArrayLock Exclusive 5688265500 bash-3.00# ./4_lwlock_waits.d 18599 Lock IdMode Count ProcArrayLock Shared 2 XidGenLock Exclusive 3 XidGenLock Shared 4 CLogControlLock Shared 5 WALInsertLock Exclusive 10 CLogControlLock Exclusive 21 WALWriteLock Exclusive 28 ProcArrayLock Exclusive 54 Lock IdMode Combined Time (ns) XidGenLock Exclusive 5688800 WALInsertLock Exclusive 11424700 CLogControlLock Shared 55589100 ProcArrayLock Shared135220400 WALWriteLock Exclusive177906900 XidGenLock Shared524146500 CLogControlLock Exclusive524563900 ProcArrayLock Exclusive 5828744500 bash-3.00# bash-3.00# ./6_lwlock_stack.d 4 18599 Lock IdMode Count ProcArrayLock Shared 1 ProcArrayLock Exclusive 52 Lock IdMode Combined Time (ns) ProcArrayLock Shared 41428300 ProcArrayLock Exclusive 3858386500 Lock Id Combined Time (ns) postgres`LWLockAcquire+0x1f0 postgres`GetSnapshotData+0x120 postgres`GetTransactionSnapshot+0x80 postgres`PortalStart+0x198 postgres`exec_bind_message+0x84c postgres`PostgresMain+0x17f8 postgres`BackendRun+0x2f8 postgres`ServerLoop+0x680 postgres`PostmasterMain+0xda8 postgres`main+0x3d0 postgres`_start+0x17c Shared 41428300 postgres`LWLockAcquire+0x1f0 postgres`CommitTransaction+0x104 postgres`CommitTransactionCommand+0xbc postgres`finish_xact_command+0x78 postgres`exec_execute_message+0x42c postgres`PostgresMain+0x1838 postgres`BackendRun+0x2f8 postgres`ServerLoop+0x680 postgres`PostmasterMain+0xda8 postgres`main+0x3d0 postgres`_start+0x17c Exclusive3858386500 -Jignesh ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] User concurrency thresholding: where do I look?
I tried CLOG Buffers 32 and the performance is as good as 64 bit.. (I havent tried 16 yet though.. ) I am going to try your second patch now.. Also here is the breakup by Mode. The combined time is the total time it waits for all counts. Lock IdMode Count ProcArrayLock Shared 1 CLogControlLock Exclusive 4 CLogControlLock Shared 4 XidGenLock Shared 4 XidGenLock Exclusive 7 WALInsertLock Exclusive 21 WALWriteLock Exclusive 62 ProcArrayLock Exclusive 79 Lock IdModeCombined Time (ns) CLogControlLockExclusive 325200 CLogControlLock Shared 4509200 XidGenLockExclusive 11839600 ProcArrayLock Shared 40506600 XidGenLock Shared119013700 WALInsertLockExclusive148063100 WALWriteLockExclusive347052100 ProcArrayLockExclusive 1054780600 Here is another one at higher user count 1600: bash-3.00# ./4_lwlock_waits.d 9208 Lock IdMode Count CLogControlLock Exclusive 1 CLogControlLock Shared 2 XidGenLock Shared 7 WALInsertLock Exclusive 12 WALWriteLock Exclusive 50 ProcArrayLock Exclusive 82 Lock IdMode Combined Time (ns) CLogControlLockExclusive27300 XidGenLock Shared 14689300 CLogControlLock Shared 72664900 WALInsertLockExclusive101431300 WALWriteLockExclusive534357400 ProcArrayLockExclusive 4110350300 Now I will try with your second patch. Regards, Jignesh Simon Riggs wrote: On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote: Lock Id Combined Time (ns) XidGenLock194966200 WALInsertLock517955000 CLogControlLock679665100 WALWriteLock 2838716200 ProcArrayLock 44181002600 Is this the time the lock is held for or the time that we wait for that lock? It would be good to see the break down of time separately for shared and exclusive. Can we have a table like this: LockId,LockMode,SumTimeLockHeld,SumTimeLockWait Top Wait time seems to come from the following code path for ProcArrayLock: Lock IdMode Count ProcArrayLock Exclusive 21 Lock Id Combined Time (ns) ProcArrayLock 5255937500 Lock Id Combined Time (ns) postgres`LWLockAcquire+0x1f0 postgres`CommitTransaction+0x104 postgres`CommitTransactionCommand+0xbc postgres`finish_xact_command+0x78 Well thats pretty weird. That code path clearly only happens once per transaction and ought to be fast. The other code paths that take ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData() ought to spend more time holding the lock. Presumably you are running with a fair number of SERIALIZABLE transactions? Are you running with commit_delay > 0? Its possible that the call to CountActiveBackends() is causing pinging of the procarray by other backends while we're trying to read it during CommitTransaction(). If so, try the attached patch. Index: src/backend/access/transam/xact.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.245 diff -c -r1.245 xact.c *** src/backend/access/transam/xact.c 7 Jun 2007 21:45:58 - 1.245 --- src/backend/access/transam/xact.c 27 Jul 2007 09:09:08 - *** *** 820,827 * are fewer than CommitSiblings other backends with active * transactions. */ ! if (CommitDelay > 0 && enableFsync && ! CountActiveBackends() >= CommitSiblings) pg_usleep(CommitDelay); XLogFlush(recptr); --- 820,826 * are fewer than CommitSiblings other backends with active * transactions. */ ! if (CommitDelay > 0 && enableFsync) pg_usle
Re: [PERFORM] User concurrency thresholding: where do I look?
Yes I can try to breakup the Shared and exclusive time.. Also yes I use commit delays =10, it helps out a lot in reducing IO load.. I will try out the patch soon. -Jignesh Simon Riggs wrote: On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote: Lock Id Combined Time (ns) XidGenLock194966200 WALInsertLock517955000 CLogControlLock679665100 WALWriteLock 2838716200 ProcArrayLock 44181002600 Is this the time the lock is held for or the time that we wait for that lock? It would be good to see the break down of time separately for shared and exclusive. Can we have a table like this: LockId,LockMode,SumTimeLockHeld,SumTimeLockWait Top Wait time seems to come from the following code path for ProcArrayLock: Lock IdMode Count ProcArrayLock Exclusive 21 Lock Id Combined Time (ns) ProcArrayLock 5255937500 Lock Id Combined Time (ns) postgres`LWLockAcquire+0x1f0 postgres`CommitTransaction+0x104 postgres`CommitTransactionCommand+0xbc postgres`finish_xact_command+0x78 Well thats pretty weird. That code path clearly only happens once per transaction and ought to be fast. The other code paths that take ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData() ought to spend more time holding the lock. Presumably you are running with a fair number of SERIALIZABLE transactions? Are you running with commit_delay > 0? Its possible that the call to CountActiveBackends() is causing pinging of the procarray by other backends while we're trying to read it during CommitTransaction(). If so, try the attached patch. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] User concurrency thresholding: where do I look?
Will try 16 and 32 CLOGBUFFER tomorrow: But here is locks data again with about increased time profiling (about 2 minutes) for the connection with about 2000 users: bash-3.00# time ./4_lwlock_waits.d 13583 ^C Lock IdMode Count ProcArrayLock Shared 5 XidGenLock Exclusive 13 CLogControlLock Shared 14 XidGenLock Shared 15 CLogControlLock Exclusive 21 WALInsertLock Exclusive 77 WALWriteLock Exclusive 175 ProcArrayLock Exclusive 275 Lock Id Combined Time (ns) XidGenLock194966200 WALInsertLock517955000 CLogControlLock679665100 WALWriteLock 2838716200 ProcArrayLock 44181002600 Top Wait time seems to come from the following code path for ProcArrayLock: Lock IdMode Count ProcArrayLock Exclusive 21 Lock Id Combined Time (ns) ProcArrayLock 5255937500 Lock Id Combined Time (ns) postgres`LWLockAcquire+0x1f0 postgres`CommitTransaction+0x104 postgres`CommitTransactionCommand+0xbc postgres`finish_xact_command+0x78 postgres`exec_execute_message+0x42c postgres`PostgresMain+0x1838 postgres`BackendRun+0x2f8 postgres`ServerLoop+0x680 postgres`PostmasterMain+0xda8 postgres`main+0x3d0 postgres`_start+0x17c 5255937500 Regards, Jignesh Simon Riggs wrote: On Thu, 2007-07-26 at 15:44 -0400, Jignesh K. Shah wrote: BEAUTIFUL!!! Using the Patch I can now go upto 1300 users without dropping.. But now it still repeats at 1300-1350 users.. OK, can you try again with 16 and 32 buffers please? We need to know how many is enough and whether this number needs to be variable via a parameter, or just slightly larger by default. Thanks. I corrected the Lock Descriptions based on what I got from lwlock.h and retried the whole scalability again with profiling again.. This time it looks like the ProcArrayLock That's what I would expect with that many users. Lock IdMode Count XidGenLock Exclusive 1 CLogControlLock Shared 2 XidGenLock Shared 2 WALWriteLock Exclusive 4 WALInsertLock Exclusive 8 CLogControlLock Exclusive 9 ProcArrayLock Exclusive 9 ...but as Tom mentioned, we need to do longer runs now so these counts get to somewhere in the hundreds so we have some statistical validity. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] User concurrency thresholding: where do I look?
WALInsertLock 2177500 CLogControlLock 9064700 ProcArrayLock199216000 bash-3.00# ./6_lwlock_stack.d 4 7056 Lock IdMode Count ProcArrayLock Shared 3 ProcArrayLock Exclusive 38 Lock Id Combined Time (ns) ProcArrayLock858238600 Lock Id Combined Time (ns) postgres`LWLockAcquire+0x1c8 postgres`TransactionIdIsInProgress+0x50 postgres`HeapTupleSatisfiesVacuum+0x2ec postgres`_bt_check_unique+0x2a0 postgres`_bt_doinsert+0x98 postgres`btinsert+0x54 postgres`FunctionCall6+0x44 postgres`index_insert+0x90 postgres`ExecInsertIndexTuples+0x1bc postgres`ExecUpdate+0x500 postgres`ExecutePlan+0x704 postgres`ExecutorRun+0x60 postgres`PortalRunMulti+0x2a0 postgres`PortalRun+0x310 postgres`exec_execute_message+0x3a0 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 postgres`main+0x394 167100 postgres`LWLockAcquire+0x1c8 postgres`GetSnapshotData+0x118 postgres`GetTransactionSnapshot+0x5c postgres`PortalRunMulti+0x22c postgres`PortalRun+0x310 postgres`exec_execute_message+0x3a0 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 postgres`main+0x394 postgres`_start+0x108 7125900 postgres`LWLockAcquire+0x1c8 postgres`CommitTransaction+0xe0 postgres`CommitTransactionCommand+0x90 postgres`finish_xact_command+0x60 postgres`exec_execute_message+0x3d8 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 postgres`main+0x394 postgres`_start+0x108 850945600 bash-3.00# echo 1100 users - DROP 1100 users bash-3.00# ./4_lwlock_waits.d 7056 Lock IdMode Count CLogControlLock Shared 1 WALWriteLock Exclusive 1 XidGenLock Exclusive 1 ProcArrayLock Shared 2 WALInsertLock Exclusive 2 XidGenLock Shared 2 CLogControlLock Exclusive 4 ProcArrayLock Exclusive 20 Lock Id Combined Time (ns) WALWriteLock 4179500 XidGenLock 6249400 CLogControlLock 20411100 WALInsertLock 29707600 ProcArrayLock207923700 bash-3.00# ./6_lwlock_stack.d 4 7056 Lock IdMode Count ProcArrayLock Exclusive 40 Lock Id Combined Time (ns) ProcArrayLock692242100 Lock Id Combined Time (ns) postgres`LWLockAcquire+0x1c8 postgres`CommitTransaction+0xe0 postgres`CommitTransactionCommand+0x90 postgres`finish_xact_command+0x60 postgres`exec_execute_message+0x3d8 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 postgres`main+0x394 postgres`_start+0x108 692242100 bash-3.00# LockID for ProcArrayLock is 4 or the 5 entry in lwlock.h which seems to indicate it is lwlock.h Any tweaks for that? -Jignesh Simon Riggs wrote: On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote: However at 900 Users where the big drop in throughput occurs: It gives a different top "consumer" of time: postgres`LWLockAcquire+0x1c8 postgres`SimpleLruReadPage+0x1ac postgres`TransactionIdGetStatus+0x14 postgres`TransactionLogFetch+0x58 TransactionIdGetStatus doesn't directly call SimpleLruReadPage(). Presumably the compiler has been rearranging things?? Looks like you're out of clog buffers. It seems like the clog buffers aren't big enough to hold clog pages for long enough and the SELECT FOR SHARE processing is leaving lots of additional read locks that are increasing the number of clog requests for older xids. Try the enclosed patch. Index:
Re: [PERFORM] User concurrency thresholding: where do I look?
I will look for runs with longer samples.. Now the script could have mislabeled lock names.. Anyway digging into the one that seems to increase over time... I did stack profiles on how that increases... and here are some numbers.. For 600-850 Users: that potential mislabeled CheckPointStartLock or LockID==12 comes from various sources where the top source (while system is still doing great) comes from: postgres`LWLockAcquire+0x1c8 postgres`SimpleLruReadPage_ReadOnly+0xc postgres`TransactionIdGetStatus+0x14 postgres`TransactionLogFetch+0x58 postgres`TransactionIdDidCommit+0x4 postgres`HeapTupleSatisfiesSnapshot+0x234 postgres`heap_release_fetch+0x1a8 postgres`index_getnext+0xf4 postgres`IndexNext+0x7c postgres`ExecScan+0x8c postgres`ExecProcNode+0xb4 postgres`ExecutePlan+0xdc postgres`ExecutorRun+0xb0 postgres`PortalRunSelect+0x9c postgres`PortalRun+0x244 postgres`exec_execute_message+0x3a0 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 8202100 postgres`LWLockAcquire+0x1c8 postgres`TransactionIdSetStatus+0x1c postgres`RecordTransactionCommit+0x2a8 postgres`CommitTransaction+0xc8 postgres`CommitTransactionCommand+0x90 postgres`finish_xact_command+0x60 postgres`exec_execute_message+0x3d8 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 postgres`main+0x394 postgres`_start+0x108 30822100 However at 900 Users where the big drop in throughput occurs: It gives a different top "consumer" of time: postgres`LWLockAcquire+0x1c8 postgres`TransactionIdSetStatus+0x1c postgres`RecordTransactionCommit+0x2a8 postgres`CommitTransaction+0xc8 postgres`CommitTransactionCommand+0x90 postgres`finish_xact_command+0x60 postgres`exec_execute_message+0x3d8 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 postgres`main+0x394 postgres`_start+0x108 406601300 postgres`LWLockAcquire+0x1c8 postgres`SimpleLruReadPage+0x1ac postgres`TransactionIdGetStatus+0x14 postgres`TransactionLogFetch+0x58 postgres`TransactionIdDidCommit+0x4 postgres`HeapTupleSatisfiesUpdate+0x360 postgres`heap_lock_tuple+0x27c postgres`ExecutePlan+0x33c postgres`ExecutorRun+0xb0 postgres`PortalRunSelect+0x9c postgres`PortalRun+0x244 postgres`exec_execute_message+0x3a0 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 postgres`main+0x394 postgres`_start+0x108 444523100 postgres`LWLockAcquire+0x1c8 postgres`SimpleLruReadPage+0x1ac postgres`TransactionIdGetStatus+0x14 postgres`TransactionLogFetch+0x58 postgres`TransactionIdDidCommit+0x4 postgres`HeapTupleSatisfiesSnapshot+0x234 postgres`heap_release_fetch+0x1a8 postgres`index_getnext+0xf4 postgres`IndexNext+0x7c postgres`ExecScan+0x8c postgres`ExecProcNode+0xb4 postgres`ExecutePlan+0xdc postgres`ExecutorRun+0xb0 postgres`PortalRunSelect+0x9c postgres`PortalRun+0x244 postgres`exec_execute_message+0x3a0 postgres`PostgresMain+0x1300 postgres`BackendRun+0x278 postgres`ServerLoop+0x63c postgres`PostmasterMain+0xc40 166130 Maybe you all will understand more than I do on what it does here.. Looks like IndexNext has a problem at high number of users to me.. but I could be wrong.. -Jignesh Tom Lane wrote: "Jignesh K. Shah" <[EMAIL PROTECTED]> writes: The count is only for a 10-second snapshot.. Plus remember there are about 1000 users running so the connection being profiled only gets 0.01 of the period on CPU.. And the count is for that CONNECTION only. OK, that explains the low absolute levels of the counts, but if the counts are for a regular backend then there's still a lot of bogosity here. Backends won't be taking the CheckpointLock at all, nor do they take CheckpointStartLock in ex
Re: [PERFORM] User concurrency thresholding: where do I look?
tStartLock Exclusive 102 Lock Id Combined Time (ns) OidGenLock21900 WALInsertLock82500 XidGenLock 3313400 CheckpointStartLock 1448289900 bash-3.00# echo 1050 users 1050 users bash-3.00# ./4_lwlock_waits.d 20764 Lock IdMode Count FreeSpaceLock Exclusive 1 CheckpointStartLock Shared 3 XidGenLock Exclusive 3 CheckpointStartLock Exclusive 146 Lock Id Combined Time (ns) FreeSpaceLock18400 XidGenLock 1900900 CheckpointStartLock 2392893700 bash-3.00# -Jignesh Tom Lane wrote: "Jignesh K. Shah" <[EMAIL PROTECTED]> writes: Here is how I got the numbers.. I had about 1600 users login into postgresql. Then started the run with 500 users and using DTrace I started tracking Postgresql Locking "as viewed from one user/connection". Echo statements indicate how many users were active at that point and how was throughput performing. All IO is done on /tmp which means on a RAM disk. bash-3.00# echo 500 users - baseline number 500 users bash-3.00# ./3_lwlock_acquires.d 19178 I don't think I believe these numbers. For one thing, CheckpointLock is simply not ever taken in shared mode. The ratios of counts for different locks seems pretty improbable too, eg there is no way on earth that the LockMgrLocks are taken more often shared than exclusive (I would expect no shared acquires at all in the sort of test you are running). Not to mention that the absolute number of counts seems way too low. So I think the counting tool is broken. Combined time of what exactly? It looks like this must be the total duration the lock is held, at least assuming that the time for CheckpointLock is correctly reported. It'd be much more useful to see the total time spent waiting to acquire the lock. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] User concurrency thresholding: where do I look?
Here is how I got the numbers.. I had about 1600 users login into postgresql. Then started the run with 500 users and using DTrace I started tracking Postgresql Locking "as viewed from one user/connection". Echo statements indicate how many users were active at that point and how was throughput performing. All IO is done on /tmp which means on a RAM disk. bash-3.00# echo 500 users - baseline number 500 users bash-3.00# ./3_lwlock_acquires.d 19178 Lock IdMode Count FirstLockMgrLock Exclusive 1 RelCacheInitLock Exclusive 2 SInvalLock Exclusive 2 WALInsertLock Exclusive 10 BufMappingLock Exclusive 12 CheckpointLock Shared 29 CheckpointStartLock Shared 29 OidGenLock Exclusive 29 XidGenLock Exclusive 29 FirstLockMgrLock Shared 33 CheckpointStartLock Exclusive 78 FreeSpaceLock Exclusive 114 OidGenLock Shared 126 XidGenLock Shared 152 ProcArrayLock Shared 482 Lock Id Combined Time (ns) SInvalLock29800 RelCacheInitLock30300 BufMappingLock 168800 FirstLockMgrLock 414300 FreeSpaceLock 1281700 ProcArrayLock 7869900 WALInsertLock 3200 CheckpointStartLock 13494700 OidGenLock 25719100 XidGenLock 26443300 CheckpointLock194267800 bash-3.00# echo 600 users - Throughput rising 600 users bash-3.00# ./3_lwlock_acquires.d 19178 Lock IdMode Count RelCacheInitLock Exclusive 1 SInvalLock Exclusive 1 BufMappingLock Exclusive 2 CLogControlLock Exclusive 2 WALInsertLock Exclusive 11 FirstLockMgrLock Shared 20 CheckpointLock Shared 24 CheckpointStartLock Shared 24 OidGenLock Exclusive 24 XidGenLock Exclusive 24 CheckpointStartLock Exclusive 72 FreeSpaceLock Exclusive 102 OidGenLock Shared 106 XidGenLock Shared 128 ProcArrayLock Shared 394 Lock Id Combined Time (ns) SInvalLock15600 RelCacheInitLock15700 BufMappingLock31000 CLogControlLock41500 FirstLockMgrLock 289000 FreeSpaceLock 3045400 CheckpointStartLock 7371800 WALInsertLock 9383200 ProcArrayLock 10457900 OidGenLock 20005900 XidGenLock 20331500 CheckpointLock187067900 bash-3.00# echo 700 users - Throughput rising 700 users bash-3.00# ./3_lwlock_acquires.d 19178 Lock IdMode Count RelCacheInitLock Exclusive 1 SInvalLock Exclusive 1 BufMappingLock Exclusive 2 WALInsertLock Exclusive 17 CheckpointLock Shared 33 CheckpointStartLock Shared 33 OidGenLock Exclusive 33 XidGenLock Exclusive 33 FirstLockMgrLock Shared 81 CheckpointStartLock Exclusive 87 FreeSpaceLock Exclusive 124 OidGenLock Shared 125 XidGenLock Shared 150 ProcArrayLock Shared 500 Lock Id Combined Time (ns) RelCacheInitLock15100 SInvalLock15400 BufMappingLock47400 FirstLockMgrLock 3021000 FreeSpaceLock 3794300 WALInsertLock 7567300 XidGenLock 18427400 ProcArrayLock 20884000 CheckpointStartLock 24084900 OidGenLock 26399500 CheckpointLock256549800 bash-3.00# echo 800 users - Throughput rising 800 users bash-3.00# ./3_lwlock_acquires.d 19178 Lock IdMode Count BufMappingLock Exclusive 1 RelCacheInitLock Exclusive 1 SInvalLock Exclusive 1 WALWriteLock Exclusive 1 WALInsertLock
[PERFORM] Second SpecJAppserver2004 with PostgreSQL
http://blogs.sun.com/jkshah/entry/specjappserver2004_with_glassfish_v2_and This time with 33% less App Server hardware but same setup for PostgreSQL 8.2.4 with 4.5% better score .. There has been reduction in CPU utilization by postgresql with the new app server which means there is potential to do more JOPS. But looks like Simon Rigg added another project for us to look into maxalign side effect with Solaris on SPARC before doing more benchmarks now. Cheers, Jignesh ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] User concurrency thresholding: where do I look?
12805 45 18 0 37 50 0 212 2847 10590 262 3042 3880 11789 44 17 0 39 61 0 154 307 48 9583 241 2903 3240 10564 50 15 0 35 70 0 840 535 206 10354 247 3035 4280 11700 42 22 0 37 160 0 169 3035 7446 286 2250 2900 8361 42 13 0 45 170 0 173 2405 7640 225 2288 2950 8674 41 13 0 47 180 0 170 2895 7445 270 2108 2860 8167 41 12 0 47 190 0 176 511185 7365 197 2138 2880 7934 40 13 0 47 201 0 172 2226 7835 204 2323 2980 8759 40 14 0 46 210 0 167 2335 7749 218 2339 3260 8264 42 13 0 46 220 0 749 6612 6516 4173 97 1166 4210 4741 23 44 0 33 230 0 181 2396 7709 219 2258 3830 8402 41 12 0 47 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 198 4396 10364 417 3113 3270 11962 49 17 0 34 10 0 210 2996 10655 282 3135 3460 12463 47 17 0 36 20 0 202 3528 9960 332 2890 3200 11261 47 16 0 37 30 0 182 2766 9950 255 2857 3340 11021 46 16 0 38 40 0 200 3056 10841 286 3127 3250 12440 48 18 0 35 50 0 240 2866 9983 272 2912 3580 11450 46 16 0 37 60 0 153 323 81 9062 233 2767 3000 9675 49 18 0 33 70 0 850 556 206 10027 271 2910 4150 11048 43 22 0 35 160 0 152 3065 7261 291 2216 2660 8055 44 12 0 44 170 0 151 2365 7193 217 2170 2830 8099 43 12 0 45 180 0 170 2635 7008 246 2009 2540 7836 43 12 0 46 190 0 165 477385 6824 197 1989 2730 7663 42 12 0 46 200 0 188 2176 7496 197 2800 8435 43 13 0 44 210 0 179 2485 7352 234 2233 3090 8237 43 12 0 44 220 0 813 6041 5963 4006 82 1125 4480 4442 25 42 0 33 230 0 162 2415 7364 225 2170 3550 7720 43 11 0 45 Tom Lane wrote: "Jignesh K. Shah" <[EMAIL PROTECTED]> writes: What its saying is that there are holds/waits in trying to get locks which are locked at Solaris user library levels called from the postgresql functions: For example both the following functions are hitting on the same mutex lock 0x10059e280 in Solaris Library call: postgres`AllocSetDelete+0x98 postgres`AllocSetAlloc+0x1c4 That's a perfect example of the sort of useless overhead that I was complaining of just now in pgsql-patches. Having malloc/free use an internal mutex is necessary in multi-threaded programs, but the backend isn't multi-threaded. And yet, apparently you can't turn that off in Solaris. (Fortunately, the palloc layer is probably insulating us from malloc's performance enough that this isn't a huge deal. But it's annoying.) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] User concurrency thresholding: where do I look?
sorry.. The are solaris mutex locks used by the postgresql process. What its saying is that there are holds/waits in trying to get locks which are locked at Solaris user library levels called from the postgresql functions: For example both the following functions are hitting on the same mutex lock 0x10059e280 in Solaris Library call: postgres`AllocSetDelete+0x98 postgres`AllocSetAlloc+0x1c4 I need to enable the DTrace probes on my builds -Jignesh Tom Lane wrote: "Jignesh K. Shah" <[EMAIL PROTECTED]> writes: Tom Lane wrote: So follow that up --- try to determine which lock is being contended for. There's some very crude code in the sources that you can enable with -DLWLOCK_STATS, but probably DTrace would be a better tool. Using plockstat -A -s 5 -p $pid I don't know what that is, but it doesn't appear to have anything to do with Postgres LWLocks or spinlocks, which are the locks I was thinking of. Try asking Robert Lor about this --- IIRC he had some dtrace probes to work with our locks. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] User concurrency thresholding: where do I look?
Tom Lane wrote: "Jignesh K. Shah" <[EMAIL PROTECTED]> writes: Yes I did see increase in context switches and CPU migrations at that point using mpstat. So follow that up --- try to determine which lock is being contended for. There's some very crude code in the sources that you can enable with -DLWLOCK_STATS, but probably DTrace would be a better tool. regards, tom lane Using plockstat -A -s 5 -p $pid on bgwriter: doesnt report anything On one of the many connections: This one is hard to read easily By default, plockstat monitors all lock con- tention events, gathers frequency and timing data about those events, and displays the data in decreasing frequency order, so that the most common events appear first. ^Cbash-3.00# plockstat -A -s 5 -p 6401 ^C Mutex hold --- Count nsec Lock Caller 59 186888 0x10059e280 libumem.so.1`process_free+0x12c nsec Time Distribution --- count Stack 16384 || 1 libumem.so.1`process_free+0x12c 32768 |@ |14 postgres`AllocSetDelete+0x98 65536 |@@ | 5 postgres`MemoryContextDelete+0x78 131072 || 0 postgres`CommitTransaction+0x240 262144 |@@@ |39 --- Count nsec Lock Caller 53012226 0x10059e280 libumem.so.1`umem_cache_alloc+0x200 nsec Time Distribution --- count Stack 4096 |@@@ | 338 libumem.so.1`umem_cache_alloc+0x200 8192 |@ |24 libumem.so.1`umem_alloc+0x5c 16384 |@ |37 libumem.so.1`malloc+0x40 32768 |@ | 131 postgres`AllocSetAlloc+0x1c4 --- Count nsec Lock Caller 32410214 0x100578030 libumem.so.1`vmem_xfree+0x164 nsec Time Distribution --- count Stack 4096 |@@ | 192 libumem.so.1`vmem_xfree+0x164 8192 ||56 libumem.so.1`process_free+0x12c 16384 |@ |26 postgres`AllocSetDelete+0x98 32768 |@@@ |50 postgres`MemoryContextDelete+0x78 --- Count nsec Lock Caller 16113585 0x10059e280 libumem.so.1`process_free+0x12c nsec Time Distribution --- count Stack 4096 |@ | 118 libumem.so.1`process_free+0x12c 8192 || 4 postgres`AllocSetDelete+0x98 16384 |@ |10 postgres`MemoryContextDelete+0x78 32768 |@@@ |24 postgres`PortalDrop+0x160 65536 || 3 131072 || 0 262144 || 2 --- Count nsec Lock Caller 326 6081 libumem.so.1`vmem0+0xc38 libumem.so.1`vmem_xalloc+0x630 nsec Time Distribution --- count Stack 4096 || 170 libumem.so.1`vmem_xalloc+0x630 8192 |@@@ | 155 libumem.so.1`vmem_alloc+0x1f8 16384 || 1 libumem.so.1`vmem_xalloc+0x524 libumem.so.1`vmem_alloc+0x1f8 --- Count nsec Lock Caller 326 5867 libumem.so.1`vmem0+0x30 libumem.so.1`vmem_alloc+0x248 nsec Time Distribution --- count Stack 4096 |@ | 185 libumem.so.1`vmem_alloc+0x248 8192 |@@ | 141 libumem.so.1`vmem_sbrk_alloc+0x30 libumem.so.1`vmem_xalloc+0x524 libumem.so.1`vmem_alloc+0x1f8 --- Count nsec Lock Caller 318 5873 0x100578030 libumem.so.1`vmem_alloc+0x1d0 nsec Time Distribution --- count Stack 4096 |@ | 228 libumem.so.1`vmem_alloc+0x1d0 8192 |@ |78 libumem.so.1`umem_alloc+0xec 16384 || 6 libumem.so.1`malloc+0x40 32768 || 6 postgres`AllocSetAlloc+0x1c4 --- Count nsec Lock
Re: [PERFORM] User concurrency thresholding: where do I look?
Yes I did see increase in context switches and CPU migrations at that point using mpstat. Regards, Jignesh Tom Lane wrote: "Jignesh K. Shah" <[EMAIL PROTECTED]> writes: There are no hard failures reported anywhere. Log min durations does show that queries are now slowing down and taking longer. OS is not swapping and also eliminated IO by putting the whole database on /tmp Hmm. Do you see any evidence of a context swap storm (ie, a drastic increase in the context swaps/second reading reported by vmstat)? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] User concurrency thresholding: where do I look?
I forgot to add one more piece of information.. I also tried the same test with 64-bit postgresql with 6GB shared_buffers and results are the same it drops around the same point which to me sounds like a bottleneck.. More later -Jignesh Jignesh K. Shah wrote: Awww Josh, I was just enjoying the chat on the picket fence! :-) Anyway the workload is mixed (reads,writes) with simple to medium queries. The workload is known to scale well. But inorder to provide substantial input I am still trying to eliminate things that can bottleneck. Currently I have eliminated CPU (plenty free) , RAM (memory is 48GB RAM in this server for a 32-bit postgresql instance), IO Storage (used the free ram to do /tmp database to eliminate IO) and am still trying to eliminate any network bottlenecks to say for sure we have a problem in PostgreSQL. But yes till that final thing is confirmed (network which can very well be the case) it could be a problem somewhere else. However the thing that worries me is more of the big drop instead of remaining constant out there.. Anyway more on this within a day or two once I add more network nics between the systems to eliminate network problems (even though stats dont show them as problems right now) and also reduce malloc lock penalties if any. As for other questions: max_locks_per_transactions is set to default (10 I believe) increasing it still seems to degrade overall throughput number. max_connections is set to 1500 for now till I get decent scaling till 1400-1500 users. There are no hard failures reported anywhere. Log min durations does show that queries are now slowing down and taking longer. OS is not swapping and also eliminated IO by putting the whole database on /tmp So while I finish adding more network connections between the two systems (need to get cards) do enjoy the following URL :-) http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html Of course all postgresql.conf still remains from the old test so no flames on that one again :-) Regards, Jignesh Josh Berkus wrote: Tom, Well, if the load is a lot of short writing transactions then you'd expect the throughput to depend on how fast stuff can be pushed down to WAL. What have you got wal_buffers set to? Are you using a commit delay? What's the I/O system anyway (any BB write cache on the WAL disk?) and what wal sync method are you using? You know, I think Jignesh needs to me on this list so I can stop relaying questions on a workload I didn't design. Let me get him. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] User concurrency thresholding: where do I look?
Awww Josh, I was just enjoying the chat on the picket fence! :-) Anyway the workload is mixed (reads,writes) with simple to medium queries. The workload is known to scale well. But inorder to provide substantial input I am still trying to eliminate things that can bottleneck. Currently I have eliminated CPU (plenty free) , RAM (memory is 48GB RAM in this server for a 32-bit postgresql instance), IO Storage (used the free ram to do /tmp database to eliminate IO) and am still trying to eliminate any network bottlenecks to say for sure we have a problem in PostgreSQL. But yes till that final thing is confirmed (network which can very well be the case) it could be a problem somewhere else. However the thing that worries me is more of the big drop instead of remaining constant out there.. Anyway more on this within a day or two once I add more network nics between the systems to eliminate network problems (even though stats dont show them as problems right now) and also reduce malloc lock penalties if any. As for other questions: max_locks_per_transactions is set to default (10 I believe) increasing it still seems to degrade overall throughput number. max_connections is set to 1500 for now till I get decent scaling till 1400-1500 users. There are no hard failures reported anywhere. Log min durations does show that queries are now slowing down and taking longer. OS is not swapping and also eliminated IO by putting the whole database on /tmp So while I finish adding more network connections between the two systems (need to get cards) do enjoy the following URL :-) http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html Of course all postgresql.conf still remains from the old test so no flames on that one again :-) Regards, Jignesh Josh Berkus wrote: Tom, Well, if the load is a lot of short writing transactions then you'd expect the throughput to depend on how fast stuff can be pushed down to WAL. What have you got wal_buffers set to? Are you using a commit delay? What's the I/O system anyway (any BB write cache on the WAL disk?) and what wal sync method are you using? You know, I think Jignesh needs to me on this list so I can stop relaying questions on a workload I didn't design. Let me get him. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL publishes first real benchmark
Can you list others that seemed out of place? Thanks. Regards, Jignesh Tom Lane wrote: "Jignesh K. Shah" <[EMAIL PROTECTED]> writes: Heikki Linnakangas wrote: May I ask you why you set max_prepared_transactions to 450, while you're apparently not using prepared transactions, according to this quote: Yep.. one of the things that we didn't revert back and got strayed out there. There were quite a few settings in that list that looked like random experimentation rather than recommendable good practice to me. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL publishes first real benchmark
Heikki Linnakangas wrote: May I ask you why you set max_prepared_transactions to 450, while you're apparently not using prepared transactions, according to this quote: Recoverable 2-phase transactions were used to coordinate the interaction between the database server and JMS server using Sun's Last Agent Logging Optimization; the 1PC database transactions and transaction log records are written to the database in a single transaction. Did you perhaps use 2PC at first, but didn't bother to change the config after switching to the last agent optimization? Yep.. one of the things that we didn't revert back and got strayed out there. -Jignesh ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL publishes first real benchmark
Its hard to do direct comparison since that one used a different commercial application server than the PostgreSQL test.. As we do more tests with the help of the community, hopefully we can understand where the CPU cycles are spent and see how to make them more efficient... Stay tuned!! -Jignesh Philippe Amelant wrote: am I wrong or DB2 9.1 is faster on less powerfull hardware ? Le lundi 09 juillet 2007 à 11:57 -0400, Jignesh K. Shah a écrit : Hello all, I think this result will be useful for performance discussions of postgresql against other databases. http://www.spec.org/jAppServer2004/results/res2007q3/ More on Josh Berkus's blog: http://blogs.ittoolbox.com/database/soup/archives/postgresql-publishes-first-real-benchmark-17470 Regards, Jignesh ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL publishes first real benchmark
Hi Heikki, Heikki Linnakangas wrote: That's really exciting news! I'm sure you spent a lot of time tweaking the settings, so let me ask you something topical: How did you end up with the bgwriter settings you used? Did you experiment with different values? How much difference did it make? Background writer is still a pain to get it right.. I say it is a necessary evil since you are trying to balance it with trying to level writes to the disks and lock contentions caused by the writer itself to the postgresql connections. Our typical problem will arise at the high number of users where all users are suddenly locked due to the bgwriter holding the locks.. Using the hotuser script (which uses pearl/Dtrace combination) we ran quite a bit of numbers trying to see which ones results in less overall time spent in PGLock* calls and yet gave good uniform writes to the disks. After reaching the published settings, everynow and then we would try playing with different values to see if it improves but generally seemed to degrade if changed.. (Of course your mileage will vary depending on config, workload, etc). Still I believe the locking mechanism needs to be revisited at some point since that seems to be the one which will eventually limit the number of users in such a workload. (Specially if you dont hit the right settings for your workload) Hopefully soon we will get access to bigger capacity servers and redo SMP tests on it with the background writer. Regards, Jignesh ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] PostgreSQL publishes first real benchmark
Hello all, I think this result will be useful for performance discussions of postgresql against other databases. http://www.spec.org/jAppServer2004/results/res2007q3/ More on Josh Berkus's blog: http://blogs.ittoolbox.com/database/soup/archives/postgresql-publishes-first-real-benchmark-17470 Regards, Jignesh ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Direct I/O
On Solaris you just look at the mount options on the file system and see if there is a forcedirectio option enabled. Generally since PostgreSQL doesn't use any special options for enabling directio that's a known way to figure it out on Solaris. Atleast on Solaris the performance over buffered filesystem is better for many workloads but not always. Plus you typically see a small reduction in CPU usage (system) and ofcourse memory. However depending on workload, you may see increased latency in writes but generally that's not the problem in many workloads since its the multiple writes to the same file which is better using concurrentio (modified directio) in Solaris. As for Linux I will leave that to other experts .. -Jignesh lai yoke hman wrote: > Hello, > How can I know my PostgreSQL 8 is using direct I/O or buffered I/O? If using > buffered I/O, how can I enable direct I/O? What is the performance difference > of them? > This is urgent, Thanks. > _ > Windows Live Spaces is here! It’s easy to create your own personal Web site. > http://spaces.live.com/?mkt=en-my > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings > ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] slow i/o
vice statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.0 268.08.0 2839.9 0.0 97.10.0 360.9 0 100 c1t0d0s6 (/usr) cpu us sy wt id 11 10 0 80 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 309.00.0 .5 175.2 208.9 566.9 676.2 81 99 c1t0d0s6 (/usr) cpu us sy wt id 0 0 0 100 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 330.00.0 2704.0 145.6 256.0 441.1 775.7 100 100 c1t0d0s6 (/usr) cpu us sy wt id 4 2 0 94 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 311.00.0 2543.9 151.0 256.0 485.6 823.2 100 100 c1t0d0s6 (/usr) cpu us sy wt id 2 0 0 98 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 319.00.0 2576.0 147.4 256.0 462.0 802.5 100 100 c1t0d0s6 (/usr) cpu us sy wt id 0 1 0 98 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.20.00.0 2 13 c1t0d0s1 (/var) 0.0 366.00.0 3088.0 124.4 255.8 339.9 698.8 100 100 c1t0d0s6 (/usr) cpu us sy wt id 6 5 0 90 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.02.00.0 16.0 0.0 1.10.0 533.2 0 54 c1t0d0s1 (/var) 1.0 282.08.0 2849.0 1.5 129.25.2 456.5 10 100 c1t0d0s6 (/usr) Thank you in advance for your help! Jun On 8/30/06, *Junaili Lie* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: I have tried this to no avail. I have also tried changing the bg_writer_delay parameter to 10. The spike in i/o still occurs although not in a consistent basis and it is only happening for a few seconds. On 8/30/06, *Jignesh K. Shah* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > wrote: The bgwriter parameters changed in 8.1 Try bgwriter_lru_maxpages=0 bgwriter_lru_percent=0 to turn off bgwriter and see if there is any change. -Jignesh Junaili Lie wrote: > Hi Jignesh, > Thank you for my reply. > I have the setting just like what you described: > > wal_sync_method = fsync > wal_buffers = 128 > checkpoint_segments = 128 > bgwriter_all_percent = 0 > bgwriter_maxpages = 0 > > > I ran the dtrace script and found the following: > During the i/o busy time, there are postgres processes that has very > high BYTES count. During that non i/o busy time, this same process > doesn't do a lot of i/o activity. I checked the pg_stat_activity but > couldn't found this process. Doing ps revealed that this process is > started at the same time since the postgres started, which leads me to > believe that it maybe background writer or some other internal process. > This process are not autovacuum because it doesn't disappear when I > tried turning autovacuum off. > Except for the ones mentioned above, I didn't modify the other > background setting: > MONSOON=# show bgwriter_delay ; > bgwriter_delay > > 200 > (1 row) > > MONSOON=# show bgwriter_lru_maxpages ; bgwriter_lru_maxpages > --- > 5 > (1 row) > > MONSOON=# show bgwriter_lru_percent ; > bgwriter_lru_percent > -- > 1 > (1 row) > > This i/o spike only happens at minute 1 and minute 6 (ie. 10.51, 10.56 ) > . If I do select * from pg_stat_activity during this time, I will see a > lot of write queries waiting to be processed. After a few seconds, > everything seems to be gone. All writes that are not happening at the > time of this i/o jump are being processed very fast, thus do not show on > pg_stat_activity. > > Thanks in advance for the reply, > Best, > > J > > On 8/29/06, *Jignesh K. Shah* < [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > mailto:[EMAIL PROTECTED]>>> wrote: > > Also to answer your real question:
Re: [PERFORM] slow i/o
The bgwriter parameters changed in 8.1 Try bgwriter_lru_maxpages=0 bgwriter_lru_percent=0 to turn off bgwriter and see if there is any change. -Jignesh Junaili Lie wrote: Hi Jignesh, Thank you for my reply. I have the setting just like what you described: wal_sync_method = fsync wal_buffers = 128 checkpoint_segments = 128 bgwriter_all_percent = 0 bgwriter_maxpages = 0 I ran the dtrace script and found the following: During the i/o busy time, there are postgres processes that has very high BYTES count. During that non i/o busy time, this same process doesn't do a lot of i/o activity. I checked the pg_stat_activity but couldn't found this process. Doing ps revealed that this process is started at the same time since the postgres started, which leads me to believe that it maybe background writer or some other internal process. This process are not autovacuum because it doesn't disappear when I tried turning autovacuum off. Except for the ones mentioned above, I didn't modify the other background setting: MONSOON=# show bgwriter_delay ; bgwriter_delay 200 (1 row) MONSOON=# show bgwriter_lru_maxpages ; bgwriter_lru_maxpages --- 5 (1 row) MONSOON=# show bgwriter_lru_percent ; bgwriter_lru_percent -- 1 (1 row) This i/o spike only happens at minute 1 and minute 6 (ie. 10.51, 10.56) . If I do select * from pg_stat_activity during this time, I will see a lot of write queries waiting to be processed. After a few seconds, everything seems to be gone. All writes that are not happening at the time of this i/o jump are being processed very fast, thus do not show on pg_stat_activity. Thanks in advance for the reply, Best, J On 8/29/06, *Jignesh K. Shah* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: Also to answer your real question: DTrace On Solaris 10: # dtrace -s /usr/demo/dtrace/whoio.d It will tell you the pids doing the io activity and on which devices. There are more scripts in that directory like iosnoop.d, iotime.d and others which also will give other details like file accessed, time it took for the io etc. Hope this helps. Regards, Jignesh Junaili Lie wrote: > Hi everyone, > We have a postgresql 8.1 installed on Solaris 10. It is running fine. > However, for the past couple days, we have seen the i/o reports > indicating that the i/o is busy most of the time. Before this, we only > saw i/o being busy occasionally (very rare). So far, there has been no > performance complaints by customers, and the slow query reports doesn't > indicate anything out of the ordinary. > There's no code changes on the applications layer and no database > configuration changes. > I am wondering if there's a tool out there on Solaris to tell which > process is doing most of the i/o activity? > Thank you in advance. > > J > ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] slow i/o
Also to answer your real question: DTrace On Solaris 10: # dtrace -s /usr/demo/dtrace/whoio.d It will tell you the pids doing the io activity and on which devices. There are more scripts in that directory like iosnoop.d, iotime.d and others which also will give other details like file accessed, time it took for the io etc. Hope this helps. Regards, Jignesh Junaili Lie wrote: Hi everyone, We have a postgresql 8.1 installed on Solaris 10. It is running fine. However, for the past couple days, we have seen the i/o reports indicating that the i/o is busy most of the time. Before this, we only saw i/o being busy occasionally (very rare). So far, there has been no performance complaints by customers, and the slow query reports doesn't indicate anything out of the ordinary. There's no code changes on the applications layer and no database configuration changes. I am wondering if there's a tool out there on Solaris to tell which process is doing most of the i/o activity? Thank you in advance. J ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] slow i/o
Did you increase the checkpoint segments and changed the default WAL lock method to fdsync? http://blogs.sun.com/jkshah/entry/postgresql_on_solaris_better_use Try fdsync instead of fysnc as mentioned in the entry. Regards, Jignesh Junaili Lie wrote: Hi everyone, We have a postgresql 8.1 installed on Solaris 10. It is running fine. However, for the past couple days, we have seen the i/o reports indicating that the i/o is busy most of the time. Before this, we only saw i/o being busy occasionally (very rare). So far, there has been no performance complaints by customers, and the slow query reports doesn't indicate anything out of the ordinary. There's no code changes on the applications layer and no database configuration changes. I am wondering if there's a tool out there on Solaris to tell which process is doing most of the i/o activity? Thank you in advance. J ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some
I have seen MemoryContextSwitchTo taking time before.. However I am not sure why would it take so much CPU time? Maybe that function does not work efficiently on Solaris? Also I donot have much idea about slot_getattr. Anybody else? (Other option is to use "collect -p $pid" experiments to gather the data to figure out what instruction is causing the high CPU usage) Maybe the Sun engineers out there can help out -Jignesh Arjen van der Meijden wrote: We have the 4 core machine. However, these numbers are taken during a benchmark, not normal work load. So the output should display the system being working fully ;) So its postgres doing a lot of work and you already had a look at the usrcall for that. The benchmark just tries to do the queries for "random page visits". This totals up to about some 80 different queries being executed with mostly random parameters. The workload is generated using php so there are no connection pools, nor prepared statements. The queries vary, but are all relatively lightweight queries with less than 6 or 7 joinable tables. Almost all queries can use indexes. Most tables are under a few MB of data, although there are a few larger than that. Most records are relatively small, consisting of mostly numbers (id's and such). The results presented here was with 25 concurrent connections. Best regards, Arjen Jignesh K. Shah wrote: You usertime is way too high for T2000... If you have a 6 core machine with 24 threads, it says all 24 threads are reported as being busy with iostat output. Best way to debug this is use prstat -amL (or if you are dumping it in a file prstat -amLc > prstat.txt) and find the pids with high user cpu time and then use the usrcall.d on few of those pids. Also how many database connections do you have and what's the type of query run by each connection? -Jignesh Arjen van der Meijden wrote: Hi Jignesh, The settings from that 'special T2000 dvd' differed from the recommended settings on the website you provided. But I don't see much difference in performance with any of the adjustments, it appears to be more or less the same. Here are a few iostat lines by the way: sd0 sd1 sd2 nfs1 cpu kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 7 1 12 958 50 350 070 00 13 1 0 85 0 00 2353 29630 000 00 92 7 0 1 0 00 2062 32620 000 00 93 7 0 0 1 11 1575 35000 000 00 92 7 0 1 0 00 1628 36200 000 00 92 8 0 1 It appears to be doing a little less kps/tps on sd1 when I restore my own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 2k work mem). Is it possible to trace the stack's for semsys, like the memcpy-traces, or are those of no interest here? Best regards, Arjen On 16-5-2006 17:52, Jignesh K. Shah wrote: Hi Arjen, Can you send me my colleagues's names in a private email? One of the drawbacks of the syscall.d script is relative timings and hence if system CPU usage is very low, it gives the relative weightage about what portion in that low is associated with what call.. So even if you have say..1% system time.. it says that most of it was IO related or semsys related. So iostat output with -c option to include CPU times helps to put it in the right perspective. Also do check the tunables mentioned and make sure they are set. Regards, Jignesh Arjen van der Meijden wrote: Hi Jignesh, Jignesh K. Shah wrote: Hi Arjen, Looking at your outputs...of syscall and usrcall it looks like * Spending too much time in semsys which means you have too many connections and they are contending to get a lock.. which is potentially the WAL log lock * llseek is high which means you can obviously gain a bit with the right file system/files tuning by caching them right. Have you set the values for Solaris for T2000 tuned for Postgresql? Not particularly, we got a "special T2000 Solaris dvd" from your colleagues here in the Netherlands and installed that (actually one of your colleagues did). Doing so all the "better default" /etc/system-settings are supposed to be set. I haven't really checked that they are, since two of your colleagues have been working on it for the mysql-version of the benchmark and I assumed they'd have verified that. Check out the tunables from the following URL http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp Try specially the /etc/system and postgresql.conf changes and see if it changes/improves your performance. I will see that those tunables are verified to be set. I am a bit surprised though about your remarks, since they'd point at the I/O being in the way? But we on
Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some
You usertime is way too high for T2000... If you have a 6 core machine with 24 threads, it says all 24 threads are reported as being busy with iostat output. Best way to debug this is use prstat -amL (or if you are dumping it in a file prstat -amLc > prstat.txt) and find the pids with high user cpu time and then use the usrcall.d on few of those pids. Also how many database connections do you have and what's the type of query run by each connection? -Jignesh Arjen van der Meijden wrote: Hi Jignesh, The settings from that 'special T2000 dvd' differed from the recommended settings on the website you provided. But I don't see much difference in performance with any of the adjustments, it appears to be more or less the same. Here are a few iostat lines by the way: sd0 sd1 sd2 nfs1 cpu kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 7 1 12 958 50 350 070 00 13 1 0 85 0 00 2353 29630 000 00 92 7 0 1 0 00 2062 32620 000 00 93 7 0 0 1 11 1575 35000 000 00 92 7 0 1 0 00 1628 36200 000 00 92 8 0 1 It appears to be doing a little less kps/tps on sd1 when I restore my own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 2k work mem). Is it possible to trace the stack's for semsys, like the memcpy-traces, or are those of no interest here? Best regards, Arjen On 16-5-2006 17:52, Jignesh K. Shah wrote: Hi Arjen, Can you send me my colleagues's names in a private email? One of the drawbacks of the syscall.d script is relative timings and hence if system CPU usage is very low, it gives the relative weightage about what portion in that low is associated with what call.. So even if you have say..1% system time.. it says that most of it was IO related or semsys related. So iostat output with -c option to include CPU times helps to put it in the right perspective. Also do check the tunables mentioned and make sure they are set. Regards, Jignesh Arjen van der Meijden wrote: Hi Jignesh, Jignesh K. Shah wrote: Hi Arjen, Looking at your outputs...of syscall and usrcall it looks like * Spending too much time in semsys which means you have too many connections and they are contending to get a lock.. which is potentially the WAL log lock * llseek is high which means you can obviously gain a bit with the right file system/files tuning by caching them right. Have you set the values for Solaris for T2000 tuned for Postgresql? Not particularly, we got a "special T2000 Solaris dvd" from your colleagues here in the Netherlands and installed that (actually one of your colleagues did). Doing so all the "better default" /etc/system-settings are supposed to be set. I haven't really checked that they are, since two of your colleagues have been working on it for the mysql-version of the benchmark and I assumed they'd have verified that. Check out the tunables from the following URL http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp Try specially the /etc/system and postgresql.conf changes and see if it changes/improves your performance. I will see that those tunables are verified to be set. I am a bit surprised though about your remarks, since they'd point at the I/O being in the way? But we only have about 600k/sec i/o according to vmstat. The database easily fits in memory. In total I logged about 500k queries of which only 70k where altering queries, of which almost all where inserts in log-tables which aren't actively read in this benchmark. But I'll give it a try. Best regards, Arjen Arjen van der Meijden wrote: Hi List, In the past few weeks we have been developing a read-heavy mysql-benchmark to have an alternative take at cpu/platform-performance. Not really to have a look at how fast mysql can be. This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled after our website's production database and the load generated on it is modelled after a simplified version of our visitor behaviour. Long story short, we think the test is a nice example of the relatively lightweight, read-heavy webapplications out there and therefore decided to have a go at postgresql as well. Of course the queries and indexes have been adjusted to (by our knowledge) best suit postgresql, while maintaining the same output to the application/interface layer. While the initial structure only got postgresql at about half the performance of mysql 4.1.x, the current version of our postgresql-benchmark has quite similar results to mysql 4.1.x, but both are quite a bit slower than 5.0.x (I think its about 30-40% faster). Since the results from those benchmarks are not yet
Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some
Hi Arjen, Can you send me my colleagues's names in a private email? One of the drawbacks of the syscall.d script is relative timings and hence if system CPU usage is very low, it gives the relative weightage about what portion in that low is associated with what call.. So even if you have say..1% system time.. it says that most of it was IO related or semsys related. So iostat output with -c option to include CPU times helps to put it in the right perspective. Also do check the tunables mentioned and make sure they are set. Regards, Jignesh Arjen van der Meijden wrote: Hi Jignesh, Jignesh K. Shah wrote: Hi Arjen, Looking at your outputs...of syscall and usrcall it looks like * Spending too much time in semsys which means you have too many connections and they are contending to get a lock.. which is potentially the WAL log lock * llseek is high which means you can obviously gain a bit with the right file system/files tuning by caching them right. Have you set the values for Solaris for T2000 tuned for Postgresql? Not particularly, we got a "special T2000 Solaris dvd" from your colleagues here in the Netherlands and installed that (actually one of your colleagues did). Doing so all the "better default" /etc/system-settings are supposed to be set. I haven't really checked that they are, since two of your colleagues have been working on it for the mysql-version of the benchmark and I assumed they'd have verified that. Check out the tunables from the following URL http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp Try specially the /etc/system and postgresql.conf changes and see if it changes/improves your performance. I will see that those tunables are verified to be set. I am a bit surprised though about your remarks, since they'd point at the I/O being in the way? But we only have about 600k/sec i/o according to vmstat. The database easily fits in memory. In total I logged about 500k queries of which only 70k where altering queries, of which almost all where inserts in log-tables which aren't actively read in this benchmark. But I'll give it a try. Best regards, Arjen Arjen van der Meijden wrote: Hi List, In the past few weeks we have been developing a read-heavy mysql-benchmark to have an alternative take at cpu/platform-performance. Not really to have a look at how fast mysql can be. This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled after our website's production database and the load generated on it is modelled after a simplified version of our visitor behaviour. Long story short, we think the test is a nice example of the relatively lightweight, read-heavy webapplications out there and therefore decided to have a go at postgresql as well. Of course the queries and indexes have been adjusted to (by our knowledge) best suit postgresql, while maintaining the same output to the application/interface layer. While the initial structure only got postgresql at about half the performance of mysql 4.1.x, the current version of our postgresql-benchmark has quite similar results to mysql 4.1.x, but both are quite a bit slower than 5.0.x (I think its about 30-40% faster). Since the results from those benchmarks are not yet public (they will be put together in a story at our website), I won't go into too much details about this benchmark. Currently we're having a look at a Sun T2000 and will be looking at will be looking at other machines as well in the future. We are running the sun-release of postgresql 8.1.3 on that T2000 now, but are looking at compiling the cvs-head version (for its index-root-cache) somewhere this week. My guess is there are a few people on this list who are interested in some dtrace results taken during our benchmarks on that T2000. Although my knowledge of both Solaris and Dtrace are very limited, I already took some samples of the system and user calls. I used Jignesh Shah's scripts for that: http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on You can find the samples here: http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log And I also did the memcpy-scripts, here: http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log (this last log is 3.5MB) If anyone is interested in some more dtrace results, let me know (and tell me what commands to run ;-) ). Best regards, Arjen ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broad
Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some
Hi Arjen, Looking at your outputs...of syscall and usrcall it looks like * Spending too much time in semsys which means you have too many connections and they are contending to get a lock.. which is potentially the WAL log lock * llseek is high which means you can obviously gain a bit with the right file system/files tuning by caching them right. Have you set the values for Solaris for T2000 tuned for Postgresql? Check out the tunables from the following URL http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp Try specially the /etc/system and postgresql.conf changes and see if it changes/improves your performance. Regards, Jignesh Arjen van der Meijden wrote: Hi List, In the past few weeks we have been developing a read-heavy mysql-benchmark to have an alternative take at cpu/platform-performance. Not really to have a look at how fast mysql can be. This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled after our website's production database and the load generated on it is modelled after a simplified version of our visitor behaviour. Long story short, we think the test is a nice example of the relatively lightweight, read-heavy webapplications out there and therefore decided to have a go at postgresql as well. Of course the queries and indexes have been adjusted to (by our knowledge) best suit postgresql, while maintaining the same output to the application/interface layer. While the initial structure only got postgresql at about half the performance of mysql 4.1.x, the current version of our postgresql-benchmark has quite similar results to mysql 4.1.x, but both are quite a bit slower than 5.0.x (I think its about 30-40% faster). Since the results from those benchmarks are not yet public (they will be put together in a story at our website), I won't go into too much details about this benchmark. Currently we're having a look at a Sun T2000 and will be looking at will be looking at other machines as well in the future. We are running the sun-release of postgresql 8.1.3 on that T2000 now, but are looking at compiling the cvs-head version (for its index-root-cache) somewhere this week. My guess is there are a few people on this list who are interested in some dtrace results taken during our benchmarks on that T2000. Although my knowledge of both Solaris and Dtrace are very limited, I already took some samples of the system and user calls. I used Jignesh Shah's scripts for that: http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on You can find the samples here: http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log And I also did the memcpy-scripts, here: http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log (this last log is 3.5MB) If anyone is interested in some more dtrace results, let me know (and tell me what commands to run ;-) ). Best regards, Arjen ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] bad performance on Solaris 10
Hi Bruce, I saw even on this alias also that people assumed that the default wal_sync_method was fsync on Solaris. I would select fsync or fdsync as the default on Solaris. (I prefer fsync as it is already highlighted as default in postgresql) Another thing to improve the defaults on Solaris will be to increase the defaults of wal_buffers and checkpoint_segments (I think in 8.1 checkpoint_segments have been already improved to a default of 8 from the previous 3 and that may be already some help in performance out there. ) These three changes improve out-of-box performance of PostgreSQL quite a bit on Solaris (SPARC as well as x64 platforms). Then you will suddenly see decrease in the number of people PostgreSQL community complaining about Solaris 10, as well as Solaris community complaining about PostgreSQL. (The benefits are mutual) Don't get me wrong. As Luke mentioned it took a while to get the potential of PostgreSQL on Solaris and people like me start doing other complex workarounds in Solaris like "forcedirectio", etc. (Yeah I did a test, if you force fsync as wal_sync_method while on Solaris, then you may not even require to do forcedirectio of your $PGDATA/pg_xlogs to get back the lost performance) If we had realized that fsync/odatasync difference was the culprit we could have saved couple of months of efforts. Yes I agree that putting OS specific things in PostgreSQL hurts community and sticking to POSIX standards help. Just my two cents. Regards, Jignesh Bruce Momjian wrote: Jignesh K. Shah wrote: Bruce, Hard to answer that... People like me who know and love PostgreSQL and Solaris finds this as an opportunity to make their favorite database work best on their favorite operating system. Many times PostgreSQL has many things based on assumption that it will run on Linux and it is left to Solaris to emulate that behavior.That said there are ways to improve performance even on UFS on Solaris, it just requires more tweaks. Hopefully this will lead to few Solaris friendly default values like fsync/odatasync :-) Yes, if someone wants to give us a clear answer on which wal_sync method is best on all versions of Solaris, we can easily make that change. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] bad performance on Solaris 10
Bruce, Hard to answer that... People like me who know and love PostgreSQL and Solaris finds this as an opportunity to make their favorite database work best on their favorite operating system. Many times PostgreSQL has many things based on assumption that it will run on Linux and it is left to Solaris to emulate that behavior.That said there are ways to improve performance even on UFS on Solaris, it just requires more tweaks. Hopefully this will lead to few Solaris friendly default values like fsync/odatasync :-) Regards, Jignesh Bruce Momjian wrote: It is hard to imagine why people spend so much time modifying Sun machines run with acceptable performance when non-Sun operating systems work fine without such hurtles. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3
For a DSS type workload with PostgreSQL where you end up with single long running queries on postgresql with about 100GB, you better use something like Sun Fire V40z with those fast Ultra320 internal drives. This might be perfect low cost complete database in a box. Sun Fire T2000 is great for OLTP where you can end up with hundreds of users doing quick and small lookups and T2000 can crank simple thread executions far better than others. However when it comes to long running queries you end up using 1/32 of the power and may not live up to your expectations. For example consider your PostgreSQL talking to Apache WebServer all on T2000... You can put them in separate zones if you have different administrators for them. :-) As for PostgreSQL on Solaris, I already have the best parameters to use on Solaris based on my tests, the default odatasync hurts performance on Solaris, so does checkpoint segments, others are tweaked so that they are set for bigger databases and hence may not show much difference on performances... That said I will still be interested to see your app performance with postgreSQL on Sun Fire T2000 as there are always ways of perseverence to improve performance :-) Regards, Jignesh Juan Casero (FL FLC) wrote: Because I plan to develop a rather large (for us anyway) data warehouse with PostgreSQL. I am looking for the right hardware that can handle queries on a database that might grow to over a 100 gigabytes. Right now our decision support system based on postgresql 8.1.3 stores retail sales information for about 4 four years back *but* only as weekly summaries. I want to build the system so it can handle daily sales transactions also. You can imagine how many more records this will involve so I am looking for hardware that can give me the performance I need to make this project useable. In other words parsing and loading the daily transaction logs for our stores is likely to take huge amounts of effort. I need a machine that can complete the task in a reasonable amount of time. As people start to query the database to find sales related reports and information I need to make sure the queries will run reasonably fast for them. I have already hand optimized all of my queries on the current system. But currently I only have weekly sales summaries. Other divisions in our company have done a similar project using MS SQL Server on SMP hardware far outclassing the database server I currently use and they report heavy loads on the server with less than ideal query run times. I am sure I can do my part to optimize the queries once I start this project but there is only so much you can do. At some point you just need more powerful hardware. This is where I am at right now. Apart from that since I will only get this one chance to buy a new server for data processing I need to make sure that I buy something that can grow over time as our needs change. I don't want to buy a server only to find out later that it cannot meet our needs with future database projects. I have to balance a limited budget, room for future performance growth, and current system requirements. Trust me it isn't easy. Juan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 2:57 AM To: pgsql-performance@postgresql.org Cc: Juan Casero (FL FLC); Luke Lonergan Subject: Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3 Juan, Ok that is beginning to become clear to me. Now I need to determine if this server is worth the investment for us. Maybe it is not a speed daemon but to be honest the licensing costs of an SMP aware RDBMS is outside our budget. You still haven't explained why you want multi-threaded queries. This is sounding like keeping up with the Joneses. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3
Hi Leigh inline comments Leigh Dyer wrote: Luke Lonergan wrote: Juan, We've got a Sun Fire V40z and it's quite a nice machine -- 6x 15krpm drives, 4GB RAM, and a pair of Opteron 850s. This gives us more than enough power now for what we need, but it's nice to know that we can shoehorn a lot more RAM, and up it to eight CPU cores if needed. The newer Sun Opteron systems look nice too, but unless you're using external storage, their little 2.5" hard drives may not be ideal. Thats because Sun Fire V40z had write cache turned on while the 4200/4100 has the write cache turned off. There is a religious belief around the "write cache" on the disk in Sun :-) To really compare the performance, you have to turn on the write cache (I believe it was format -e and the cache option.. but that could have changed.. need to verify that again.. Same goes for T2000 SAS disks too.. Write cache is turned off on it so be careful before you compare benchmarks on internal drives :-) -Jignesh Thanks Leigh - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] t1000/t2000 sun-servers
pgbench according to me is more io write intensive benchmark. T2000 with its internal drive may not perform well with pgbench with a high load. If you are using external storage, try it out. I havent tried it out yet but let me know what you see. -Jignesh Guido Neitzer wrote: On 06.03.2006, at 21:10 Uhr, Jignesh K. Shah wrote: Like migrate all your postgresql databases to one T2000. You might see that your average response time may not be faster but it can handle probably all your databases migrated to one T2000. In essence, your single thread performance will not speed up on Sun Fire T2000 but you can certainly use it to replace all your individual postgresql servers in your organization or see higher scalability in terms of number of users handled with 1 server with Sun Fire T2000. How good is a pgbench test for evaluating things like this? I have used it to compare several machines, operating systems and PostgreSQL versions - but it was more or less just out of curiosity. The real evaluation was made with "real life tests" - mostly scripts which also tested the application server itself. But as it was it's easy to compare several machines with pgbench, I just did the tests and they were interesting and reflected the real world not as bad as I had thought from a "benchmark". So, personally I'm interested in a simple pgbench test - perhaps with some more ( > 50) clients simulated ... cug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] t1000/t2000 sun-servers
Suggestions for benchmarks on Sun Fire T2000... * Don't try DSS or TPC-H type of test with Postgres on Sun Fire T2000 Since such queries tend to have one connection, it will perform badly with Postgre since it will use only one hardware virtual CPU of the available 32 virtual CPU on Sun Fire T2000. (Oracle/DB2 have ways of breaking the queries into multiple processes and hence use multiple virtual CPUs on Sun Fire T2000, PostgreSQL cannot do the same in such cases) * Use OLTP Type of benchmark Where you have more than 30 simultaneous users/connections doing work on Postgres without bottlenecking on datafiles of course :-) * Use multiple databases or instances of Postgresql Like migrate all your postgresql databases to one T2000. You might see that your average response time may not be faster but it can handle probably all your databases migrated to one T2000. In essence, your single thread performance will not speed up on Sun Fire T2000 but you can certainly use it to replace all your individual postgresql servers in your organization or see higher scalability in terms of number of users handled with 1 server with Sun Fire T2000. For your /etc/system use the parameters as mentioned in http://www.sun.com/servers/coolthreads/tnb/parameters.jsp For hints on setting it up for Postgresql refer to other databases setup on http://www.sun.com/servers/coolthreads/tnb/applications.jsp If you get specific performance problems send email to pgsql-performance@postgresql.org Regards, Jignesh Neil Saunders wrote: I may be able to organize a test on a T2000 if someone could give advice as to an appropriate test to run... Cheers, Neil On 3/6/06, Claus Guttesen <[EMAIL PROTECTED]> wrote: Hi. Has anybody tried the new Sun "cool-thread" servers t1000/t2000 from Sun? I'd love to see benchmarks with Solaris 10 and pg 8.1. regards Claus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] copy and postgresql.conf
Actually fsync is not the default on solaris (verify using "show all;) (If you look closely in postgresql.conf it is commented out and mentioned as default but show all tells a different story) In all my cases I saw the default as wal_sync_method | open_datasync Also I had seen quite an improvement by changing the default checkpoint_segments from 3 to 64 or 128 and also increasing wal_buffers to 64 depending on how heavy is your load. Also open_datasync type of operations benefit with forcedirectio on Solaris and hence either move wal to forcedirectio mounted file system or try changing default sync to fsync (the *said* default) Now if you use fsync then you need a bigger file system cache since by default Solaris's segmap mechanism only maps 12% of your physical ram to be used for file system buffer cache. Increasing segmap_percent to 50 on SPARC allows to use 50% of your RAM to be mapped to be used for 50% ( NOTE: It does not reserve but just allow mapping of the memory which can be used for file system buffer cache) Changing maxphys allows the file system buffer cache to coalesce writes from the 8Ks that PostgreSQL is doing to bigger writes/reads. Also since you are now exploiting file system buffer cache, file system Logging is very much recommended (available from a later update of Solaris 8 I believe). Regards, Jignesh FERREIRA, William (VALTECH) wrote: with PostgreSQL 8.1.3, here are my parameters (it's the default configuration) wal_sync_method = fsync wal_buffers = 8 checkpoint_segments = 3 bgwriter_lru_percent = 1.0 bgwriter_lru_maxpages = 5 bgwriter_all_percent = 0.333 bgwriter_all_maxpages = 5 and you think times can be improved again ? -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Envoyé : mercredi 15 février 2006 15:14 À : FERREIRA, William (VALTECH) Cc : Albert Cervera Areny; pgsql-performance@postgresql.org Objet : Re: [PERFORM] copy and postgresql.conf What's your postgresql.conf parameter for the equivalent ones that I suggested? I believe your wal_buffers and checkpoint_segments could be bigger. If that's the case then yep you are fine. As for the background writer I am seeing mixed results yet so not sure about that. But thanks for the feedback. -Jignesh FERREIRA, William (VALTECH) wrote: i tested the last version version of PostgreSQL and for the same test : before : 40mn and now : 12mn :) faster than Oracle (exactly what i wanted :p ) thanks to everybody Will -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de Albert Cervera Areny Envoyé : mardi 14 février 2006 17:07 À : pgsql-performance@postgresql.org Objet : Re: [PERFORM] copy and postgresql.conf Sorry, COPY improvements came with 8.1 (http://www.postgresql.org/docs/whatsnew) A Dimarts 14 Febrer 2006 14:26, FERREIRA, William (VALTECH) va escriure: thanks, i'm using postgresql 8.0.3 there is no primary key and no index on my tables regards -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de Albert Cervera Areny Envoyé : mardi 14 février 2006 12:38 À : pgsql-performance@postgresql.org Objet : Re: [PERFORM] copy and postgresql.conf Hi William, which PostgreSQL version are you using? Newer (8.0+) versions have some important performance improvements for the COPY command. Also, you'll notice significant improvements by creating primary & foreign keys after the copy command. I think config tweaking can improve key and index creation but I don't think you can improve the COPY command itself. There are also many threads in this list commenting on this issue, you'll find it easely in the archives. A Dimarts 14 Febrer 2006 10:44, FERREIRA, William (VALTECH) va escriure: hi, i load data from files using copy method. Files contain between 2 and 7 millions of rows, spread on 5 tables. For loading all the data, it takes 40mn, and the same processing takes 17mn with Oracle. I think that this time can be improved by changing postgresql configuration file. But which parameters i need to manipulate and with which values ? Here are the specifications of my system : V250 architecture sun4u 2xCPU UltraSparc IIIi 1.28 GHz. 8 Go RAM. Regards. Will This e-mail is intended only for the above addressee. It may contain privileged information. If you are not the addressee you must not copy, distribute, disclose or use any of the information in it. If you have received it in error please delete it and immediately notify the sender. Security Notice: all e-mail, sent to or from this address, may be accessed by someone other than the recipient, for system management and security reasons. This access is controlled under Regulation of Investigatory Powers Act 2000, Lawful Business Practises. ---(end of broadcast)--- TIP 4: Have you searched our list archives
Re: [PERFORM] copy and postgresql.conf
What's your postgresql.conf parameter for the equivalent ones that I suggested? I believe your wal_buffers and checkpoint_segments could be bigger. If that's the case then yep you are fine. As for the background writer I am seeing mixed results yet so not sure about that. But thanks for the feedback. -Jignesh FERREIRA, William (VALTECH) wrote: i tested the last version version of PostgreSQL and for the same test : before : 40mn and now : 12mn :) faster than Oracle (exactly what i wanted :p ) thanks to everybody Will -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de Albert Cervera Areny Envoyé : mardi 14 février 2006 17:07 À : pgsql-performance@postgresql.org Objet : Re: [PERFORM] copy and postgresql.conf Sorry, COPY improvements came with 8.1 (http://www.postgresql.org/docs/whatsnew) A Dimarts 14 Febrer 2006 14:26, FERREIRA, William (VALTECH) va escriure: thanks, i'm using postgresql 8.0.3 there is no primary key and no index on my tables regards -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de Albert Cervera Areny Envoyé : mardi 14 février 2006 12:38 À : pgsql-performance@postgresql.org Objet : Re: [PERFORM] copy and postgresql.conf Hi William, which PostgreSQL version are you using? Newer (8.0+) versions have some important performance improvements for the COPY command. Also, you'll notice significant improvements by creating primary & foreign keys after the copy command. I think config tweaking can improve key and index creation but I don't think you can improve the COPY command itself. There are also many threads in this list commenting on this issue, you'll find it easely in the archives. A Dimarts 14 Febrer 2006 10:44, FERREIRA, William (VALTECH) va escriure: hi, i load data from files using copy method. Files contain between 2 and 7 millions of rows, spread on 5 tables. For loading all the data, it takes 40mn, and the same processing takes 17mn with Oracle. I think that this time can be improved by changing postgresql configuration file. But which parameters i need to manipulate and with which values ? Here are the specifications of my system : V250 architecture sun4u 2xCPU UltraSparc IIIi 1.28 GHz. 8 Go RAM. Regards. Will This e-mail is intended only for the above addressee. It may contain privileged information. If you are not the addressee you must not copy, distribute, disclose or use any of the information in it. If you have received it in error please delete it and immediately notify the sender. Security Notice: all e-mail, sent to or from this address, may be accessed by someone other than the recipient, for system management and security reasons. This access is controlled under Regulation of Investigatory Powers Act 2000, Lawful Business Practises. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Albert Cervera Areny Dept. Informàtica Sedifa, S.L. Av. Can Bordoll, 149 08202 - Sabadell (Barcelona) Tel. 93 715 51 11 Fax. 93 715 51 12 AVISO LEGAL La presente comunicación y sus anexos tiene como destinatario la persona a la que va dirigida, por lo que si usted lo recibe por error debe notificarlo al remitente y eliminarlo de su sistema, no pudiendo utilizarlo, total o parcialmente, para ningún fin. Su contenido puede tener información confidencial o protegida legalmente y únicamente expresa la opinión del remitente. El uso del correo electrónico vía Internet no permite asegurarni la confidencialidad de los mensajes nisucorrecta recepción. Enel caso de que el destinatario no consintiera la utilización del correo electrónico, deberá ponerlo en nuestro conocimiento inmediatamente. ... DISCLAIMER . This message and its attachments are intended exclusively for the named addressee. If you receive this message in error, please immediately delete it from your system and notify the sender. You may not use this message or any part of it for any purpose. The message may contain information that is confidential or protected by law, and any opinions expressed are those of the individualsender. Internet e-mail guarantees neither the confidentiality nor the proper receipt of the message sent. If the addressee of this message does not consent to the use of internete-mail,pleaseinform usinmmediately. -
Re: [PERFORM] copy and postgresql.conf
What version of Solaris are you using? Do you have the recommendations while using COPY on Solaris? http://blogs.sun.com/roller/page/jkshah?entry=postgresql_on_solaris_better_use wal_sync_method = fsync wal_buffers = 128 checkpoint_segments = 128 bgwriter_percent = 0 bgwriter_maxpages = 0 And also for /etc/system on Solaris 10, 9 SPARC use the following set maxphys=1048576 set md:md_maxphys=1048576 set segmap_percent=50 set ufs:freebehind=0 set msgsys:msginfo_msgmni = 3584 set semsys:seminfo_semmni = 4096 set shmsys:shminfo_shmmax = 15392386252 set shmsys:shminfo_shmmni = 4096 Can you try putting in one run with this values and send back your experiences on whether it helps your workload or not? Atleast I saw improvements using the above settings with COPY with Postgres 8.0 and Postgres 8.1 on Solaris. Regards, Jignesh FERREIRA, William (VALTECH) wrote: 30% faster !!! i will test this new version ... thanks a lot -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de Albert Cervera Areny Envoyé : mardi 14 février 2006 17:07 À : pgsql-performance@postgresql.org Objet : Re: [PERFORM] copy and postgresql.conf Sorry, COPY improvements came with 8.1 (http://www.postgresql.org/docs/whatsnew) A Dimarts 14 Febrer 2006 14:26, FERREIRA, William (VALTECH) va escriure: thanks, i'm using postgresql 8.0.3 there is no primary key and no index on my tables regards -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de Albert Cervera Areny Envoyé : mardi 14 février 2006 12:38 À : pgsql-performance@postgresql.org Objet : Re: [PERFORM] copy and postgresql.conf Hi William, which PostgreSQL version are you using? Newer (8.0+) versions have some important performance improvements for the COPY command. Also, you'll notice significant improvements by creating primary & foreign keys after the copy command. I think config tweaking can improve key and index creation but I don't think you can improve the COPY command itself. There are also many threads in this list commenting on this issue, you'll find it easely in the archives. A Dimarts 14 Febrer 2006 10:44, FERREIRA, William (VALTECH) va escriure: hi, i load data from files using copy method. Files contain between 2 and 7 millions of rows, spread on 5 tables. For loading all the data, it takes 40mn, and the same processing takes 17mn with Oracle. I think that this time can be improved by changing postgresql configuration file. But which parameters i need to manipulate and with which values ? Here are the specifications of my system : V250 architecture sun4u 2xCPU UltraSparc IIIi 1.28 GHz. 8 Go RAM. Regards. Will This e-mail is intended only for the above addressee. It may contain privileged information. If you are not the addressee you must not copy, distribute, disclose or use any of the information in it. If you have received it in error please delete it and immediately notify the sender. Security Notice: all e-mail, sent to or from this address, may be accessed by someone other than the recipient, for system management and security reasons. This access is controlled under Regulation of Investigatory Powers Act 2000, Lawful Business Practises. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Albert Cervera Areny Dept. Informàtica Sedifa, S.L. Av. Can Bordoll, 149 08202 - Sabadell (Barcelona) Tel. 93 715 51 11 Fax. 93 715 51 12 AVISO LEGAL La presente comunicación y sus anexos tiene como destinatario la persona a la que va dirigida, por lo que si usted lo recibe por error debe notificarlo al remitente y eliminarlo de su sistema, no pudiendo utilizarlo, total o parcialmente, para ningún fin. Su contenido puede tener información confidencial o protegida legalmente y únicamente expresa la opinión del remitente. El uso del correo electrónico vía Internet no permite asegurarni la confidencialidad de los mensajes nisucorrecta recepción. Enel caso de que el destinatario no consintiera la utilización del correo electrónico, deberá ponerlo en nuestro conocimiento inmediatamente. ... DISCLAIMER . This message and its attachments are intended exclusively for the named addressee. If you receive this message in error, please immediately delete it from your system and notify the sender. You may not use this message or any part of it for any purpose. The message may contain information that is confidential or protected by law, and any op
Re: [PERFORM] PostgreSQL Solaris packages now in beta
For people installing PostgreSQL on Solaris with the new packaget, it will show a greatly improved experience to get PostgreSQL up and running which was quite a inhibitor in terms of "Love at First Sight". This will now help people familiar with Solaris have a great first impression of PostgreSQL and hence lower the barrier of entry to PostgreSQL. With more than 3.7 million downloads of Solaris 10 already, now PostgreSQL have accesss to probably a 3.7 million incremental user-base of these relatively "New" PostgreSQL users. Regards, Jignesh Robert Lor wrote: With big thanks to Josh Berkus and Devrim Gunduz, I'm happy to announce that Sun has just released a Solaris distribution of PostgreSQL 8.1.2 with ready-to-install packages for both Sparc and x86. These packages are currently in Beta, and we expect to FCS in 2 -3 weeks. The packages, along with an install guide, are available for download at http://pgfoundry.org/projects/solarispackages/ We have tightly integrated PostgreSQL with Solaris in a manner similar to the Linux distributions available on postgresql.org. In fact, the directory structures are identical. Starting with Solaris 10 Update 2, PostgreSQL will be distributed with every copy of Solaris, via download and physical media. We welcome any and all feedback on this PostgreSQL Solaris distribution. Please subscribe to the [EMAIL PROTECTED] mailing list to give us feedback: http://pgfoundry.org/mail/?group_id=163 BTW, I'm a senior engineer at Sun Microsystems, recently working with the PostgreSQL community (namely Josh Berkus, Devrim Gunduz, and Gavin Sherry) on the Solaris Packages Project at PgFoundry, PostgreSQL performance optimization on Solaris, and leveraging Solaris 10 capabilities (e.g. DTrace) specifically for PostgreSQL. I'll be posting a Solaris performance tuning guide in a few weeks. Regards, Robert Lor ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Sudden slowdown of Pg server
lockstat is available in Solaris 9. That can help you to determine if there are any kernel level locks that are occuring during that time. Solaris 10 also has plockstat which can be used to identify userland locks happening in your process. Since you have Solaris 9, try the following: You can run (as root) lockstat sleep 5 and note the output which can be long. I guess "prstat -am" output, "iostat -xczn 3", "vmstat 3" outputs will help also. prstat -am has a column called "LAT", if the value is in double digits, then you have a locking issue which will probably result in higher "SLP" value for the process. (Interpretation is data and workload specific which this email is too small to decode) Once you have identified a particular process (if any) to be the source of the problem, get its id and you can look at the outputs of following command which (quite intrusive) truss -c -p $pid 2> truss-syscount.txt (Ctrl-C after a while to stop collecting) truss -a -e -u":::" -p $pid 2> trussout.txt (Ctrl-C after a while to stop collecting) Regards, Jignesh Jerry Sievers wrote: Hello; I am going through a post mortem analysis of an infrequent but recurring problem on a Pg 8.0.3 installation. Application code connects to Pg using J2EE pooled connections. PostgreSQL 8.0.3 on sparc-sun-solaris2.9, compiled by GCC sparc-sun-solaris2.8-gcc (GCC) 3.3.2 Database is quite large with respect to the number of tables, some of which have up to 6 million tuples. Typical idle/busy connection ratio is 3/100 but occationally we'll catch 20 or more busy sessions. The problem manifests itself and appears like a locking issue. About weekly throuput slows down and we notice the busy connection count rising minute by minute. 2, 20, 40... Before long, the app server detects lack of responsiveness and fails over to another app server (not Pg) which in turn attempts a bunch of new connections into Postgres. Sampling of the snapshots of pg_locks and pg_stat_activity tables takes place each minute. I am wishing for a few new ideas as to what to be watching; Here's some observations that I've made. 1. At no time do any UN-granted locks show in pg_locks 2. The number of exclusive locks is small 1, 4, 8 3. Other locks type/mode are numerous but appear like normal workload. 4. There are at least a few old ' In Transaction' cases in activity view 5. No interesting error messages or warning in Pg logs. 6. No crash of Pg backend Other goodies includes a bounty of poor performing queries which are constantly being optimized now for good measure. Aside from the heavy queries, performance is generallly decent. Resource related server configs have been boosted substantially but have not undergone any formal R&D to verify that we're inthe safe under heavy load. An max_fsm_relations setting which is *below* our table and index count was discovered by me today and will be increased this evening during a maint cycle. The slowdown and subsequent run-away app server takes place within a small 2-5 minute window and I have as of yet not been able to get into Psql during the event for a hands-on look. Questions; 1. Is there any type of resource lock that can unconditionally block another session and NOT appear as UN-granted lock? 2. What in particular other runtime info would be most useful to sample here? 3. What Solaris side runtime stats might give some clues here (maybe?)( and how often to sample? Assume needs to be aggressive due to how fast this problem crops up. Any help appreciated Thank you ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
Hi Juan, Solaris 10 license is for free.. Infact I believe you do receive the media with Sun Fire V20z. If you want support then there are various "pay" plans depending on the level of support. If not your license allows you Right to Use anyway for free. That said I haven't done much testing with 32/64 bit differences. However for long term purposes, 64-bit always seems to be the safe bet. As for your load performance, lot of it depends on your file system layout also. Regards, Jignesh Juan Casero wrote: I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz opterons, 2 Gigs of RAM and RAID 1. I would have liked a better server capable of RAID but that seems to be out of his budget right now. Ok so I assume I get this Sun box. Most likely I will go with Linux since it is a fair bet he doesn't want to pay for the Solaris 10 x86 license. Although I kind of like the idea of using Solaris 10 x86 for this. I will assume I need to install the x64 kernel that comes with say Fedora Core 4. Should I run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode? My instinct tells me 64 bit mode is most efficient for our database size about 20 gigs right now but may grow to 100 gigs in a year or so. I just finished loading a 20 gig database on a dual 900 Mhz Ultrasparc III system with 2 gigs of ram and about 768 megs of shared memory available for the posgresql server running Solaris 10. The load has smoked a P4 3.2 Ghz system I am using also with 2 gigs of ram running postgresql 8.0.3. I mean I started the sparc load after the P4 load. The sparc load has finished already rebuilding the database from a pg_dump file but the P4 system is still going. The p4 has 1.3 Gigs of shared memory allocated to postgresql. How about them apples? Thanks, Juan On Wednesday 21 December 2005 18:57, William Yu wrote: Juan Casero wrote: Can you elaborate on the reasons the opteron is better than the Xeon when it comes to disk io? I have a PostgreSQL 7.4.8 box running a DSS. One of our Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode, transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that block and then the CPU must do extra work in copying the memory to > 4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the background. ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly