Re: [PERFORM] Terrible Write Performance of a Stored Procedure
> > You're right that it should be removed, but this explanation is wrong. The > behavior as configured is actually "if there are >=100 other transactions in > progress, wait 0.1 second before committing after the first one gets > committed", in hopes that one of the other 100 might also join along in the > disk write. Thanks for the correction. My question is how you're getting .1 seconds from his commit_delay? if (CommitDelay > 0 && enableFsync && CountActiveBackends() >= CommitSiblings) pg_usleep(CommitDelay); Wouldn't this actually be 1 second based on a commit_delay of 10? > > > Since in this case max_connections it set to 100, it's actually impossible > for the commit_delay/commit_siblings behavior to trigger give this > configuration. That's one reason it should be removed. The other is that i > general, if you don't exactly what you're doing, you shouldn't be touching > either parameters; they don't do what people expect them to and it's > extremely unlikely you'll encounter any of the rare use cases where they > might help. After looking, I agree, thanks again for the correction Greg. --Scott
[PERFORM] random slow query
I have a table about 50 million rows. There are a few writers to pump data into the table at the rate of 4 row/hours. Most the time, the SELECT is less than 100 ms. However sometime it is very slow, from 30 seconds to 500 seconds. The database is vacuum analyze regularly. One months ago, this type of slow query happened about a few time per day. But recently, the slow query happens more frequent at the rate of once every 10 minutes or less. There seesm not relation to the database loading or the type of query. If I manually execute these query, it is returns in less than 1 seconds. I just wonder where should I start to look? Thanks Shawn. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
On 06/29/2009 03:33 PM, Sean Ma wrote: I have a table about 50 million rows. There are a few writers to pump data into the table at the rate of 4 row/hours. Most the time, the SELECT is less than 100 ms. However sometime it is very slow, from 30 seconds to 500 seconds. The database is vacuum analyze regularly. One months ago, this type of slow query happened about a few time per day. But recently, the slow query happens more frequent at the rate of once every 10 minutes or less. There seesm not relation to the database loading or the type of query. If I manually execute these query, it is returns in less than 1 seconds. I just wonder where should I start to look? The slow queries could be waiting for locks - so you could enable log_lock_waits to see if that is the issue. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Utilizing multiple cores in a function call.
Good morning. I have developed a function call that schedules patient appointments within a day based on several resource constraints. The algorithm has been mentioned on here before and I have managed to tweak it down to 6-9 seconds from the original 27 seconds. Of course, I want it to be faster still. The function throttles one of my CPUs to 100% (shown as 50% in Task Manager) and leaves the other one sitting pretty. Is there any way to use both CPUs? Thanks, Matthew Hartman Programmer/Analyst Information Management, ICP Kingston General Hospital (613) 549- x4294
Re: [PERFORM] Utilizing multiple cores in a function call.
Hartman, Matthew wrote: Good morning. I have developed a function call that schedules patient appointments within a day based on several resource constraints. The algorithm has been mentioned on here before and I have managed to tweak it down to 6-9 seconds from the original 27 seconds. To speed up the execution of processes, I heartily recommend the book, "Writing Efficient Programs" by Jon Louis Bentley, Prentice-Hall, 1982. There are many important steps. The most important is usually to refine the algorithm itself. I once speeded up a program that would have required several weeks on a main frame running 24/7 to 6 minutes by improving the basic algorithm of the thing. Only then would it have made sense to optimize the actual code. Next, you need to profile the code to see where the hot spots are. There is little point to examining code in other parts of the program. Of course, I want it to be faster still. The function throttles one of my CPUs to 100% (shown as 50% in Task Manager) and leaves the other one sitting pretty. Is there any way to use both CPUs? You could write your algorithm as a separate process -- a server. Then in you SQL program, you invoke a trivial function that just hands the arguments off to the server. Thus, your SQL program would normally run on one processor and the time-consuming algorithm would run on the other. If you are not careful, this would not benefit you at all because your SQL process would wait until the server returns its answer. So you would need to modify your SQL program so that it could do other things while the server process did its thing. My guess is that you need a more efficient algorithm before you go to the trouble of optimizing the execution of your current one. As far as making it run on multiple processors, it depends critically on the nature of your algorithm. A few can easily be modified to run on multiple processors. Some cannot run on multiple processors at all. Thanks, Matthew Hartman Programmer/Analyst Information Management, ICP Kingston General Hospital (613) 549- x4294 -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jerseyhttp://counter.li.org ^^-^^ 10:40:01 up 10 days, 21:29, 3 users, load average: 4.19, 4.22, 4.19 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Sean Ma wrote: > I have a table about 50 million rows. There are a few writers to > pump data into the table at the rate of 4 row/hours. Most the > time, the SELECT is less than 100 ms. However sometime it is very > slow, from 30 seconds to 500 seconds. The database is vacuum analyze > regularly. What version of PostgreSQL is this? On what OS? What hardware? We had similar problems on some of our servers under 8.2 and earlier due to the tendency of PostgreSQL to build up a very large set of dirty pages and then throw them all at the drives with an immediate fsync. The RAID controller queued up the requests, and fast reads got stuck in the queue behind all those writes. You may want to look at this excellent coverage of the topic by Greg Smith: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm We got around the problem by keeping the checkpoint interval and shared buffer size fairly small, and making the background writer fairly aggressive. What works for you, if this is your problem, may be different. I've heard that some have had to tune their OS caching configuration. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Utilizing multiple cores in a function call.
I'm pretty much at that point where I've chewed the fat off of the algorithm, or at least at my personal limits. Occasionally a new idea pops into my head and yields an improvement but it's in the order of 100-250ms. Google came back with "no sir". It seems PostgreSQL is limited to one CPU per query unless I spawn a master/controller like you suggested. Shame.. Matthew Hartman Programmer/Analyst Information Management, ICP Kingston General Hospital (613) 549- x4294 -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Jean-David Beyer Sent: Monday, June 29, 2009 10:53 AM To: pgsql performance Subject: Re: [PERFORM] Utilizing multiple cores in a function call. Hartman, Matthew wrote: > Good morning. > > > > I have developed a function call that schedules patient appointments > within a day based on several resource constraints. The algorithm has > been mentioned on here before and I have managed to tweak it down to 6-9 > seconds from the original 27 seconds. > To speed up the execution of processes, I heartily recommend the book, "Writing Efficient Programs" by Jon Louis Bentley, Prentice-Hall, 1982. There are many important steps. The most important is usually to refine the algorithm itself. I once speeded up a program that would have required several weeks on a main frame running 24/7 to 6 minutes by improving the basic algorithm of the thing. Only then would it have made sense to optimize the actual code. Next, you need to profile the code to see where the hot spots are. There is little point to examining code in other parts of the program. > > Of course, I want it to be faster still. The function throttles one of > my CPUs to 100% (shown as 50% in Task Manager) and leaves the other one > sitting pretty. Is there any way to use both CPUs? > You could write your algorithm as a separate process -- a server. Then in you SQL program, you invoke a trivial function that just hands the arguments off to the server. Thus, your SQL program would normally run on one processor and the time-consuming algorithm would run on the other. If you are not careful, this would not benefit you at all because your SQL process would wait until the server returns its answer. So you would need to modify your SQL program so that it could do other things while the server process did its thing. My guess is that you need a more efficient algorithm before you go to the trouble of optimizing the execution of your current one. As far as making it run on multiple processors, it depends critically on the nature of your algorithm. A few can easily be modified to run on multiple processors. Some cannot run on multiple processors at all. > > > Thanks, > > > Matthew Hartman > Programmer/Analyst > Information Management, ICP > Kingston General Hospital > (613) 549- x4294 > > > > > -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jerseyhttp://counter.li.org ^^-^^ 10:40:01 up 10 days, 21:29, 3 users, load average: 4.19, 4.22, 4.19 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Utilizing multiple cores in a function call.
Hartman, Matthew wrote: I'm pretty much at that point where I've chewed the fat off of the algorithm, or at least at my personal limits. Occasionally a new idea pops into my head and yields an improvement but it's in the order of 100-250ms. Google came back with "no sir". It seems PostgreSQL is limited to one CPU per query unless I spawn a master/controller like you suggested. Shame.. Although I have never done it myself, you might try using PL/R to perform the algo in R, and make use of snow package to run parallel tasks -- see: http://cran.r-project.org/web/views/HighPerformanceComputing.html Joe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Utilizing multiple cores in a function call.
On Mon, 29 Jun 2009, Hartman, Matthew wrote: The function throttles one of my CPUs to 100% (shown as 50% in Task Manager) and leaves the other one sitting pretty. Is there any way to use both CPUs? Not easily. Potential techniques: -Rewrite the function or its time critical portion in some other language that allows using two processes usefully -Write a "worker server" that you prompt to pick up work from a table and write its output to another that you can ask to handle part of the job. You might communicate with the worker using the LISTEN/NOTIFY mechanism in the database. -Some combination of these two techniques. One popular way to speed up things that are running slowly is to run some part of them in a C UDF, so that you could use "select my_big_computation(x,y,z)" and get faster execution. If you were hoping for a quick answer, no such thing. I suspect you'd get better help talking about what your function does and see if there's a specific part somebody else is familiar with optimizing. For example, I've seen >10:1 speedups just be rewriting one small portion of a computationally expensive mathematical function in C before, keeping the rest of the logic on the database side. You don't necessarily have to rewrite the whole thing. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Utilizing multiple cores in a function call.
On Mon, Jun 29, 2009 at 10:26 AM, Hartman, Matthew wrote: > Good morning. > > > > I have developed a function call that schedules patient appointments within > a day based on several resource constraints. The algorithm has been > mentioned on here before and I have managed to tweak it down to 6-9 seconds > from the original 27 seconds. > > > > Of course, I want it to be faster still. The function throttles one of my > CPUs to 100% (shown as 50% in Task Manager) and leaves the other one sitting > pretty. Is there any way to use both CPUs? Your best bet at using multiple cores on a cpu bound problem is to try and divide up the work logically into separate pools and to attack the work with multiple function calls. This is probably what the database would do for you if it had 'in-query multi threading', only the database could attack it on a much finer grained level. In your particular case, I think the answer is to attack the problem in an entirely new direction, although your matrix query is one of the coolest queries i've seen in a while. The first thought that jumped out at me was to try and treat your nurses and stations as incrementing numbers so that if you allocate three hours of nurse x's time, you increment some number by three in the nurse's table. This would lay on top of a kind of a time calculation system that would convert that number to actual time based on the nurses schedule, etc. On top of _that_, you would need some kind of resolution system to handle canceled appointments, nurse no-shows, etc. The stations would operate on a similar principle...you imagine all the available hours for the station stretched to infinity on a number line and keep a fixed allocation point which always moves forwards, plus a 'number line time' -> real time converter and a freestore list to pick up unexpectedly freed time. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random slow query
Hi Sean, Sean Ma wrote: One months ago, this type of slow query happened about a few time per day. But recently, the slow query happens more frequent at the rate of once every 10 minutes or less. There seesm not relation to th What is your hardware (memory, CPU type and such)? This seems like a cache issue to me, but I can't tell for sure without some additional information on your system: 1) What is the amount of a) available memory b) free memory and c) memory available to i/o buffers? 2) What is the swap usage if any? 3) What is the CPU load? Any noticeable patterns in CPU load? You can use /usr/bin/top to obtain most of this information. Mike -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Utilizing multiple cores in a function call.
On Mon, 2009-06-29 at 14:42 -0400, Greg Smith wrote: > -Write a "worker server" that you prompt to pick up work from a table and > write its output to another that you can ask to handle part of the job. > You might communicate with the worker using the LISTEN/NOTIFY mechanism in > the database. > > -Some combination of these two techniques. One popular way to speed up > things that are running slowly is to run some part of them in a C UDF, so > that you could use "select my_big_computation(x,y,z)" and get faster > execution. The trouble here is that the backend may not like having threads suddenly introduced into its execution environment. If properly written, I don't really see why a C UDF that used pthreads couldn't spawn two worker threads that _NEVER_ touched _ANY_ PostgreSQL APIs, talked to the SPI, etc, and let them run while blocking the main thread until they complete. Then again, I know relatively little about Pg's guts, and for all I know initing the pthread environment could completely mess up the backend. Personally I'd want to do it out-of-process, using a SECURITY DEFINER PL/PgSQL function owned by a role that also owned some otherwise private queue and result tables for your worker server. As Greg Smith noted, LISTEN/NOTIFY would allow your worker server to avoid polling and instead sleep when there's nothing in the queue, and would also let your waiting clients avoid polling the result table. > For example, I've seen >10:1 speedups just be rewriting one small portion > of a computationally expensive mathematical function in C before, keeping > the rest of the logic on the database side. You don't necessarily have to > rewrite the whole thing. A useful dirty trick is to use Psyco in Python. It's a specializing compiler that can get massive performance boosts out of Python code without any code changes, and it seems to work with PL/Python. Just: try: import psyco psyco.full() except: # Enabing Pysco failed; don't care pass in your function should get you a pretty serious boost. This will NOT, however, allow your code to use two cores at once; you'll need threading or multiple processes for that. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance