Re: [PERFORM] Restricting Postgres
Kevin Barnard wrote: I am generally interested in a good solution for this. So far our solution has been to increase the hardware to the point of allowing 800 connections to the DB. I don't have the mod loaded for Apache, but we haven't had too many problems there. The site is split pretty good between dynamic and non-dynamic, it's largely Flash with several plugins to the DB. However we still can and have been slammed and up to point of the 800 connections. What I don't get is why not use pgpool? This should eliminate the rapid fire forking of postgres instanaces in the DB server. I'm assuming you app can safely handle a failure to connect to the DB (i.e. exceed number of DB connections). If not it should be fairly simple to send a 503 header when it's unable to get the connection. Note, that I am not necessarily looking for a PostgreSQL solution to the matter. Just a way to prevent the database from killing off the server it sits on, but looking at the load averages. I have attempted to make use of pgpool and have had some very poor performance. There were constant error messages being sounded, load averages on that machine seemed to skyrocket and it just seemed to not be suited for my needs. Apache::DBI overall works better to what I require, even if it is not a pool per sey. Now if pgpool supported variable rate pooling like Apache does with it's children, it might help to even things out. That and you'd still get the spike if you have to start the webserver and database server at or around the same time. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Restricting Postgres
Matt Clark wrote: Case in point: A first time visitor hits your home page. A dynamic page is generated (in about 1 second) and served (taking 2 more seconds) which contains links to 20 additional The gain from an accelerator is actually even more that that, as it takes essentially zero seconds for Apache to return the generated content (which in the case of a message board could be quite large) to Squid, which can then feed it slowly to the user, leaving Apache free again to generate another page. When serving dialup users large dynamic pages this can be a _huge_ gain. I think Martin's pages (dimly recalling another thread) take a pretty long time to generate though, so he may not see quite such a significant gain. Correct the 75% of all hits are on a script that can take anywhere from a few seconds to a half an hour to complete.The script essentially auto-flushes to the browser so they get new information as it arrives creating the illusion of on demand generation. A squid proxy would probably cause severe problems when dealing with a script that does not complete output for a variable rate of time. As for images, CSS, javascript and such the site makes use of it, but in the grand scheme of things the amount of traffic they tie up is literally inconsequential. Though I will probably move all of that onto another server just to allow the main server the capabilities of dealing with almost exclusively dynamic content. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Restricting Postgres
Matt Clark wrote: Apache::DBI overall works better to what I require, even if it is not a pool per sey. Now if pgpool supported variable rate pooling like Apache does with it's children, it might help to even things out. That and you'd still get the spike if you have to start the webserver and database server at or around the same time. I still don't quite get it though - you shouldn't be getting more than one child per second being launched by Apache, so that's only one PG postmaster per second, which is really a trivial load. That is unless you have 'StartServers' set high, in which case the 'obvious' answer is to lower it. Are you launching multiple DB connections per Apache process as well? I have start servers set to a fairly high limit. However this would make little different overall if I restarted the webservers to load in new modules during a high load time.When I am averaging 145 concurrent connections before a restart, I can expect that many request to hit the server once Apache begins to respond. As a result, it will literally cause a spike on both machines as new connections are initiated at a high rate. In my case I don't always have the luxury of waiting till 0300 just to test a change. Again, not necessarily looking for a PostgreSQL solution. I am looking for a method that would allow the database or the OS itself to protect the system it's hosted on.If both the database and the apache server were on the same machine this type of scenario would be unstable to say the least. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Restricting Postgres
Matt Clark wrote: Correct the 75% of all hits are on a script that can take anywhere from a few seconds to a half an hour to complete.The script essentially auto-flushes to the browser so they get new information as it arrives creating the illusion of on demand generation. This is more like a streaming data server, which is a very different beast from a webserver, and probably better suited to the job. Usually either multithreaded or single-process using select() (just like Squid). You could probably build one pretty easily. Using a 30MB Apache process to serve one client for half an hour seems like a hell of a waste of RAM. These are CGI scripts at the lowest level, nothing more and nothing less. While I could probably embed a small webserver directly into the perl scripts and run that as a daemon, it would take away the portability that the scripts currently offer. This should be my last question on the matter, does squid report the proper IP address of the client themselves?That's a critical requirement for the scripts. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Restricting Postgres
Matt Clark wrote: Pierre-Frédéric Caillaud wrote: check this marvelus piece of 5 minutes of work : http://boutiquenumerique.com/test/iframe_feed.html cela m'a fait le sourire :-) (apologies for bad french) M Javascript is not an option for the scripts, one of the mandates of the project is to support as many different client setups as possible and we have encountered everything from WebTV to the latest Firefox release. It's a chat/roleplay community and not everyone will invest in new equipment. Now, it would seem to me that there is a trade off between a JS push system and a constant ever-present process.With the traditional method as I use it, a client will incur the initial penalty of going through authentication, pulling the look and feel of the realms, sites and simply poll one table from that point on. Now on the other hand, you have one user making a call for new posts every x amount of seconds. This means every X seconds the penalty for authentication and design would kick in, increasing overall the load. The current scripts can also by dynamically adapted to slow things down based on heavy load or quiet realms that bring little posts in. It's much harder to expect Javascript solutions to work perfectly every time and not be modified by some proxy. Unfortunately, we are getting way off track. I'm looking for a way to protect the PostgreSQL server, either from PostgreSQL or some sort of external script which pools load average once in a while to make that determination. Now is there an administrative command in PostgreSQL that will cause it to move into some sort of maintenance mode? For me that could be exceedingly useful as it would still allow for an admin connection to be made and run a VACUUM FULL and such. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Restricting Postgres
Simon Riggs wrote: On Tue, 2004-11-02 at 23:52, Martin Foster wrote: Is there a way to restrict how much load a PostgreSQL server can take before dropping queries in order to safeguard the server?I was looking at the login.conf (5) man page and while it allows me to limit by processor time this seems to not fit my specific needs. Essentially, I am looking for a sort of functionality similar to what Sendmail and Apache have. Once the load of the system reaches a certain defined limit the daemon drops tasks until such a time that it can resume normal operation. Sounds great... could you give more shape to the idea, so people can comment on it? What limit? Measured how? Normal operation is what? Drop what? How to tell? Let's use the example in Apache, there is the Apache::LoadAvgLimit mod_perl module which allows one to limit based on the system load averages. Here is an example of the configuration one would find: Location /perl PerlInitHandler Apache::LoadAvgLimit PerlSetVar LoadAvgLimit_1 3.00 PerlSetVar LoadAvgLimit_5 2.00 PerlSetVar LoadAvgLimit_15 1.50 PerlSetVar LoadAvgRetryAfter 120 /Location The end state is simple, once the load average moves above 3.00 for the 1 minute average the web server will not process the CGI scripts or mod_perl applications under that directory. Instead it will return a 503 error and save the system from being crushed by ever increasing load averages. Only once the load average is below the defined limits will the server process requests as normal. This is not necessarily the nicest or cleanest way or doing things, but it does allow the Apache web server to prevent a collapse. There are ways of restricting the size of files, number of concurrent processes and even memory being used by a daemon. This can be done through ulimit or the login.conf file if your system supports it. However, there is no way to restrict based on load averages, only processor time which is ineffective for a perpetually running daemon like PostgreSQL has. While not necessarily common on my servers I have witnessed some fairly high load averages which may have led to the machine dropping outright. Any help on this matter would be appreciated. You can limit the number of connections overall? Limiting concurrent connections is not always the solution to the problem. Problems can occur when there is a major spike in activity that would be considered abnormal, due to outside conditions. For example using Apache::DBI or pgpool the DBMS may be required to spawn a great deal of child processed in a short order of time. This in turn can cause a major spike in processor load and if unchecked by running as high demand queries the system can literally increase in load until the server buckles. I've seen this behavior before when restarting the web server during heavy loads.Apache goes from zero connections to a solid 120, causing PostgreSQL to spawn that many children in a short order of time just to keep up with the demand. PostgreSQL undertakes a penalty when spawning a new client and accepting a connection, this slows takes resources at every level to accomplish. However clients on the web server are hitting the server at an accelerated rate because of the slowed response, leading to even more demand being placed on both machines. In most cases the processor will be taxed and the load average high enough to cause even a noticeable delay when using a console, however it will generally recover... slowly or in rare cases crash outright. In such a circumstance, having the database server refuse queries when the sanity of the system is concerned might come in handy for such a circumstance. Of course, I am not blaming PostgreSQL, there are probably some instabilities in the AMD64 port of FreeBSD 5.2.1 for dual processor systems that lead to an increased chance of failure instead of recovery. However, if there was a way to prevent the process from reaching those limits, it may avoid the problem altogether. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Restricting Postgres
John A Meinel wrote: Martin Foster wrote: Simon Riggs wrote: On Tue, 2004-11-02 at 23:52, Martin Foster wrote: [...] I've seen this behavior before when restarting the web server during heavy loads.Apache goes from zero connections to a solid 120, causing PostgreSQL to spawn that many children in a short order of time just to keep up with the demand. But wouldn't limiting the number of concurrent connections do this at the source. If you tell it that You can at most have 20 connections you would never have postgres spawn 120 children. I'm not sure what apache does if it can't get a DB connection, but it seems exactly like what you want. Now, if you expected to have 50 clients that all like to just sit on open connections, you could leave the number of concurrent connections high. But if your only connect is from the webserver, where all of them are designed to be short connections, then leave the max low. The other possibility is having the webserver use connection pooling, so it uses a few long lived connections. But even then, you could limit it to something like 10-20, not 120. John =:- I have a dual processor system that can support over 150 concurrent connections handling normal traffic and load. Now suppose I setup Apache to spawn all of it's children instantly, what will happen is that as this happens the PostgreSQL server will also receive 150 attempts at connection. This will spawn 150 children in a short order of time and as this takes place clients can connect and start requesting information not allowing the machine to settle down to a normal traffic.That spike when initiated can cripple the machine or even the webserver if a deadlocked transaction is introduced. Because on the webserver side a slowdown in the database means that it will just get that many more connection attempts pooled from the clients. As they keep clicking and hitting reload over and over to get a page load, that server starts to buckle hitting unbelievably high load averages. When the above happened once, I lost the ability to type on a console because of a 60+ (OpenBSD) load average on a single processor system. The reason why Apache now drops a 503 Service Unavailable when loads get too high. It's that spike I worry about and it can happen for whatever reason. It could just as easily be triggered by a massive concurrent request for processing of an expensive query done in DDOS fashion. This may not affect the webserver at all, at least immediately, but the same problem can effect can come into effect. Limiting connections help, but it's not the silver bullet and limits your ability to support more connections because of that initial spike. The penalty for forking a new child is hardly unexecpected, even Apache will show the same effect when restarted in a high traffic time. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Restricting Postgres
Is there a way to restrict how much load a PostgreSQL server can take before dropping queries in order to safeguard the server?I was looking at the login.conf (5) man page and while it allows me to limit by processor time this seems to not fit my specific needs. Essentially, I am looking for a sort of functionality similar to what Sendmail and Apache have. Once the load of the system reaches a certain defined limit the daemon drops tasks until such a time that it can resume normal operation. While not necessarily common on my servers I have witnessed some fairly high load averages which may have led to the machine dropping outright. Any help on this matter would be appreciated. -- Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Cleaning up indexes
My database was converted from MySQL a while back and has maintained all of the indexes which were previously used. Tt the time however, there were limitations on the way PostgreSQL handled the indexes compared to MySQL. Meaning that under MySQL, it would make use of a multi-column index even if the rows within did not match.When the conversion was made more indexes were created overall to correct this and proceed with the conversion. Now the time has come to clean up the used indexes. Essentially, I want to know if there is a way in which to determine which indexes are being used and which are not. This will allow me to drop off the unneeded ones and reduce database load as a result. And have things changed as to allow for mismatched multi-column indexes in version 7.4.x or even the upcoming 8.0.x? Martin Foster [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Tanking a server with shared memory
I have been experimenting with the 'IPC::Shareable' module under the native implementation of Perl 5 for OpenBSD 3.5. While it is not loaded by default it is a pure pure implementation. I have tested this module under two machines, one which used to run PostgreSQL and has a higher then normal amount of SYSV semaphores. The other has a normal amount, when testing under the former database server things load up fine, clients can connect and all information is as it should. When I test under the normal setup the machine tanks.No core dumps, no errors produced, just a near instant lock-up of the server itself and that is with a non-privileged user. While I know this is a Perl issue, but figured I might be able to gain some insight on how a server could drop without at least generating a panic. Any ideas? Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Faster with a sub-query then without
Tom Lane wrote: Martin Foster [EMAIL PROTECTED] writes: The one not using sub-queries under EXPLAIN ANALYZE proves itself to be less efficient and have a far higher cost then those with the penalty of a sub-query. Since this seems to be counter to what I have been told in the past, I thought I would bring this forward and get some enlightenment. The ones with the subqueries are not having to form the full join of W and G; they just pick a few rows out of G and look up the matching W rows. The subquery penalty is nonexistent in this case because the subqueries are not dependent on any variables from the outer query, and so they need be evaluated only once, rather than once per outer-query row which is what I suppose you were expecting. This is reflected in the EXPLAIN output: notice they are shown as InitPlans not SubPlans. The outputs of the InitPlans are essentially treated as constants (shown as $0 in the EXPLAIN output) and the outer plan is approximately what it would be if you'd written WHERE g.field = 'constant' instead of WHERE g.field = (select ...) regards, tom lane That would explain it overall. Still, it does seem unusual when one puts in additional code, which most literature warns you about and you actually gain a speed boost. Thanks! Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Faster with a sub-query then without
I thought this could generate some interesting discussion. Essentially, there are three queries below, two using sub-queries to change the way the randomized information (works first by author and then by work) and the original which simply randomizes out of all works available. The one not using sub-queries under EXPLAIN ANALYZE proves itself to be less efficient and have a far higher cost then those with the penalty of a sub-query. Since this seems to be counter to what I have been told in the past, I thought I would bring this forward and get some enlightenment. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] --- SELECT g.GalleryID, w.WorkID, w.WorkName, w.WorkImageThumbnail, g.GalleryRating, g.GalleryPenName FROM ethereal.Work w, ethereal.Gallery g WHERE w.GalleryID = g.GalleryID AND g.GalleryPrivacy = 'no' AND w.WorkImageThumbnail IS NOT NULL AND g.PuppeteerLogin = (SELECT PuppeteerLogin FROM ethereal.Gallery WHERE GalleryType='image' GROUP BY PuppeteerLogin ORDER BY RANDOM() LIMIT 1) ORDER BY RANDOM() LIMIT 1 Limit (cost=60.70..60.70 rows=1 width=100) (actual time=1.013..1.013 rows=0 loops=1) InitPlan - Limit (cost=6.36..6.37 rows=1 width=11) (actual time=0.711..0.713 rows=1 loops=1) - Sort (cost=6.36..6.45 rows=33 width=11) (actual time=0.708..0.708 rows=1 loops=1) Sort Key: random() - HashAggregate (cost=5.45..5.53 rows=33 width=11) (actual time=0.420..0.553 rows=46 loops=1) - Seq Scan on gallery (cost=0.00..5.30 rows=60 width=11) (actual time=0.007..0.227 rows=59 loops=1) Filter: ((gallerytype)::text = 'image'::text) - Sort (cost=54.33..54.37 rows=16 width=100) (actual time=1.009..1.009 rows=0 loops=1) Sort Key: random() - Nested Loop (cost=0.00..54.01 rows=16 width=100) (actual time=0.981..0.981 rows=0 loops=1) - Seq Scan on gallery g (cost=0.00..5.56 rows=2 width=24) (actual time=0.855..0.888 rows=1 loops=1) Filter: (((galleryprivacy)::text = 'no'::text) AND ((puppeteerlogin)::text = ($0)::text)) - Index Scan using pkwork on work w (cost=0.00..24.10 rows=8 width=80) (actual time=0.080..0.080 rows=0 loops=1) Index Cond: (w.galleryid = outer.galleryid) Filter: (workimagethumbnail IS NOT NULL) Total runtime: 1.211 ms --- SELECT g.GalleryID, w.WorkID, w.WorkName, w.WorkImageThumbnail, g.GalleryRating, g.GalleryPenName FROM ethereal.Work w, ethereal.Gallery g WHERE w.GalleryID = g.GalleryID AND g.GalleryPrivacy = 'no' AND w.WorkImageThumbnail IS NOT NULL AND g.GalleryPenName = (SELECT GalleryPenName FROM ethereal.Gallery WHERE GalleryType='image' GROUP BY GalleryPenName ORDER BY RANDOM() LIMIT 1) ORDER BY RANDOM() LIMIT 1 Limit (cost=59.92..59.92 rows=1 width=100) (actual time=0.904..0.906 rows=1 loops=1) InitPlan - Limit (cost=6.69..6.69 rows=1 width=14) (actual time=0.731..0.733 rows=1 loops=1) - Sort (cost=6.69..6.79 rows=42 width=14) (actual time=0.729..0.729 rows=1 loops=1) Sort Key: random() - HashAggregate (cost=5.45..5.56 rows=42 width=14) (actual time=0.431..0.568 rows=48 loops=1) - Seq Scan on gallery (cost=0.00..5.30 rows=60 width=14) (actual time=0.011..0.233 rows=59 loops=1) Filter: ((gallerytype)::text = 'image'::text) - Sort (cost=53.23..53.27 rows=16 width=100) (actual time=0.899..0.899 rows=1 loops=1) Sort Key: random() - Nested Loop (cost=0.00..52.91 rows=16 width=100) (actual time=0.808..0.862 rows=6 loops=1) - Index Scan using idxgallery_pen on gallery g (cost=0.00..4.45 rows=2 width=24) (actual time=0.767..0.769 rows=1 loops=1) Index Cond: ((gallerypenname)::text = ($0)::text) Filter: ((galleryprivacy)::text = 'no'::text) - Index Scan using pkwork on work w (cost=0.00..24.10 rows=8 width=80) (actual time=0.020..0.042 rows=6 loops=1) Index Cond: (w.galleryid = outer.galleryid) Filter: (workimagethumbnail IS NOT NULL) Total runtime: 1.117 ms --- SELECT g.GalleryID, w.WorkID, w.WorkName, w.WorkImageThumbnail, g.GalleryRating, g.GalleryPenName FROM ethereal.Work w, ethereal.Gallery g WHERE w.GalleryID = g.GalleryID AND g.GalleryType = 'image' AND g.GalleryPrivacy = 'no' AND w.WorkImageThumbnail IS NOT NULL ORDER BY RANDOM() LIMIT 1 Limit (cost=111.73..111.73 rows=1 width=100) (actual time=13.021..13.023 rows=1 loops=1) - Sort (cost=111.73..113.70 rows=786 width=100
Re: [PERFORM] Performance Bottleneck
Jeff wrote: On Aug 8, 2004, at 1:29 AM, Martin Foster wrote: I am currently making use of Apache::DBI which overrides the DBI::disconnect call and keeps a pool of active connections for use when need be. Since it offloads the pooling to the webserver, it seems more advantageous then pgpool which while being able to run on a external system is not adding another layer of complexity. Apache::DBI is not the same sort of a pool as pgpool. DB connections are not shared among all your apache children (A common misconception). So if you have 300 apache kids you can have have 300 db connections. With pgpool connections are shared among all of them so even though you have 300 kids you only have say 32 db connections. Seems that you are right, never noticed that from the documentation before. I always assumed it had something to do with the long lasting/persistent scripts that would remain in transactions for extended periods of time. Here is an odd question. While the server run 7.4.x, the client connects with 7.3.x. Would this in itself make a difference in performance as the protocols are different? At least based from pgpool's documentation. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Performance Bottleneck
Arjen van der Meijden wrote: On 8-8-2004 16:29, Matt Clark wrote: There are two well-worn and very mature techniques for dealing with the issue of web apps using one DB connection per apache process, both of which work extremely well and attack the issue at its source. 1)Use a front-end caching proxy like Squid as an accelerator. Static content will be served by the accelerator 99% of the time. Additionally, large pages can be served immediately to the accelerator by Apache, which can then go on to serve another request without waiting for the end user's dial-up connection to pull the data down. Massive speedup, fewer apache processes needed. Another version of this 1) is to run with a content accelerator; our favourite is to run Tux in front of Apache. It takes over the connection-handling stuff, has a very low memoryprofile (compared to Apache) and very little overhead. What it does, is to serve up all simple content (although you can have cgi/php/perl and other languages being processed by it, entirely disabling the need for apache in some cases) and forwards/proxies everything it doesn't understand to an Apache/other webserver running at the same machine (which runs on another port). I think there are a few advantages over Squid; since it is partially done in kernel-space it can be slightly faster in serving up content, apart from its simplicity which will probably matter even more. You'll have no caching issues for pages that should not be cached or static files that change periodically (like every few seconds). Afaik Tux can handle more than 10 times as much ab-generated requests per second than a default-compiled Apache on the same machine. And besides the speed-up, you can do any request you where able to do before, since Tux will simply forward it to Apache if it didn't understand it. Anyway, apart from all that. Reducing the amount of apache-connections is nice, but not really the same as reducing the amount of pooled-connections using a db-pool... You may even be able to run with 1000 http-connections, 40 apache-processes and 10 db-connections. In case of the non-pooled setup, you'd still have 40 db-connections. In a simple test I did, I did feel pgpool had quite some overhead though. So it should be well tested, to find out where the turnover-point is where it will be a gain instead of a loss... Best regards, Arjen van der Meijden Other then images, there are very few static pages being loaded up by the user.Since they make up a very small portion of the traffic, it tends to be an optimization we can forgo for now. I attempted to make use of pgpool. At the default 32 connections pre-forked the webserver almost immediately tapped out the pgpool base and content stopped being served because no new processes were being forked to make up for it. So I raised it to a higher value (256) and it immediately segfaulted and dropped the core.So not sure exactly how to proceed, since I rather need the thing to fork additional servers as load hits and not the other way around. Unless I had it configured oddly, but it seems work differently then an Apache server would to handle content. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance Bottleneck
Jan Wieck wrote: On 8/3/2004 2:05 PM, Martin Foster wrote: I run a Perl/CGI driven website that makes extensive use of PostgreSQL (7.4.3) for everything from user information to formatting and display of specific sections of the site. The server itself, is a dual processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives mirrored for redundancy running under FreeBSD 5.2.1 (AMD64). Recently loads on the site have increased during peak hours to the point of showing considerable loss in performance.This can be observed when connections move from the 120 concurrent connections to PostgreSQL to roughly 175 or more. Essentially, the machine seems to struggle to keep up with continual requests and slows down respectively as resources are tied down. Have you taken a look at pgpool? I know, it sounds silly to *reduce* the number of DB connections through a connection pool, but it can help. Jan I am currently making use of Apache::DBI which overrides the DBI::disconnect call and keeps a pool of active connections for use when need be. Since it offloads the pooling to the webserver, it seems more advantageous then pgpool which while being able to run on a external system is not adding another layer of complexity. Anyone had any experience with both Apache::DBI and pgpool? For my needs they seem to do essentially the same thing, simply that one is invisible to the code while the other requires adding the complexity of a proxy. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Bottleneck
Christopher Kings-Lynne wrote: This value of wal_buffers is simply ridiculous. Instead I think is ridiculous a wal_buffers = 8 ( 64KB ) by default. There is no point making WAL buffers higher than 8. I have done much testing of this and it makes not the slightest difference to performance that I could measure. Chris No point? I had it at 64 if memory serves and logs were warning me that raising this value would be desired because of excessive IO brought upon from the logs being filled far too often. It would seem to me that 8 is a bit low in at least a few circumstances. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance Bottleneck
Gaetano Mendola wrote: Let start from your postgres configuration: shared_buffers = 8192 This is really too small for your configuration sort_mem = 2048 wal_buffers = 128 This is really too small for your configuration effective_cache_size = 16000 change this values in: shared_buffers = 5 sort_mem = 16084 wal_buffers = 1500 effective_cache_size = 32000 to bump up the shm usage you have to configure your OS in order to be allowed to use that ammount of SHM. This are the numbers that I feel good for your HW, the second step now is analyze your queries These changes have yielded some visible improvements, with load averages rarely going over the anything noticeable. However, I do have a question on the matter, why do these values seem to be far higher then what a frequently pointed to document would indicate as necessary? http://www.varlena.com/GeneralBits/Tidbits/perf.html I am simply curious, as this clearly shows that my understanding of PostgreSQL is clearly lacking when it comes to tweaking for the hardware. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Bottleneck
Mike Benoit wrote: On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote: The queries themselves are simple, normally drawing information from one table with few conditions or in the most complex cases using joins on two table or sub queries. These behave very well and always have, the problem is that these queries take place in rather large amounts due to the dumb nature of the scripts themselves. Show us the explain analyze on that queries, how many rows the tables are containing, the table schema could be also usefull. If the queries themselves are optimized as much as they can be, and as you say, its just the sheer amount of similar queries hitting the database, you could try using prepared queries for ones that are most often executed to eliminate some of the overhead. I've had relatively good success with this in the past, and it doesn't take very much code modification. One of the biggest problems is most probably related to the indexes. Since the performance penalty of logging the information needed to see which queries are used and which are not is a slight problem, then I cannot really make use of it for now. However, I am curious how one would go about preparing query? Is this similar to the DBI::Prepare statement with placeholders and simply changing the values passed on execute? Or is this something database level such as a view et cetera? SELECT Post.PostIDNumber, Post.$format, Post.PuppeteerLogin, Post.PuppetName, Post.PostCmd, Post.PostClass FROM Post WHERE Post.PostIDNumber ?::INT AND (Post.PostTo='all' OR Post.PostTo=?) AND (NOT EXISTS (SELECT PuppetIgnore.PuppetLogin FROM PuppetIgnore WHERE PuppetIgnore.PuppetIgnore='global' AND PuppetIgnore.PuppeteerLogin=? AND PuppetIgnore.PuppetLogin=Post.PuppeteerLogin) OR Post.PuppeteerLogin IS NULL) AND (NOT EXISTS (SELECT PuppetIgnore.PuppetName FROM PuppetIgnore WHERE PuppetIgnore.PuppetIgnore='single' AND PuppetIgnore.PuppeteerLogin=? AND PuppetIgnore.PuppetName=Post.PuppetName) OR Post.PuppetName IS NULL) ORDER BY Post.PostIDNumber LIMIT 100 The range is determined from the previous run or through a query listed below. It was determined that using INT was far faster then limiting by timestamp. SELECT MIN(PostIDNumber) FROM Post WHERE RealmName=? AND PostClass IN ('general','play') AND PostTo='all' The above simply provides a starting point, nothing more. Once posts are pulled the script will throw in the last pulled number as to start from a fresh point. Under MySQL time was an stored as an INT which may have helped it handle timestamps more efficiently.It also made use of three or more queries, where two were done to generate an IN statement for the query actually running at the time. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance Bottleneck
Scott Marlowe wrote: On Fri, 2004-08-06 at 17:24, Gaetano Mendola wrote: Martin Foster wrote: Gaetano Mendola wrote: Let start from your postgres configuration: shared_buffers = 8192 This is really too small for your configuration sort_mem = 2048 wal_buffers = 128 This is really too small for your configuration effective_cache_size = 16000 change this values in: shared_buffers = 5 sort_mem = 16084 wal_buffers = 1500 effective_cache_size = 32000 to bump up the shm usage you have to configure your OS in order to be allowed to use that ammount of SHM. This are the numbers that I feel good for your HW, the second step now is analyze your queries These changes have yielded some visible improvements, with load averages rarely going over the anything noticeable. However, I do have a question on the matter, why do these values seem to be far higher then what a frequently pointed to document would indicate as necessary? http://www.varlena.com/GeneralBits/Tidbits/perf.html I am simply curious, as this clearly shows that my understanding of PostgreSQL is clearly lacking when it comes to tweaking for the hardware. Unfortunately there is no a wizard tuning for postgres so each one of us have a own school. The data I gave you are oversized to be sure to achieve improvements. Now you can start to decrease these values ( starting from the wal_buffers ) in order to find the good compromise with your HW. FYI, my school of tuning is to change one thing at a time some reasonable percentage (shared_buffers from 1000 to 2000) and measure the change under simulated load. Make another change, test it, chart the shape of the change line. It should look something like this for most folks: shared_buffers | q/s (more is better) 100 | 20 200 | 45 400 | 80 1000 | 100 ... levels out here... 8000 | 110 1 | 108 2 | 40 3 | 20 Note it going back down as we exceed our memory and start swapping shared_buffers. Where that happens on your machine is determined by many things like your machine's memory, memory bandwidth, type of load, etc... but it will happen on most machines and when it does, it often happens at the worst times, under heavy parallel load. Unless testing shows it's faster, 1 or 25% of mem (whichever is less) is usually a pretty good setting for shared_buffers. Large data sets may require more than 1, but going over 25% on machines with large memory is usually a mistake, especially servers that do anything other than just PostgreSQL. You're absolutely right about one thing, there's no automatic wizard for tuning this stuff. Which rather points out the crux of the problem. This is a live system, meaning changes made need to be as informed as possible, and that changing values for the sake of testing can lead to potential problems in service. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Performance Bottleneck
Scott Marlowe wrote: On Fri, 2004-08-06 at 22:02, Martin Foster wrote: Scott Marlowe wrote: On Fri, 2004-08-06 at 17:24, Gaetano Mendola wrote: Martin Foster wrote: Gaetano Mendola wrote: Let start from your postgres configuration: shared_buffers = 8192 This is really too small for your configuration sort_mem = 2048 wal_buffers = 128 This is really too small for your configuration effective_cache_size = 16000 change this values in: shared_buffers = 5 sort_mem = 16084 wal_buffers = 1500 effective_cache_size = 32000 to bump up the shm usage you have to configure your OS in order to be allowed to use that ammount of SHM. This are the numbers that I feel good for your HW, the second step now is analyze your queries These changes have yielded some visible improvements, with load averages rarely going over the anything noticeable. However, I do have a question on the matter, why do these values seem to be far higher then what a frequently pointed to document would indicate as necessary? http://www.varlena.com/GeneralBits/Tidbits/perf.html I am simply curious, as this clearly shows that my understanding of PostgreSQL is clearly lacking when it comes to tweaking for the hardware. Unfortunately there is no a wizard tuning for postgres so each one of us have a own school. The data I gave you are oversized to be sure to achieve improvements. Now you can start to decrease these values ( starting from the wal_buffers ) in order to find the good compromise with your HW. FYI, my school of tuning is to change one thing at a time some reasonable percentage (shared_buffers from 1000 to 2000) and measure the change under simulated load. Make another change, test it, chart the shape of the change line. It should look something like this for most folks: shared_buffers | q/s (more is better) 100 | 20 200 | 45 400 | 80 1000 | 100 ... levels out here... 8000 | 110 1 | 108 2 | 40 3 | 20 Note it going back down as we exceed our memory and start swapping shared_buffers. Where that happens on your machine is determined by many things like your machine's memory, memory bandwidth, type of load, etc... but it will happen on most machines and when it does, it often happens at the worst times, under heavy parallel load. Unless testing shows it's faster, 1 or 25% of mem (whichever is less) is usually a pretty good setting for shared_buffers. Large data sets may require more than 1, but going over 25% on machines with large memory is usually a mistake, especially servers that do anything other than just PostgreSQL. You're absolutely right about one thing, there's no automatic wizard for tuning this stuff. Which rather points out the crux of the problem. This is a live system, meaning changes made need to be as informed as possible, and that changing values for the sake of testing can lead to potential problems in service. But if you make those changes slowly, as I was showing, you should see the small deleterious effects like I was showing long before they become catastrophic. To just jump shared_buffers to 5 is not a good idea, especially if the sweet spot is likely lower than that. While I agree, there are also issues with the fact that getting consistent results from this site are very much difficult to do, since it is based on the whims of users visiting one of three sites hosted on the same hardware. Now that being said, having wal_buffers at 8 certainly would not be a good idea, since the database logs themselves were warning of excessive writes in that region.I am not hoping for a perfect intermix ratio, that will solve all my problems. But a good idea on a base that will allow me to gain a fair load would certainly be a good option. Right now, the load being handled is not much more then a single processor system did with half the memory. Certainly this architecture should be able to take more of a beating then this? Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Bottleneck
Gaetano Mendola wrote: Martin Foster wrote: Gaetano Mendola wrote: Martin Foster wrote: I run a Perl/CGI driven website that makes extensive use of PostgreSQL (7.4.3) for everything from user information to formatting and display of specific sections of the site. The server itself, is a dual processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives mirrored for redundancy running under FreeBSD 5.2.1 (AMD64). Recently loads on the site have increased during peak hours to the point of showing considerable loss in performance.This can be observed when connections move from the 120 concurrent connections to PostgreSQL to roughly 175 or more. Essentially, the machine seems to struggle to keep up with continual requests and slows down respectively as resources are tied down. Code changes have been made to the scripts to essentially back off in high load working environments which have worked to an extent. However, as loads continue to increase the database itself is not taking well to the increased traffic taking place. Having taken a look at 'Tuning PostgreSQL for Performance' (http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as best I could in order to set my settings.However, even with statistics disabled and ever setting tweaked things still consider to deteriorate. Is there anything anyone can recommend in order to give the system a necessary speed boost? It would seem to me that a modest dataset of roughly a Gig combined with that type of hardware should be able to handle substantially more load then what it is. Can anyone provide me with clues as where to pursue?Would disabling 'fsync' provide more performance if I choose that information may be lost in case of a crash? If anyone needs access to logs, settings et cetera. Please ask, I simply wish to test the waters first on what is needed. Thanks! Tell us about your tipical queries, show us your configuration file. The access are only in read only mode or do you have concurrent writers and readers ? During peak hours your processors are tied to 100% ? What say the vmstat and the iostat ? May be you are not using indexes some where, or may be yes but the planner is not using it... In two words we needs other informations in order to help you. Regards Gaetano Mendola I included all the files in attachments, which will hopefully cut down on any replied to Emails.As for things like connection pooling, the web server makes use of Apache::DBI to pool the connections for the Perl scripts being driven on that server.For the sake of being thorough, a quick 'apachectl status' was thrown in when the database was under a good load. Let start from your postgres configuration: shared_buffers = 8192 This is really too small for your configuration sort_mem = 2048 wal_buffers = 128 This is really too small for your configuration effective_cache_size = 16000 change this values in: shared_buffers = 5 sort_mem = 16084 wal_buffers = 1500 effective_cache_size = 32000 to bump up the shm usage you have to configure your OS in order to be allowed to use that ammount of SHM. This are the numbers that I feel good for your HW, the second step now is analyze your queries The queries themselves are simple, normally drawing information from one table with few conditions or in the most complex cases using joins on two table or sub queries. These behave very well and always have, the problem is that these queries take place in rather large amounts due to the dumb nature of the scripts themselves. Show us the explain analyze on that queries, how many rows the tables are containing, the table schema could be also usefull. regards Gaetano Mendola I will look into moving up those values and seeing how they interact with the system once I get back from work.Since it was requested, I have a visual representation of an older schema, one that was used under MySQL. Note that all of the timestamps are now properly set to LOCALTIME on PostgreSQL. http://prdownloads.sourceforge.net/ethereal-realms/ethereal-3_0_0.png?download The amount of rows for tables of note are as follows: Puppeteer 1606 Puppet33176 Realm 83 Post 36156 Audit 61961 The post table is continually cleared of old information since the nature of the information is time very critical and archiving would only hinder performance.As a result, this will vary wildly based on time of day since users (Puppeteers) tend to post more during peak hours. NOTE: The scripts make use of different schema's with the same information in order to virtualize the script in order to support more then one site on the same hardware. On a side note, this would be a normal post-authentication session once in realm for getting new posts: * Script is executed and schema is determined through stored procedure; * Formatting information is fetched from Tag and RealmDesign as needed
[PERFORM] Performance Bottleneck
I run a Perl/CGI driven website that makes extensive use of PostgreSQL (7.4.3) for everything from user information to formatting and display of specific sections of the site. The server itself, is a dual processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives mirrored for redundancy running under FreeBSD 5.2.1 (AMD64). Recently loads on the site have increased during peak hours to the point of showing considerable loss in performance.This can be observed when connections move from the 120 concurrent connections to PostgreSQL to roughly 175 or more. Essentially, the machine seems to struggle to keep up with continual requests and slows down respectively as resources are tied down. Code changes have been made to the scripts to essentially back off in high load working environments which have worked to an extent. However, as loads continue to increase the database itself is not taking well to the increased traffic taking place. Having taken a look at 'Tuning PostgreSQL for Performance' (http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as best I could in order to set my settings.However, even with statistics disabled and ever setting tweaked things still consider to deteriorate. Is there anything anyone can recommend in order to give the system a necessary speed boost? It would seem to me that a modest dataset of roughly a Gig combined with that type of hardware should be able to handle substantially more load then what it is. Can anyone provide me with clues as where to pursue?Would disabling 'fsync' provide more performance if I choose that information may be lost in case of a crash? If anyone needs access to logs, settings et cetera. Please ask, I simply wish to test the waters first on what is needed. Thanks! Martin Foster [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Performance Bottleneck
Gaetano Mendola wrote: Martin Foster wrote: I run a Perl/CGI driven website that makes extensive use of PostgreSQL (7.4.3) for everything from user information to formatting and display of specific sections of the site. The server itself, is a dual processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives mirrored for redundancy running under FreeBSD 5.2.1 (AMD64). Recently loads on the site have increased during peak hours to the point of showing considerable loss in performance.This can be observed when connections move from the 120 concurrent connections to PostgreSQL to roughly 175 or more. Essentially, the machine seems to struggle to keep up with continual requests and slows down respectively as resources are tied down. Code changes have been made to the scripts to essentially back off in high load working environments which have worked to an extent. However, as loads continue to increase the database itself is not taking well to the increased traffic taking place. Having taken a look at 'Tuning PostgreSQL for Performance' (http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as best I could in order to set my settings.However, even with statistics disabled and ever setting tweaked things still consider to deteriorate. Is there anything anyone can recommend in order to give the system a necessary speed boost? It would seem to me that a modest dataset of roughly a Gig combined with that type of hardware should be able to handle substantially more load then what it is. Can anyone provide me with clues as where to pursue?Would disabling 'fsync' provide more performance if I choose that information may be lost in case of a crash? If anyone needs access to logs, settings et cetera. Please ask, I simply wish to test the waters first on what is needed. Thanks! Tell us about your tipical queries, show us your configuration file. The access are only in read only mode or do you have concurrent writers and readers ? During peak hours your processors are tied to 100% ? What say the vmstat and the iostat ? May be you are not using indexes some where, or may be yes but the planner is not using it... In two words we needs other informations in order to help you. Regards Gaetano Mendola I included all the files in attachments, which will hopefully cut down on any replied to Emails.As for things like connection pooling, the web server makes use of Apache::DBI to pool the connections for the Perl scripts being driven on that server.For the sake of being thorough, a quick 'apachectl status' was thrown in when the database was under a good load. Since it would rather slow things down to wait for the servers to really get bogged down with load averages of 20.00 and more, I opted to choose a period of time where we are a bit busier then normal. You will be able to see how the system behaves under a light load and subsequently reaching 125 or so concurrent connections. The queries themselves are simple, normally drawing information from one table with few conditions or in the most complex cases using joins on two table or sub queries. These behave very well and always have, the problem is that these queries take place in rather large amounts due to the dumb nature of the scripts themselves. Over a year ago when I was still using MySQL for the project, the statistics generated would report well over 65 queries per second under loads ranging from 130 to 160 at peak but averaged over the weeks of operation. Looking at the Apache status, one can see that it averages only roughly 2.5 requests per second giving you a slight indication as to what is taking place. A quick run of 'systat -ifstat' shows the following graph: /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10 Load Average Interface Traffic PeakTotal lo0 in 0.000 KB/s 0.000 KB/s 37.690 GB out 0.000 KB/s 0.000 KB/s 37.690 GB em0 in 34.638 KB/s 41.986 KB/s 28.998 GB out70.777 KB/s 70.777 KB/s 39.553 GB Em0 is a full duplexed 100Mbs connection to an internal switch that supports the servers directly. Load on the loopback was cut down considerably once I stopped using pg_autovaccum since its performance benefits under low load were buried under the hindrance it caused when traffic was high. I am sure that there are some places that could benefit from some optimization. Especially in the case of indexes, however as a whole the problem seems to be related more to the massive onslaught of queries then it does anything else. Also note that some of these scripts run for longer durations even if they are web based.Some run as long as 30 minutes, making queries to the database from periods of wait from five seconds to twenty-five seconds. Under high duress
Re: [NOVICE] [PERFORM] Extreme high load averages
Shridhar Daithankar wrote: On 10 Jul 2003 at 0:43, Martin Foster wrote: As for creating a new table, that in itself is a nice idea. But it would cause issues for people currently in the realm. Their posts would essentially dissapear from site and cause more confusion then its worth. No they won't. Say you have a base table and your current post table is child of that. You can query on base table and get rows from child table. That way all the data would always be there. While inserting posts, you would insert in child table. While qeurying you would query on base table. That way things will be optimal. Inheritance would work, but the database would essentially just grow and grow and grow right? Right. But there are two advantages. 1. It will always contain valid posts. No dead tuples. 2. You can work in chuncks of data. Each child table can be dealt with separately without affecting other child tables, whereas in case of a single large table, entire site is affected.. Deleting 100K posts from 101K rows table is vastly different than deleting 10K posts from 2M rows table. Later one would unnecessary starve the table with dead tuples and IO whereas in former case you can do create table as select from and drop the original.. HTH Bye Shridhar -- [In 'Doctor' mode], I spent a good ten minutes telling Emacs what Ithought of it. (The response was, 'Perhaps you could try to be lessabusive.')(By Matt Welsh) When I ran EXPLAIN on the views and queries making use of the inherited tables, I noticed that everything worked based on sequence scans and it avoided all indexes.While making use of ONLY kicked in full indexes. This is even after having created a child table with the same indexes as the parent. Is this a known issue, or just some sort of oddity on my setup? Tables still cannot be removed easily, but I found a way to work around it for a day-to-day basis. Essentailly I just clean out the tables containing old rows and delete them later. However based on the above, I doubt performance would get any better. Thanks for the advice however! Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Scott Marlowe wrote: It would be nice to have a program that could run on any OS postgresql runs on and could report on the current limits of the kernel, and make recommendations for changes the admin might want to make. One could probably make a good stab at effective cache size during install. Anything reasonably close would probably help. Report what % of said resources could be consumed by postgresql under various circumstances... One of the issues that automating the process would encounter are limits in the kernel that are too low for PostgreSQL to handle. The BSD's come to mind where they need values manually increased in the kernel before you can reach a reasonable maximum connection count. Another example is how OpenBSD will outright crash when trying to test the database during install time. It seems that most of the tests fail because the maximum amount of processes allowed is too low for the test to succeed. While FreeBSD will work just fine on those same tests. If PostgreSQL automates the configuration, that would be a plus. But also detect the platform and inform the person that these changes should be done to the kernel, sysctl or whatever in order to have that configuration run. Perl may be useful in this for a few reasons. It's portable enough to run on multiple Unix variants and the tools would be fairly standard, so the code would require less considerations for more exotic implementations. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Sean Chittenden wrote: I looked through the src/doc/runtime.sgml for a good place to stick this and couldn't find a place that this seemed appropriate, but on FreeBSD, this can be determined with a great deal of precision in a programmatic manner: echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192)) The same OID is available via C too. It'd be slick if PostgreSQL could tune itself (on FreeBSD) at initdb time with the above code. If Linux exports this info via /proc and can whip out the appropriate magic, even better. An uncommented out good guess that shows up in postgresql.conf would be stellar and quite possible with the use of sed. Maybe an initdb switch could be added to have initdb tune the config it generates? If a -n is added, have it generate a config and toss it to stdout? case `uname` in FreeBSD) echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192)) ;; *) echo Unable to automatically determine the effective cache size /dev/stderr ;; esac -sc Simplest way may be to create a 'auto-tune' directory with scripts for configured platforms. When postgres installs the databases, it checks for 'tune.xxx' and if found uses that to generate the script itself? This would allow for defaults on platforms that do not have them and optimization for those that do. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [NOVICE] [PERFORM] Extreme high load averages
Dennis Björklund wrote: On Sun, 6 Jul 2003, Martin Foster wrote: The processor seems to be purposely sitting there twiddling it's thumbs. Which leads me to believe that perhaps the nice levels have to be changed on the server itself? It could also be all the usual things that affect performance. Are your queries using indexes where it should? Do you vacuum analyze after you have updated/inserted a lot of data? It could be that some of your queries is not as efficient as it should, like doing a sequenctial scan over a table instead of an index scan. That translates into more IO needed and slower response times. Especially when you have more connections figthing for the available IO. I actually got a bit more respect for PostgreSQL tonight. It seems that one of my scripts was not committing changes after maintenance was conducted. Meaning that rows that would normally be removed after offline archiving was completed were in fact still around. Normally at any given point in time this table would grow 50K rows during a day, be archived that night and then loose rows that were no longer needed.This process, is what allowed MySQL to maintain any stability as the size of this table can balloon significantly. PostgreSQL with tweaking was handling a table with nearly 300K rows. That size alone would of dragged the MySQL system down to a near grind, and since most of those rows are not needed. One can imagine that queries are needlessly processing rows that should be outright ignored. This probably explains why row numbering based searches greatly accelerated the overall process. By fixing the script and doing the appropriate full vacuum and re-index, the system is behaving much more like it should. Even if the process may seem a bit odd to some. The reason for removing rows on a daily basis is due to the perishable nature of the information. Since this is a chat site, posts over a day old are rarely needed for any reason. Which is why they are archived into dumps in case we really need to retrieve the information itself and this gives us the added bonus of smaller backup sizes and smaller database sizes. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Michael Pohl wrote: On Sun, 6 Jul 2003, Matthew Nuzum wrote: At the very least, if there is good documentation for these parameters, maybe the conf file should provide a link to this info. I believe that is what Josh is proposing: http://archives.postgresql.org/pgsql-performance/2003-07/msg00102.php [Apache httpd] uses a three phase (if not more) documentation level. The .conf file contains detailed instructions in an easy to read and not-to-jargon-ish structure. The docs provide detailed tutorials and papers that expand on configuration params in an easy to read format. Both of these refer to the thorough reference manual that breaks each possible option down into it's nitty gritty details so that a user can get more information if they so desire. I agree that Apache's approach is primo. Often the .conf comments are enough to jog my memory about a directive I haven't used for a while. Or the comments are enough to let me know I don't need a directive, or that I need to go to the manual and read more. I appreciate that. michael One thing that may also help, is to include more sample .conf files. For example, you could include settings that would be commonly seen for decicated databases with generic specs and another with less resources and not dedicated for use with Postgres. This would allow users to see how certain setting changes will work. The default .conf is great if you want to setup a small test bed, but for a real life example chances are it won't exactly be what your looking for. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly