Re: [PERFORM] Really really slow select count(*)
On Tue, Feb 08, 2011 at 03:52:31PM -0600, Kevin Grittner wrote: Scott Marlowe scott.marl...@gmail.com wrote: Greg Smith g...@2ndquadrant.com wrote: Kevin and I both suggested a fast plus timeout then immediate behavior is what many users seem to want. Are there any settings in postgresql.conf that would make it unsafe to use -m immediate? I don't think so. There could definitely be problems if someone cuts power before your shutdown completes, though. (I hear that those firefighters like to cut power to a building before they grab those big brass nozzles to spray a stream of water into a building. Go figure...) Following you off topic, I know of one admin type who has stated I don't care what sort of fine the power company wants to give me, if my property's on fire, I'm going to pull the meter, in order to hand it to the first responder, rather than have them sit there waiting for the power tech to arrive while my house burns. Back on topic, I like the the idea of a timed escalation. That means there's two things to configure though, timeout(s?) and the set of states to escalate through. I can see different use cases for different sets. Hmmm: pg_ctl -m s:10:f:5:i restart for smart, 5 sec. timeout, escalate to fast, 5 sec., then immediate? Not sure how rhat would interact w/ -t. Perhaps: pg_ctl -t 10 -m s -t 5 -m f -m i restart Some video-processing tools do things like that: the order of options impacts their interaction. Ross -- Ross Reedstrom, Ph.D. reeds...@rice.edu Systems Engineer Admin, Research Scientistphone: 713-348-6166 Connexions http://cnx.orgfax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
Ross, Way off topic now, but from my time programming electrical meters I can tell you pulling the meter from its socket is potentially an extremely dangerous thing to do. If there is a load across the meter's poles the spark that results on disconnect could kill the puller instantly. (You don't want to know what happens if the person isn't killed.) I don't know what property your admin type is trying to protect, but I'm inclined to let it burn and live to work through the insurance collection process. Oh, and +1 for timed escalation of a shutdown. Bob Lunney --- On Wed, 2/16/11, Ross J. Reedstrom reeds...@rice.edu wrote: From: Ross J. Reedstrom reeds...@rice.edu Subject: Re: [PERFORM] Really really slow select count(*) big snip Following you off topic, I know of one admin type who has stated I don't care what sort of fine the power company wants to give me, if my property's on fire, I'm going to pull the meter, in order to hand it to the first responder, rather than have them sit there waiting for the power tech to arrive while my house burns. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Tue, Feb 8, 2011 at 3:23 PM, Shaun Thomas stho...@peak6.com wrote: With 300k rows, count(*) isn't a good test, really. That's just on the edge of big-enough that it could be 1-second to fetch from the disk controller, 1 second you say ? excellent, sign me up 70 seconds is way out of bounds I don't want a more efficient query to test with, I want the shitty query that performs badly that isolates an obvious problem. The default settings are not going to cut it for a database of your size, with the volume you say it's getting. not to mention the map reduce jobs I'm hammering it with all night :) but I did pause those until this is solved But you need to put in those kernel parameters I suggested. And I know this sucks, but you also have to raise your shared_buffers and possibly your work_mem and then restart the DB. But this time, pg_ctl to invoke a fast stop, and then use the init script in /etc/init.d to restart it. I'm getting another slicehost slice. hopefully I can clone the whole thing over without doing a full install and go screw around with it there. its a fairly complicated install, even with buildout doing most of the configuration. =felix
Re: [PERFORM] Really really slow select count(*)
On 02/07/2011 09:17 PM, felix wrote: Well it said Failed to shutdown ... and then returned control. and then proceeded to run for about an hour. I'm not sure how graceful that is. Ah, but that was just the control script that sends the database the command to shut down. The 'graceful' part, is that the database is being nice to everyone trying to do things with the data inside. The control script has a timeout. So it'll send the command, wait a few seconds to see if the database responds, and then gives up. At that point, you can use a fast shutdown to tell the database not to be so nice, and it'll force disconnect all users and shut down as quickly as possible while maintaining data integrity. The easiest way to see this in action is to take a look at the postgres log files. In most default installs, this is in /your/pg/dir/pg_log and the files follow a postgresql--MM-DD_HHMMSS.log format and generally auto-rotate. If not, set redirect_stderr to on, and make sure log_directory and log_filename are both set. Those are in your postgresql.conf, by the way. :) I've only been using postgres since we migrated in May Aha. Yeah... relatively new installs tend to have the worst growing pains. Once you shake this stuff out, you'll be much better off. its only conjecture that the issue is file space bloat or free map problems. those are overall issues that I will get to as soon as I can. but this is table specific. With 300k rows, count(*) isn't a good test, really. That's just on the edge of big-enough that it could be 1-second to fetch from the disk controller, even if the table is fully vacuumed. And in your case, that table really will likely come from the disk controller, as your shared_buffers are set way too low. The default settings are not going to cut it for a database of your size, with the volume you say it's getting. But you need to put in those kernel parameters I suggested. And I know this sucks, but you also have to raise your shared_buffers and possibly your work_mem and then restart the DB. But this time, pg_ctl to invoke a fast stop, and then use the init script in /etc/init.d to restart it. I am not a DBA, You are now. :) You're administering a database, either as part of your job description, or because you have no choice because your company doesn't have an official DBA. Either way, you'll need to know this stuff. Which is why we're helping out. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
Well it said Failed to shutdown ... and then returned control. and then proceeded to run for about an hour. I'm not sure how graceful that is. Ah, but that was just the control script that sends the database the command to shut down. The 'graceful' part, is that the database is being nice to everyone trying to do things with the data inside. The control script has a timeout. So it'll send the command, wait a few seconds to see if the database responds, and then gives up. For what it's worth, I think that's the not-so-graceful part. The control script gives up, but the actual shutdown still occurs eventually, after all current connections have ended. I think most users will take pg_ctl at its word, and assume Failed to shutdown means I couldn't shut down with this command, maybe you should try something else, and not I couldn't shut down right now, although I'll get to it as soon as everyone disconnects.. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
Maciek Sakrejda msakre...@truviso.com wrote: Well it said Failed to shutdown ... and then returned control. and then proceeded to run for about an hour. I'm not sure how graceful that is. Ah, but that was just the control script that sends the database the command to shut down. The 'graceful' part, is that the database is being nice to everyone trying to do things with the data inside. The control script has a timeout. So it'll send the command, wait a few seconds to see if the database responds, and then gives up. For what it's worth, I think that's the not-so-graceful part. The control script gives up, but the actual shutdown still occurs eventually, after all current connections have ended. I think most users will take pg_ctl at its word, and assume Failed to shutdown means I couldn't shut down with this command, maybe you should try something else, and not I couldn't shut down right now, although I'll get to it as soon as everyone disconnects.. Yeah, current behavior with that shutdown option is the opposite of smart for any production environment I've seen. (I can see where it would be handy in development, though.) What's best in production is the equivalent of the fast option with escalation to immediate if necessary to ensure shutdown within the time limit. In my world, telling PostgreSQL to shut down PostgreSQL is most often because in a few minutes someone is going to pull the plug to move the server, an electrician is going to flip the circuit off to do some wiring, or (in one recent event) the building is on fire and the fire department is about to cut electrical power. In such situations, patiently waiting for a long-running query to complete is a Very Bad Idea, much less waiting for a connection pool to cycle all connections out. Telling the user that the shutdown failed, when what is really happening is that it will block new connections and keep waiting around indefinitely, with an actual shutdown at some ill-defined future moment is adding insult to injury. In my view, anyway -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Tue, Feb 8, 2011 at 18:36, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Yeah, current behavior with that shutdown option is the opposite of smart for any production environment I've seen. (I can see where it would be handy in development, though.) What's best in production is the equivalent of the fast option with escalation to immediate if necessary to ensure shutdown within the time limit. +1, we should call it dumb :) Not accepting new connections with the database system is shutting down makes it even worse -- it means you can't log in to the server to inspect who's querying it or call pg_terminate_backend() on them. I couldn't find any past discussions about changing the default to fast. Are there any reasons why that cannot be done in a future release? Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Tue, Feb 8, 2011 at 9:50 AM, Marti Raudsepp ma...@juffo.org wrote: On Tue, Feb 8, 2011 at 18:36, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Yeah, current behavior with that shutdown option is the opposite of smart for any production environment I've seen. (I can see where it would be handy in development, though.) What's best in production is the equivalent of the fast option with escalation to immediate if necessary to ensure shutdown within the time limit. +1, we should call it dumb :) Not accepting new connections with the database system is shutting down makes it even worse -- it means you can't log in to the server to inspect who's querying it or call pg_terminate_backend() on them. I couldn't find any past discussions about changing the default to fast. Are there any reasons why that cannot be done in a future release? Or at least throw a hint the user's way that -m fast might be needed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
I couldn't find any past discussions about changing the default to fast. Are there any reasons why that cannot be done in a future release? Or at least throw a hint the user's way that -m fast might be needed. I think there are several issues here: 1. Does pg_ctl give a clear indication of the outcome of a failed smart mode shutdown? 2. Is the current smart shutdown mode behavior useful? 3. Should the default shutdown mode be changed to fast? I think felix mainly complained about (1), and that's what I was talking about as well. The current message (I have only an 8.3 handy, but I don't imagine this has changed much) is: pg_ctl stop -t5 waiting for server to shut down failed pg_ctl: server does not shut down This leaves out crucial information (namely, but it will stop accepting new connections and shut down when all current connections are closed). It seems like something along those lines should be added to the error message, or perhaps at least to pg_ctl documentation. Currently, the docs page ( http://www.postgresql.org/docs/current/static/app-pg-ctl.html ) only hints at this, and pg_ctl --help does not really mention this at all. Of the two other issues, (3) seems reasonable (I have no strong feelings there either way), and (2) is probably a moot point (the behavior won't change in a backward-incompatible manner now, and if it's dethroned as default, that doesn't really matter). --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
Marti Raudsepp ma...@juffo.org wrote: I couldn't find any past discussions about changing the default to fast. It's not entirely unrelated to the Linux LSB init script in August and September of 1009: http://archives.postgresql.org/pgsql-hackers/2009-08/msg01843.php http://archives.postgresql.org/pgsql-hackers/2009-09/msg01963.php -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
Marti Raudsepp wrote: I couldn't find any past discussions about changing the default to fast. Are there any reasons why that cannot be done in a future release? Well, it won't actually help as much as you might think. It's possible for clients to be in a state where fast shutdown doesn't work, either. You either have to kill them manually or use an immediate shutdown. Kevin and I both suggested a fast plus timeout then immediate behavior is what many users seem to want. My comments were at http://archives.postgresql.org/pgsql-hackers/2009-09/msg01145.php ; for an example of how fast shutdown can fail see http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Tue, Feb 8, 2011 at 22:09, Greg Smith g...@2ndquadrant.com wrote: Kevin and I both suggested a fast plus timeout then immediate behavior is what many users seem to want. My comments were at http://archives.postgresql.org/pgsql-hackers/2009-09/msg01145.php ; for an example of how fast shutdown can fail see http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php True, I've hit that a few times too. Seems that a better solution would be implementing a new -m option that does this transparently? Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
Marti Raudsepp ma...@juffo.org wrote: Greg Smith g...@2ndquadrant.com wrote: Kevin and I both suggested a fast plus timeout then immediate behavior is what many users seem to want. Seems that a better solution would be implementing a new -m option that does this transparently? Maybe. Another option might be to use -t or some new switch (or -t in combination with some new switch) as a time limit before escalating to the next shutdown mode. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Tue, Feb 8, 2011 at 1:09 PM, Greg Smith g...@2ndquadrant.com wrote: Marti Raudsepp wrote: I couldn't find any past discussions about changing the default to fast. Are there any reasons why that cannot be done in a future release? Kevin and I both suggested a fast plus timeout then immediate behavior is what many users seem to want. My comments were at http://archives.postgresql.org/pgsql-hackers/2009-09/msg01145.php ; for an example of how fast shutdown can fail see http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php Are there any settings in postgresql.conf that would make it unsafe to use -m immediate? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
Scott Marlowe scott.marl...@gmail.com wrote: Greg Smith g...@2ndquadrant.com wrote: Kevin and I both suggested a fast plus timeout then immediate behavior is what many users seem to want. Are there any settings in postgresql.conf that would make it unsafe to use -m immediate? I don't think so. There could definitely be problems if someone cuts power before your shutdown completes, though. (I hear that those firefighters like to cut power to a building before they grab those big brass nozzles to spray a stream of water into a building. Go figure...) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
Scott Marlowe wrote: Are there any settings in postgresql.conf that would make it unsafe to use -m immediate? Two concerns: -Clients will be killed without any review, and data related to them lost -The server will have to go through recovery to start back up again, which could potentially take a long time. If you manage a successful shutdown that doesn't happen. Shouldn't be unsafe, just has those issues. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Tue, Feb 8, 2011 at 3:08 PM, Greg Smith g...@2ndquadrant.com wrote: Scott Marlowe wrote: Are there any settings in postgresql.conf that would make it unsafe to use -m immediate? Two concerns: -Clients will be killed without any review, and data related to them lost -The server will have to go through recovery to start back up again, which could potentially take a long time. If you manage a successful shutdown that doesn't happen. Shouldn't be unsafe, just has those issues. Good, I was kinda worried about full_page_writes being off or fsync or something like that being a problem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Mon, Feb 7, 2011 at 05:03, Craig Ringer cr...@postnewspapers.com.au wrote: What would possibly help would be if Pg could fall back to lower shared_buffers automatically, screaming about it in the logs but still launching. OTOH, many people don't check the logs, so they'd think their new setting had taken effect and it hadn't - you've traded one usability problem for another. Even if Pg issued WARNING messages to each client that connected, lots of (non-psql) clients don't display them, so many users would never know. Do you have a suggestion about how to do this better? The current approach is known to be rather unlovely, but nobody's come up with a better one that works reasonably and doesn't trample on other System V shared memory users that may exist on the system. We could do something similar to what Apache does -- provide distros with a binary to check the configuration file in advance. This check program is launched before the restart command, and if it fails, the server is not restarted. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
+1 this is exactly what I was looking for at the time: a -t (configtest) option to pg_ctl and I think it should fall back to lower shared buffers and log it. SHOW ALL; would show the used value On Mon, Feb 7, 2011 at 11:30 AM, Marti Raudsepp ma...@juffo.org wrote: On Mon, Feb 7, 2011 at 05:03, Craig Ringer cr...@postnewspapers.com.au wrote: What would possibly help would be if Pg could fall back to lower shared_buffers automatically, screaming about it in the logs but still launching. OTOH, many people don't check the logs, so they'd think their new setting had taken effect and it hadn't - you've traded one usability problem for another. Even if Pg issued WARNING messages to each client that connected, lots of (non-psql) clients don't display them, so many users would never know. Do you have a suggestion about how to do this better? The current approach is known to be rather unlovely, but nobody's come up with a better one that works reasonably and doesn't trample on other System V shared memory users that may exist on the system. We could do something similar to what Apache does -- provide distros with a binary to check the configuration file in advance. This check program is launched before the restart command, and if it fails, the server is not restarted. Regards, Marti
Re: [PERFORM] Really really slow select count(*)
On Mon, Feb 7, 2011 at 8:05 AM, felix crucialfe...@gmail.com wrote: +1 this is exactly what I was looking for at the time: a -t (configtest) option to pg_ctl and I think it should fall back to lower shared buffers and log it. SHOW ALL; would show the used value however, much like apache, this might not have gotten caught. In order to catch it we'd have to see how much shared mem was available, and I think you have to actually allocate it to find out if you can. Since pg is already running, allocating shared_buffers / fsm twice might fail when allocating it once would succeed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
Craig Ringer wrote: What would possibly help would be if Pg could fall back to lower shared_buffers automatically, screaming about it in the logs but still launching. This is exactly what initdb does when it produces an initial setting for shared_buffers that goes into the postgresql.conf file. It wouldn't be hard to move that same logic into a loop that executed when startup failed to allocated enough memory. There are two problems here, one almost solved, the other more philosphical. It used to be that max_fsm_pages and wal_buffers could be large enough components to the allocation that reducing them might actually be a necessary fix, too. With the removal of the former and a method to automatically set the latter now available, the remaining components to the shared memory sizing computation are probably possible to try and fix automatically if the kernel limits are too low. But it's unclear whether running in a degraded mode, where performance might be terrible, with only a log message is preferrable to stopping and forcing the DBA's attention toward the mistake that was made immediately. Log files get rotated out, and it's not hard to imagine this problem coming to haunt someone only a month or two later--by which time the change to shared_buffers is long forgotten, and the log message complaining about it lost too. Accordingly I would expect any serious attempt to add some auto-reduction behavior to be beset with argument, and I'd never consider writing such a thing as a result. Too many non-controversial things I could work on instead. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On 02/07/2011 06:30 PM, Marti Raudsepp wrote: On Mon, Feb 7, 2011 at 05:03, Craig Ringercr...@postnewspapers.com.au wrote: What would possibly help would be if Pg could fall back to lower shared_buffers automatically, screaming about it in the logs but still launching. OTOH, many people don't check the logs, so they'd think their new setting had taken effect and it hadn't - you've traded one usability problem for another. Even if Pg issued WARNING messages to each client that connected, lots of (non-psql) clients don't display them, so many users would never know. Do you have a suggestion about how to do this better? The current approach is known to be rather unlovely, but nobody's come up with a better one that works reasonably and doesn't trample on other System V shared memory users that may exist on the system. We could do something similar to what Apache does -- provide distros with a binary to check the configuration file in advance. This check program is launched before the restart command, and if it fails, the server is not restarted. That would work for config file errors (and would probably be a good idea) but won't help with bad shared memory configuration. When Pg is already running, it's usually not possible for a test program to claim the amount of shared memory the config file says to allocate, because Pg is already using it. Nonetheless, Pg will work fine when restarted. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On 02/08/2011 03:05 AM, Greg Smith wrote: Accordingly I would expect any serious attempt to add some auto-reduction behavior to be beset with argument, and I'd never consider writing such a thing as a result. Too many non-controversial things I could work on instead. Yep. I expressed my own doubts in the post I suggested that in. If Pg did auto-correct down, it'd be necessary to scream about it angrily and continuously, not just once during startup. Given that it's clear many people never even look at the logs (what logs? where are they?) I think Pg would also have to send notices to the client. Problem is, many clients don't process notices/warnings, so particularly slack admins won't see that either. I'm not particularly excited about the idea. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Mon, Feb 7, 2011 at 6:05 AM, Shaun Thomas stho...@peak6.com wrote: That’s one of the things I talked about. To be safe, PG will start to shut down but disallow new connections, and **that’s all**. Old connections are grandfathered in until they disconnect, and when they all go away, it shuts down gracefully. Well it said Failed to shutdown ... and then returned control. and then proceeded to run for about an hour. I'm not sure how graceful that is. I generally take programs at their word. Failed is clearly past tense. So far as your Django install, have you activated the memcache contrib. module? Your pages should be lazy-caching and rarely depend on the DB, if they can. yes thanks my web app is very finely tuned and is working splendidly. I've been working on very large sites sites since 1998 and this client has been with me for 10 years already. its a fairly high traffic site. I've only been using postgres since we migrated in May but it is one particular table on postgres that has shit the sock drawer. You should also rarely be doing count(*) on a 300k row table, even if everything is cached and speedy. I'm not this is a test query that is obviously way out of bounds for acceptable response. there is something very very wrong with this table and I need to solve it ASAP. other tables that have less updates but similar sizes are not having this problem. there are foreign keys pointing to this table so its a bit tricky to just refill it, but I can think of one way. I'll have to do that. its only conjecture that the issue is file space bloat or free map problems. those are overall issues that I will get to as soon as I can. but this is table specific. That’s an application design issue you need to address before it’s too late, or you have to rush and implement a hasty fix. it is not an application design issue, though there are always improvements being made. Being a DBA sucks sometimes. J I am not a DBA, I'm just trying to query a 300k row table. though I am happy to learn more. I know an awful lot about a lot of things. but you can't specialize in everything
Re: [PERFORM] Really really slow select count(*)
On Mon, Feb 7, 2011 at 8:17 PM, felix crucialfe...@gmail.com wrote: On Mon, Feb 7, 2011 at 6:05 AM, Shaun Thomas stho...@peak6.com wrote: That’s one of the things I talked about. To be safe, PG will start to shut down but disallow new connections, and *that’s all*. Old connections are grandfathered in until they disconnect, and when they all go away, it shuts down gracefully. Well it said Failed to shutdown ... and then returned control. and then proceeded to run for about an hour. I'm not sure how graceful that is. I generally take programs at their word. Failed is clearly past tense. I agree that here what pg_ctl said and what it didn't aren't exactly the same thing. but it is one particular table on postgres that has shit the sock drawer. What queries are running slow, and what does explain analyze have to say about them? You should also rarely be doing count(*) on a 300k row table, even if everything is cached and speedy. I'm not this is a test query that is obviously way out of bounds for acceptable response. there is something very very wrong with this table and I need to solve it ASAP. other tables that have less updates but similar sizes are not having this problem. Is this the same problem you had at the beginning and were trying to fix with clustering and increasing fsm, or is this now a different table and a different problem? there are foreign keys pointing to this table so its a bit tricky to just refill it, but I can think of one way. I'll have to do that. its only conjecture that the issue is file space bloat or free map problems. those are overall issues that I will get to as soon as I can. but this is table specific. What does the query you ran before that shows bloat show on this table now? That’s an application design issue you need to address before it’s too late, or you have to rush and implement a hasty fix. it is not an application design issue, though there are always improvements being made. If your application is doing select count(*) with either no where clause or with a very non-selective one, then it is somewhat of a design issue, and there are ways to make that faster. if it's a different query, show us what it and its explain analyze look like. Being a DBA sucks sometimes. J I am not a DBA, I'm just trying to query a 300k row table. though I am happy to learn more. I know an awful lot about a lot of things. but you can't specialize in everything Well the good news is that there's a LOT less arcana involved in keep pgsql happy than there is in keeping something like Oracle happy. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
BRUTAL http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html max_fsm_pages See Section 17.4.1http://www.postgresql.org/docs/8.3/interactive/kernel-resources.html#SYSVIPC for information on how to adjust those parameters, if necessary. I see absolutely nothing in there about how to set those parameters. several hours later ( where is my data directory ? 8.4 shows it in SHOW ALL; 8.3 does not. conf files ? in the data directory no, its in /etc/postgres/8.3/main where is pg_ctl ? what user do I need to be ? postgres then why was it installed in the home dir of a user that does not have permissions to use it ?? ) cd /home/crucial/bin /home/crucial/bin/pg_ctl -D /var/lib/postgresql/8.3/main reload reload does not reset max_fsm_pages, I need to actually restart the server. postgres@nestseekers:/home/crucial/bin$ /home/crucial/bin/pg_ctl -D /var/lib/postgresql/8.3/main restart waiting for server to shut down... failed pg_ctl: server does not shut down OK, my mistake. probably I have to disconnect all clients. I don't want to do a planned maintenance right now. so I go to sleep the server restarts itself an hour later. but no, it fails to restart because this memory setting you recommend is not possible without reconfiguring the kernel. postgres@nestseekers:/home/crucial/bin$ 2011-02-06 05:18:00 EST LOG: could not load root certificate file root.crt: No such file or directory 2011-02-06 05:18:00 EST DETAIL: Will not verify client certificates. 2011-02-06 05:18:00 EST FATAL: could not create shared memory segment: Invalid argument 2011-02-06 05:18:00 EST DETAIL: Failed system call was shmget(key=5432001, size=35463168, 03600). 2011-02-06 05:18:00 EST HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 35463168 bytes), reduce PostgreSQL's shared_buffers parameter (currently 3072) and/or its max_connections parameter (currently 103). If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for. The PostgreSQL documentation contains more information about shared memory configuration. ^C *and the website is down for the next 6 hours while I sleep.* total disaster after a few tries I get it to take an max_fsm_pages of 300k postgres@nestseekers:/home/crucial/bin$ 2011-02-06 05:19:26 EST LOG: could not load root certificate file root.crt: No such file or directory 2011-02-06 05:19:26 EST DETAIL: Will not verify client certificates. 2011-02-06 05:19:26 EST LOG: database system was shut down at 2011-02-06 00:07:41 EST 2011-02-06 05:19:27 EST LOG: autovacuum launcher started 2011-02-06 05:19:27 EST LOG: database system is ready to accept connections ^C 2011-02-06 05:33:45 EST LOG: checkpoints are occurring too frequently (21 seconds apart) 2011-02-06 05:33:45 EST HINT: Consider increasing the configuration parameter checkpoint_segments. ?? From my perspective: the defaults for postgres 8.3 result in a database that does not scale and fails dramatically after 6 months. changing that default is brutally difficult and can only really be done by adjusting something in the kernel. I have clustered that table, its still unbelievably slow. I still don't know if this bloat due to the small free space map has anything to do with why the table is performing like this. On Fri, Feb 4, 2011 at 5:35 PM, Shaun Thomas stho...@peak6.com wrote: You can stop the bloating by setting the right max_fsm_pages setting, but you'll either have to go through and VACUUM FULL every table in your database, or dump/restore to regain all the lost space and performance (the later would actually be faster). Before I even touch an older PostgreSQL DB, I set it to some value over 3-million just as a starting value to be on the safe side. A little used memory is a small price to pay for stopping gradual expansion.
Re: [PERFORM] Really really slow select count(*)
On Sun, Feb 06, 2011 at 11:48:50AM +0100, felix wrote: BRUTAL Did the changes work in your test environment? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Sun, Feb 6, 2011 at 3:48 AM, felix crucialfe...@gmail.com wrote: BRUTAL SNIP OK, my mistake. probably I have to disconnect all clients. I don't want to do a planned maintenance right now. so I go to sleep the server restarts itself an hour later. but no, it fails to restart because this memory setting you recommend is not possible without reconfiguring the kernel. SNIP and the website is down for the next 6 hours while I sleep. total disaster Let's review: 1: No test or staging system used before production 2: DB left in an unknown state (trying to shut down, not able) 3: No monitoring software to tell you when the site is down 4: I'm gonna just go ahead and guess no backups were taken either, or are regularly taken. This website can't be very important, if that's the way you treat it. Number 1 up there becomes even worse because it was your first time trying to make this particular change in Postgresql. If it is important, you need to learn how to start treating it that way. Even the most junior of sys admins or developers I work with know we test it a couple times outside of production before just trying it there. And my phone starts complaining a minute after the site stops responding if something does go wrong the rest of the time. Do not lay this at anyone else's feet. From my perspective: the defaults for postgres 8.3 result in a database that does not scale and fails dramatically after 6 months. Agreed. Welcome to using shared memory and the ridiculously low defaults on most flavors of unix or linux. changing that default is brutally difficult and can only really be done by adjusting something in the kernel. Please, that's a gross exaggeration. The sum totoal to changing them is: run sysctl -a|grep shm copy out proper lines to cahnge edit sysctl.conf put new lines in there with changes sudo sysctl -p # applies changes edit the appropriate postgresql.conf, make changes sudo /etc/init.d/postgresql-8.3 stop sudo /etc/init.d/postgresql-8.3 start I have clustered that table, its still unbelievably slow. Did you actually delete the old entries before clustering it? if it's still got 4G of old sessions or whatever in it, clustering ain't gonna help. I still don't know if this bloat due to the small free space map has anything to do with why the table is performing like this. Since you haven't show us what changes, if any, have happened to the table, neither do we :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Fri, Feb 4, 2011 at 1:14 PM, felix crucialfe...@gmail.com wrote: schema_name | table_name | row_count | mb_used | total_mb_used -+--+-+-+--- public | django_session | 1.47843e+07 | 4122 | 18832 So does this row still have 15M rows in it? Any old ones you can delete, then run cluster on the table? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
I have clustered that table, its still unbelievably slow. Did you actually delete the old entries before clustering it? if it's still got 4G of old sessions or whatever in it, clustering ain't gonna help. Also, IMHO it is a lot better to store sessions in something like memcached, rather than imposing this rather large load on the main database... PS : if your site has been down for 6 hours, you can TRUNCATE your sessions table... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Sun, Feb 6, 2011 at 12:19 PM, Pierre C li...@peufeu.com wrote: I have clustered that table, its still unbelievably slow. Did you actually delete the old entries before clustering it? if it's still got 4G of old sessions or whatever in it, clustering ain't gonna help. Also, IMHO it is a lot better to store sessions in something like memcached, rather than imposing this rather large load on the main database... PS : if your site has been down for 6 hours, you can TRUNCATE your sessions table... Agreed. When I started where I am sessions were on pg and falling over all the time. Because I couldn't change it at the time, I was forced to make autovac MUCH more aggressive. I didn't have to crank up fsm a lot really but did a bit. Then just ran a vacuum full / reindex across the sessions table and everything was fine after that. But we could handle 100x time the load for sessions with memcached I bet. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Sun, Feb 6, 2011 at 4:23 PM, Scott Marlowe scott.marl...@gmail.comwrote: Let's review: 1: No test or staging system used before production no, I do not have a full ubuntu machine replicating the exact memory and application load of the production server. this was changing one configuration parameter. something I was advised to do, read about quite a bit, tested on my development server (mac) and then proceeded to do at 6 am on Sunday morning, our slowest time. 2: DB left in an unknown state (trying to shut down, not able) what ? I checked the site, everything was normal. I went in via psql and tried some queries for about half an hour and continued to monitor the site. then I went to bed at 7am (EU time). Why did it shutdown so much later ? I have never restarted postgres before, so this was all new to me. I apologize that I wasn't born innately with such knowledge. So is it normal for postgres to report that it failed to shut down, operate for an hour and then go ahead and restart itself ? 3: No monitoring software to tell you when the site is down of course I have monitoring software. both external and internal. but it doesn't come and kick me out of bed. yes, I need an automated cel phone call. that was the first thing I saw to afterwards. 4: I'm gonna just go ahead and guess no backups were taken either, or are regularly taken. WTF ? of course I have backups. I just went through a very harsh down period event. I fail to see why it is now necessary for you to launch such an attack on me. Perhaps the tone of my post sounded like I was blaming you, or at least you felt that way. Why do you feel that way ? Why not respond with: ouch ! did you check this ... that say something nice and helpful. correct my mistakes This website can't be very important, if that's the way you treat it. just to let you know, that is straight up offensive This is high traffic real estate site. Downtime is unacceptable. I had less downtime than this when I migrated to the new platform. I spent rather a large amount of time reading and questioning here. I asked many questions for clarification and didn't do ANYTHING until I was sure it was the correct solution. I didn't just pull some shit off a blog and start changing settings at random. I double checked opinions against different people and I searched for more docs on that param. Amazingly none of the ones I found commented on the shared memory issue and I didn't even understand the docs discussing shared memory because it didn't seem to apply to what I was doing. that's my misunderstanding. I come her to share my misunderstanding. And my phone starts complaining a minute after the site stops responding if something does go wrong the rest of the time. Do not lay this at anyone else's feet. I didn't. There is not even the slightest hint of that in my post. I came here and posted the details of where I went wrong and what confused me about the documentation that I followed. That's so other people can follow it and so somebody here can comment on it. changing that default is brutally difficult and can only really be done by adjusting something in the kernel. Please, that's a gross exaggeration. The sum totoal to changing them is: run sysctl -a|grep shm copy out proper lines to cahnge edit sysctl.conf put new lines in there with changes sudo sysctl -p # applies changes edit the appropriate postgresql.conf, make changes sudo /etc/init.d/postgresql-8.3 stop sudo /etc/init.d/postgresql-8.3 start Considering how splendidly the experiment with changing fsm_max_pages went, I think you can understand that I have no desire to experiment with kernel settings. It is easy for you because you ALREADY KNOW everything involved. I am not a sysadmin and we don't have one. My apologies for that. so does the above mean that I don't have to restart the entire server, just postgres ? I assumed that changing kernel settings means rebooting the server. I have clustered that table, its still unbelievably slow. Did you actually delete the old entries before clustering it? if it's still got 4G of old sessions or whatever in it, clustering ain't gonna help. its a different table. the problem one has only 300k rows the problem is not the size, the problem is the speed is catastrophic I still don't know if this bloat due to the small free space map has anything to do with why the table is performing like this. Since you haven't show us what changes, if any, have happened to the table, neither do we :) sorry, it didn't seem to be the most important topic when I got out of bed
Re: [PERFORM] Really really slow select count(*)
yeah, it already uses memcached with db save. nothing important in session anyway the session table is not the issue and I never clustered that one or ever will thanks for the tip, also the other one about HOT On Sun, Feb 6, 2011 at 8:19 PM, Pierre C li...@peufeu.com wrote: I have clustered that table, its still unbelievably slow. Did you actually delete the old entries before clustering it? if it's still got 4G of old sessions or whatever in it, clustering ain't gonna help. Also, IMHO it is a lot better to store sessions in something like memcached, rather than imposing this rather large load on the main database... PS : if your site has been down for 6 hours, you can TRUNCATE your sessions table...
Re: [PERFORM] Really really slow select count(*)
On 07/02/11 09:52, felix wrote: So is it normal for postgres to report that it failed to shut down, operate for an hour and then go ahead and restart itself ? That's pretty wacky. Did you shut it down via pg_ctl or using an init script / service command in your OS? It shouldn't matter, but it'd be good to know. If the problem is with an init script, then knowing which OS and version you're on would help. If it was with psql directly, that's something that can be looked into. this was changing one configuration parameter. something I was advised to do, read about quite a bit, tested on my development server (mac) and then proceeded to do at 6 am on Sunday morning, our slowest time. System V shared memory is awful - but it's really the only reasonable alternative for a multi-process (rather than multi-threaded) server. PostgreSQL could use mmap()ed temp files, but that'd add additional overheads and they'd potentially get flushed from main memory unless the memory was mlock()ed. As mlock() has similar limits and configuration methods to system V shared memory, you get back to the same problem in a slightly different form. What would possibly help would be if Pg could fall back to lower shared_buffers automatically, screaming about it in the logs but still launching. OTOH, many people don't check the logs, so they'd think their new setting had taken effect and it hadn't - you've traded one usability problem for another. Even if Pg issued WARNING messages to each client that connected, lots of (non-psql) clients don't display them, so many users would never know. Do you have a suggestion about how to do this better? The current approach is known to be rather unlovely, but nobody's come up with a better one that works reasonably and doesn't trample on other System V shared memory users that may exist on the system. so does the above mean that I don't have to restart the entire server, just postgres ? I assumed that changing kernel settings means rebooting the server. Nope. sysctl settings like shmmax may be changed on the fly. -- System Network Administrator POST Newspapers -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Sun, Feb 6, 2011 at 6:52 PM, felix crucialfe...@gmail.com wrote: On Sun, Feb 6, 2011 at 4:23 PM, Scott Marlowe scott.marl...@gmail.com wrote: Let's review: 1: No test or staging system used before production no, I do not have a full ubuntu machine replicating the exact memory and application load of the production server. this was changing one configuration parameter. something I was advised to do, read about quite a bit, tested on my development server (mac) and then proceeded to do at 6 am on Sunday morning, our slowest time. I would strongly suggest you at least test these changes out elsewhere. It doesn't have to exactly match, but if you had a machine that was even close to test on you'd have known what to expect. Virtual machines are dirt simple to set up now. So not having one inexcusable. 2: DB left in an unknown state (trying to shut down, not able) what ? You told it to restart, which is a stop and a start. It didn't stop. It was in an unknown state. With settings in its config file you didn't know whether or not they worked because you hadn't tested them already on somthing similar. Why did it shutdown so much later ? Because that's when the last open connection from before when you told it to shutdown / restart. I have never restarted postgres before, so this was all new to me. Which is why you use a virtual machine to build a test lab so you CAN make these changes somewhere other than produciton. I apologize that I wasn't born innately with such knowledge. Guess what!? Neither was I! I do however know how to setup a test system so I don't test things on my production machine. So is it normal for postgres to report that it failed to shut down, operate for an hour and then go ahead and restart itself ? Yes. It eventually finished your restart you told it to do. 3: No monitoring software to tell you when the site is down of course I have monitoring software. both external and internal. but it doesn't come and kick me out of bed. yes, I need an automated cel phone call. that was the first thing I saw to afterwards. Monitoring software that can't send you emails when things break is in need of having that feature enabled. 4: I'm gonna just go ahead and guess no backups were taken either, or are regularly taken. WTF ? of course I have backups. I just went through a very harsh down period event. I fail to see why it is now necessary for you to launch such an attack on me. No, it just seemed like your admin skills were pretty sloppy, so a lack of a backup wouldn't surprise me. Perhaps the tone of my post sounded like I was blaming you, or at least you felt that way. It felt more like you were blaming PostgreSQL for being overly complex, but I wasn't taking it all that personally. Why do you feel that way ? I don't. Why not respond with: ouch ! did you check this ... that say something nice and helpful. correct my mistakes I'd be glad to, but your message wasn't looking for help. go back and read it. It's one long complaint. This website can't be very important, if that's the way you treat it. just to let you know, that is straight up offensive Really? I'd say performing maintenance with no plan or pre-testing is far more offensive. This is high traffic real estate site. Downtime is unacceptable. I had less downtime than this when I migrated to the new platform. I expect you did more planning an testing? I spent rather a large amount of time reading and questioning here. I asked many questions for clarification and didn't do ANYTHING until I was sure it was the correct solution. I didn't just pull some shit off a blog and start changing settings at random. But yet you failed to test it on even the simplest similar system setup. And so you lacked the practical knowledge of how to make this change in production safely. I double checked opinions against different people and I searched for more docs on that param. Amazingly none of the ones I found commented on the shared memory issue and I didn't even understand the docs discussing shared memory because it didn't seem to apply to what I was doing. that's my misunderstanding. I come her to share my misunderstanding. Well, that's useful. And I can see where there could be some changes made to the docs or a user friendly howto on how to increase shared memory and fsm and all that. Please, that's a gross exaggeration. The sum totoal to changing them is: run sysctl -a|grep shm copy out proper lines to cahnge edit sysctl.conf put new lines in there with changes sudo sysctl -p # applies changes edit the appropriate postgresql.conf, make changes sudo /etc/init.d/postgresql-8.3 stop sudo /etc/init.d/postgresql-8.3 start Considering how splendidly the experiment with changing fsm_max_pages went, I think you can understand that I have no desire to experiment with kernel settings. Experimenting is what you do on a test machine,
Re: [PERFORM] Really really slow select count(*)
You really got screwed by the default settings. You don’t actually need to “hack” the kernel, but you do have to make these changes, because the amount of memory PG has on your system is laughable. That might actually be the majority of your problem. In your /etc/sysctl.conf, you need these lines: kernel.shmmax = 68719476736 kernel.shmall = 4294967296 Then you need to run sysctl -p These changes can only be made as root, by the way. That will give you more than enough shared memory to restart PG. But it also tells me you’re using the default memory settings. If you have more than 4GB on that system, you need to set shared_buffers to 1G or so. In addition, you need to bump your effective_cache_size to something representing the remaining inode cache in your system. Run ‘free’ to see that. You also need to know something about unix systems. If you’re running an ubuntu system, your control files are in /etc/init.d, and you can invoke them with: service pg_cluster restart or the more ghetto: /etc/init.d/pg_cluster restart It may also be named postgres, postgresql, or some other variant. The problem you’ll run into with this is that PG tries to play nice, so it’ll wait for all connections to disconnect before it shuts down to restart. That means, of course, you need to do a fast shutdown, which forces all connections to disconnect, but the service control script won’t do that. So you’re left with the pg_ctl command again. pg_ctl –D /my/pg/dir –m fast And yeah, your checkpoint segments probably are too low. Based on your session table, you should probably have that at 25 or higher. But that’s part of the point. I highly recommend you scan around Google for pages on optimizing PostgreSQL installs. These are pretty much covered in all of them. Fixing the shmall and shmax kernel settings are also pretty well known in database circles, because they really are set to ridiculously low defaults for any machine that may eventually be a server of anything. I was surprised it blocked the memory request for the max_fsm_pages setting, but that just proves your system was unoptimized in several different ways that may have been slowing down your count(*) statements, among other things. Please, for your own sanity and the safety of your systems, look this stuff up to the point you can do most of it without looking. You can clearly do well, because you picked your way through the manuals to know about the kernel settings, and that you could call pg_ctl, and so on. __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
Re: [PERFORM] Really really slow select count(*)
I checked the site, everything was normal. I went in via psql and tried some queries for about half an hour and continued to monitor the site. then I went to bed at 7am (EU time). Why did it shutdown so much later ? That’s one of the things I talked about. To be safe, PG will start to shut down but disallow new connections, and *that’s all*. Old connections are grandfathered in until they disconnect, and when they all go away, it shuts down gracefully. pg_ctl –D /my/pg/dir stop –m fast pg_ctl –D /my/pg/dir start Is what you wanted. I have never restarted postgres before, so this was all new to me. I apologize that I wasn't born innately with such knowledge. Forget about it. But you need to learn your tools. Restarting the DB server is something you’ll need to do occasionally. Just like restarting your Django proxy or app. You need to be fully knowledgeable about every part of your tool-chain, or at least the parts you’re responsible for. I double checked opinions against different people and I searched for more docs on that param. Amazingly none of the ones I found commented on the shared memory issue and I didn't even understand the docs discussing shared memory because it didn't seem to apply to what I was doing. That’s no coincidence. I’ve seen that complaint if you increase shared_buffers, but not for max_fsm_pages. I guess I’m so used to bumping up shmmax and shmall that I forget how low default systems leave those values. But you do need to increase them. Every time. They’re crippling your install in more ways than just postgres. So far as your Django install, have you activated the memcache contrib. module? Your pages should be lazy-caching and rarely depend on the DB, if they can. You should also rarely be doing count(*) on a 300k row table, even if everything is cached and speedy. 300k row tables have nasty habits of becoming 3M row tables (or more) after enough time, and no amount of cache will save you from counting that. It’ll take 1 second or more every time eventually, and then you’ll be in real trouble. That’s an application design issue you need to address before it’s too late, or you have to rush and implement a hasty fix. I suggest setting your log_min_duration to 1000, so every query that takes longer than 1 second to execute is logged in your postgres logs. You can use that to track down trouble spots before they get really bad. That’s normally aggressive enough to catch the real problem queries without flooding your logs with too much output. Being a DBA sucks sometimes. ☺ __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
Re: [PERFORM] Really really slow select count(*)
felix wrote: So is it normal for postgres to report that it failed to shut down, operate for an hour and then go ahead and restart itself ? You've already gotten a few explanations for why waiting for connections can cause this. I'll only add that it is critical to be watching the database log file when doing work like this with PostgreSQL. Go back and check it if you still have the data from when your problematic restart attempt happened, normally you'll get some warnings about it starting to shutdown. Try to look for the actual server shutdown message and then the restart one after doing this sort of thing. If you don't see them when you do this again, you'll know something unexpected is happening, and then to look into what that is. Also, as a general downtime commentary born from years of being the receiving end of outages, I'd recommend against ever doing any server maintenance operation for the first time just before bedtime. While that may be convienent from a less users are using the site perspective, the downside is what you've seen here: mistakes can mean rather extended outages. Better to get up early and do this sort of thing instead, so you can watch the site afterwards for a few hours to make sure nothing is broken. For similar reasons I try to avoid ever doing major changes on a Friday. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Fri, 04 Feb 2011 21:37:56 +0100, Shaun Thomas stho...@peak6.com wrote: On 02/04/2011 02:14 PM, felix wrote: oh and there in the footnotes to django they say dont' forget to run the delete expired sessions management every once in a while. thanks guys. Oh Django... :) it won't run now because its too big, I can delete them from psql though You might be better off deleting the inverse. You know, start a transaction, select all the sessions that *aren't* expired, truncate the table, insert them back into the session table, and commit. Note that for a session table, that is updated very often, you can use the postgres' HOT feature which will create a lot less dead rows. Look it up in the docs. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Fri, Feb 04, 2011 at 03:46:35PM +0100, felix wrote: directly after REINDEX and ANALYZE: Aggregate (cost=62348.70..62348.71 rows=1 width=0) (actual time=15830.000..15830.000 rows=1 loops=1) - Seq Scan on fastadder_fastadderstatus (cost=0.00..61613.16 rows=294216 width=0) (actual time=30.000..15570.000 rows=302479 loops=1) Total runtime: 15830.000 ms do run vacuum of the table. reindex doesn't matter for seq scans, and analyze, while can help choose different plan - will not help here anyway. Best regards, depesz -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
felix wrote: explain analyze select count(*) from fastadder_fastadderstatus; Aggregate (cost=62458.73..62458.74 rows=1 width=0) (actual time=77130.000..77130.000 rows=1 loops=1) - Seq Scan on fastadder_fastadderstatus (cost=0.00..61701.18 rows=303018 width=0) (actual time=50.000..76930.000 rows=302479 loops=1) Total runtime: *77250.000 ms* PostgreSQL version? If you're running on 8.3 or earlier, I would be suspicous that your Free Space Map has been overrun. What you are seeing is that the table itself is much larger on disk than it's supposed to be. That can be caused by frequent UPDATEs if you don't have vacuum cleanup working effectively, you'll get lots of dead sections left behind from UPDATEs in the middle. The best way to fix all this is to run CLUSTER on the table. That will introduce a bit of downtime while it holds a lock on the table (only a few minutes based on what you've shown here), but the copy you'll have afterwards won't be spread all over disk anymore. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books
Re: [PERFORM] Really really slow select count(*)
On 02/04/2011 08:46 AM, felix wrote: explain analyze select count(*) from fastadder_fastadderstatus; Aggregate (cost=62458.73..62458.74 rows=1 width=0) (actual time=77130.000..77130.000 rows=1 loops=1) - Seq Scan on fastadder_fastadderstatus (cost=0.00..61701.18 rows=303018 width=0) (actual time=50.000..76930.000 rows=302479 loops=1) Total runtime: *77250.000 ms* How big is this table when it's acting all bloated and ugly? SELECT relpages*8/1024 FROM pg_class WHERE relname='fastadder_fastadderstatus'; That's the number of MB it's taking up that would immediately affect a count statement. directly after REINDEX and ANALYZE: Aggregate (cost=62348.70..62348.71 rows=1 width=0) (actual time=15830.000..15830.000 rows=1 loops=1) - Seq Scan on fastadder_fastadderstatus (cost=0.00..61613.16 rows=294216 width=0) (actual time=30.000..15570.000 rows=302479 loops=1) Total runtime: 15830.000 ms That probably put it into cache, explaining the difference, but yeah... that is pretty darn slow. Is this the only thing running when you're doing your tests? What does your disk IO look like? 10k-50k updates per day mostly of this sort: set priority=1 where id=12345 Well... that's up to 16% turnover per day, but even then, regular vacuuming should keep it manageable. I could rework the app to be more efficient and do updates using batches where id IN (1,2,3,4...) No. Don't do that. You'd be better off loading everything into a temp table and doing this: UPDATE fastadder_fastadderstatus s SET priority = 1 FROM temp_statuses t WHERE t.id=s.id; It's a better practice, but still doesn't really explain your performance issues. fastadder_fastadderstatus_pkey PRIMARY KEY, btree (id) fastadder_fastadderstatus_apt_id_key UNIQUE, btree (apt_id, service_id) fastadder_fastadderstatus_agent_priority btree (agent_priority) fastadder_fastadderstatus_apt_id btree (apt_id) fastadder_fastadderstatus_built btree (built) fastadder_fastadderstatus_last_checked btree (last_checked) fastadder_fastadderstatus_last_validated btree (last_validated) fastadder_fastadderstatus_position_in_queue btree (position_in_queue) fastadder_fastadderstatus_priority btree (priority) fastadder_fastadderstatus_running_status btree (running_status) fastadder_fastadderstatus_service_id btree (service_id) Whoh! Hold on, here. That looks like *way* too many indexes. Definitely will slow down your insert/update performance. The index on 'built' for example, is a boolean. If it's evenly distributed, that's 150k matches for true or false, rendering it useless, yet still requiring space and maintenance. I'm guessing the story is similar for quite a few of the others. It doesn't really explain your count speed, but it certainly isn't helping. Something seems fishy, here. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On 02/04/2011 08:56 AM, Greg Smith wrote: PostgreSQL version? If you're running on 8.3 or earlier, I would be suspicous that your Free Space Map has been overrun. That's my first inclination. If he says autovacuum is running, there's no way it should be bloating the table that much. Felix, If you're running a version before 8.4, what is your max_fsm_pages setting? If it's too low, autovacuum won't save you, and your tables will continue to grow daily unless you vacuum full regularly, and I wouldn't recommend that to my worst enemy. ;) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Fri, Feb 04, 2011 at 05:20:27PM +0100, felix wrote: reply was meant for the list -- Forwarded message -- From: felix crucialfe...@gmail.com Date: Fri, Feb 4, 2011 at 4:39 PM Subject: Re: [PERFORM] Really really slow select count(*) To: Greg Smith g...@2ndquadrant.com On Fri, Feb 4, 2011 at 3:56 PM, Greg Smith g...@2ndquadrant.com wrote: PostgreSQL version? If you're running on 8.3 or earlier, I would be suspicous that your Free Space Map has been overrun. 8.3 What you are seeing is that the table itself is much larger on disk than it's supposed to be. which part of the explain told you that ? shaun thomas SELECT relpages*8/1024 FROM pg_class WHERE relname='fastadder_fastadderstatus'; 458MB way too big. build_cache is text between 500-1k chars As has been suggested, you really need to CLUSTER the table to remove dead rows. VACUUM will not do that, VACUUM FULL will but will take a full table lock and then you would need to REINDEX to fix index bloat. CLUSTER will do this in one shot. You almost certainly have your free space map way too small, which is how you bloated in the first place. Cheers, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On 02/04/2011 10:17 AM, felix wrote: How big is this table when it's acting all bloated and ugly? 458MB Wow! There's no way a table with 300k records should be that big unless it's just full of text. 70-seconds seems like a really long time to read half a gig, but that might be because it's fighting for IO with other processes. For perspective, we have several 1-2 million row tables smaller than that. Heck, I have a 11-million row table that's only 30% larger. are updates of the where id IN (1,2,3,4) generally not efficient ? how about for select queries ? Well, IN is notorious for being inefficient. It's been getting better, but even EXISTS is a better bet than using IN. We've got a lot of stuff using IN here, and we're slowly phasing it out. Every time I get rid of it, things get faster. I actually just added most of those yesterday in an attempt to improve performance. priority and agent_priority were missing indexes and that was a big mistake. Haha. Well, that can always be true. Ironically one of the things you actually did by creating the indexes is create fast lookup values to circumvent your table bloat. It would help with anything except sequence scans, which you saw with your count query. ok, built True is in the minority. Ok, in that case, use a partial index. If a boolean value is only 1% of your table or something, why bother indexing the rest anyway? CREATE INDEX fastadder_fastadderstatus_built ON fastadder_fastadderstatus WHERE built; But only if it really is the vast minority. Check this way: SELECT built, count(1) FROM fastadder_fastadderstatus GROUP BY 1; We used one of these to ignore a status that was over 90% of the table, where the other statuses combined were less than 10%. The index was 10x smaller and much faster than before. If you know both booleans are used together often, you can combine them into a single index, again using a partial where it only indexes if both values are true. Much smaller, much faster index if it's more selective than the other indexes. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On 02/04/2011 10:03 AM, felix wrote: max_fsm_pages | 153600 | Sets the maximum number of disk pages for which free space is tracked. max_fsm_relations | 1000 | Sets the maximum number of tables and indexes for which free space is tracked. how do I determine the best size or if that's the problem ? Well, the best way is to run: vacuumdb -a -v -z vacuum.log And at the end of the log, it'll tell you how many pages it wants, and how many pages were available. From the sounds of your database, 150k is way too small. If a single table is getting 10-50k updates per day, it's a good chance a ton of other tables are getting similar traffic. With max_fsm_pages at that setting, any update beyond 150k effectively gets forgotten, and forgotten rows aren't reused by new inserts or updates. Your database has probably been slowly expanding for months without you realizing it. The tables that get the most turnover will be hit the hardest, as it sounds like what happened here. You can stop the bloating by setting the right max_fsm_pages setting, but you'll either have to go through and VACUUM FULL every table in your database, or dump/restore to regain all the lost space and performance (the later would actually be faster). Before I even touch an older PostgreSQL DB, I set it to some value over 3-million just as a starting value to be on the safe side. A little used memory is a small price to pay for stopping gradual expansion. Your reindex was a good idea. Indexes do sometimes need that. But your base tables need work too. Unless you're on 8.4 or above, auto_vacuum isn't enough. Just to share an anecdote, I was with a company about five years ago and they also used the default max_fsm_pages setting. Their DB had expanded to 40GB and was filling their disk, only a couple weeks before exhausting it. I set the max_fsm_pages setting to 2-million, set up a bunch of scripts to vacuum-full the tables from smallest to largest (to make enough space for the larger tables, you see) and the database ended up at less than 20GB. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Fri, Feb 4, 2011 at 5:35 PM, Shaun Thomas stho...@peak6.com wrote: vacuumdb -a -v -z vacuum.log And at the end of the log, it'll tell you how many pages it wants, and how many pages were available. this is the dev, not live. but this is after it gets done with that table: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing public.fastadder_fastadderstatus INFO: fastadder_fastadderstatus: scanned 2492 of 2492 pages, containing 154378 live rows and 0 dead rows; 3 rows in sample, 154378 estimated total rows and there's nothing at the end of the whole vacuum output about pages actual command: vacuumdb -U postgres -W -v -z djns4 vacuum.log I tried it with all databases too ? thanks
Re: [PERFORM] Really really slow select count(*)
On Fri, Feb 4, 2011 at 10:38 AM, felix crucialfe...@gmail.com wrote: On Fri, Feb 4, 2011 at 5:35 PM, Shaun Thomas stho...@peak6.com wrote: vacuumdb -a -v -z vacuum.log And at the end of the log, it'll tell you how many pages it wants, and how many pages were available. this is the dev, not live. but this is after it gets done with that table: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing public.fastadder_fastadderstatus INFO: fastadder_fastadderstatus: scanned 2492 of 2492 pages, containing 154378 live rows and 0 dead rows; 3 rows in sample, 154378 estimated total rows and there's nothing at the end of the whole vacuum output about pages actual command: vacuumdb -U postgres -W -v -z djns4 vacuum.log I tried it with all databases too I believe you have to run it on the whole db to get that output. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
vacuumdb -a -v -z -U postgres -W vacuum.log that's all, isn't it ? it did each db 8.3 in case that matters the very end: There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing public.seo_partnerlinkcategory INFO: seo_partnerlinkcategory: scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows On Fri, Feb 4, 2011 at 6:40 PM, Scott Marlowe scott.marl...@gmail.comwrote: I tried it with all databases too I believe you have to run it on the whole db to get that output.
Re: [PERFORM] Really really slow select count(*)
On 02/04/2011 11:38 AM, felix wrote: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing public.fastadder_fastadderstatus INFO: fastadder_fastadderstatus: scanned 2492 of 2492 pages, containing 154378 live rows and 0 dead rows; 3 rows in sample, 154378 estimated total rows and there's nothing at the end of the whole vacuum output about pages I'm not sure if it gives it to you if you pick a single DB, but if you use -a for all, you should get something at the very end like this: INFO: free space map contains 1365918 pages in 1507 relations DETAIL: A total of 1326656 page slots are in use (including overhead). 1326656 page slots are required to track all free space. Current limits are: 300 page slots, 3500 relations, using 38784 kB. VACUUM That's on our dev system. Your dev table seems properly sized, but prod probably isn't. If you run an all-database vacuum after-hours, you'll see the stuff at the end. And if your 'page slots are required' is greater than your 'page slots are in use,' you've got a problem. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
vacuumdb -a -v -z -U postgres -W vacuum.log Password: Password: Password: Password: Password: Password: Password: Password: Password: Password: Password: cruxnu:nsbuildout crucial$ do you think its possible that it just doesn't have anything to complain about ? or the password is affecting it ? In any case I'm not sure I want to run this even at night on production. what is the downside to estimating max_fsm_pages too high ? 300 should be safe its certainly not 150k I have one very large table (10m) that is being analyzed before I warehouse it. that could've been the monster that ate the free map. I think today I've learned that even unused tables affect postgres performance. and do you agree that I should turn CLUSTER ON ? I have no problem to stop all tasks to this table at night and just reload it On Fri, Feb 4, 2011 at 6:47 PM, Shaun Thomas stho...@peak6.com wrote: On 02/04/2011 11:44 AM, felix wrote: the very end: There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing public.seo_partnerlinkcategory INFO: seo_partnerlinkcategory: scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows That looks to me like it didn't finish. Did you fork it off with '' or run it and wait until it gave control back to you? It really should be telling you how many pages it wanted, and are in use. If not, something odd is going on. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
Re: [PERFORM] Really really slow select count(*)
You can run vacuum verbose on just the postgres database and get the global numbers at the end. gotta be a superuser as well. # \c postgres postgres postgres=# vacuum verbose; lots deleted. DETAIL: A total of 7664 page slots are in use (including overhead). 7664 page slots are required to track all free space. Current limits are: 1004800 page slots, 5000 relations, using 6426 kB. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On 02/04/2011 12:14 PM, felix wrote: do you think its possible that it just doesn't have anything to complain about ? or the password is affecting it ? Why is it asking for the password over and over again? It shouldn't be doing that. And also, are you running this as a user with superuser privileges? You might want to think about setting up a .pgpass file, or setting up local trust for the postgres user so you can run maintenance without having to manually enter a password. In any case I'm not sure I want to run this even at night on production. You should be. Even with auto vacuum turned on, all of our production systems get a nightly vacuum over the entire list of databases. It's non destructive, and about the only thing that happens is disk IO. If your app has times where it's not very busy, say 3am, it's a good time. This is especially true since your free space map is behind. We actually turn off autovacuum because we have a very transactionally intense DB, and if autovacuum launches on a table in the middle of the day, our IO totally obliterates performance. We only run a nightly vacuum over all the databases when very few users or scripts are using anything. what is the downside to estimating max_fsm_pages too high ? Nothing really. It uses more memory to track it, but on modern servers, it's not a concern. The only risk is that you don't know what the real setting should be, so you may not completely stop your bloating. and do you agree that I should turn CLUSTER ON ? Cluster isn't really something you turn on, but something you do. It's like vacuum full, in that it basically rebuilds the table and all indexes from scratch. The major issue you'll run into is that it reorders the table by the index you chose, so you'd best select the primary key unless you have reasons to use something else. And you have to do it table by table, which will really suck since we already know your whole db has bloated, not just one or two tables. You're going to be doing some scripting, buddy. :) Well, unless you just do a dump/restore and start over with sane postgresql.conf settings. I have no problem to stop all tasks to this table at night and just reload it That will work for this table. Just keep in mind all your tables have been suffering since you installed this database. Tables with the highest turnover were hit hardest, but they all have non-ideal sizes compared to what they would be if your maintenance was working. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
felix wrote: and do you agree that I should turn CLUSTER ON ? I have no problem to stop all tasks to this table at night and just reload it You don't turn it on; it's a one time operation that does a cleanup. It is by far the easiest way to clean up the mess you have right now. Moving forward, if you have max_fsm_pages set to an appropriate number, you shouldn't end up back in this position again. But VACUUM along won't get you out of there, and VACUUM FULL is always a worse way to clean this up than CLUSTER. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Fri, Feb 4, 2011 at 11:38 AM, Greg Smith g...@2ndquadrant.com wrote: You don't turn it on; it's a one time operation that does a cleanup. It is by far the easiest way to clean up the mess you have right now. Moving forward, if you have max_fsm_pages set to an appropriate number, you shouldn't end up back in this position again. But VACUUM along won't get you out of there, and VACUUM FULL is always a worse way to clean this up than CLUSTER. note that for large, randomly ordered tables, cluster can be pretty slow, and you might want to do the old: begin; select * into temporaryholdingtable order by somefield; truncate oldtable; insert into oldtables select * from temporaryholdingtable; commit; for fastest performance. I've had Cluster take hours to do that the above does in 1/4th the time. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On 02/04/2011 01:01 PM, Scott Marlowe wrote: begin; select * into temporaryholdingtable order by somefield; truncate oldtable; insert into oldtables select * from temporaryholdingtable; commit; That's usually how I do it, except for larger tables, I also throw in a DROP INDEX for all the indexes on the table before the insert, and CREATE INDEX statements afterwards. Which actually brings up a question I've been wondering to myself that I may submit to [HACKERS]: Can we add a a parallel option to the reindexdb command? We added one to pg_restore, so we already know it works. I have a bunch of scripts that get all the indexes in the database and order them by size (so they're distributed evenly), round-robin them into separate REINDEX sql files, and launches them all in parallel depending on how many threads you want, but that's so hacky I feel dirty every time I use it. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On Fri, Feb 4, 2011 at 7:34 PM, Shaun Thomas stho...@peak6.com wrote: Why is it asking for the password over and over again? It shouldn't be doing that. because I asked it to: -W on the production server I need to enter password and I'm testing on dev first. I just sudo tried it but still no report and do you agree that I should turn CLUSTER ON ? Cluster isn't really something you turn on, but something you do. djns4=# cluster fastadder_fastadderstatus; ERROR: there is no previously clustered index for table fastadder_fastadderstatus http://www.postgresonline.com/journal/archives/10-How-does-CLUSTER-ON-improve-index-performance.html djns4=# alter table fastadder_fastadderstatus CLUSTER ON fastadder_fastadderstatus_pkey; ALTER TABLE djns4=# CLUSTER fastadder_fastadderstatus; CLUSTER ok, that's why I figured I was turning something on. the table has been altered. it will be pk ordered, new entries always at the end and no deletes but this means I have to manually run cluster from time to time, right ? not that there will be much or any reordering. or it should be fine going forward with vacuum and enlarging the free space memory map. It's like vacuum full, in that it basically rebuilds the table and all indexes from scratch. The major issue you'll run into is that it reorders the table by the index you chose, so you'd best select the primary key unless you have reasons to use something else. And you have to do it table by table, which will really suck since we already know your whole db has bloated, not just one or two tables. do we know that ? many of the tables are fairly static. only this one is seriously borked, and yet other related tables seem to be fine. You're going to be doing some scripting, buddy. :) Well, unless you just do a dump/restore and start over with sane postgresql.conf settings. well who knew the defaults were unsane ? :) scripting this is trivial, I already have the script I have made the mistake of doing VACUUM FULL in the past. in fact on this table, and it had to be killed because it took down my entire website ! that may well be the major borking event. a credit to postgres that the table still functions if that's the case. scott marlowe: begin; select * into temporaryholdingtable order by somefield; truncate oldtable; insert into oldtables select * from temporaryholdingtable; commit; that sounds like a good approach. gentlemen, 300,000 + thanks for your generous time ! (a small number, I know) -felix
Re: [PERFORM] Really really slow select count(*)
On Fri, Feb 4, 2011 at 12:26 PM, felix crucialfe...@gmail.com wrote: I just sudo tried it but still no report It's not about who you are in Unix / Linux, it's about who you are in Postgresql. \du will show you who is a superusr. psql -U username will let you connect as that user. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
On 02/04/2011 01:26 PM, felix wrote: because I asked it to: -W on the production server I need to enter password and I'm testing on dev first. Right. I'm just surprised it threw up the prompt so many times. I just sudo tried it but still no report Nono... you have to run the vacuum command with the -U for a superuser in the database. Like the postgres user. but this means I have to manually run cluster from time to time, right ? not that there will be much or any reordering. or it should be fine going forward with vacuum and enlarging the free space memory map. It should be fine going forward. You only need to re-cluster if you want to force the table to remain in the order you chose, since it doesn't maintain the order for updates and new inserts. Since you're only doing it as a cleanup, that's not a concern for you. do we know that ? many of the tables are fairly static. only this one is seriously borked, and yet other related tables seem to be fine. Probably not in your case. I just mean that any non-static table is going to have this problem. If you know what those are, great. I don't usually have that luxury, so I err on the side of assuming the whole DB is borked. :) Also, here's a query you may find useful in the future. It reports the top 20 tables by size, but also reports the row counts and what not. It's a good way to find possibly bloated tables, or tables you could archive: SELECT n.nspname AS schema_name, c.relname AS table_name, c.reltuples AS row_count, c.relpages*8/1024 AS mb_used, pg_total_relation_size(c.oid)/1024/1024 AS total_mb_used FROM pg_class c JOIN pg_namespace n ON (n.oid=c.relnamespace) WHERE c.relkind = 'r' ORDER BY total_mb_used DESC LIMIT 20; The total_mb_used column is the table + all of the indexes and toast table space. The mb_used is just for the table itself. This will also help you see index bloat, or if a table has too much toasted data. well who knew the defaults were unsane ? :) Not really unsane, but for any large database, they're not ideal. This also goes for the default_statistics_target setting. If you haven't already, you may want to bump this up to 100 from the default of 10. Not enough stats can make the planner ignore indexes and other bad things, and it sounds like your DB is big enough to benefit from that. Later versions have made 100 the default, so you'd just be catching up. :) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
ah right, duh. yes, I did it as -U postgres, verified as a superuser just now did it from inside psql as postgres \c djns4 vacuum verbose analyze; still no advice on the pages On Fri, Feb 4, 2011 at 8:34 PM, Scott Marlowe scott.marl...@gmail.comwrote: On Fri, Feb 4, 2011 at 12:26 PM, felix crucialfe...@gmail.com wrote: I just sudo tried it but still no report It's not about who you are in Unix / Linux, it's about who you are in Postgresql. \du will show you who is a superusr. psql -U username will let you connect as that user.
Re: [PERFORM] Really really slow select count(*)
On 02/04/2011 01:59 PM, felix wrote: still no advice on the pages I think it just hates you. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Really really slow select count(*)
it probably has good reason to hate me. ns= SELECT n.nspname AS schema_name, c.relname AS table_name, ns- c.reltuples AS row_count, ns- c.relpages*8/1024 AS mb_used, ns- pg_total_relation_size(c.oid)/1024/1024 AS total_mb_used ns- FROM pg_class c ns- JOIN pg_namespace n ON (n.oid=c.relnamespace) ns- WHERE c.relkind = 'r' ns- ORDER BY total_mb_used DESC ns- LIMIT 20; schema_name |table_name| row_count | mb_used | total_mb_used -+--+-+-+--- public | django_session | 1.47843e+07 |4122 | 18832 public | traffic_tracking2010 | 9.81985e+06 | 811 | 1653 public | mailer_mailingmessagelog | 7.20214e+06 | 441 | 1082 public | auth_user| 3.20077e+06 | 572 | 791 public | fastadder_fastadderstatus| 302479 | 458 | 693 public | registration_registrationprofile | 3.01345e+06 | 248 | 404 public | reporting_dp_6c93734c| 1.1741e+06 | 82 | 224 public | peoplez_contact | 79759 | 18 | 221 public | traffic_tracking201101 | 1.49972e+06 | 163 | 204 public | reporting_dp_a3439e2a| 1.32739e+06 | 82 | 187 public | nsproperties_apthistory | 44906 | 69 | 126 public | nsproperties_apt | 30780 | 71 | 125 public | clients_showingrequest | 85175 | 77 | 103 public | reporting_dp_4ffe04ad| 330252 | 26 | 63 public | fastadder_fastadderstatuslog | 377402 | 28 | 60 public | nsmailings_officememotoagent | 268345 | 15 | 52 public | celery_taskmeta |5041 | 12 | 32 public | mailer_messagelog| 168298 | 24 | 32 public | datapoints_job |9167 | 12 | 23 public | fastadder_fastadderstatus_errors | 146314 | 7 | 21 oh and there in the footnotes to django they say dont' forget to run the delete expired sessions management every once in a while. thanks guys. it won't run now because its too big, I can delete them from psql though well just think how sprightly my website will run tomorrow once I fix these. On Fri, Feb 4, 2011 at 9:00 PM, Shaun Thomas stho...@peak6.com wrote: On 02/04/2011 01:59 PM, felix wrote: still no advice on the pages I think it just hates you. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
Re: [PERFORM] Really really slow select count(*)
On 02/04/2011 02:14 PM, felix wrote: oh and there in the footnotes to django they say dont' forget to run the delete expired sessions management every once in a while. thanks guys. Oh Django... :) it won't run now because its too big, I can delete them from psql though You might be better off deleting the inverse. You know, start a transaction, select all the sessions that *aren't* expired, truncate the table, insert them back into the session table, and commit. BEGIN; CREATE TEMP TABLE foo_1 AS SELECT * FROM django_session WHERE date_expired CURRENT_DATE; TRUNCATE django_session; INSERT INTO django_session SELECT * from foo_1; COMMIT; Except I don't actually know what the expired column is. You can figure that out pretty quick, I assume. That'll also have the benefit of cleaning up the indexes and the table all at once. If you just do a delete, the table won't change at all, except that it'll have less active records. well just think how sprightly my website will run tomorrow once I fix these. Maybe. :) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance