Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/12/2012 01:31 PM, Jeff Janes wrote: On Mon, Nov 12, 2012 at 10:38 AM, Lists wrote: On 11/10/2012 02:21 PM, Jeff Janes wrote: On Fri, Nov 9, 2012 at 4:28 PM, Lists wrote: 2) It was sheer chance that I discovered the need to reindex prior to vacuum in order to get the disk space back. As of 9.0, a "vacuum full" inherently does a reindex, so doing an explicit one is neither necessary nor beneficial. I don't know if your discovery is based on a non-full vacuum, or on an older server. I can only state that merely doing a "vacuum full" or "vacuum full $tables" sequentially did not free the space, whereas the sequential reindex $table, each followed immediately by a vacuum full $table) did. With what version? [root@alpha ~]# rpm -qi postgresql91-server Name: postgresql91-server Relocations: (not relocatable) Version : 9.1.5 Vendor: (none) Release : 3PGDG.rhel6 Build Date: Sun 02 Sep 2012 12:13:18 PM UTC Install Date: Wed 12 Sep 2012 03:04:41 AM UTC Build Host: koji-sl6-x86-64-pg91 Group : Applications/DatabasesSource RPM: postgresql91-9.1.5-3PGDG.rhel6.src.rpm Size: 15191132 License: PostgreSQL Signature : DSA/SHA1, Sun 02 Sep 2012 12:13:24 PM UTC, Key ID 1f16d2e1442df0f8 URL : http://www.postgresql.org/ If you'd like I can easily recreate the scenario by simply not "cleaning up" one of the DB servers until it bloats up and make available (limit distribution) a binary copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at night) in order to help identify why it didn't work as expected. Do you think can make an easily script-able way to re-create the resistant bloat? That would be better than trying to disseminate binary files, I think. It would only be better if it actually created the situation that caused the space to not be freed. But, until you know the actual cause of a problem, I've found that it's often not productive to create simulations that may or may not be actually related to the problem. What I did was just create and drop temp tables in a tight loop, with autovacuum off, and then once pg_attribute got good and bloated, did a vacuum full as the database owner or superuser. Based on my understanding, if your loop included an intermittent schema change from within a transaction it might better approximate my actual scenario. Merely creating temp tables and then dropping them would create lots of activity "at the end" of the table which would free correctly. This still does not explain why reindex $table works when reindex is supposedly implicit in the vacuum. If all of your long-lived objects were created before pg_attribute got bloated and so the bloat was due only to short-lived objects, then non-full vacuum (if run often enough) should eventually be able to return that space as the short-lived objects near the end start to go away. However, if even a single long-live object finds itself at the end of the table, then only a vacuum full will ever be able to reclaim that space. Since the time period involved (weeks/months) would have included both a large number of created/destroyed temp tables and occasionally altered persistent objects it would appear that the full option a very good idea, at least periodically. If you can prevent the extreme bloat from occurring in the first place, then the "end" of the table would not be so far away from its desired size that it needs to get reset by a vacuum full. If you find your self in need of a vacuum full, then you should do one. But you should ask yourself what went wrong that you got into that situation in the first place. I agree; this is why my questions on enabling autovacuum in a related thread. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On Mon, Nov 12, 2012 at 10:38 AM, Lists wrote: > On 11/10/2012 02:21 PM, Jeff Janes wrote: >> >> On Fri, Nov 9, 2012 at 4:28 PM, Lists wrote: >> >>> >>> 2) It was sheer chance that I discovered the need to reindex prior to >>> vacuum >>> in order to get the disk space back. >> >> As of 9.0, a "vacuum full" inherently does a reindex, so doing an >> explicit one is neither necessary nor beneficial. >> >> I don't know if your discovery is based on a non-full vacuum, or on an >> older server. > > > I can only state that merely doing a "vacuum full" or "vacuum full $tables" > sequentially did not free the space, whereas the sequential reindex $table, > each followed immediately by a vacuum full $table) did. With what version? > If you'd like I can > easily recreate the scenario by simply not "cleaning up" one of the DB > servers until it bloats up and make available (limit distribution) a binary > copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at > night) in order to help identify why it didn't work as expected. Do you think can make an easily script-able way to re-create the resistant bloat? That would be better than trying to disseminate binary files, I think. What I did was just create and drop temp tables in a tight loop, with autovacuum off, and then once pg_attribute got good and bloated, did a vacuum full as the database owner or superuser. >> >> If all of your long-lived objects were created before pg_attribute got >> bloated and so the bloat was due only to short-lived objects, then >> non-full vacuum (if run often enough) should eventually be able to >> return that space as the short-lived objects near the end start to go >> away. However, if even a single long-live object finds itself at the >> end of the table, then only a vacuum full will ever be able to reclaim >> that space. >> > > Since the time period involved (weeks/months) would have included both a > large number of created/destroyed temp tables and occasionally altered > persistent objects it would appear that the full option a very good idea, at > least periodically. If you can prevent the extreme bloat from occurring in the first place, then the "end" of the table would not be so far away from its desired size that it needs to get reset by a vacuum full. If you find your self in need of a vacuum full, then you should do one. But you should ask yourself what went wrong that you got into that situation in the first place. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/10/2012 02:21 PM, Jeff Janes wrote: On Fri, Nov 9, 2012 at 4:28 PM, Lists wrote: ... 3) For each of the tables from #2, run the commands REINDEX TABLE $table; VACUUM FULL ANALYZE $table; The end result is a squeaky-clean database server with expected disk usage. NOTES: ... 2) It was sheer chance that I discovered the need to reindex prior to vacuum in order to get the disk space back. As of 9.0, a "vacuum full" inherently does a reindex, so doing an explicit one is neither necessary nor beneficial. I don't know if your discovery is based on a non-full vacuum, or on an older server. I can only state that merely doing a "vacuum full" or "vacuum full $tables" sequentially did not free the space, whereas the sequential reindex $table, each followed immediately by a vacuum full $table) did. If you'd like I can easily recreate the scenario by simply not "cleaning up" one of the DB servers until it bloats up and make available (limit distribution) a binary copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at night) in order to help identify why it didn't work as expected. 5) I don't yet know if the "full" option for the vacuum is necessary to free up all space. I will experiment with this and post results if useful. The answer to this is mostly non-deterministic. non-full vacuum can only free space from the "end" of the table. If all of your long-lived objects were created before pg_attribute got bloated and so the bloat was due only to short-lived objects, then non-full vacuum (if run often enough) should eventually be able to return that space as the short-lived objects near the end start to go away. However, if even a single long-live object finds itself at the end of the table, then only a vacuum full will ever be able to reclaim that space. Since the time period involved (weeks/months) would have included both a large number of created/destroyed temp tables and occasionally altered persistent objects it would appear that the full option a very good idea, at least periodically. -Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/09/2012 05:26 PM, Steve Crawford wrote: Bloat in pg_attribute would correlate with A) (or any constant creation/destruction of tables). You can vacuum and/or reindex the system tables if you are connected as the superuser but you are better off preventing bloat by appropriate adjustment of your configuration settings. However note that if you do frequent bulk creation/destruction of tables you could end up bloating the attribute table between vacuum runs and may need to periodically manually shrink it. Steve, Our system divides customers into distinct databases, however customers are often clustered. (Think: different locations of 7/11) and so we have to aggregate data from different databases. We do this with dblink to get the data and temp tables to collate it, which appears to be a cause of the bloat we're seeing. -Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/10/2012 02:23 PM, Scott Marlowe wrote: When in doubt there are the docs:) http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM "The "autovacuum daemon" actually consists of multiple processes. There is a persistent daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, attempting to start one worker within each database every autovacuum_naptime seconds. (Therefore, if the installation has N databases, a new worker will be launched every autovacuum_naptime/N seconds.)" And apparently it wasn't always this way: http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html#AUTOVACUUM The autovacuum daemon, when enabled, runs every autovacuum_naptime seconds. On each run, it selects one database to process and checks each table within that database. VACUUM or ANALYZE commands are issued as needed. Seems to have changed in 8.3: http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html "Beginning in PostgreSQL 8.3, autovacuum has a multiprocess architecture: There is a daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, but attempt to start one worker on each database every autovacuum_naptime seconds. One worker will be launched for each database, with a maximum of autovacuum_max_workers processes running at the same time..." -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On Sat, Nov 10, 2012 at 3:20 PM, Adrian Klaver wrote: > On 11/10/2012 02:08 PM, Scott Marlowe wrote: >> >> On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes wrote: >>> >>> On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe >>> wrote: As well, since the default nap time is 1 minute, it will take at least 50 minutes to vacuum each db as nap time is how long autovac waits between databases. >>> >>> >>> That isn't how it works. The naptime is per database, not per >>> cluster. If the naptime is 1 minute and there are 50 "active" >>> databases, then it will launch a new worker every 1.2 seconds >>> (assuming the old ones finish fast enough that doing so would not >>> exceed autovacuum_max_workers) >> >> >> Hmmm. That was not my understanding from previous discussions on nap >> time. >> >> > > > When in doubt there are the docs:) > > http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM > > "The "autovacuum daemon" actually consists of multiple processes. There is a > persistent daemon process, called the autovacuum launcher, which is in > charge of starting autovacuum worker processes for all databases. The > launcher will distribute the work across time, attempting to start one > worker within each database every autovacuum_naptime seconds. (Therefore, if > the installation has N databases, a new worker will be launched every > autovacuum_naptime/N seconds.)" And apparently it wasn't always this way: http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html#AUTOVACUUM The autovacuum daemon, when enabled, runs every autovacuum_naptime seconds. On each run, it selects one database to process and checks each table within that database. VACUUM or ANALYZE commands are issued as needed. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On Fri, Nov 9, 2012 at 4:28 PM, Lists wrote: ... > 3) For each of the tables from #2, run the commands > REINDEX TABLE $table; > VACUUM FULL ANALYZE $table; > > The end result is a squeaky-clean database server with expected disk usage. > > NOTES: ... > > > 2) It was sheer chance that I discovered the need to reindex prior to vacuum > in order to get the disk space back. As of 9.0, a "vacuum full" inherently does a reindex, so doing an explicit one is neither necessary nor beneficial. I don't know if your discovery is based on a non-full vacuum, or on an older server. > 5) I don't yet know if the "full" option for the vacuum is necessary to free > up all space. I will experiment with this and post results if useful. The answer to this is mostly non-deterministic. non-full vacuum can only free space from the "end" of the table. If all of your long-lived objects were created before pg_attribute got bloated and so the bloat was due only to short-lived objects, then non-full vacuum (if run often enough) should eventually be able to return that space as the short-lived objects near the end start to go away. However, if even a single long-live object finds itself at the end of the table, then only a vacuum full will ever be able to reclaim that space. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/10/2012 02:08 PM, Scott Marlowe wrote: On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes wrote: On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe wrote: As well, since the default nap time is 1 minute, it will take at least 50 minutes to vacuum each db as nap time is how long autovac waits between databases. That isn't how it works. The naptime is per database, not per cluster. If the naptime is 1 minute and there are 50 "active" databases, then it will launch a new worker every 1.2 seconds (assuming the old ones finish fast enough that doing so would not exceed autovacuum_max_workers) Hmmm. That was not my understanding from previous discussions on nap time. When in doubt there are the docs:) http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM "The "autovacuum daemon" actually consists of multiple processes. There is a persistent daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, attempting to start one worker within each database every autovacuum_naptime seconds. (Therefore, if the installation has N databases, a new worker will be launched every autovacuum_naptime/N seconds.)" -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: PG defaults and performance (was Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum))
On Sun, Nov 11, 2012 at 8:05 AM, Jeff Janes wrote: > Totally not. With default settings and default pgbench, the easiest > way for host B to beat host A is by lying about the durability of > fsync. True. Without the ability to brutally cut the power to a cloud instance or other remote (and in some cases possibly virtualized) server, it's practically impossible to test that. We're basically relying on replication and hoping that three instances (master and two slaves) don't go down simultaneously, which is hardly a guarantee. ChrisA -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes wrote: > On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe wrote: >> >> As well, since the default nap time is 1 minute, it will take at least >> 50 minutes to vacuum each db as nap time is how long autovac waits >> between databases. > > That isn't how it works. The naptime is per database, not per > cluster. If the naptime is 1 minute and there are 50 "active" > databases, then it will launch a new worker every 1.2 seconds > (assuming the old ones finish fast enough that doing so would not > exceed autovacuum_max_workers) Hmmm. That was not my understanding from previous discussions on nap time. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe wrote: > > As well, since the default nap time is 1 minute, it will take at least > 50 minutes to vacuum each db as nap time is how long autovac waits > between databases. That isn't how it works. The naptime is per database, not per cluster. If the naptime is 1 minute and there are 50 "active" databases, then it will launch a new worker every 1.2 seconds (assuming the old ones finish fast enough that doing so would not exceed autovacuum_max_workers) Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: PG defaults and performance (was Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum))
On Fri, Nov 9, 2012 at 6:17 PM, Chris Angelico wrote: > On Sat, Nov 10, 2012 at 12:26 PM, Steve Crawford > wrote: >> Don't do that. Defaults are good for ensuring that PostgreSQL will start on >> the widest reasonable variety of systems. They are *terrible* for >> performance and are certainly wrong for the system you describe. > > Tuning a PostgreSQL database is a major science, but is there a > reasonably easy way to get a stable baseline for comparison? We've > been exploring different hosting options recently, and one thing we > want to know is how well Postgres will perform. To that end, we've > been using pgbench on a default configuration Postgres, on the > expectation that that'll at least be consistent (that is, if a Cloud > Host A instance does X tps and Cloud Host B does 2*X, then we can > expect host B to deliver roughly double performance in production). > How valid is this assumption? Broadly, or totally not? Totally not. With default settings and default pgbench, the easiest way for host B to beat host A is by lying about the durability of fsync. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
Scott -- <...> >To tune autovacuum with 50 databases, start by dropping nap time to >something much lower, like 10s. Then if you need to, drop cost delay >until you get to 0. If you get to 0 and it's still not hitting your >IO too hard, but not keeping up, then increase cost limit. If you get >to something in the 5000 to 1 range, and its still not keeping up >then start bumping the thread count > Thanks for outlining a strategy on this -- useful advice. Greg Williamson -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
PG defaults and performance (was Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum))
On Sat, Nov 10, 2012 at 12:26 PM, Steve Crawford wrote: > Don't do that. Defaults are good for ensuring that PostgreSQL will start on > the widest reasonable variety of systems. They are *terrible* for > performance and are certainly wrong for the system you describe. Tuning a PostgreSQL database is a major science, but is there a reasonably easy way to get a stable baseline for comparison? We've been exploring different hosting options recently, and one thing we want to know is how well Postgres will perform. To that end, we've been using pgbench on a default configuration Postgres, on the expectation that that'll at least be consistent (that is, if a Cloud Host A instance does X tps and Cloud Host B does 2*X, then we can expect host B to deliver roughly double performance in production). How valid is this assumption? Broadly, or totally not? ChrisA -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On Fri, Nov 9, 2012 at 5:28 PM, Lists wrote: > As I've spent a considerable amount of time trying to sort this out, I'm > posting it for the benefit other users. SNIP > D) concurrent use of pg_dump; Not usually a problem, unless it's overloading your IO subsystem. > C) use of transactions, especially prepared transactions and multiple > savepoints; > E) use of numerous databases on a single server, average about 50; These two can be killers. Long running transactions can cause autovacuum processes to stall out or be autocancelled. As well, since the default nap time is 1 minute, it will take at least 50 minutes to vacuum each db as nap time is how long autovac waits between databases. Reducing autovacuum nap time to 5 or 10 seconds would be a good move here, also possibly making it more aggressive by increasing max worker threads, decreasing cost delay (possibly to zero or close to it) and / or increasing cost limit. After making such a change then watching iostat when vacuum is running to see how hard its hitting your IO subsystem. I'm guessing that with SSDs it isn't gonna be a big problem. As Greg Smith has pointed out in the past, usually the answer to an autovacuum problem is making it more, not less aggressive. Unless you're flooding your IO this is almost always the right answer. Keep in mind that autovacuum by default is setup to be VERY unaggressive because it may be running on a netbook for all it knows. To tune autovacuum with 50 databases, start by dropping nap time to something much lower, like 10s. Then if you need to, drop cost delay until you get to 0. If you get to 0 and it's still not hitting your IO too hard, but not keeping up, then increase cost limit. If you get to something in the 5000 to 1 range, and its still not keeping up then start bumping the thread count -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/09/2012 04:28 PM, Lists wrote: As I've spent a considerable amount of time trying to sort this out, I'm posting it for the benefit other users. Having missed the earlier conversationa couple comments: I've experienced persistent, ongoing issues with autovacuum in a mixed read/write environment with midrange hardware (16 core Xeon, 128 GB RAM, 200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults in the 9.1 RPMs provided by Postgres. (yum.postgresql.org) Don't do that. Defaults are good for ensuring that PostgreSQL will start on the widest reasonable variety of systems. They are *terrible* for performance and are certainly wrong for the system you describe. The cause of this is not yet determined. It may be related to the any or all of the combination of: A) extensive use of temp tables; B) extensive use of multiple dblink() calls in a single query; C) use of transactions, especially prepared transactions and multiple savepoints; D) concurrent use of pg_dump; E) use of numerous databases on a single server, average about 50; To offset this, we turned off autovacuum, and used an old script to vacuum the tables in the middle of the night when nobody was looking. Unfortunately, the vacuum script only vacuumed the "userland" tables and tremendous amounts of disk space were being wasted, particularly in the pg_attribute tables. Bloat in pg_attribute would correlate with A) (or any constant creation/destruction of tables). You can vacuum and/or reindex the system tables if you are connected as the superuser but you are better off preventing bloat by appropriate adjustment of your configuration settings. However note that if you do frequent bulk creation/destruction of tables you could end up bloating the attribute table between vacuum runs and may need to periodically manually shrink it. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
As I've spent a considerable amount of time trying to sort this out, I'm posting it for the benefit other users. I've experienced persistent, ongoing issues with autovacuum in a mixed read/write environment with midrange hardware (16 core Xeon, 128 GB RAM, 200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults in the 9.1 RPMs provided by Postgres. (yum.postgresql.org) The cause of this is not yet determined. It may be related to the any or all of the combination of: A) extensive use of temp tables; B) extensive use of multiple dblink() calls in a single query; C) use of transactions, especially prepared transactions and multiple savepoints; D) concurrent use of pg_dump; E) use of numerous databases on a single server, average about 50; To offset this, we turned off autovacuum, and used an old script to vacuum the tables in the middle of the night when nobody was looking. Unfortunately, the vacuum script only vacuumed the "userland" tables and tremendous amounts of disk space were being wasted, particularly in the pg_attribute tables. However, use of any of the statements "vacuum analyze", "vacuum full analyze", "vacuum full verbose analyze" without mentioning specific tables did not resolve the extra disk space used issue, disk usage still remained at least 5x the expected amount in all cases. (in one case, use of all of these open-ended vacuum queries did almost nothing) Nor did running any variation of "vacuum analyze $table" in a loop thru all tables (including the pg_* tables) completely resolve the issue, either. In order to completely clean things up, we ended up writing a script do the following: 1) Determine the databases using excessive disk space, in descending order of use with this query: SELECT d.datname as Name, d.datistemplate::int AS datistemplate, pg_catalog.pg_get_userbyid(d.datdba) as Owner, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE -1 END as Size FROM pg_catalog.pg_database d order by CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE NULL END desc nulls first; 2) For each database from #1, get a list of tables to be cleaned up with this query: SELECT nspname || '.' || relname AS "table", pg_total_relation_size(C.oid) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE (true or (nspname NOT IN ('pg_catalog', 'information_schema'))) AND C.relkind = 'r' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC; 3) For each of the tables from #2, run the commands REINDEX TABLE $table; VACUUM FULL ANALYZE $table; The end result is a squeaky-clean database server with expected disk usage. NOTES: 1) The above queries are derived from queries found to determine how much disk space was used, even though the additional information provided isn't actually used by the script. 2) It was sheer chance that I discovered the need to reindex prior to vacuum in order to get the disk space back. 3) I'd like to get autovacuum to work. I've read suggestions to tweak cost_delay and/or cost_limit. I haven't yet determined if the problem is I/O based or lock/deadlock based. I'm guessing the problem is the latter, though it's hard to tell because queries stack up quickly and load average is sky high when autovacuum fails for us. 4) The aforementioned process is S-L-O-W. Think at least hours and probably days depending on your databases, your server(s), and the load. 5) I don't yet know if the "full" option for the vacuum is necessary to free up all space. I will experiment with this and post results if useful. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On Thu, Nov 8, 2012 at 6:05 PM, Lists wrote: > On 11/07/2012 12:42 PM, Tom Lane wrote: >>> >>> ... because it >>> >occasionally causes transactions and queries to hang when an update >>> >causes a vacuum mid-day, effectively taking us offline randomly. >> >> I suspect this claim is based on ancient and no longer very relevant >> experience. > > Even so, if I felt the need to keep autovacuum off, what would I need to run > regularly in order to keep things neat and tidy under the hood? Would a > simple "vacuum" within each database suffice? Should I be logged in as the > database owner or as an administrative user? Just know that most of the time people think they need to turn off autovacuum they usually need to tune it instead. either more or less agressive depending on why they think they need to turn it off. If it's consuming too much IO then reduce cost limit / increase cost delay, if it's not aggressive enough, then reverse that and increase cost limit and decrease cost delay. If your IO subsystem can't keep up, then turning off autovacuum or turning it down simply be delaying the problem rather than solving it (i.e. throw more IO at it). -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
Lists writes: > Even so, if I felt the need to keep autovacuum off, what would I need to > run regularly in order to keep things neat and tidy under the hood? > Would a simple "vacuum" within each database suffice? Should I be logged > in as the database owner or as an administrative user? A plain "vacuum" (or probably better, "vacuum analyze") done as superuser will suffice, as long as you do it often enough. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On 11/07/2012 12:42 PM, Tom Lane wrote: ... because it >occasionally causes transactions and queries to hang when an update >causes a vacuum mid-day, effectively taking us offline randomly. I suspect this claim is based on ancient and no longer very relevant experience. Even so, if I felt the need to keep autovacuum off, what would I need to run regularly in order to keep things neat and tidy under the hood? Would a simple "vacuum" within each database suffice? Should I be logged in as the database owner or as an administrative user? Thanks, Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On Wed, Nov 7, 2012 at 3:15 PM, Lists wrote: > On 11/07/2012 12:42 PM, Tom Lane wrote: >> >> So you've turned off autovacuum, and are carefully not vacuuming the >> system catalogs. That's your problem all right. Is there a >> particularly good reason why this script isn't a one-liner "VACUUM"? > > > Back in the 8.x days, we experienced "vacuum full analyze" occasionally > causing other processes to hang/timeout. That was your first mistake. By 8.0 the need for vacuum full was almost zero. Except for instances where bloat got out of hand, vacuum full should generally be avoided after 8.0. Regular vacuum should be plenty, whether it's run by autovacuum daemon or a cron job. Until 8.3 autovacuum was single threaded so therefore often had trouble keeping up with bloat. While vacuum full is a blocking operation plain vacuums are not, so unless you REALLY need a vacuum full they should be avoided. > In an attempt to minimize the > impact of the locking, we updated the script to vacuum one table at a time, > which seemed to work well throughout the 8.x series. I'd happily accept that > this conclusion may have simply have been wrong, but it worked well enough > that nobody complained and life was good. Yeah you still had blocking but it was probably less noticeable. > After switching to 9.x, we read > that the "full" vacuum was less useful and so the script was changed to > "vacuum analyze $table" rather than "vacuum full analyze $table". Yeah at that point you'd have been better off tuning autovacuum to be more aggressive and let it do the job. Generally the time to call vacuum by hand is right after you've done something like delete half the rows in a large table. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On Wed, Nov 7, 2012 at 1:58 PM, Scott Marlowe wrote: > On Wed, Nov 7, 2012 at 1:42 PM, Tom Lane wrote: >> Lists writes: >> >>> ... because it >>> occasionally causes transactions and queries to hang when an update >>> causes a vacuum mid-day, effectively taking us offline randomly. >> >> I suspect this claim is based on ancient and no longer very relevant >> experience. > > My experience is that if autovac is causing problems with stalled > queries etc you're either A: running ancient pg versions (pre 8.3), B: > Running WAY too aggressive settings in autovac (100 threads, no nap > time, cost limit of 10 etc.) or C: Your IO subsystem is absolute > crap. > > On any modern server, default autovac settings from 8.3 and on should > only have the possible problem of not being tuned aggressively enough. Oh another failure scenario up there is that you're running DDL in production, which is stalling behind an autovac, and in turn the two are stalling other queries. This has happened for me once or twice on more modern versions (8.3 and 8.4) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
Jeff Janes writes: >> WHERE nspname NOT IN ('pg_catalog', 'information_schema') >> AND C.relkind <> 'i' >> AND nspname !~ '^pg_toast' > I question the wisdom of that where clause (from the wiki) > If the pg_catalog relations are big, then they are big and why > shouldn't they get reported as such? Agreed, please change it. (The index and toast exclusions are reasonable, since those will be accounted for in pg_total_relation_size of the parent. Personally I'd code the toast exclusion using relkind not a namespace check though.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On 11/7/12 3:58 PM, Jeff Janes wrote: WHERE nspname NOT IN ('pg_catalog', 'information_schema') I question the wisdom of that where clause (from the wiki) If the pg_catalog relations are big, then they are big and why shouldn't they get reported as such? And if they are not big, that is why the limit is there on the wiki page. The idea was that in a new database with a relatively small number of tables, your own tables will be lost among the catalog data unless you filter them out. Testing against an install with a single real table, the query there will show something like this right now: relation| total_size + public.t | 3568 kB public.t_k_seq | 8192 bytes But if the filter on pg_catalog is removed, you get this instead: relation | total_size -+ public.t| 3568 kB pg_catalog.pg_depend| 808 kB pg_catalog.pg_proc | 752 kB pg_catalog.pg_attribute | 568 kB pg_catalog.pg_rewrite | 464 kB pg_catalog.pg_description | 392 kB pg_catalog.pg_statistic | 328 kB pg_catalog.pg_operator | 208 kB pg_catalog.pg_collation | 152 kB pg_catalog.pg_type | 152 kB pg_catalog.pg_amop | 136 kB pg_catalog.pg_class | 136 kB pg_catalog.pg_constraint| 112 kB pg_catalog.pg_conversion| 104 kB pg_catalog.pg_index | 88 kB pg_catalog.pg_amproc| 80 kB pg_catalog.pg_opclass | 80 kB pg_catalog.pg_ts_config_map | 80 kB pg_catalog.pg_cast | 80 kB pg_catalog.pg_authid| 72 kB That is overload for a lot of people, and confusing to new users. That's why I opted for the shorter version. There's no perfect answer to all use cases here. This sort of thing is why there's three sets of queries for pg_stat_user_tables, pg_stat_sys_tables, and pg_stat_all_tables. The wiki disk space queries aim to be like the user tables version from that trio. Adding a note pointing out that you might want to remove pg_catalog and see the size of those relations would be appropriate. I wouldn't make that the default case though, due to the issue highlighted above. I'd rather optimize the initially suggested query so that new users get simple output, even if it means that might hide problems on larger installs, where the catalog data became big. The other way I sometimes balance these two requirements--want to show all the big data, but not clutter small installs with the catalog--is to make the filter size-based instead: SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size" pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' AND relpages > 100 ORDER BY pg_total_relation_size(C.oid) DESC LIMIT 20; On my trivial test install that gives me just the one user table: relation | total_size --+ public.t | 3568 kB While still showing larger catalog tables if they grow to be noticeable. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On 11/07/2012 12:58 PM, Scott Marlowe wrote: My experience is that if autovac is causing problems with stalled queries etc you're either A: running ancient pg versions (pre 8.3), B: Running WAY too aggressive settings in autovac (100 threads, no nap time, cost limit of 10 etc.) or C: Your IO subsystem is absolute crap. On any modern server, default autovac settings from 8.3 and on should only have the possible problem of not being tuned aggressively enough. A) We are running PG 9.1. B) We used the default settings in the RPMs provided by yum.postgresql.org. At the bottom of this message is information about the RPMs we currently are using. C) I/O subsystem for PG is twin SATA III SSDs in a RAID 1 configuration, capable of tens of thousands of IO operations per second. Servers are recent, SATA III, 16-core Xeons with 128 GB ECC RAM in 1U rackmount cases. As stated previously, we make extensive use of temp tables, transactions, and dblink, but had no trouble with catalog table bloat in 8.x; this is a new phenomenon for us. # rpm -qi postgresql91-9.1.5-3PGDG.rhel6.x86_64 Name: postgresql91 Relocations: (not relocatable) Version : 9.1.5 Vendor: (none) Release : 3PGDG.rhel6 Build Date: Sun 02 Sep 2012 12:13:18 PM UTC Install Date: Wed 12 Sep 2012 03:04:24 AM UTC Build Host: koji-sl6-x86-64-pg91 Group : Applications/DatabasesSource RPM: postgresql91-9.1.5-3PGDG.rhel6.src.rpm Size: 5193673 License: PostgreSQL Signature : DSA/SHA1, Sun 02 Sep 2012 12:13:20 PM UTC, Key ID 1f16d2e1442df0f8 URL : http://www.postgresql.org/ Summary : PostgreSQL client programs and libraries Description : PostgreSQL is an advanced Object-Relational database management system (DBMS) that supports almost all SQL constructs (including transactions, subselects and user-defined types and functions). The postgresql package includes the client programs and libraries that you'll need to access a PostgreSQL DBMS server. These PostgreSQL client programs are programs that directly manipulate the internal structure of PostgreSQL databases on a PostgreSQL server. These client programs can be located on the same machine with the PostgreSQL server, or may be on a remote machine which accesses a PostgreSQL server over a network connection. This package contains the command-line utilities for managing PostgreSQL databases on a PostgreSQL server. If you want to manipulate a PostgreSQL database on a local or remote PostgreSQL server, you need this package. You also need to install this package if you're installing the postgresql91-server package. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On 11/07/2012 12:42 PM, Tom Lane wrote: So you've turned off autovacuum, and are carefully not vacuuming the system catalogs. That's your problem all right. Is there a particularly good reason why this script isn't a one-liner "VACUUM"? Back in the 8.x days, we experienced "vacuum full analyze" occasionally causing other processes to hang/timeout. In an attempt to minimize the impact of the locking, we updated the script to vacuum one table at a time, which seemed to work well throughout the 8.x series. I'd happily accept that this conclusion may have simply have been wrong, but it worked well enough that nobody complained and life was good. After switching to 9.x, we read that the "full" vacuum was less useful and so the script was changed to "vacuum analyze $table" rather than "vacuum full analyze $table". Are you sure that once-a-day vacuuming is sufficient, even if it was covering the system catalogs? If you've managed to bloat pg_attribute to 36GB, I suspect you've got enough churn (perhaps from temp tables) that you really need the catalogs vacuumed more often. The only thing that I could find in the docs even mentioning the idea of vacuuming catalogs is this sentence: (A manual VACUUM should fix the problem, as suggested by the hint; but note that the VACUUM must be performed by a superuser, else it will fail to process system catalogs and thus not be able to advance the database's datfrozenxid.) http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html This does NOT clearly say that the end user could vacuum catalogs, let alone that it's necessary or even a good idea. Otherwise, the only mention is of tables, and there's no mention of the idea that tables are anything but user space. My advice is dump, reload, and *don't* turn off autovacuum. ... because it occasionally causes transactions and queries to hang when an update causes a vacuum mid-day, effectively taking us offline randomly. I suspect this claim is based on ancient and no longer very relevant experience. We tried several times to turn on autovacuum with 9.1 and had problems every time. If our use case is particularly special, I'd love to work with you to get autovacuum to work in our situation too as it would make life easier for us! But for the past few months, every time we've turned it on, we've had our phones swamped with customers who are unable to use our system while our application monitors scream bloody murder, at least weekly. From what we could tell (under extreme pressure to get it all working again ASAP, mind you) it seemed that when doing a large update from within a transaction, autovacuum would get triggered before the transaction completed, causing the transaction to hang or at least slow way down, causing timeouts to occur with load balancers, so customers would then try again, compounding the ongoing problem. Pretty soon you have not only I/O issues, but also locking issues and upset customers. This issue may be compounded because we make fairly extensive use of dblink and temp tables to aggregate data for our customers who have multiple sites. -Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On Wed, Nov 7, 2012 at 1:42 PM, Tom Lane wrote: > Lists writes: > >> ... because it >> occasionally causes transactions and queries to hang when an update >> causes a vacuum mid-day, effectively taking us offline randomly. > > I suspect this claim is based on ancient and no longer very relevant > experience. My experience is that if autovac is causing problems with stalled queries etc you're either A: running ancient pg versions (pre 8.3), B: Running WAY too aggressive settings in autovac (100 threads, no nap time, cost limit of 10 etc.) or C: Your IO subsystem is absolute crap. On any modern server, default autovac settings from 8.3 and on should only have the possible problem of not being tuned aggressively enough. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
Hi Greg, I've added you to the cc list because I'm proposing to change some wiki content which you wrote On Wed, Nov 7, 2012 at 11:54 AM, Lists wrote: > On 11/07/2012 09:01 AM, Jeff Janes wrote: >> >> Ben, did you ever figure out where the space was going? > > > > Now, here's where it gets weird. From the disk space usage wiki, > (http://wiki.postgresql.org/wiki/Disk_Usage) you'll recognize how I used it > to get a total disk space used result: > > with mytable AS ( > SELECT > nspname || '.' || relname AS "relation", > pg_total_relation_size(C.oid) AS "size" > FROM > pg_class C > LEFT JOIN pg_namespace N ON > (N.oid = C.relnamespace) > WHERE nspname NOT IN ('pg_catalog', 'information_schema') > AND C.relkind <> 'i' > AND nspname !~ '^pg_toast' > ORDER BY > pg_total_relation_size(C.oid) DESC > ) > SELECT sum(size) AS size FROM mytable I question the wisdom of that where clause (from the wiki) If the pg_catalog relations are big, then they are big and why shouldn't they get reported as such? And if they are not big, that is why the limit is there on the wiki page. I'm tempted to go change it, but maybe there is a good reason it is there which I do not understand. ... > > Google returns this page: > http://www.postgresql.org/docs/9.1/static/catalog-pg-attribute.html which > doesn't help me much. So, am I doing something wrong with admin? Our current > process is that every night in the middle of the night, a script connects to > each database on each server and runs a query to get all tables in each > database and, for each, run > > "VACUUM ANALYZE $table" > > for each table in the database. I take it your script that does that is not including the pg_catalog tables? Why not just run "vacuum analyze" and let it do the entire database? > I will note that autovacuum is off because it occasionally causes > transactions and queries to hang when an update causes a vacuum mid-day, > effectively taking us offline randomly. Hang as in they are blocking on locks? Or they just get slow because the autovacuum is consuming too much IO? Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
Lists writes: > pg_catalog.pg_attribute | 36727480320 Ouch. > Our current process is that every night in the middle of the night, a > script connects to each database on each server and runs a query to get > all tables in each database and, for each, run > "VACUUM ANALYZE $table" > for each table in the database. > (note: there is a database for the "postgres" user on each DB server) > The script is a remnant from PG 8.x days, so am I missing something > fundamental about 9.x? I will note that autovacuum is off ... So you've turned off autovacuum, and are carefully not vacuuming the system catalogs. That's your problem all right. Is there a particularly good reason why this script isn't a one-liner "VACUUM"? Are you sure that once-a-day vacuuming is sufficient, even if it was covering the system catalogs? If you've managed to bloat pg_attribute to 36GB, I suspect you've got enough churn (perhaps from temp tables) that you really need the catalogs vacuumed more often. My advice is dump, reload, and *don't* turn off autovacuum. > ... because it > occasionally causes transactions and queries to hang when an update > causes a vacuum mid-day, effectively taking us offline randomly. I suspect this claim is based on ancient and no longer very relevant experience. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On 11/07/2012 09:01 AM, Jeff Janes wrote: Ben, did you ever figure out where the space was going? I think we've found where the space is going, but I still don't yet know how to resolve it. I modified your query thusly in order to get a total of space used, and got an answer that matches closely: with stuff as (SELECT d.datname as Name, pg_catalog.pg_get_userbyid(d.datdba) as Owner, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE -1 END as Size FROM pg_catalog.pg_database d order by CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE NULL END desc nulls first) SELECT sum(size) AS overall from stuff; Result: 171,276,369,124 # du -sbc /var/lib/pgsql/9.1/data/* Result: 172,087,129,512 Now, the question is, I see several databases that uses disk usage with sizes that are dramatically different than I get from a dump/restore to another machine: Production: santarosa444| postgres | 44 GB Dump/Restore: santarosa444| postgres | 685 MB Now, here's where it gets weird. From the disk space usage wiki, (http://wiki.postgresql.org/wiki/Disk_Usage) you'll recognize how I used it to get a total disk space used result: with mytable AS ( SELECT nspname || '.' || relname AS "relation", pg_total_relation_size(C.oid) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC ) SELECT sum(size) AS size FROM mytable ... but the total result is 747,569,152 which is close to the dump/restore value, not the production server value, even though I'm running this query on the production server. So there's *something* that the latter query isn't identifying that the former is. On a hunch, ran this query: with mytable AS ( SELECT nspname || '.' || relname AS "relation", pg_total_relation_size(C.oid) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE (true or (nspname NOT IN ('pg_catalog', 'information_schema'))) AND C.relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC ) SELECT sum(size) AS size FROM mytable; And the result is 46,771,216,384! Removing the "mytable" wrapper stuff, here are the top results: pg_catalog.pg_attribute | 36727480320 pg_catalog.pg_attrdef| 3800072192 pg_catalog.pg_depend | 2665930752 pg_catalog.pg_class | 1508925440 pg_catalog.pg_type | 1113038848 public.att_claims| 451698688 public.stgrades | 127639552 pg_catalog.pg_index | 107806720 Google returns this page: http://www.postgresql.org/docs/9.1/static/catalog-pg-attribute.html which doesn't help me much. So, am I doing something wrong with admin? Our current process is that every night in the middle of the night, a script connects to each database on each server and runs a query to get all tables in each database and, for each, run "VACUUM ANALYZE $table" for each table in the database. And then once a week: psql -U postgres -c "\l" | grep -Po "(\w+444)" | xargs -t -i psql -U postgres {} -c "REINDEX DATABASE {};" (note: there is a database for the "postgres" user on each DB server) The script is a remnant from PG 8.x days, so am I missing something fundamental about 9.x? I will note that autovacuum is off because it occasionally causes transactions and queries to hang when an update causes a vacuum mid-day, effectively taking us offline randomly. Our scenario is pretty much a worst-possible case of transactions, prepared transactions, temp tables, and concurrent read/write queries. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On Tue, Nov 6, 2012 at 11:28 AM, Tom Lane wrote: > Jeff Janes writes: >> On Tue, Nov 6, 2012 at 10:49 AM, Lists wrote: >>> I followed your example, the result is at the bottom. Based on this it would >>> seem that there are 3-4 databases that seem to be the culprit. How could I >>> get more depth/detail on what specifically is the problem? > >> If you have installed the contrib modules (oid2name specifically), you >> can use that to get the name of the bloated database: >> oid2name | fgrep 607471 > > Or, if you didn't install contrib, try > > select datname from pg_database where oid = 607471 Thanks, I knew there had to be a more direct way to do that. > >> If the name of the database doesn't give you any insight, then look >> for large files in the directory base/607471 that whose names all >> start with the same digits and use oid2name to get the names of the >> relations for those files. > >> oid2name -d -o > > For this you can try > > select relname from pg_class where relfilenode = > > Or let the database do the work: > > select relname, pg_relation_size(oid) from pg_class order by 2 desc; Ben described using something like this method originally and not finding the space, so I wanted to work backwards from certain knowledge of where the OS says the space is being used. But now I think maybe his scripts to aggregate table sizes over all databases (and also his script to load pg_dumps of those databases into a new cluster) are accidentally omitting some databases--the largest ones. Is there a simple query for a super-user to get a list of all relation sizes over all databases cluster-wide? If "\l+" can get the size of databases other than the one currently connected to, maybe there is a way to extend that to tables in those other databases. It would at least be nice to be able to get the sizes of all databases. Since '\l+' doesn't sort by size and I don't know how to make it do so, I pulled the query from psql source code and modified it: SELECT d.datname as Name, pg_catalog.pg_get_userbyid(d.datdba) as Owner, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname)) ELSE 'No Access' END as Size FROM pg_catalog.pg_database d order by CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE NULL END desc nulls first; (And discovered a long forgotten unused database I had sitting around taking up space) Ben, did you ever figure out where the space was going? Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
Jeff Janes writes: > On Tue, Nov 6, 2012 at 10:49 AM, Lists wrote: >> I followed your example, the result is at the bottom. Based on this it would >> seem that there are 3-4 databases that seem to be the culprit. How could I >> get more depth/detail on what specifically is the problem? > If you have installed the contrib modules (oid2name specifically), you > can use that to get the name of the bloated database: > oid2name | fgrep 607471 Or, if you didn't install contrib, try select datname from pg_database where oid = 607471 > If the name of the database doesn't give you any insight, then look > for large files in the directory base/607471 that whose names all > start with the same digits and use oid2name to get the names of the > relations for those files. > oid2name -d -o For this you can try select relname from pg_class where relfilenode = Or let the database do the work: select relname, pg_relation_size(oid) from pg_class order by 2 desc; regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On Tue, Nov 6, 2012 at 10:49 AM, Lists wrote: > I followed your example, the result is at the bottom. Based on this it would > seem that there are 3-4 databases that seem to be the culprit. How could I > get more depth/detail on what specifically is the problem? If you have installed the contrib modules (oid2name specifically), you can use that to get the name of the bloated database: oid2name | fgrep 607471 If the name of the database doesn't give you any insight, then look for large files in the directory base/607471 that whose names all start with the same digits and use oid2name to get the names of the relations for those files. oid2name -d -o Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
I followed your example, the result is at the bottom. Based on this it would seem that there are 3-4 databases that seem to be the culprit. How could I get more depth/detail on what specifically is the problem? -Ben On 11/05/2012 07:10 PM, Scott Marlowe wrote: What does du -sh have to say about it? Use unix tools to examine your file system and see where the usage is going. For instance, I can do this: cd /var/lib/postgresql/8.4/main/ du -s *|sort -n 0 server.crt 0 server.key 4 pg_tblspc 4 pg_twophase 4 PG_VERSION 4 postmaster.opts 4 postmaster.pid 12 pg_clog 12 pg_stat_tmp 12 pg_subtrans 28 pg_multixact 460 global 16392 pg_xlog 16396 base which tells me that I'm using about 16MB for each pg_xlog and base. I can then do cd into base and look around: cd base du -s *|sort -n 5416 1 5416 11563 5560 11564 Which shows me using about 5MB each for three different dbs. And so on. On an off guess, did you go from a SQL_ASCII encoding to UTF8? That might increase disk space usage a bit. [root@delta ~]# cd /var/lib/pgsql/9.1/data/ [root@delta data]# du -s * | sort -n 4 pg_ident.conf 4 pg_serial 4 pg_tblspc 4 PG_VERSION 4 postmaster.opts 4 postmaster.pid 8 pg_hba.conf 12 pg_notify 12 pg_twophase 20 postgresql.300 20 postgresql.conf 20 postgresql.conf.20120903 20 postgresql.conf.300 76 pg_subtrans 104 pg_multixact 15044 pg_log 18184 global 25216 pg_stat_tmp 47916 pg_clog 671916 pg_xlog 164753204 base [root@delta data]# cd base [root@delta base]# du -s * | sort -n 4 pgsql_tmp 612412772 638812780 64241 72424 331506 72700 160676 72896 391655 73200 52389 73216 523672 74104 619675 74956 295646 76768 307580 77896 547597 80824 571547 87368 475799 90940 631604 113876 124651 123548 148525 130096 367533 149792 439726 173648 355578 175404 679545 190732 559580 225780 511706 326468 667547 352736 655477 398736 535644 469408 136582 483716 499753 513124 270926 575612 715601 590408 487780 04 463779 713208 643540 714896 583515 803216 343438 806952 427663 855156 739506 872200 197221 975692 64371 987692 775594 1005268 595488 1024812 691482 1042212 727552 1047464 379566 1260044 76601 1276756 16384 1345072 403667 1474468 209158 1477808 172604 1536168 221124 1637652 258798 1811504 88598 1963740 245588 2076748 703467 2193536 415671 2430908 801322 2552640 319552 2785212 28315 3454880 112612 3755548 451666 3929420 100666 4651876 40451 5714940 751514 6257740 233293 7313900 184735 9334796 763606 10940780283609 20837264788338 45285640607471 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
Jeff, thanks for the feedback! On 11/05/2012 08:51 PM, Jeff Janes wrote: My first guesses would be things that are not directly under the databases control, such as: 1) your server logs are accumulating and you aren't doing anything about them I'm guessing that this is not the case: [root@delta data]# du -shc * | grep -i log 47M pg_clog 15M pg_log 641Mpg_xlog 2) you are taking backup snapshots to somewhere in that directory and not cleaning them up Our backup snapshots (taken with pg_dump) are taken on a different server over the network. Dumps are made several times during each day. Could this be part of the problem if (somehow) they didn't complete? And if so, would there be some cleanup I'd have to do other than restarting PG? 3) your archive_command is failing (which you should see reports of in the server logs) and so you are accumulating xlog files. As I understand things, the result above under 1) demonstrates that this, also, is not the cause. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On Mon, Nov 5, 2012 at 7:01 PM, Lists wrote: > We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB > servers with more disk space and memory. Unexpectedly, the DB servers have > steadily increased their disk space usage since. Reported system load > doesn't seem to be affected. It's happening to all our DB servers running > 9.1. > > When we reload all pg_dumps from our worst-affected server into an offline > server, the disk space usage is about 26 GB, but the production database is > using 166 GB. (# df /var/lib/pgsql;) My first guesses would be things that are not directly under the databases control, such as: 1) your server logs are accumulating and you aren't doing anything about them 2) you are taking backup snapshots to somewhere in that directory and not cleaning them up 3) your archive_command is failing (which you should see reports of in the server logs) and so you are accumulating xlog files. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage
On Mon, Nov 5, 2012 at 8:01 PM, Lists wrote: > We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB > servers with more disk space and memory. Unexpectedly, the DB servers have > steadily increased their disk space usage since. Reported system load > doesn't seem to be affected. It's happening to all our DB servers running > 9.1. > > When we reload all pg_dumps from our worst-affected server into an offline > server, the disk space usage is about 26 GB, but the production database is > using 166 GB. (# df /var/lib/pgsql;) > > To resolve this, we've tried: > > 1) reindexed everything (cut about 10% of disk usage temporarily) > > 2) tried vacuum full, and vacuum analyze on all databases. (to minimal > effect) > > 3) Restarting PG (no discernable effect) including a full stop/start. > > 4) We've looked for stale prepared transactions (none found) > > 5) instructions from the wiki to try to determine what the cause of all the > disk usage is: http://wiki.postgresql.org/wiki/Disk_Usage but when we add up > all the results for all the different databases, tables, indexes, etc. in a > script, we get a number very close to the usage of the freshly loaded > server. (24 GB) What does du -sh have to say about it? Use unix tools to examine your file system and see where the usage is going. For instance, I can do this: cd /var/lib/postgresql/8.4/main/ du -s *|sort -n 0 server.crt 0 server.key 4 pg_tblspc 4 pg_twophase 4 PG_VERSION 4 postmaster.opts 4 postmaster.pid 12 pg_clog 12 pg_stat_tmp 12 pg_subtrans 28 pg_multixact 460 global 16392 pg_xlog 16396 base which tells me that I'm using about 16MB for each pg_xlog and base. I can then do cd into base and look around: cd base du -s *|sort -n 54161 541611563 556011564 Which shows me using about 5MB each for three different dbs. And so on. On an off guess, did you go from a SQL_ASCII encoding to UTF8? That might increase disk space usage a bit. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Unexpectedly high disk space usage
We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB servers with more disk space and memory. Unexpectedly, the DB servers have steadily increased their disk space usage since. Reported system load doesn't seem to be affected. It's happening to all our DB servers running 9.1. When we reload all pg_dumps from our worst-affected server into an offline server, the disk space usage is about 26 GB, but the production database is using 166 GB. (# df /var/lib/pgsql;) To resolve this, we've tried: 1) reindexed everything (cut about 10% of disk usage temporarily) 2) tried vacuum full, and vacuum analyze on all databases. (to minimal effect) 3) Restarting PG (no discernable effect) including a full stop/start. 4) We've looked for stale prepared transactions (none found) 5) instructions from the wiki to try to determine what the cause of all the disk usage is: http://wiki.postgresql.org/wiki/Disk_Usage but when we add up all the results for all the different databases, tables, indexes, etc. in a script, we get a number very close to the usage of the freshly loaded server. (24 GB) What is Postgres doing with ~ 80% of its disk space usage? This is not normal, is it? I would hate to have to take the servers off line just to dump/restore in order to bring disk usage back to normal... SYSTEM SPECS: I've attached the postgresql.conf on a RHEL6/64 Linux server with 128 GB Of RAM and 16 real CPU cores. (HT turned on, 32 CPUs according to the O/S) #FROM: sysctl.conf: # Controls the maximum shared segment size, in bytes kernel.shmmax = 136365211648 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296 -Ben # - # PostgreSQL configuration file # - # # This file consists of lines of the form: # # name = value # # (The "=" is optional.) Whitespace may be used. Comments are introduced with # "#" anywhere on a line. The complete list of parameter names and allowed # values can be found in the PostgreSQL documentation. # # The commented-out settings shown in this file represent the default values. # Re-commenting a setting is NOT sufficient to revert it to the default value; # you need to reload the server. # # This file is read on server startup and when the server receives a SIGHUP # signal. If you edit the file on a running system, you have to SIGHUP the # server for the changes to take effect, or use "pg_ctl reload". Some # parameters, which are marked below, require a server shutdown and restart to # take effect. # # Any parameter can also be given as a command-line option to the server, e.g., # "postgres -c log_connections=on". Some parameters can be changed at run time # with the "SET" SQL command. # # Memory units: kB = kilobytesTime units: ms = milliseconds #MB = megabytes s = seconds #GB = gigabytes min = minutes # h = hours # d = days #-- # FILE LOCATIONS #-- # The default values of these variables are driven from the -D command-line # option or PGDATA environment variable, represented here as ConfigDir. #data_directory = 'ConfigDir' # use data in another directory # (change requires restart) #hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file # (change requires restart) #ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file # (change requires restart) # If external_pid_file is not explicitly set, no extra PID file is written. #external_pid_file = '(none)' # write an extra PID file # (change requires restart) #-- # CONNECTIONS AND AUTHENTICATION #-- # - Connection Settings - listen_addresses = '*' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost', '*' = all # (change requires restart) port = 5432 # (change requires restart) # max_connections = 150 # (change requires restart) # Note: Increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). #superuser_reserved_connections = 3 # (change requires restart) #unix_socket_directory = '' # (change requires restart) #un