Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-12 Thread Lists

On 11/12/2012 01:31 PM, Jeff Janes wrote:

On Mon, Nov 12, 2012 at 10:38 AM, Lists  wrote:

On 11/10/2012 02:21 PM, Jeff Janes wrote:

On Fri, Nov 9, 2012 at 4:28 PM, Lists  wrote:


2) It was sheer chance that I discovered the need to reindex prior to
vacuum
in order to get the disk space back.

As of 9.0, a "vacuum full" inherently does a reindex, so doing an
explicit one is neither necessary nor beneficial.

I don't know if your discovery is based on a non-full vacuum, or on an
older server.


I can only state that merely doing a "vacuum full" or "vacuum full $tables"
sequentially did not free the space, whereas the sequential  reindex $table,
each followed immediately by a vacuum full $table) did.

With what version?

[root@alpha ~]# rpm -qi postgresql91-server
Name: postgresql91-server  Relocations: (not relocatable)
Version : 9.1.5 Vendor: (none)
Release : 3PGDG.rhel6   Build Date: Sun 02 Sep 2012 
12:13:18 PM UTC
Install Date: Wed 12 Sep 2012 03:04:41 AM UTC  Build Host: 
koji-sl6-x86-64-pg91
Group   : Applications/DatabasesSource RPM: 
postgresql91-9.1.5-3PGDG.rhel6.src.rpm

Size: 15191132 License: PostgreSQL
Signature   : DSA/SHA1, Sun 02 Sep 2012 12:13:24 PM UTC, Key ID 
1f16d2e1442df0f8

URL : http://www.postgresql.org/





If you'd like I can
easily recreate the scenario by simply not "cleaning up" one of the DB
servers until it bloats up and make available (limit distribution) a binary
copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at
night) in order to help identify why it didn't work as expected.

Do you think can make an easily script-able way to re-create the
resistant bloat?  That would be better than trying to disseminate
binary files, I think.
It would only be better if it actually created the situation that caused 
the space to not be freed. But, until you know the actual cause of a 
problem, I've found that it's often not productive to create simulations 
that may or may not be actually related to the problem.




What I did was just create and drop temp tables in a tight loop, with
autovacuum off, and then once pg_attribute got good and bloated, did a
vacuum full as the database owner or superuser.
Based on my understanding, if your loop included an intermittent schema 
change from within a transaction it might better approximate my actual 
scenario. Merely creating temp tables and then dropping them would 
create lots of activity "at the end" of the table which would free 
correctly. This still does not explain why reindex $table works when 
reindex is supposedly implicit in the vacuum.



If all of your long-lived objects were created before pg_attribute got
bloated and so the bloat was due only to short-lived objects, then
non-full vacuum (if run often enough) should eventually be able to
return that space as the short-lived objects near the end start to go
away.  However, if even a single long-live object finds itself at the
end of the table, then only a vacuum full will ever be able to reclaim
that space.


Since the time period involved (weeks/months) would have included both a
large number of created/destroyed temp tables and occasionally altered
persistent objects it would appear that the full option a very good idea, at
least periodically.

If you can prevent the extreme bloat from occurring in the first
place, then the "end" of the table would not be so far away from its
desired size that it needs to get reset by a vacuum full.

If you find your self in need of a vacuum full, then you should do
one.  But you should ask yourself what went wrong that you got into
that situation in the first place.

I agree; this is why my questions on enabling autovacuum in a related 
thread.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-12 Thread Jeff Janes
On Mon, Nov 12, 2012 at 10:38 AM, Lists  wrote:
> On 11/10/2012 02:21 PM, Jeff Janes wrote:
>>
>> On Fri, Nov 9, 2012 at 4:28 PM, Lists  wrote:
>>
>>>
>>> 2) It was sheer chance that I discovered the need to reindex prior to
>>> vacuum
>>> in order to get the disk space back.
>>
>> As of 9.0, a "vacuum full" inherently does a reindex, so doing an
>> explicit one is neither necessary nor beneficial.
>>
>> I don't know if your discovery is based on a non-full vacuum, or on an
>> older server.
>
>
> I can only state that merely doing a "vacuum full" or "vacuum full $tables"
> sequentially did not free the space, whereas the sequential  reindex $table,
> each followed immediately by a vacuum full $table) did.

With what version?


> If you'd like I can
> easily recreate the scenario by simply not "cleaning up" one of the DB
> servers until it bloats up and make available (limit distribution) a binary
> copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at
> night) in order to help identify why it didn't work as expected.

Do you think can make an easily script-able way to re-create the
resistant bloat?  That would be better than trying to disseminate
binary files, I think.

What I did was just create and drop temp tables in a tight loop, with
autovacuum off, and then once pg_attribute got good and bloated, did a
vacuum full as the database owner or superuser.

>>
>> If all of your long-lived objects were created before pg_attribute got
>> bloated and so the bloat was due only to short-lived objects, then
>> non-full vacuum (if run often enough) should eventually be able to
>> return that space as the short-lived objects near the end start to go
>> away.  However, if even a single long-live object finds itself at the
>> end of the table, then only a vacuum full will ever be able to reclaim
>> that space.
>>
>
> Since the time period involved (weeks/months) would have included both a
> large number of created/destroyed temp tables and occasionally altered
> persistent objects it would appear that the full option a very good idea, at
> least periodically.

If you can prevent the extreme bloat from occurring in the first
place, then the "end" of the table would not be so far away from its
desired size that it needs to get reset by a vacuum full.

If you find your self in need of a vacuum full, then you should do
one.  But you should ask yourself what went wrong that you got into
that situation in the first place.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-12 Thread Lists

On 11/10/2012 02:21 PM, Jeff Janes wrote:

On Fri, Nov 9, 2012 at 4:28 PM, Lists  wrote:


...


3) For each of the tables from #2, run the commands
REINDEX TABLE $table;
VACUUM FULL ANALYZE $table;

The end result is a squeaky-clean database server with expected disk usage.

NOTES:

...


2) It was sheer chance that I discovered the need to reindex prior to vacuum
in order to get the disk space back.

As of 9.0, a "vacuum full" inherently does a reindex, so doing an
explicit one is neither necessary nor beneficial.

I don't know if your discovery is based on a non-full vacuum, or on an
older server.


I can only state that merely doing a "vacuum full" or "vacuum full 
$tables" sequentially did not free the space, whereas the sequential  
reindex $table, each followed immediately by a vacuum full $table) did. 
If you'd like I can easily recreate the scenario by simply not "cleaning 
up" one of the DB servers until it bloats up and make available (limit 
distribution) a binary copy of the database (EG: rsync the 
/var/lib/pgsql/ filesystem late at night) in order to help identify why 
it didn't work as expected.





5) I don't yet know if the "full" option for the vacuum is necessary to free
up all space. I will experiment with this and post results if useful.

The answer to this is mostly non-deterministic.  non-full vacuum can
only free space from the "end" of the table.

If all of your long-lived objects were created before pg_attribute got
bloated and so the bloat was due only to short-lived objects, then
non-full vacuum (if run often enough) should eventually be able to
return that space as the short-lived objects near the end start to go
away.  However, if even a single long-live object finds itself at the
end of the table, then only a vacuum full will ever be able to reclaim
that space.



Since the time period involved (weeks/months) would have included both a 
large number of created/destroyed temp tables and occasionally altered 
persistent objects it would appear that the full option a very good 
idea, at least periodically.


-Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-11 Thread Lists

On 11/09/2012 05:26 PM, Steve Crawford wrote:
Bloat in pg_attribute would correlate with A) (or any constant 
creation/destruction of tables). You can vacuum and/or reindex the 
system tables if you are connected as the superuser but you are better 
off preventing bloat by appropriate adjustment of your configuration 
settings. However note that if you do frequent bulk 
creation/destruction of tables you could end up bloating the attribute 
table between vacuum runs and may need to periodically manually shrink 
it.





Steve,

Our system divides customers into distinct databases, however customers 
are often clustered. (Think: different locations of 7/11) and so we have 
to aggregate data from different databases. We do this with dblink to 
get the data and temp tables to collate it, which appears to be a cause 
of the bloat we're seeing.


-Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-10 Thread Adrian Klaver

On 11/10/2012 02:23 PM, Scott Marlowe wrote:




When in doubt there are the docs:)

http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM

"The "autovacuum daemon" actually consists of multiple processes. There is a
persistent daemon process, called the autovacuum launcher, which is in
charge of starting autovacuum worker processes for all databases. The
launcher will distribute the work across time, attempting to start one
worker within each database every autovacuum_naptime seconds. (Therefore, if
the installation has N databases, a new worker will be launched every
autovacuum_naptime/N seconds.)"


And apparently it wasn't always this way:

http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html#AUTOVACUUM

The autovacuum daemon, when enabled, runs every autovacuum_naptime
seconds. On each run, it selects one database to process and checks
each table within that database. VACUUM or ANALYZE commands are issued
as needed.


Seems to have changed in 8.3:

http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html

"Beginning in PostgreSQL 8.3, autovacuum has a multiprocess 
architecture: There is a daemon process, called the autovacuum launcher, 
which is in charge of starting autovacuum worker processes for all 
databases. The launcher will distribute the work across time, but 
attempt to start one worker on each database every autovacuum_naptime 
seconds. One worker will be launched for each database, with a maximum 
of autovacuum_max_workers processes running at the same time..."








--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-10 Thread Scott Marlowe
On Sat, Nov 10, 2012 at 3:20 PM, Adrian Klaver  wrote:
> On 11/10/2012 02:08 PM, Scott Marlowe wrote:
>>
>> On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes  wrote:
>>>
>>> On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe 
>>> wrote:


 As well, since the default nap time is 1 minute, it will take at least
 50 minutes to vacuum each db as nap time is how long autovac waits
 between databases.
>>>
>>>
>>> That isn't how it works.  The naptime is per database, not per
>>> cluster.  If the naptime is 1 minute and there are 50 "active"
>>> databases, then it will launch a new worker every 1.2 seconds
>>> (assuming the old ones finish fast enough that doing so would not
>>> exceed autovacuum_max_workers)
>>
>>
>> Hmmm.  That was not my understanding from previous discussions on nap
>> time.
>>
>>
>
>
> When in doubt there are the docs:)
>
> http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM
>
> "The "autovacuum daemon" actually consists of multiple processes. There is a
> persistent daemon process, called the autovacuum launcher, which is in
> charge of starting autovacuum worker processes for all databases. The
> launcher will distribute the work across time, attempting to start one
> worker within each database every autovacuum_naptime seconds. (Therefore, if
> the installation has N databases, a new worker will be launched every
> autovacuum_naptime/N seconds.)"

And apparently it wasn't always this way:

http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html#AUTOVACUUM

The autovacuum daemon, when enabled, runs every autovacuum_naptime
seconds. On each run, it selects one database to process and checks
each table within that database. VACUUM or ANALYZE commands are issued
as needed.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-10 Thread Jeff Janes
On Fri, Nov 9, 2012 at 4:28 PM, Lists  wrote:


...

> 3) For each of the tables from #2, run the commands
> REINDEX TABLE $table;
> VACUUM FULL ANALYZE $table;
>
> The end result is a squeaky-clean database server with expected disk usage.
>
> NOTES:
...
>
>
> 2) It was sheer chance that I discovered the need to reindex prior to vacuum
> in order to get the disk space back.

As of 9.0, a "vacuum full" inherently does a reindex, so doing an
explicit one is neither necessary nor beneficial.

I don't know if your discovery is based on a non-full vacuum, or on an
older server.


> 5) I don't yet know if the "full" option for the vacuum is necessary to free
> up all space. I will experiment with this and post results if useful.

The answer to this is mostly non-deterministic.  non-full vacuum can
only free space from the "end" of the table.

If all of your long-lived objects were created before pg_attribute got
bloated and so the bloat was due only to short-lived objects, then
non-full vacuum (if run often enough) should eventually be able to
return that space as the short-lived objects near the end start to go
away.  However, if even a single long-live object finds itself at the
end of the table, then only a vacuum full will ever be able to reclaim
that space.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-10 Thread Adrian Klaver

On 11/10/2012 02:08 PM, Scott Marlowe wrote:

On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes  wrote:

On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe  wrote:


As well, since the default nap time is 1 minute, it will take at least
50 minutes to vacuum each db as nap time is how long autovac waits
between databases.


That isn't how it works.  The naptime is per database, not per
cluster.  If the naptime is 1 minute and there are 50 "active"
databases, then it will launch a new worker every 1.2 seconds
(assuming the old ones finish fast enough that doing so would not
exceed autovacuum_max_workers)


Hmmm.  That was not my understanding from previous discussions on nap time.





When in doubt there are the docs:)

http://www.postgresql.org/docs/9.1/interactive/routine-vacuuming.html#AUTOVACUUM

"The "autovacuum daemon" actually consists of multiple processes. There 
is a persistent daemon process, called the autovacuum launcher, which is 
in charge of starting autovacuum worker processes for all databases. The 
launcher will distribute the work across time, attempting to start one 
worker within each database every autovacuum_naptime seconds. 
(Therefore, if the installation has N databases, a new worker will be 
launched every autovacuum_naptime/N seconds.)"


--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: PG defaults and performance (was Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum))

2012-11-10 Thread Chris Angelico
On Sun, Nov 11, 2012 at 8:05 AM, Jeff Janes  wrote:
> Totally not.  With default settings and default pgbench, the easiest
> way for host B to beat host A is by lying about the durability of
> fsync.

True. Without the ability to brutally cut the power to a cloud
instance or other remote (and in some cases possibly virtualized)
server, it's practically impossible to test that. We're basically
relying on replication and hoping that three instances (master and two
slaves) don't go down simultaneously, which is hardly a guarantee.

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-10 Thread Scott Marlowe
On Sat, Nov 10, 2012 at 2:12 PM, Jeff Janes  wrote:
> On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe  wrote:
>>
>> As well, since the default nap time is 1 minute, it will take at least
>> 50 minutes to vacuum each db as nap time is how long autovac waits
>> between databases.
>
> That isn't how it works.  The naptime is per database, not per
> cluster.  If the naptime is 1 minute and there are 50 "active"
> databases, then it will launch a new worker every 1.2 seconds
> (assuming the old ones finish fast enough that doing so would not
> exceed autovacuum_max_workers)

Hmmm.  That was not my understanding from previous discussions on nap time.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-10 Thread Jeff Janes
On Fri, Nov 9, 2012 at 5:56 PM, Scott Marlowe  wrote:
>
> As well, since the default nap time is 1 minute, it will take at least
> 50 minutes to vacuum each db as nap time is how long autovac waits
> between databases.

That isn't how it works.  The naptime is per database, not per
cluster.  If the naptime is 1 minute and there are 50 "active"
databases, then it will launch a new worker every 1.2 seconds
(assuming the old ones finish fast enough that doing so would not
exceed autovacuum_max_workers)

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: PG defaults and performance (was Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum))

2012-11-10 Thread Jeff Janes
On Fri, Nov 9, 2012 at 6:17 PM, Chris Angelico  wrote:
> On Sat, Nov 10, 2012 at 12:26 PM, Steve Crawford
>  wrote:
>> Don't do that. Defaults are good for ensuring that PostgreSQL will start on
>> the widest reasonable variety of systems. They are *terrible* for
>> performance and are certainly wrong for the system you describe.
>
> Tuning a PostgreSQL database is a major science, but is there a
> reasonably easy way to get a stable baseline for comparison? We've
> been exploring different hosting options recently, and one thing we
> want to know is how well Postgres will perform. To that end, we've
> been using pgbench on a default configuration Postgres, on the
> expectation that that'll at least be consistent (that is, if a Cloud
> Host A instance does X tps and Cloud Host B does 2*X, then we can
> expect host B to deliver roughly double performance in production).
> How valid is this assumption? Broadly, or totally not?

Totally not.  With default settings and default pgbench, the easiest
way for host B to beat host A is by lying about the durability of
fsync.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-09 Thread Greg Williamson
Scott -- 

<...>

>To tune autovacuum with 50 databases, start by dropping nap time to
>something much lower, like 10s.  Then if you need to, drop cost delay
>until you get to 0.  If you get to 0 and it's still not hitting your
>IO too hard, but not keeping up, then increase cost limit.  If you get
>to something in the 5000 to 1 range, and its still not keeping up
>then start bumping the thread count
>


Thanks for outlining a strategy on this -- useful advice.

Greg Williamson



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


PG defaults and performance (was Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum))

2012-11-09 Thread Chris Angelico
On Sat, Nov 10, 2012 at 12:26 PM, Steve Crawford
 wrote:
> Don't do that. Defaults are good for ensuring that PostgreSQL will start on
> the widest reasonable variety of systems. They are *terrible* for
> performance and are certainly wrong for the system you describe.

Tuning a PostgreSQL database is a major science, but is there a
reasonably easy way to get a stable baseline for comparison? We've
been exploring different hosting options recently, and one thing we
want to know is how well Postgres will perform. To that end, we've
been using pgbench on a default configuration Postgres, on the
expectation that that'll at least be consistent (that is, if a Cloud
Host A instance does X tps and Cloud Host B does 2*X, then we can
expect host B to deliver roughly double performance in production).
How valid is this assumption? Broadly, or totally not?

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-09 Thread Scott Marlowe
On Fri, Nov 9, 2012 at 5:28 PM, Lists  wrote:
> As I've spent a considerable amount of time trying to sort this out, I'm
> posting it for the benefit other users.
SNIP

> D) concurrent use of pg_dump;

Not usually a problem, unless it's overloading your IO subsystem.

> C) use of transactions, especially prepared transactions and multiple
> savepoints;
> E) use of numerous databases on a single server, average about 50;

These two can be killers.  Long running transactions can cause
autovacuum processes to stall out or be autocancelled.

As well, since the default nap time is 1 minute, it will take at least
50 minutes to vacuum each db as nap time is how long autovac waits
between databases.

Reducing autovacuum nap time to 5 or 10 seconds would be a good move
here, also possibly making it more aggressive by increasing max worker
threads, decreasing cost delay (possibly to zero or close to it) and /
or increasing cost limit.  After making such a change then watching
iostat when vacuum is running to see how hard its hitting your IO
subsystem.  I'm guessing that with SSDs it isn't gonna be a big
problem.

As Greg Smith has pointed out in the past, usually the answer to an
autovacuum problem is making it more, not less aggressive.  Unless
you're flooding your IO this is almost always the right answer.  Keep
in mind that autovacuum by default is setup to be VERY unaggressive
because it may be running on a netbook for all it knows.

To tune autovacuum with 50 databases, start by dropping nap time to
something much lower, like 10s.  Then if you need to, drop cost delay
until you get to 0.  If you get to 0 and it's still not hitting your
IO too hard, but not keeping up, then increase cost limit.  If you get
to something in the 5000 to 1 range, and its still not keeping up
then start bumping the thread count


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-09 Thread Steve Crawford

On 11/09/2012 04:28 PM, Lists wrote:
As I've spent a considerable amount of time trying to sort this out, 
I'm posting it for the benefit other users.


Having missed the earlier conversationa couple comments:

I've experienced persistent, ongoing issues with autovacuum in a mixed 
read/write environment with midrange hardware (16 core Xeon, 128 GB 
RAM, 200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults 
in the 9.1 RPMs provided by Postgres. (yum.postgresql.org)


Don't do that. Defaults are good for ensuring that PostgreSQL will start 
on the widest reasonable variety of systems. They are *terrible* for 
performance and are certainly wrong for the system you describe.


The cause of this is not yet determined. It may be related to the any 
or all of the combination of:


A) extensive use of temp tables;
B) extensive use of multiple dblink() calls in a single query;
C) use of transactions, especially prepared transactions and 
multiple savepoints;

D) concurrent use of pg_dump;
E) use of numerous databases on a single server, average about 50;

To offset this, we turned off autovacuum, and used an old script to 
vacuum the tables in the middle of the night when nobody was looking. 
Unfortunately, the vacuum script only vacuumed the "userland" tables 
and tremendous amounts of disk space were being wasted, particularly 
in the pg_attribute tables.


Bloat in pg_attribute would correlate with A) (or any constant 
creation/destruction of tables). You can vacuum and/or reindex the 
system tables if you are connected as the superuser but you are better 
off preventing bloat by appropriate adjustment of your configuration 
settings. However note that if you do frequent bulk creation/destruction 
of tables you could end up bloating the attribute table between vacuum 
runs and may need to periodically manually shrink it.


Cheers,
Steve



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-09 Thread Lists
As I've spent a considerable amount of time trying to sort this out, I'm 
posting it for the benefit other users.


I've experienced persistent, ongoing issues with autovacuum in a mixed 
read/write environment with midrange hardware (16 core Xeon, 128 GB RAM, 
200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults in the 
9.1 RPMs provided by Postgres. (yum.postgresql.org) The cause of this is 
not yet determined. It may be related to the any or all of the 
combination of:


A) extensive use of temp tables;
B) extensive use of multiple dblink() calls in a single query;
C) use of transactions, especially prepared transactions and 
multiple savepoints;

D) concurrent use of pg_dump;
E) use of numerous databases on a single server, average about 50;

To offset this, we turned off autovacuum, and used an old script to 
vacuum the tables in the middle of the night when nobody was looking. 
Unfortunately, the vacuum script only vacuumed the "userland" tables and 
tremendous amounts of disk space were being wasted, particularly in the 
pg_attribute tables.


However, use of any of the statements "vacuum analyze", "vacuum full 
analyze", "vacuum full verbose analyze" without mentioning specific 
tables did not resolve the extra disk space used issue, disk usage still 
remained at least 5x the expected amount in all cases. (in one case, use 
of all of these open-ended vacuum queries did almost nothing)


Nor did running any variation of "vacuum analyze $table" in a loop thru 
all tables (including the pg_* tables) completely resolve the issue, 
either.


In order to completely clean things up, we ended up writing a script do 
the following:


1) Determine the databases using excessive disk space, in descending 
order of use with this query:


SELECT
d.datname as Name,
d.datistemplate::int AS datistemplate,
pg_catalog.pg_get_userbyid(d.datdba) as Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE -1
END as Size
FROM pg_catalog.pg_database d
order by
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END desc nulls first;

2) For each database from #1, get a list of tables to be cleaned up with 
this query:


SELECT
nspname || '.' || relname AS "table",
pg_total_relation_size(C.oid) AS "size"
FROM
pg_class C
LEFT JOIN pg_namespace N ON
(N.oid = C.relnamespace) WHERE (true or (nspname NOT IN 
('pg_catalog', 'information_schema')))

AND C.relkind = 'r'
AND nspname !~ '^pg_toast'
ORDER BY
pg_total_relation_size(C.oid) DESC;

3) For each of the tables from #2, run the commands
REINDEX TABLE $table;
VACUUM FULL ANALYZE $table;

The end result is a squeaky-clean database server with expected disk usage.

NOTES:

1) The above queries are derived from queries found to determine how 
much disk space was used, even though the additional information 
provided isn't actually used by the script.


2) It was sheer chance that I discovered the need to reindex prior to 
vacuum in order to get the disk space back.


3) I'd like to get autovacuum to work. I've read suggestions to tweak 
cost_delay and/or cost_limit. I haven't yet determined if the problem is 
I/O based or lock/deadlock based. I'm guessing the problem is the 
latter, though it's hard to tell because queries stack up quickly and 
load average is sky high when autovacuum fails for us.


4) The aforementioned process is S-L-O-W. Think at least hours and 
probably days depending on your databases, your server(s), and the load.


5) I don't yet know if the "full" option for the vacuum is necessary to 
free up all space. I will experiment with this and post results if useful.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-08 Thread Scott Marlowe
On Thu, Nov 8, 2012 at 6:05 PM, Lists  wrote:
> On 11/07/2012 12:42 PM, Tom Lane wrote:
>>>
>>> ... because it
>>> >occasionally causes transactions and queries to hang when an update
>>> >causes a vacuum mid-day, effectively taking us offline randomly.
>>
>> I suspect this claim is based on ancient and no longer very relevant
>> experience.
>
> Even so, if I felt the need to keep autovacuum off, what would I need to run
> regularly in order to keep things neat and tidy under the hood? Would a
> simple "vacuum" within each database suffice? Should I be logged in as the
> database owner or as an administrative user?

Just know that most of the time people think they need to turn off
autovacuum they usually need to tune it instead.  either more or less
agressive depending on why they think they need to turn it off.  If
it's consuming too much IO then reduce cost limit / increase cost
delay, if it's not aggressive enough, then reverse that and increase
cost limit and decrease cost delay.  If your IO subsystem can't keep
up, then turning off autovacuum or turning it down simply be delaying
the problem rather than solving it (i.e. throw more IO at it).


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-08 Thread Tom Lane
Lists  writes:
> Even so, if I felt the need to keep autovacuum off, what would I need to 
> run regularly in order to keep things neat and tidy under the hood? 
> Would a simple "vacuum" within each database suffice? Should I be logged 
> in as the database owner or as an administrative user?

A plain "vacuum" (or probably better, "vacuum analyze") done as
superuser will suffice, as long as you do it often enough.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-08 Thread Lists

On 11/07/2012 12:42 PM, Tom Lane wrote:

... because it
>occasionally causes transactions and queries to hang when an update
>causes a vacuum mid-day, effectively taking us offline randomly.

I suspect this claim is based on ancient and no longer very relevant
experience.
Even so, if I felt the need to keep autovacuum off, what would I need to 
run regularly in order to keep things neat and tidy under the hood? 
Would a simple "vacuum" within each database suffice? Should I be logged 
in as the database owner or as an administrative user?


Thanks,

Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Scott Marlowe
On Wed, Nov 7, 2012 at 3:15 PM, Lists  wrote:
> On 11/07/2012 12:42 PM, Tom Lane wrote:
>>
>> So you've turned off autovacuum, and are carefully not vacuuming the
>> system catalogs.  That's your problem all right.  Is there a
>> particularly good reason why this script isn't a one-liner "VACUUM"?
>
>
> Back in the 8.x days, we experienced "vacuum full analyze" occasionally
> causing other processes to hang/timeout.

That was your first mistake.  By 8.0 the need for vacuum full was
almost zero. Except for instances where bloat got out of hand, vacuum
full should generally be avoided after 8.0.  Regular vacuum should be
plenty, whether it's run by autovacuum daemon or a cron job. Until 8.3
autovacuum was single threaded so therefore often had trouble keeping
up with bloat.  While vacuum full is a blocking operation plain
vacuums are not, so unless you REALLY need a vacuum full they should
be avoided.


> In an attempt to minimize the
> impact of the locking, we updated the script to vacuum one table at a time,
> which seemed to work well throughout the 8.x series. I'd happily accept that
> this conclusion may have simply have been wrong, but it worked well enough
> that nobody complained and life was good.

Yeah you still had blocking but it was probably less noticeable.

> After switching to 9.x, we read
> that the "full" vacuum was less useful and so the script was changed to
> "vacuum analyze $table" rather than "vacuum full analyze $table".

Yeah at that point you'd have been better off tuning autovacuum to be
more aggressive and let it do the job.  Generally the time to call
vacuum by hand is right after you've done something like delete half
the rows in a large table.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Scott Marlowe
On Wed, Nov 7, 2012 at 1:58 PM, Scott Marlowe  wrote:
> On Wed, Nov 7, 2012 at 1:42 PM, Tom Lane  wrote:
>> Lists  writes:
>>
>>> ... because it
>>> occasionally causes transactions and queries to hang when an update
>>> causes a vacuum mid-day, effectively taking us offline randomly.
>>
>> I suspect this claim is based on ancient and no longer very relevant
>> experience.
>
> My experience is that if autovac is causing problems with stalled
> queries etc you're either A: running ancient pg versions (pre 8.3), B:
> Running WAY too aggressive settings in autovac (100 threads, no nap
> time, cost limit of 10 etc.) or C: Your IO subsystem is absolute
> crap.
>
> On any modern server, default autovac settings from 8.3 and on should
> only have the possible problem of not being tuned aggressively enough.

Oh another failure scenario up there is that you're running DDL in
production, which is stalling behind an autovac, and in turn the two
are stalling other queries.  This has happened for me once or twice on
more modern versions (8.3 and 8.4)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Tom Lane
Jeff Janes  writes:
>> WHERE nspname NOT IN ('pg_catalog', 'information_schema')
>> AND C.relkind <> 'i'
>> AND nspname !~ '^pg_toast'

> I question the wisdom of that where clause (from the wiki)

> If the pg_catalog relations are big, then they are big and why
> shouldn't they get reported as such?

Agreed, please change it.

(The index and toast exclusions are reasonable, since those will be
accounted for in pg_total_relation_size of the parent.  Personally I'd
code the toast exclusion using relkind not a namespace check though.)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Greg Smith

On 11/7/12 3:58 PM, Jeff Janes wrote:

 WHERE nspname NOT IN ('pg_catalog', 'information_schema')


I question the wisdom of that where clause (from the wiki)

If the pg_catalog relations are big, then they are big and why
shouldn't they get reported as such?  And if they are not big, that is
why the limit is there on the wiki page.


The idea was that in a new database with a relatively small number of 
tables, your own tables will be lost among the catalog data unless you 
filter them out.  Testing against an install with a single real table, 
the query there will show something like this right now:


relation| total_size
+
 public.t   | 3568 kB
 public.t_k_seq | 8192 bytes

But if the filter on pg_catalog is removed, you get this instead:

  relation   | total_size
-+
 public.t| 3568 kB
 pg_catalog.pg_depend| 808 kB
 pg_catalog.pg_proc  | 752 kB
 pg_catalog.pg_attribute | 568 kB
 pg_catalog.pg_rewrite   | 464 kB
 pg_catalog.pg_description   | 392 kB
 pg_catalog.pg_statistic | 328 kB
 pg_catalog.pg_operator  | 208 kB
 pg_catalog.pg_collation | 152 kB
 pg_catalog.pg_type  | 152 kB
 pg_catalog.pg_amop  | 136 kB
 pg_catalog.pg_class | 136 kB
 pg_catalog.pg_constraint| 112 kB
 pg_catalog.pg_conversion| 104 kB
 pg_catalog.pg_index | 88 kB
 pg_catalog.pg_amproc| 80 kB
 pg_catalog.pg_opclass   | 80 kB
 pg_catalog.pg_ts_config_map | 80 kB
 pg_catalog.pg_cast  | 80 kB
 pg_catalog.pg_authid| 72 kB

That is overload for a lot of people, and confusing to new users. 
That's why I opted for the shorter version.


There's no perfect answer to all use cases here.  This sort of thing is 
why there's three sets of queries for pg_stat_user_tables, 
pg_stat_sys_tables, and pg_stat_all_tables.  The wiki disk space queries 
aim to be like the user tables version from that trio.


Adding a note pointing out that you might want to remove pg_catalog and 
see the size of those relations would be appropriate.  I wouldn't make 
that the default case though, due to the issue highlighted above.  I'd 
rather optimize the initially suggested query so that new users get 
simple output, even if it means that might hide problems on larger 
installs, where the catalog data became big.


The other way I sometimes balance these two requirements--want to show 
all the big data, but not clutter small installs with the catalog--is to 
make the filter size-based instead:


SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
AND relpages > 100
ORDER BY pg_total_relation_size(C.oid) DESC 
   LIMIT 20;


On my trivial test install that gives me just the one user table:

 relation | total_size
--+
 public.t | 3568 kB

While still showing larger catalog tables if they grow to be noticeable.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Lists

On 11/07/2012 12:58 PM, Scott Marlowe wrote:
My experience is that if autovac is causing problems with stalled 
queries etc you're either A: running ancient pg versions (pre 8.3), B: 
Running WAY too aggressive settings in autovac (100 threads, no nap 
time, cost limit of 10 etc.) or C: Your IO subsystem is absolute 
crap. On any modern server, default autovac settings from 8.3 and on 
should only have the possible problem of not being tuned aggressively 
enough. 


A) We are running PG 9.1.

B) We used the default settings in the RPMs provided by 
yum.postgresql.org. At the bottom of this message is information about 
the RPMs we currently are using.


C) I/O subsystem for PG is twin SATA III SSDs in a RAID 1 configuration, 
capable of tens of thousands of IO operations per second. Servers are 
recent, SATA III, 16-core Xeons with 128 GB ECC RAM in 1U rackmount cases.


As stated previously, we make extensive use of temp tables, 
transactions, and dblink, but had no trouble with catalog table bloat in 
8.x; this is a new phenomenon for us.


# rpm -qi postgresql91-9.1.5-3PGDG.rhel6.x86_64
Name: postgresql91 Relocations: (not relocatable)
Version : 9.1.5 Vendor: (none)
Release : 3PGDG.rhel6   Build Date: Sun 02 Sep 2012 
12:13:18 PM UTC
Install Date: Wed 12 Sep 2012 03:04:24 AM UTC  Build Host: 
koji-sl6-x86-64-pg91
Group   : Applications/DatabasesSource RPM: 
postgresql91-9.1.5-3PGDG.rhel6.src.rpm

Size: 5193673  License: PostgreSQL
Signature   : DSA/SHA1, Sun 02 Sep 2012 12:13:20 PM UTC, Key ID 
1f16d2e1442df0f8

URL : http://www.postgresql.org/
Summary : PostgreSQL client programs and libraries
Description :
PostgreSQL is an advanced Object-Relational database management system
(DBMS) that supports almost all SQL constructs (including
transactions, subselects and user-defined types and functions). The
postgresql package includes the client programs and libraries that
you'll need to access a PostgreSQL DBMS server.  These PostgreSQL
client programs are programs that directly manipulate the internal
structure of PostgreSQL databases on a PostgreSQL server. These client
programs can be located on the same machine with the PostgreSQL
server, or may be on a remote machine which accesses a PostgreSQL
server over a network connection. This package contains the command-line
utilities for managing PostgreSQL databases on a PostgreSQL server.

If you want to manipulate a PostgreSQL database on a local or remote 
PostgreSQL

server, you need this package. You also need to install this package
if you're installing the postgresql91-server package.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Lists

On 11/07/2012 12:42 PM, Tom Lane wrote:

So you've turned off autovacuum, and are carefully not vacuuming the
system catalogs.  That's your problem all right.  Is there a
particularly good reason why this script isn't a one-liner "VACUUM"?


Back in the 8.x days, we experienced "vacuum full analyze" occasionally 
causing other processes to hang/timeout. In an attempt to minimize the 
impact of the locking, we updated the script to vacuum one table at a 
time, which seemed to work well throughout the 8.x series. I'd happily 
accept that this conclusion may have simply have been wrong, but it 
worked well enough that nobody complained and life was good. After 
switching to 9.x, we read that the "full" vacuum was less useful and so 
the script was changed to "vacuum analyze $table" rather than "vacuum 
full analyze $table".



Are you sure that once-a-day vacuuming is sufficient, even if it was
covering the system catalogs?  If you've managed to bloat pg_attribute
to 36GB, I suspect you've got enough churn (perhaps from temp tables)
that you really need the catalogs vacuumed more often.
The only thing that I could find in the docs even mentioning the idea of 
vacuuming catalogs is this sentence:


(A manual VACUUM should fix the problem, as suggested by the hint; but 
note that the VACUUM must be performed by a superuser, else it will fail 
to process system catalogs and thus not be able to advance the 
database's datfrozenxid.)

http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html

This does NOT clearly say that the end user could vacuum catalogs, let 
alone that it's necessary or even a good idea. Otherwise, the only 
mention is of tables, and there's no mention of the idea that tables are 
anything but user space.



My advice is dump, reload, and *don't* turn off autovacuum.


... because it
occasionally causes transactions and queries to hang when an update
causes a vacuum mid-day, effectively taking us offline randomly.

I suspect this claim is based on ancient and no longer very relevant
experience.

We tried several times to turn on autovacuum with 9.1 and had problems 
every time. If our use case is particularly special, I'd love to work 
with you to get autovacuum to work in our situation too as it would make 
life easier for us! But for the past few months, every time we've turned 
it on, we've had our phones swamped with customers who are unable to use 
our system while our application monitors scream bloody murder, at least 
weekly.


From what we could tell (under extreme pressure to get it all working 
again ASAP, mind you) it seemed that when doing a large update from 
within a transaction, autovacuum would get triggered before the 
transaction completed, causing the transaction to hang or at least slow 
way down, causing timeouts to occur with load balancers, so customers 
would then try again, compounding the ongoing problem. Pretty soon you 
have not only I/O issues, but also locking issues and upset customers. 
This issue may be compounded because we make fairly extensive use of 
dblink and temp tables to aggregate data for our customers who have 
multiple sites.


-Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Scott Marlowe
On Wed, Nov 7, 2012 at 1:42 PM, Tom Lane  wrote:
> Lists  writes:
>
>> ... because it
>> occasionally causes transactions and queries to hang when an update
>> causes a vacuum mid-day, effectively taking us offline randomly.
>
> I suspect this claim is based on ancient and no longer very relevant
> experience.

My experience is that if autovac is causing problems with stalled
queries etc you're either A: running ancient pg versions (pre 8.3), B:
Running WAY too aggressive settings in autovac (100 threads, no nap
time, cost limit of 10 etc.) or C: Your IO subsystem is absolute
crap.

On any modern server, default autovac settings from 8.3 and on should
only have the possible problem of not being tuned aggressively enough.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Jeff Janes
Hi Greg, I've added you to the cc list because I'm proposing to change
some wiki content which you wrote

On Wed, Nov 7, 2012 at 11:54 AM, Lists  wrote:
> On 11/07/2012 09:01 AM, Jeff Janes wrote:
>>
>> Ben, did you ever figure out where the space was going?
>
>
>
> Now, here's where it gets weird. From the disk space usage wiki,
> (http://wiki.postgresql.org/wiki/Disk_Usage) you'll recognize how I used it
> to get a total disk space used result:
>
> with mytable AS (
> SELECT
> nspname || '.' || relname AS "relation",
> pg_total_relation_size(C.oid) AS "size"
> FROM
> pg_class C
> LEFT JOIN pg_namespace N ON
> (N.oid = C.relnamespace)
> WHERE nspname NOT IN ('pg_catalog', 'information_schema')
> AND C.relkind <> 'i'
> AND nspname !~ '^pg_toast'
> ORDER BY
> pg_total_relation_size(C.oid) DESC
> )
> SELECT sum(size) AS size FROM mytable

I question the wisdom of that where clause (from the wiki)

If the pg_catalog relations are big, then they are big and why
shouldn't they get reported as such?  And if they are not big, that is
why the limit is there on the wiki page.

I'm tempted to go change it, but maybe there is a good reason it is
there which I do not understand.



...
>
> Google returns this page:
> http://www.postgresql.org/docs/9.1/static/catalog-pg-attribute.html which
> doesn't help me much. So, am I doing something wrong with admin? Our current
> process is that every night in the middle of the night, a script connects to
> each database on each server and runs a query to get all tables in each
> database and, for each, run
>
> "VACUUM ANALYZE $table"
>
> for each table in the database.


I take it your script that does that is not including the pg_catalog tables?

Why not just run "vacuum analyze" and let it do the entire database?


> I will note that autovacuum is off because it occasionally causes
> transactions and queries to hang when an update causes a vacuum mid-day,
> effectively taking us offline randomly.

Hang as in they are blocking on locks?  Or they just get slow because
the autovacuum is consuming too much IO?

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Tom Lane
Lists  writes:
>   pg_catalog.pg_attribute  | 36727480320

Ouch.

> Our current process is that every night in the middle of the night, a 
> script connects to each database on each server and runs a query to get 
> all tables in each database and, for each, run
> "VACUUM ANALYZE $table"
> for each table in the database.
> (note: there is a database for the "postgres" user on each DB server) 
> The script is a remnant from PG 8.x days, so am I missing something 
> fundamental about 9.x? I will note that autovacuum is off ...

So you've turned off autovacuum, and are carefully not vacuuming the
system catalogs.  That's your problem all right.  Is there a
particularly good reason why this script isn't a one-liner "VACUUM"?
Are you sure that once-a-day vacuuming is sufficient, even if it was
covering the system catalogs?  If you've managed to bloat pg_attribute
to 36GB, I suspect you've got enough churn (perhaps from temp tables)
that you really need the catalogs vacuumed more often.

My advice is dump, reload, and *don't* turn off autovacuum.

> ... because it 
> occasionally causes transactions and queries to hang when an update 
> causes a vacuum mid-day, effectively taking us offline randomly.

I suspect this claim is based on ancient and no longer very relevant
experience.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Lists

On 11/07/2012 09:01 AM, Jeff Janes wrote:

Ben, did you ever figure out where the space was going?


I think we've found where the space is going, but I still don't yet know 
how to resolve it. I modified your query thusly in order to get a total 
of space used, and got an answer that matches closely:


with stuff as (SELECT d.datname as Name, 
pg_catalog.pg_get_userbyid(d.datdba) as Owner,

CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE -1
END as Size
FROM pg_catalog.pg_database d
order by
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END desc nulls first) SELECT sum(size) AS overall from stuff;
Result: 171,276,369,124

# du -sbc /var/lib/pgsql/9.1/data/*
Result: 172,087,129,512

Now, the question is, I see several databases that uses disk usage with 
sizes that are dramatically different than I get from a dump/restore to 
another machine:


Production:
 santarosa444| postgres | 44 GB

Dump/Restore:
 santarosa444| postgres | 685 MB

Now, here's where it gets weird. From the disk space usage wiki, 
(http://wiki.postgresql.org/wiki/Disk_Usage) you'll recognize how I used 
it to get a total disk space used result:


with mytable AS (
SELECT
nspname || '.' || relname AS "relation",
pg_total_relation_size(C.oid) AS "size"
FROM
pg_class C
LEFT JOIN pg_namespace N ON
(N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY
pg_total_relation_size(C.oid) DESC
)
SELECT sum(size) AS size FROM mytable

... but the total result is 747,569,152 which is close to the 
dump/restore value, not the production server value, even though I'm 
running this query on the production server. So there's *something* that 
the latter query isn't identifying that the former is.


On a hunch, ran this query:

with mytable AS (
SELECT
nspname || '.' || relname AS "relation",
pg_total_relation_size(C.oid) AS "size"
FROM
pg_class C
LEFT JOIN pg_namespace N ON
(N.oid = C.relnamespace) WHERE (true or 
(nspname NOT IN ('pg_catalog', 'information_schema')))

AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY
pg_total_relation_size(C.oid) DESC
)
SELECT sum(size) AS size FROM mytable;

And the result is 46,771,216,384! Removing the "mytable" wrapper stuff, 
here are the top results:


 pg_catalog.pg_attribute  | 36727480320
 pg_catalog.pg_attrdef| 3800072192
 pg_catalog.pg_depend | 2665930752
 pg_catalog.pg_class  | 1508925440
 pg_catalog.pg_type   | 1113038848
 public.att_claims| 451698688
 public.stgrades  | 127639552
 pg_catalog.pg_index  | 107806720


Google returns this page: 
http://www.postgresql.org/docs/9.1/static/catalog-pg-attribute.html 
which doesn't help me much. So, am I doing something wrong with admin? 
Our current process is that every night in the middle of the night, a 
script connects to each database on each server and runs a query to get 
all tables in each database and, for each, run


"VACUUM ANALYZE $table"

for each table in the database.

And then once a week:
psql -U postgres -c "\l" | grep -Po "(\w+444)" | xargs -t -i psql -U 
postgres {} -c "REINDEX DATABASE {};"


(note: there is a database for the "postgres" user on each DB server) 
The script is a remnant from PG 8.x days, so am I missing something 
fundamental about 9.x? I will note that autovacuum is off because it 
occasionally causes transactions and queries to hang when an update 
causes a vacuum mid-day, effectively taking us offline randomly. Our 
scenario is pretty much a worst-possible case of transactions, prepared 
transactions, temp tables, and concurrent read/write queries.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Jeff Janes
On Tue, Nov 6, 2012 at 11:28 AM, Tom Lane  wrote:
> Jeff Janes  writes:
>> On Tue, Nov 6, 2012 at 10:49 AM, Lists  wrote:
>>> I followed your example, the result is at the bottom. Based on this it would
>>> seem that there are 3-4 databases that seem to be the culprit. How could I
>>> get more depth/detail on what specifically is the problem?
>
>> If you have installed the contrib modules (oid2name specifically), you
>> can use that to get the name of the bloated database:
>> oid2name | fgrep 607471
>
> Or, if you didn't install contrib, try
>
> select datname from pg_database where oid = 607471

Thanks, I knew there had to be a more direct way to do that.

>
>> If the name of the database doesn't give you any insight, then look
>> for large files in the directory base/607471 that whose names all
>> start with the same digits and use oid2name to get the names of the
>> relations for those files.
>
>> oid2name -d   -o 
>
> For this you can try
>
> select relname from pg_class where relfilenode = 
>
> Or let the database do the work:
>
> select relname, pg_relation_size(oid) from pg_class order by 2 desc;

Ben described using something like this method originally and not
finding the space, so I wanted to work backwards from certain
knowledge of where the OS says the space is being used.

But now I think maybe his scripts to aggregate table sizes over all
databases (and also his script to load pg_dumps of those databases
into a new cluster) are accidentally omitting some databases--the
largest ones.

Is there a simple query for a super-user to get a list of all relation
sizes over all databases cluster-wide?

If "\l+" can get the size of databases other than the one currently
connected to, maybe there is a way to extend that to tables in those
other databases.

It would at least be nice to be able to get the sizes of all
databases.  Since '\l+' doesn't sort by size and I don't know how to
make it do so, I pulled the query from psql source code and modified
it:

SELECT d.datname as Name,  pg_catalog.pg_get_userbyid(d.datdba) as Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
ELSE 'No Access'
END as Size
FROM pg_catalog.pg_database d
order by
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END desc nulls first;

(And discovered a long forgotten unused database I had sitting around
taking up space)

Ben, did you ever figure out where the space was going?

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-06 Thread Tom Lane
Jeff Janes  writes:
> On Tue, Nov 6, 2012 at 10:49 AM, Lists  wrote:
>> I followed your example, the result is at the bottom. Based on this it would
>> seem that there are 3-4 databases that seem to be the culprit. How could I
>> get more depth/detail on what specifically is the problem?

> If you have installed the contrib modules (oid2name specifically), you
> can use that to get the name of the bloated database:
> oid2name | fgrep 607471

Or, if you didn't install contrib, try

select datname from pg_database where oid = 607471

> If the name of the database doesn't give you any insight, then look
> for large files in the directory base/607471 that whose names all
> start with the same digits and use oid2name to get the names of the
> relations for those files.

> oid2name -d   -o 

For this you can try

select relname from pg_class where relfilenode = 

Or let the database do the work:

select relname, pg_relation_size(oid) from pg_class order by 2 desc;

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-06 Thread Jeff Janes
On Tue, Nov 6, 2012 at 10:49 AM, Lists  wrote:
> I followed your example, the result is at the bottom. Based on this it would
> seem that there are 3-4 databases that seem to be the culprit. How could I
> get more depth/detail on what specifically is the problem?

If you have installed the contrib modules (oid2name specifically), you
can use that to get the name of the bloated database:

oid2name | fgrep 607471

If the name of the database doesn't give you any insight, then look
for large files in the directory base/607471 that whose names all
start with the same digits and use oid2name to get the names of the
relations for those files.

oid2name -d   -o 

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-06 Thread Lists
I followed your example, the result is at the bottom. Based on this it 
would seem that there are 3-4 databases that seem to be the culprit. How 
could I get more depth/detail on what specifically is the problem?


-Ben

On 11/05/2012 07:10 PM, Scott Marlowe wrote:
What does du -sh have to say about it? Use unix tools to examine your 
file system and see where the usage is going. For instance, I can do 
this: cd /var/lib/postgresql/8.4/main/ du -s *|sort -n 0 server.crt 0 
server.key 4 pg_tblspc 4 pg_twophase 4 PG_VERSION 4 postmaster.opts 4 
postmaster.pid 12 pg_clog 12 pg_stat_tmp 12 pg_subtrans 28 
pg_multixact 460 global 16392 pg_xlog 16396 base which tells me that 
I'm using about 16MB for each pg_xlog and base. I can then do cd into 
base and look around: cd base du -s *|sort -n 5416 1 5416 11563 5560 
11564 Which shows me using about 5MB each for three different dbs. And 
so on. On an off guess, did you go from a SQL_ASCII encoding to UTF8? 
That might increase disk space usage a bit. 


[root@delta ~]# cd /var/lib/pgsql/9.1/data/
[root@delta data]# du -s * | sort -n
4   pg_ident.conf
4   pg_serial
4   pg_tblspc
4   PG_VERSION
4   postmaster.opts
4   postmaster.pid
8   pg_hba.conf
12  pg_notify
12  pg_twophase
20  postgresql.300
20  postgresql.conf
20  postgresql.conf.20120903
20  postgresql.conf.300
76  pg_subtrans
104 pg_multixact
15044   pg_log
18184   global
25216   pg_stat_tmp
47916   pg_clog
671916 pg_xlog
164753204   base

[root@delta data]# cd base
[root@delta base]# du -s * | sort -n
4   pgsql_tmp
612412772
638812780
64241
72424   331506
72700   160676
72896   391655
73200   52389
73216   523672
74104   619675
74956   295646
76768   307580
77896   547597
80824   571547
87368   475799
90940   631604
113876  124651
123548  148525
130096  367533
149792  439726
173648  355578
175404  679545
190732  559580
225780  511706
326468  667547
352736  655477
398736  535644
469408  136582
483716  499753
513124  270926
575612  715601
590408  487780
04  463779
713208  643540
714896  583515
803216  343438
806952  427663
855156  739506
872200  197221
975692  64371
987692  775594
1005268 595488
1024812 691482
1042212 727552
1047464 379566
1260044 76601
1276756 16384
1345072 403667
1474468 209158
1477808 172604
1536168 221124
1637652 258798
1811504 88598
1963740 245588
2076748 703467
2193536 415671
2430908 801322
2552640 319552
2785212 28315
3454880 112612
3755548 451666
3929420 100666
4651876 40451
5714940 751514
6257740 233293
7313900 184735
9334796 763606
10940780283609
20837264788338
45285640607471


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-06 Thread Lists

Jeff, thanks for the feedback!

On 11/05/2012 08:51 PM, Jeff Janes wrote:

My first guesses would be things that are not directly under the
databases control, such as:

1) your server logs are accumulating and you aren't doing anything about them


I'm guessing that this is not the case:

[root@delta data]# du -shc *  | grep -i log
47M pg_clog
15M pg_log
641Mpg_xlog


2) you are taking backup snapshots to somewhere in that directory and
not cleaning them up


Our backup snapshots (taken with pg_dump) are taken on a different 
server over the network. Dumps are made several times during each day. 
Could this be part of the problem if (somehow) they didn't complete? And 
if so, would there be some cleanup I'd have to do other than restarting PG?



3) your archive_command is failing (which you should see reports of in
the server logs) and so you are accumulating xlog files.


As I understand things, the result above under 1) demonstrates that 
this, also, is not the cause.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-05 Thread Jeff Janes
On Mon, Nov 5, 2012 at 7:01 PM, Lists  wrote:
> We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB
> servers with more disk space and memory. Unexpectedly, the DB servers have
> steadily increased their disk space usage since. Reported system load
> doesn't seem to be affected. It's happening to all our DB servers running
> 9.1.
>
> When we reload all pg_dumps from our worst-affected server into an offline
> server, the disk space usage is about 26 GB, but the production database is
> using 166 GB. (# df /var/lib/pgsql;)

My first guesses would be things that are not directly under the
databases control, such as:

1) your server logs are accumulating and you aren't doing anything about them
2) you are taking backup snapshots to somewhere in that directory and
not cleaning them up
3) your archive_command is failing (which you should see reports of in
the server logs) and so you are accumulating xlog files.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-05 Thread Scott Marlowe
On Mon, Nov 5, 2012 at 8:01 PM, Lists  wrote:
> We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB
> servers with more disk space and memory. Unexpectedly, the DB servers have
> steadily increased their disk space usage since. Reported system load
> doesn't seem to be affected. It's happening to all our DB servers running
> 9.1.
>
> When we reload all pg_dumps from our worst-affected server into an offline
> server, the disk space usage is about 26 GB, but the production database is
> using 166 GB. (# df /var/lib/pgsql;)
>
> To resolve this, we've tried:
>
> 1) reindexed everything (cut about 10% of disk usage temporarily)
>
> 2) tried vacuum full, and vacuum analyze on all databases. (to minimal
> effect)
>
> 3) Restarting PG (no discernable effect) including a full stop/start.
>
> 4) We've looked for stale prepared transactions (none found)
>
> 5) instructions from the wiki to try to determine what the cause of all the
> disk usage is: http://wiki.postgresql.org/wiki/Disk_Usage but when we add up
> all the results for all the different databases, tables, indexes, etc. in a
> script, we get a number very close to the usage of the freshly loaded
> server. (24 GB)

What does du -sh have to say about it?  Use unix tools to examine your
file system and see where the usage is going.  For instance, I can do
this:

cd /var/lib/postgresql/8.4/main/
du -s *|sort -n
0   server.crt
0   server.key
4   pg_tblspc
4   pg_twophase
4   PG_VERSION
4   postmaster.opts
4   postmaster.pid
12  pg_clog
12  pg_stat_tmp
12  pg_subtrans
28  pg_multixact
460 global
16392   pg_xlog
16396   base

which tells me that I'm using about 16MB for each pg_xlog and base.  I
can then do cd into base and look around:

cd base
du -s *|sort -n
54161
541611563
556011564

Which shows me using about 5MB each for three different dbs.

And so on.

On an off guess, did you go from a SQL_ASCII encoding to UTF8?  That
might increase disk space usage a bit.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Unexpectedly high disk space usage

2012-11-05 Thread Lists
We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB 
servers with more disk space and memory. Unexpectedly, the DB servers 
have steadily increased their disk space usage since. Reported system 
load doesn't seem to be affected. It's happening to all our DB servers 
running 9.1.


When we reload all pg_dumps from our worst-affected server into an 
offline server, the disk space usage is about 26 GB, but the production 
database is using 166 GB. (# df /var/lib/pgsql;)


To resolve this, we've tried:

1) reindexed everything (cut about 10% of disk usage temporarily)

2) tried vacuum full, and vacuum analyze on all databases. (to minimal 
effect)


3) Restarting PG (no discernable effect) including a full stop/start.

4) We've looked for stale prepared transactions (none found)

5) instructions from the wiki to try to determine what the cause of all 
the disk usage is: http://wiki.postgresql.org/wiki/Disk_Usage but when 
we add up all the results for all the different databases, tables, 
indexes, etc. in a script, we get a number very close to the usage of 
the freshly loaded server. (24 GB)


What  is Postgres doing with ~ 80% of its disk space usage? This is not 
normal, is it? I would hate to have to take the servers off line just to 
dump/restore in order to bring disk usage back to normal...



SYSTEM SPECS:
I've attached the postgresql.conf on a RHEL6/64 Linux server with 128 GB 
Of RAM and 16 real CPU cores. (HT turned on, 32 CPUs according to the O/S)


#FROM: sysctl.conf:
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 136365211648
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296


-Ben
# -
# PostgreSQL configuration file
# -
#
# This file consists of lines of the form:
#
#   name = value
#
# (The "=" is optional.)  Whitespace may be used.  Comments are introduced with
# "#" anywhere on a line.  The complete list of parameter names and allowed
# values can be found in the PostgreSQL documentation.
#
# The commented-out settings shown in this file represent the default values.
# Re-commenting a setting is NOT sufficient to revert it to the default value;
# you need to reload the server.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pg_ctl reload".  Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#
# Any parameter can also be given as a command-line option to the server, e.g.,
# "postgres -c log_connections=on".  Some parameters can be changed at run time
# with the "SET" SQL command.
#
# Memory units:  kB = kilobytesTime units:  ms  = milliseconds
#MB = megabytes s   = seconds
#GB = gigabytes min = minutes
#   h   = hours
#   d   = days


#--
# FILE LOCATIONS
#--

# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'   # use data in another directory
# (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
# (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
# (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = '(none)'   # write an extra PID file
# (change requires restart)


#--
# CONNECTIONS AND AUTHENTICATION
#--

# - Connection Settings -

listen_addresses = '*'  # what IP address(es) to listen on;
# comma-separated list of addresses;
# defaults to 'localhost', '*' = all
# (change requires restart)
port = 5432 # (change requires restart)
# max_connections = 150 # (change requires restart)
# Note:  Increasing max_connections costs ~400 bytes of shared memory per
# connection slot, plus lock space (see max_locks_per_transaction).
#superuser_reserved_connections = 3 # (change requires restart)
#unix_socket_directory = '' # (change requires restart)
#un