Re: [PERFORM] Ineffective autovacuum

2011-09-27 Thread Marti Raudsepp
1. First things first: vacuum cannot delete tuples that are still
visible to any old running transactions. You might have some very long
queries or transactions that prevent it from cleaning properly:

select * from pg_stat_activity where xact_start  now()-interval '10 minutes';

2. On 8.3 and earlier servers with large tables, it's critical that
your max_fsm_pages and max_fsm_relations are tuned properly. Failing
that, autovacuum will permanently leak space that can only be fixed
with a VACUUM FULL (which will take an exclusive lock and run for a
very long time)

PostgreSQL version 8.4 addressed this problem, but for the
unfortunate, you have to follow the tuning advice here:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#autovacuum_max_fsm_pages.2C_max_fsm_relations

On Tue, Sep 27, 2011 at 08:08, Royce Ausburn royce...@inomial.com wrote:
 I've noticed that the same tables seem to be auto vacuum'd over and over 
 again… Some of the tables are a bit surprising in that they're updated 
 semi-regularly, but not enough (I'd think) to warrant an autovacuum every few 
 minutes… Is this unusual?

Maybe they're just auto-analyze processes? Those get triggered on
insert-only tables too, when vacuum normally wouldn't run.

 Perhaps unrelated: I've done some digging around and happened across a 
 nightly task doing:
 select pg_stat_reset()

AFAIK (but I could be wrong), vacuum uses a separate set of statistics
not affected by pg_stat_reset.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Ineffective autovacuum

2011-09-27 Thread Tom Lane
Royce Ausburn royce...@inomial.com writes:
 Since sending this first email I've up'd the autovacuum log level and I've 
 noticed that the same tables seem to be auto vacuum'd over and over again… 
 Some of the tables are a bit surprising in that they're updated 
 semi-regularly, but not enough (I'd think) to warrant an autovacuum every few 
 minutes… Is this unusual?

Well, that proves autovacuum isn't getting blocked anyway.  At this
point I suspect that Marti has fingered the correct issue: you likely
need to increase the FSM settings.  You should try running a manual
VACUUM VERBOSE and see if it suggests that more FSM space is needed
(there'll be some FSM stats at the end of the verbose printout).

 Perhaps unrelated: I've done some digging around and happened across a 
 nightly task doing:
 select pg_stat_reset()
 on each of the databases in the cluster…. I've no idea why we're doing that 
 (and our usual sysadmin / DBA has resigned, so I doubt I'll ever know).  
 There must have been a reason at the time, but I wonder if this might be 
 interfering with things?

Hmm, it's not helping any.  Anything that needs vacuuming, but less
often than once a day, would get missed due to the stats getting
forgotten.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Ineffective autovacuum

2011-09-27 Thread Scott Marlowe
On Tue, Sep 27, 2011 at 7:49 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Royce Ausburn royce...@inomial.com writes:
 Since sending this first email I've up'd the autovacuum log level and I've 
 noticed that the same tables seem to be auto vacuum'd over and over again… 
 Some of the tables are a bit surprising in that they're updated 
 semi-regularly, but not enough (I'd think) to warrant an autovacuum every 
 few minutes… Is this unusual?

 Well, that proves autovacuum isn't getting blocked anyway.  At this
 point I suspect that Marti has fingered the correct issue: you likely
 need to increase the FSM settings.  You should try running a manual
 VACUUM VERBOSE and see if it suggests that more FSM space is needed
 (there'll be some FSM stats at the end of the verbose printout).

That's the probably the best first step, a good second one might be to
increase the aggressiveness of autovac by lowering the delay, and
increasing the cost limit.

OP: You need to watch it a little closer during the day.  first do as
suggested and increase the max_fsm_pages.  High settings on it don't
cost a lot as they're only 6 bytes per page.  So 1M max_fsm_pages
costs 6M of shared RAM.  After that run vacuum verbose every hour or
two to keep an eye on the trend of how many pages it says are needed.
If that number doesn't stabilize, but just keeps growing then you're
not vacuuming aggressively enough.  Up autovacuum_vacuum_cost_limit by
a couple of factors, and lower autovacuum_vacuum_cost_delay to 5ms or
less.  Make sure you don't swamp your IO subsystem.  On big machines
with lots of spindles it's hard to swamp the IO.  On smaller
workstation class machines it's pretty easy.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Ineffective autovacuum

2011-09-27 Thread Royce Ausburn

On 27/09/2011, at 8:29 PM, Marti Raudsepp wrote:

 1. First things first: vacuum cannot delete tuples that are still
 visible to any old running transactions. You might have some very long
 queries or transactions that prevent it from cleaning properly:
 
 select * from pg_stat_activity where xact_start  now()-interval '10 minutes';

Thanks -- that query is very handy.  I suspect this might be the cause of our 
woes as this query results in a handful of long lived connections, however 
they're connections to databases other than the one that I'm having trouble 
with.  

I've checked up on the FSM as you suggested, I don't think that's the problem 
as there're no warnings in the verbose output nor the logs.  But another clue:

DETAIL:  93 dead row versions cannot be removed yet.

After clearing those stuffed transactions vacuum verbose manages to clear away 
all the dead rows… That's confirmation enough for me - Now to find the 
application bugs - Thanks Tom, Marti  Scott for your help!

--Royce



[PERFORM] Ineffective autovacuum

2011-09-26 Thread Royce Ausburn
Hi all,

I have a problem with autovacuum apparently not doing the job I need it to do.

I have a table named datasession that is frequently inserted, updated and 
deleted from.  Typically the table will have a few thousand rows in it.  Each 
row typically survives a few days and is updated every 5 - 10 mins.  The 
application receives unreliable, potentially duplicate data from its source, so 
this table is heavily used for synchronising application threads as well.  A 
typical access pattern is:

- tx begin
- SELECT FOR UPDATE on a single row
- Do some application processing (1 - 100 ms)
- Possibly UPDATE the row
- tx commit

In a few instances of our application we're seeing this table grow obscenely to 
the point where our monitoring servers get us out of bed to manually vacuum.  I 
like sleep, so I want to fix this =D

I've read some recent threads and found a discussion (below) on auto vacuum 
that mentions auto vacuum will be cancelled when a client requests a lock that 
auto vacuum is using… My questions:

1) Does it look like I'm affected by the same problem as in the below 
discussion?

2) Are there better solutions to this problem than a periodic task that 
vacuums/truncates-and-rebuilds the table?  


Perhaps relevant info:


# select version();
 version
  
--
 PostgreSQL 8.3.12 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 
(Ubuntu 4.2.4-1ubuntu3)
(1 row)

Auto vacuum and vacuum parameters are set to the factory defaults.

Cheers,

--Royce

 From: Tom Lane t...@sss.pgh.pa.us
 Subject: Re: [GENERAL] Vacuum as easily obtained locks 
 Date: 4 August 2011 1:52:02 AM AEST
 To: Michael Graham mgra...@bloxx.com
 Cc: Pavan Deolasee pavan.deola...@gmail.com, pgsql-gene...@postgresql.org
 
 On Wed, 2011-08-03 at 11:40 -0400, Tom Lane wrote:
 The other problem is that once autovacuum has gotten the lock, it has
 to keep it for long enough to re-scan the truncatable pages (to make
 sure they're still empty).  And it is set up so that any access to the
 table will kick autovacuum off the lock.  An access pattern like that
 would very likely prevent it from ever truncating, if there are a lot
 of pages that need to be truncated.  (There's been some discussion of
 modifying this behavior, but nothing's been done about it yet.) 

 Michael Graham mgra...@bloxx.com writes:
 Ah!  This looks like it is very much the issue.  Since I've got around
 150GB of data that should be truncatable and a select every ~2s.
 
 Just to confirm would postgres write:
 
 2011-08-03 16:09:55 BST ERROR:  canceling autovacuum task
 2011-08-03 16:09:55 BST CONTEXT:  automatic vacuum of table
 traffic.public.logdata5queue
 
 Under those circumstances?
 
 Yup ...
 
 If you do a manual VACUUM, it won't allow itself to get kicked off the
 lock ... but as noted upthread, that will mean your other queries get
 blocked till it's done.  Not sure there's any simple fix for this that
 doesn't involve some downtime.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




Re: [PERFORM] Ineffective autovacuum

2011-09-26 Thread Tom Lane
Royce Ausburn royce...@inomial.com writes:
 I have a problem with autovacuum apparently not doing the job I need it to do.

Hm, I wonder whether you're getting bit by bug #5759, which was fixed
after 8.3.12.

 I have a table named datasession that is frequently inserted, updated and 
 deleted from.  Typically the table will have a few thousand rows in it.  Each 
 row typically survives a few days and is updated every 5 - 10 mins.  The 
 application receives unreliable, potentially duplicate data from its source, 
 so this table is heavily used for synchronising application threads as well.  
 A typical access pattern is:

 - tx begin
 - SELECT FOR UPDATE on a single row
 - Do some application processing (1 - 100 ms)
 - Possibly UPDATE the row
 - tx commit

Transactions of that form would not interfere with autovacuum.  You'd
need something that wants exclusive lock, like a schema change.

 I've read some recent threads and found a discussion (below) on auto vacuum 
 that mentions auto vacuum will be cancelled when a client requests a lock 
 that auto vacuum is using… My questions:
 1) Does it look like I'm affected by the same problem as in the below 
 discussion?

Not unless you're seeing a lot of canceling autovacuum task messages
in the postmaster log.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Ineffective autovacuum

2011-09-26 Thread Royce Ausburn



On 27/09/2011, at 2:21 PM, Tom Lane wrote:

 Royce Ausburn royce...@inomial.com writes:
 I have a problem with autovacuum apparently not doing the job I need it to 
 do.
 
 Hm, I wonder whether you're getting bit by bug #5759, which was fixed
 after 8.3.12.

If this were the case, would I see lots of auto vacuum worker processes in ps 
that are essentially doing nothing because they're sleeping all the time?  If 
so, then I think perhaps not.

 
 I have a table named datasession that is frequently inserted, updated and 
 deleted from.  Typically the table will have a few thousand rows in it.  
 Each row typically survives a few days and is updated every 5 - 10 mins.  
 The application receives unreliable, potentially duplicate data from its 
 source, so this table is heavily used for synchronising application threads 
 as well.  A typical access pattern is:
 
 - tx begin
 - SELECT FOR UPDATE on a single row
 - Do some application processing (1 - 100 ms)
 - Possibly UPDATE the row
 - tx commit
 
 Transactions of that form would not interfere with autovacuum.  You'd
 need something that wants exclusive lock, like a schema change.
 
 I've read some recent threads and found a discussion (below) on auto vacuum 
 that mentions auto vacuum will be cancelled when a client requests a lock 
 that auto vacuum is using∑ My questions:
 1) Does it look like I'm affected by the same problem as in the below 
 discussion?
 
 Not unless you're seeing a lot of canceling autovacuum task messages
 in the postmaster log.

Okay - This is not the case.

Since sending this first email I've up'd the autovacuum log level and I've 
noticed that the same tables seem to be auto vacuum'd over and over again… Some 
of the tables are a bit surprising in that they're updated semi-regularly, but 
not enough (I'd think) to warrant an autovacuum every few minutes… Is this 
unusual?


Perhaps unrelated: I've done some digging around and happened across a nightly 
task doing:

select pg_stat_reset()

on each of the databases in the cluster…. I've no idea why we're doing that 
(and our usual sysadmin / DBA has resigned, so I doubt I'll ever know).  There 
must have been a reason at the time, but I wonder if this might be interfering 
with things?

At any rate, I think the logs might glean some more interesting information, 
I'll let it alone for a few hours and hopefully I'll have some more useful 
information.

--Royce


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance