Re: [HACKERS] autovacuum truncate exclusive lock round two

2013-01-23 Thread Kevin Grittner
Kevin Grittner wrote:

 Applied with trivial editing, mostly from a pgindent run against
 modified files.

Applied back as far as 9.0. Before that code didn't match well
enough for it to seem safe to apply without many hours of
additional testing.

I have confirmed occurences of this problem at least as far back as
9.0 in the wild, where it is causing performance degradation severe
enough to force users to stop production usage long enough to
manually vacuum the affected tables. The use case is a lot like
what Jan described, where PostgreSQL is being used for high volume
queuing. When there is a burst of activity which increases table
size, and then the queues are drained back to a normal state, the
problem shows up.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-11 Thread Jan Wieck

On 12/9/2012 2:37 PM, Kevin Grittner wrote:

Jan Wieck wrote:


Based on the discussion and what I feel is a consensus I have
created an updated patch that has no GUC at all. The hard coded
parameters in include/postmaster/autovacuum.h are

 AUTOVACUUM_TRUNCATE_LOCK_CHECK_INTERVAL 20 /* ms */
 AUTOVACUUM_TRUNCATE_LOCK_WAIT_INTERVAL 50 /* ms */
 AUTOVACUUM_TRUNCATE_LOCK_TIMEOUT 5000 /* ms */


Since these really aren't part of the external API and are only
referenced in vacuumlazy.c, it seems more appropriate to define
them there.


I gave that the worst workload I can think of. A pgbench (style)
application that throws about 10 transactions per second at it,
so that there is constantly the need to give up the lock due to
conflicting lock requests and then reacquiring it again. A
cleanup process is periodically moving old tuples from the
history table to an archive table, making history a rolling
window table. And a third job that 2-3 times per minute produces
a 10 second lasting transaction, forcing autovacuum to give up on
the lock reacquisition.

Even with that workload autovacuum slow but steady is chopping
away at the table.


Applies with minor offsets, builds without warning, and passes
`make check-world`. My tests based on your earlier posted test
script confirm the benefit.

There are some minor white-space issues; for example git diff
--color shows some trailing spaces in comments.


Cleaned up all of those.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 5036849..1686b88 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***
*** 52,62 
--- 52,64 
  #include storage/bufmgr.h
  #include storage/freespace.h
  #include storage/lmgr.h
+ #include storage/proc.h
  #include utils/lsyscache.h
  #include utils/memutils.h
  #include utils/pg_rusage.h
  #include utils/timestamp.h
  #include utils/tqual.h
+ #include portability/instr_time.h
  
  
  /*
***
*** 82,87 
--- 84,96 
   */
  #define SKIP_PAGES_THRESHOLD	((BlockNumber) 32)
  
+ /*
+  * Truncate exclusive lock configuration
+  */
+ #define AUTOVACUUM_TRUNCATE_LOCK_CHECK_INTERVAL 20 /* ms */
+ #define AUTOVACUUM_TRUNCATE_LOCK_WAIT_INTERVAL  50 /* ms */
+ #define AUTOVACUUM_TRUNCATE_LOCK_TIMEOUT5000 /* ms */
+ 
  typedef struct LVRelStats
  {
  	/* hasindex = true means two-pass strategy; false means one-pass */
*** typedef struct LVRelStats
*** 103,108 
--- 112,118 
  	ItemPointer dead_tuples;	/* array of ItemPointerData */
  	int			num_index_scans;
  	TransactionId latestRemovedXid;
+ 	bool		lock_waiter_detected;
  } LVRelStats;
  
  
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 193,198 
--- 203,210 
  	vacrelstats-old_rel_pages = onerel-rd_rel-relpages;
  	vacrelstats-old_rel_tuples = onerel-rd_rel-reltuples;
  	vacrelstats-num_index_scans = 0;
+ 	vacrelstats-pages_removed = 0;
+ 	vacrelstats-lock_waiter_detected = false;
  
  	/* Open all indexes of the relation */
  	vac_open_indexes(onerel, RowExclusiveLock, nindexes, Irel);
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 259,268 
  		vacrelstats-hasindex,
  		new_frozen_xid);
  
! 	/* report results to the stats collector, too */
! 	pgstat_report_vacuum(RelationGetRelid(onerel),
  		 onerel-rd_rel-relisshared,
  		 new_rel_tuples);
  
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
--- 271,288 
  		vacrelstats-hasindex,
  		new_frozen_xid);
  
! 	/*
! 	 * report results to the stats collector, too.
! 	 * An early terminated lazy_truncate_heap attempt
! 	 * suppresses the message and also cancels the
! 	 * execution of ANALYZE, if that was ordered.
! 	 */
! 	if (!vacrelstats-lock_waiter_detected)
! 		pgstat_report_vacuum(RelationGetRelid(onerel),
  		 onerel-rd_rel-relisshared,
  		 new_rel_tuples);
+ 	else
+ 		vacstmt-options = ~VACOPT_ANALYZE;
  
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
*** lazy_truncate_heap(Relation onerel, LVRe
*** 1257,1336 
  	BlockNumber old_rel_pages = vacrelstats-rel_pages;
  	BlockNumber new_rel_pages;
  	PGRUsage	ru0;
  
  	pg_rusage_init(ru0);
  
  	/*
! 	 * We need full exclusive lock on the relation in order to do truncation.
! 	 * If we can't get it, give up rather than waiting --- we don't want to
! 	 * block other backends, and we don't want to deadlock (which is quite
! 	 * possible considering we already hold a lower-grade lock).
! 	 */
! 	if (!ConditionalLockRelation(onerel, AccessExclusiveLock))
! 		return;
! 
! 	/*
! 	 * Now that we have exclusive lock, look to see if the rel has grown
! 	 * whilst we were vacuuming with non-exclusive lock.  If so, give up; the
! 	 * 

Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-11 Thread Kevin Grittner
Jan Wieck wrote:

 Cleaned up all of those.

Applied with trivial editing, mostly from a pgindent run against
modified files.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-09 Thread Kevin Grittner
Jan Wieck wrote:

 Based on the discussion and what I feel is a consensus I have
 created an updated patch that has no GUC at all. The hard coded
 parameters in include/postmaster/autovacuum.h are
 
  AUTOVACUUM_TRUNCATE_LOCK_CHECK_INTERVAL 20 /* ms */
  AUTOVACUUM_TRUNCATE_LOCK_WAIT_INTERVAL 50 /* ms */
  AUTOVACUUM_TRUNCATE_LOCK_TIMEOUT 5000 /* ms */

Since these really aren't part of the external API and are only
referenced in vacuumlazy.c, it seems more appropriate to define
them there.

 I gave that the worst workload I can think of. A pgbench (style) 
 application that throws about 10 transactions per second at it,
 so that there is constantly the need to give up the lock due to
 conflicting lock requests and then reacquiring it again. A
 cleanup process is periodically moving old tuples from the
 history table to an archive table, making history a rolling
 window table. And a third job that 2-3 times per minute produces
 a 10 second lasting transaction, forcing autovacuum to give up on
 the lock reacquisition.
 
 Even with that workload autovacuum slow but steady is chopping
 away at the table.

Applies with minor offsets, builds without warning, and passes
`make check-world`. My tests based on your earlier posted test
script confirm the benefit.

There are some minor white-space issues; for example git diff
--color shows some trailing spaces in comments.

There are no docs, but since there are no user-visible changes in
behavior other than better performance and more prompt and reliable
trunction of tables where we were already doing so, it doesn't seem
like any new docs are needed. Due to the nature of the problem,
tests are tricky to run correctly and take a long time to run, so I
don't see how any regressions tests would be appropriate, either.

This patch seems ready for committer, and I would be comfortable
with making the minor changes I mention above and committing it. 
I'll wait a day or two to allow any other comments or objections.

To summarize, there has been pathalogical behavior in an
infrequently-encountered corner case of autovacuum, wasting a lot
of resources indefinitely when it is encountered; this patch gives
a major performance improvement in in this situation without any
other user-visible change and without requiring any new GUCs. It
adds a new public function in the locking area to allow a process
to check whether a particular lock it is holding is blocking any
other process, and another to wrap that to make it easy to check
whether the lock held on a particular table is blocking another
process. It uses this new capability to be smarter about scheduling
autovacuum's truncation work, and to avoid throwing away
incremental progress in doing so.

As such, I don't think it would be crazy to back-patch this, but I
think it would be wise to allow it to be proven on master/9.3 for a
while before considering that.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-08 Thread Jan Wieck

On 12/6/2012 12:45 PM, Robert Haas wrote:

On Wed, Dec 5, 2012 at 10:16 PM, Jan Wieck janwi...@yahoo.com wrote:

That sort of dynamic approach would indeed be interesting. But I fear that
it is going to be complex at best. The amount of time spent in scanning
heavily depends on the visibility map. The initial vacuum scan of a table
can take hours or more, but it does update the visibility map even if the
vacuum itself is aborted later. The next vacuum may scan that table in
almost no time at all, because it skips all blocks that are marked all
visible.


Well, if that's true, then there's little reason to worry about giving
up quickly, because the next autovacuum a minute later won't consume
many resources.


Almost no time is of course relative to what an actual scan and dead 
tuple removal cost. Looking at a table with 3 GB of dead tuples at the 
end, the initial vacuum scan takes hours. When vacuum comes back to this 
table, cleaning up a couple megabytes of newly deceased tuples and then 
skipping over the all visible pages may take a minute.


Based on the discussion and what I feel is a consensus I have created an 
updated patch that has no GUC at all. The hard coded parameters in 
include/postmaster/autovacuum.h are


AUTOVACUUM_TRUNCATE_LOCK_CHECK_INTERVAL  20 /* ms */
AUTOVACUUM_TRUNCATE_LOCK_WAIT_INTERVAL   50 /* ms */
AUTOVACUUM_TRUNCATE_LOCK_TIMEOUT 5000 /* ms */

I gave that the worst workload I can think of. A pgbench (style) 
application that throws about 10 transactions per second at it, so that 
there is constantly the need to give up the lock due to conflicting lock 
requests and then reacquiring it again. A cleanup process is 
periodically moving old tuples from the history table to an archive 
table, making history a rolling window table. And a third job that 2-3 
times per minute produces a 10 second lasting transaction, forcing 
autovacuum to give up on the lock reacquisition.


Even with that workload autovacuum slow but steady is chopping away at 
the table.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c9253a9..fd3e91e 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***
*** 52,62 
--- 52,64 
  #include storage/bufmgr.h
  #include storage/freespace.h
  #include storage/lmgr.h
+ #include storage/proc.h
  #include utils/lsyscache.h
  #include utils/memutils.h
  #include utils/pg_rusage.h
  #include utils/timestamp.h
  #include utils/tqual.h
+ #include portability/instr_time.h
  
  
  /*
*** typedef struct LVRelStats
*** 103,108 
--- 105,111 
  	ItemPointer dead_tuples;	/* array of ItemPointerData */
  	int			num_index_scans;
  	TransactionId latestRemovedXid;
+ 	bool		lock_waiter_detected;
  } LVRelStats;
  
  
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 193,198 
--- 196,203 
  	vacrelstats-old_rel_pages = onerel-rd_rel-relpages;
  	vacrelstats-old_rel_tuples = onerel-rd_rel-reltuples;
  	vacrelstats-num_index_scans = 0;
+ 	vacrelstats-pages_removed = 0;
+ 	vacrelstats-lock_waiter_detected = false;
  
  	/* Open all indexes of the relation */
  	vac_open_indexes(onerel, RowExclusiveLock, nindexes, Irel);
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 259,268 
  		vacrelstats-hasindex,
  		new_frozen_xid);
  
! 	/* report results to the stats collector, too */
! 	pgstat_report_vacuum(RelationGetRelid(onerel),
  		 onerel-rd_rel-relisshared,
  		 new_rel_tuples);
  
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
--- 264,281 
  		vacrelstats-hasindex,
  		new_frozen_xid);
  
! 	/*
! 	 * report results to the stats collector, too.
! 	 * An early terminated lazy_truncate_heap attempt 
! 	 * suppresses the message and also cancels the
! 	 * execution of ANALYZE, if that was ordered.
! 	 */
! 	if (!vacrelstats-lock_waiter_detected)
! 		pgstat_report_vacuum(RelationGetRelid(onerel),
  		 onerel-rd_rel-relisshared,
  		 new_rel_tuples);
+ 	else
+ 		vacstmt-options = ~VACOPT_ANALYZE;
  
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
*** lazy_truncate_heap(Relation onerel, LVRe
*** 1255,1334 
  	BlockNumber old_rel_pages = vacrelstats-rel_pages;
  	BlockNumber new_rel_pages;
  	PGRUsage	ru0;
  
  	pg_rusage_init(ru0);
  
  	/*
! 	 * We need full exclusive lock on the relation in order to do truncation.
! 	 * If we can't get it, give up rather than waiting --- we don't want to
! 	 * block other backends, and we don't want to deadlock (which is quite
! 	 * possible considering we already hold a lower-grade lock).
! 	 */
! 	if (!ConditionalLockRelation(onerel, AccessExclusiveLock))
! 		return;
! 
! 	/*
! 	 * Now that 

Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-06 Thread Robert Haas
On Wed, Dec 5, 2012 at 10:16 PM, Jan Wieck janwi...@yahoo.com wrote:
 On 12/5/2012 2:00 PM, Robert Haas wrote:

 Many it'd be sensible to relate the retry time to the time spend
 vacuuming the table.  Say, if the amount of time spent retrying
 exceeds 10% of the time spend vacuuming the table, with a minimum of
 1s and a maximum of 1min, give up.  That way, big tables will get a
 little more leeway than small tables, which is probably appropriate.

 That sort of dynamic approach would indeed be interesting. But I fear that
 it is going to be complex at best. The amount of time spent in scanning
 heavily depends on the visibility map. The initial vacuum scan of a table
 can take hours or more, but it does update the visibility map even if the
 vacuum itself is aborted later. The next vacuum may scan that table in
 almost no time at all, because it skips all blocks that are marked all
 visible.

Well, if that's true, then there's little reason to worry about giving
up quickly, because the next autovacuum a minute later won't consume
many resources.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-06 Thread Jan Wieck
Kevin and Robert are well aware of most of the below. I just want to put 
this out here so other people, who haven't followed the discussion too 
closely, may chime in.


Some details on the problem:

First of all, there is a minimum number of 1000 pages that the vacuum 
scan must detect as possibly being all empty at the end of a relation. 
Without at least 8MB of possible free space at the end, the code never 
calls lazy_truncate_heap(). This means we don't have to worry about tiny 
relations at all. Any relation that stays under 8MB turnover between 
autovacuum VACUUM runs can never get into this ever.


Relations that have higher turnover than that, but at random places or 
with a high percentage of rather static rows, don't fall into the 
problem category either. They may never accumulate that much contiguous 
free space at the end. The turnover will be reusing free space all over 
the place. So again, lazy_truncate_heap() won't be called ever.


Relations that eventually build up more than 8MB of free space at the 
end aren't automatically a problem. The autovacuum VACUUM scan just 
scanned those pages at the end, which means that the safety scan for 
truncate, done under exclusive lock, is checking exactly those pages at 
the end and most likely they are still in memory. The truncate safety 
scan will be fast due to a 99+% buffer cache hit rate.


The only actual problem case (I have found so far) are rolling window 
tables of significant size, that can bloat multiple times their normal 
size every now and then. This is indeed a rare corner case and I have no 
idea how many users may (unknowingly) be suffering from it.


This rare corner case triggers lazy_truncate_heap() with a significant 
amount of free space to truncate. The table bloats, then all the bloat 
is deleted and the periodic 100% turnover will guarantee that all live 
tuples will shortly after circulate in lower block numbers again, with 
gigabytes of empty space at the end.


This by itself isn't a problem still. The existing code may do the job 
just fine unless there is frequent access to that very table. Only 
at this special combination of circumstances we actually have a problem.


Only now, with a significant amount of free space at the end and 
frequent access to the table, the truncate safety scan takes long enough 
and has to actually read pages from disk to interfere with client 
transactions.


At this point, the truncate safety scan may have to be interrupted to 
let the frequent other traffic go through. This is what we accomplish 
with the autovacuum_truncate_lock_check interval, where we voluntarily 
release the lock whenever someone else needs it. I agree with Kevin that 
a 20ms check interval is reasonable because the code to check this is 
even less expensive than releasing the exclusive lock we're holding.


At the same time, completely giving up and relying on the autovacuum 
launcher to restart another worker isn't as free as it looks like 
either. The next autovacuum worker will have to do the VACUUM scan 
first, before getting to the truncate phase. We cannot just skip blindly 
to the truncate code. With repeated abortion of the truncate, the table 
would deteriorate and accumulate dead tuples again. The removal of dead 
tuples and their index tuples has priority.


As said earlier in the discussion, the VACUUM scan will skip pages, that 
are marked as completely visible. So the scan won't physically read the 
majority of the empty pages at the end of the table over and over. But 
it will at least scan all pages, that had been modified since the last 
VACUUM run.


To me this means that we want to be more generous to the truncate code 
about acquiring the exclusive lock. In my tests, I've seen that a 
rolling window table with a live set of just 10 MB or so, but empty 
space of 3 GB, can still have a 2 minute VACUUM scan time. Throwing that 
work away because we can't acquire the exclusive lock withing 2 seconds 
is a waste of effort.


Something in between 2-60 seconds sounds more reasonable to me.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-05 Thread Robert Haas
On Tue, Dec 4, 2012 at 2:05 PM, Jan Wieck janwi...@yahoo.com wrote:
 So the question on the table is which of these three intervals
 should be GUCs, and what values to use if they aren't.

 I could live with all the above defaults, but would like to see more
 comments on them.

I largely agree with what's already been said.  The only interval that
seems to me to maybe need its own knob is the total time after which
the autovacuum worker will give up.  If we set it to 2 *
deadlock_timeout, some people might find that a reason to raise
deadlock_timeout.  Since people *already* raise deadlock_timeout to
obscenely high values (a minute?  an hour???) and then complain that
things blow up in their face, I think there's a decent argument to be
made that piggybacking anything else on that setting is unwise.

Against that, FWICT, this problem only affects a small number of
users: Jan is the only person I can ever remember reporting this
issue.  I'm not dumb enough to think he's the only person who it
affects; but my current belief is that it's not an enormously common
problem.  So the main argument I can see against adding a GUC is that
the problem is too marginal to justify a setting of its own.  What I
really see as the key issue is: suppose we hardcode this to say 2
seconds.  Is that going to fix the problem effectively for 99% of the
people who have this problem, or for 25% of the people who have this
problem?  In the former case, we probably don't need a GUC; in the
latter case, we probably do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-05 Thread Kevin Grittner
Robert Haas wrote:

 Since people *already* raise deadlock_timeout to obscenely high
 values (a minute? an hour???) and then complain that things blow
 up in their face, I think there's a decent argument to be made
 that piggybacking anything else on that setting is unwise.

If people are really doing that, then I tend to agree. I wasn't
aware of that practice.

 Against that, FWICT, this problem only affects a small number of
 users: Jan is the only person I can ever remember reporting this
 issue. I'm not dumb enough to think he's the only person who it
 affects; but my current belief is that it's not an enormously
 common problem. So the main argument I can see against adding a
 GUC is that the problem is too marginal to justify a setting of
 its own. What I really see as the key issue is: suppose we
 hardcode this to say 2 seconds. Is that going to fix the problem
 effectively for 99% of the people who have this problem, or for
 25% of the people who have this problem? In the former case, we
 probably don't need a GUC; in the latter case, we probably do.

Given the fact that autovacuum will keep throwing workers at it to
essentially loop indefinitely at an outer level, I don't think the
exact setting of this interval is all that critical either. My gut
feel is that anything in the 2 second to 5 second range would be
sane, so I won't argue over any explicit setting within that range.
Below that, I think the overhead of autovacuum coming back to the
table repeatedly would probably start to get too high; below that
we could be causing some small, heavily-updated table to be
neglected by autovacuum -- especially if you get multiple
autovacuum workers tied up in this delay on different tables at the
same time.

Regarding how many people are affected, I have seen several reports
of situations where users claim massive impact on performance when
autovacuum kicks in. The reports have not included enough detail to
quantify the impact or in most cases to establish a cause, but this
seems like it could have a noticable impact, especially if the
deadlock timeout was set to more than a second.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-05 Thread Robert Haas
On Wed, Dec 5, 2012 at 11:24 AM, Kevin Grittner kgri...@mail.com wrote:
 Robert Haas wrote:
 Since people *already* raise deadlock_timeout to obscenely high
 values (a minute? an hour???) and then complain that things blow
 up in their face, I think there's a decent argument to be made
 that piggybacking anything else on that setting is unwise.

 If people are really doing that, then I tend to agree. I wasn't
 aware of that practice.

It's probably not quite common enough to be called a practice, but I
have encountered it a number of times in support situations.  Alas, I
no longer remember the details of exactly what misery it caused, but I
do remember it wasn't good.  :-)

 Against that, FWICT, this problem only affects a small number of
 users: Jan is the only person I can ever remember reporting this
 issue. I'm not dumb enough to think he's the only person who it
 affects; but my current belief is that it's not an enormously
 common problem. So the main argument I can see against adding a
 GUC is that the problem is too marginal to justify a setting of
 its own. What I really see as the key issue is: suppose we
 hardcode this to say 2 seconds. Is that going to fix the problem
 effectively for 99% of the people who have this problem, or for
 25% of the people who have this problem? In the former case, we
 probably don't need a GUC; in the latter case, we probably do.

 Given the fact that autovacuum will keep throwing workers at it to
 essentially loop indefinitely at an outer level, I don't think the
 exact setting of this interval is all that critical either. My gut
 feel is that anything in the 2 second to 5 second range would be
 sane, so I won't argue over any explicit setting within that range.
 Below that, I think the overhead of autovacuum coming back to the
 table repeatedly would probably start to get too high; below that
 we could be causing some small, heavily-updated table to be
 neglected by autovacuum -- especially if you get multiple
 autovacuum workers tied up in this delay on different tables at the
 same time.

I think that part of what's tricky here is that the dynamics of this
problem depend heavily on table size.  I handled one support case
where lowering autovacuum_naptime to 15s was an indispenable part of
the solution, so in that case having an autovacuum worker retry for
more than a few seconds sounds kind of insane.  OTOH, that case
involved a small, rapidly changing table.  If you've got an enormous
table where vacuum takes an hour to chug through all of it, abandoning
the effort to truncate the table after a handful of seconds might
sound equally insane.

Many it'd be sensible to relate the retry time to the time spend
vacuuming the table.  Say, if the amount of time spent retrying
exceeds 10% of the time spend vacuuming the table, with a minimum of
1s and a maximum of 1min, give up.  That way, big tables will get a
little more leeway than small tables, which is probably appropriate.

 Regarding how many people are affected, I have seen several reports
 of situations where users claim massive impact on performance when
 autovacuum kicks in. The reports have not included enough detail to
 quantify the impact or in most cases to establish a cause, but this
 seems like it could have a noticable impact, especially if the
 deadlock timeout was set to more than a second.

Yeah, I agree this could be a cause of those types of reports, but I
don't have any concrete evidence that any of the cases I've worked
were actually due to this specific issue.  The most recent case of
this type I worked on was due to I/O saturation - which, since it
happened to be EC2, really meant network saturation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-05 Thread Jan Wieck

On 12/5/2012 2:00 PM, Robert Haas wrote:

Many it'd be sensible to relate the retry time to the time spend
vacuuming the table.  Say, if the amount of time spent retrying
exceeds 10% of the time spend vacuuming the table, with a minimum of
1s and a maximum of 1min, give up.  That way, big tables will get a
little more leeway than small tables, which is probably appropriate.


That sort of dynamic approach would indeed be interesting. But I fear 
that it is going to be complex at best. The amount of time spent in 
scanning heavily depends on the visibility map. The initial vacuum scan 
of a table can take hours or more, but it does update the visibility map 
even if the vacuum itself is aborted later. The next vacuum may scan 
that table in almost no time at all, because it skips all blocks that 
are marked all visible.


So the total time the scan takes is no yardstick I'd use.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-04 Thread Kevin Grittner
Jan Wieck wrote:

 Thinking about it, I'm not really happy with removing the 
 autovacuum_truncate_lock_check GUC at all.
 
 Fact is that the deadlock detection code and the configuration
 parameter for it should IMHO have nothing to do with all this in
 the first place. A properly implemented application does not
 deadlock.

I don't agree. I believe that in some cases it is possible and
practicable to set access rules which would prevent deadlocks in
application access to a database. In other cases the convolutions
required in the code, the effort in educating dozens or hundreds of
programmers maintaining the code (and keeping the training current
during staff turnover), and the staff time required for compliance
far outweigh the benefit of an occasional transaction retry.
However, it is enough for your argument that there are cases where
it can be done.

 Someone running such a properly implemented application should be
 able to safely set deadlock_timeout to minutes without the
 slightest ill side effect, but with the benefit that the deadlock
 detection code itself does not add to the lock contention. The
 only reason one cannot do so today is because autovacuum's
 truncate phase could then freeze the application with an
 exclusive lock for that long.
 
 I believe the check interval needs to be decoupled from the 
 deadlock_timeout again.

OK

 This will leave us with 2 GUCs at least.

Hmm. What problems do you see with hard-coding reasonable values?
Adding two or three GUC settings for a patch with so little
user-visible impact seems weird. And it seems to me (and also
seemed to Robert) as though the specific values of the other two
settings really aren't that critical as long as they are anywhere
within a reasonable range. Configuring PostgreSQL can be
intimidating enough without adding knobs that really don't do
anything useful. Can you show a case where special values would be
helpful?

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-04 Thread Jan Wieck

On 12/4/2012 8:06 AM, Kevin Grittner wrote:

Jan Wieck wrote:

I believe the check interval needs to be decoupled from the
deadlock_timeout again.


OK


This will leave us with 2 GUCs at least.


Hmm. What problems do you see with hard-coding reasonable values?


The question is what is reasonable?

Lets talk about the time to (re)acquire the lock first. In the cases 
where truncating a table can hurt we are dealing with many gigabytes. 
The original vacuumlazy scan of them can take hours if not days. During 
that scan the vacuum worker has probably spent many hours napping in the 
vacuum delay points. For me 50ms interval for 5 seconds would be 
reasonable for (re)acquiring that lock.


The reasoning behind it being that we need some sort of retry mechanism 
because if autovacuum just gave up the exclusive lock because someone 
needed access, it is more or less guaranteed that the immediate attempt 
to reacquire it will fail until that waiter has committed. But if it 
can't get a lock after 5 seconds, the system seems busy enough so that 
autovacuum should come back much later, when the launcher kicks it off 
again.


I don't care much about occupying that autovacuum worker for a few 
seconds. It just spent hours vacuuming that very table. How much harm 
will a couple more seconds do?


The check interval for the LockHasWaiters() call however depends very 
much on the response time constraints of the application. A 200ms 
interval for example would cause the truncate phase to hold onto the 
exclusive lock for 200ms at least. That means that a steady stream of 
short running transactions would see a 100ms blocking on average, 
200ms max. For many applications that is probably OK. If your response 
time constraint is =50ms on 98% of transactions, you might want to have 
that knob though.


I admit I really have no idea what the most reasonable default for that 
value would be. Something between 50ms and deadlock_timeout/2 I guess.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-04 Thread Kevin Grittner
Jan Wieck wrote:

 [arguments for GUCs]

This is getting confusing. I thought I had already conceded the
case for autovacuum_truncate_lock_try, and you appeared to spend
most of your post arguing for it anyway. I think. It's a little
hard to tell. Perhaps the best thing is to present the issue to the
list and solicit more opinions on what to do. Please correct me if
I mis-state any of this.

The primary problem this patch is solving is that in some
workloads, autovacuum will repeatedly try to truncate the unused
pages at the end of a table, but will continually get canceled
after burning resources because another process wants to acquire a
lock on the table which conflicts with the one held by autovacuum.
This is handled by the deadlock checker, so another process must
block for the deadlock_timeout interval each time. All work done by
the truncate phase of autovacuum is lost on each interrupted
attempt. Statistical information is not updated, so another attempt
will trigger the next time autovacuum looks at whether to vacuum
the table.

It's obvious that this pattern not only fails to release
potentially large amounts of unused space back to the OS, but the
headbanging can continue to consume significant resources and for
an extended period, and the repeated blocking for deadlock_timeout
could cause latency problems.

The patch has the truncate work, which requires
AccessExclusiveLock, check at intervals for whether another process
is waiting on its lock. That interval is one of the timings we need
to determine, and for which a GUC was initially proposed. I think
that the check should be fast enough that doing it once every 20ms
as a hard-coded interval would be good enough. When it sees this
situation, it truncates the file for as far as it has managed to
get, releases its lock on the table, sleeps for an interval, and
then checks to see if the lock has become available again.

How long it should sleep between tries to reacquire the lock is
another possible GUC. Again, I'm inclined to think that this could
be hard-coded. Since autovacuum was knocked off-task after doing
some significant work, I'm inclined to make this interval a little
bigger, but I don't think it matters a whole lot. Anything between
20ms and 100ms seens sane. Maybe 50ms?

At any point that it is unable to acquire the lock, there is a
check for how long this autovacuum task has been starved for the
lock. Initially I was arguing for twice the deadlock_timeout on the
basis that this would probably be short enough not to leave the
autovacuum worker sidelined for too long, but long enough for the
attempt to get past a single deadlock between two other processes.
This is the setting Jan is least willing to concede.

If the autovacuum worker does abandon the attempt, it will keep
retrying, since we go out of our way to prevent the autovacuum
process from updating the statistics based on the incomplete
processing. This last interval is not how long it will attempt to
truncate, but how long it will keep one autovacuum worker making
unsuccessful attempts to acquire the lock before it is put to other
uses. Workers will keep coming back to this table until the
truncate phase is completed, just as it does without the patch; the
difference being that anytime it gets the lock, even briefly, it is
able to persist some progress.

So the question on the table is which of these three intervals
should be GUCs, and what values to use if they aren't.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-04 Thread Jan Wieck

On 12/4/2012 1:51 PM, Kevin Grittner wrote:

Jan Wieck wrote:


[arguments for GUCs]


This is getting confusing. I thought I had already conceded the
case for autovacuum_truncate_lock_try, and you appeared to spend
most of your post arguing for it anyway. I think. It's a little
hard to tell. Perhaps the best thing is to present the issue to the
list and solicit more opinions on what to do. Please correct me if
I mis-state any of this.

The primary problem this patch is solving is that in some
workloads, autovacuum will repeatedly try to truncate the unused
pages at the end of a table, but will continually get canceled
after burning resources because another process wants to acquire a
lock on the table which conflicts with the one held by autovacuum.
This is handled by the deadlock checker, so another process must
block for the deadlock_timeout interval each time. All work done by
the truncate phase of autovacuum is lost on each interrupted
attempt. Statistical information is not updated, so another attempt
will trigger the next time autovacuum looks at whether to vacuum
the table.

It's obvious that this pattern not only fails to release
potentially large amounts of unused space back to the OS, but the
headbanging can continue to consume significant resources and for
an extended period, and the repeated blocking for deadlock_timeout
could cause latency problems.

The patch has the truncate work, which requires
AccessExclusiveLock, check at intervals for whether another process
is waiting on its lock. That interval is one of the timings we need
to determine, and for which a GUC was initially proposed. I think
that the check should be fast enough that doing it once every 20ms
as a hard-coded interval would be good enough. When it sees this
situation, it truncates the file for as far as it has managed to
get, releases its lock on the table, sleeps for an interval, and
then checks to see if the lock has become available again.

How long it should sleep between tries to reacquire the lock is
another possible GUC. Again, I'm inclined to think that this could
be hard-coded. Since autovacuum was knocked off-task after doing
some significant work, I'm inclined to make this interval a little
bigger, but I don't think it matters a whole lot. Anything between
20ms and 100ms seens sane. Maybe 50ms?

At any point that it is unable to acquire the lock, there is a
check for how long this autovacuum task has been starved for the
lock. Initially I was arguing for twice the deadlock_timeout on the
basis that this would probably be short enough not to leave the
autovacuum worker sidelined for too long, but long enough for the
attempt to get past a single deadlock between two other processes.
This is the setting Jan is least willing to concede.

If the autovacuum worker does abandon the attempt, it will keep
retrying, since we go out of our way to prevent the autovacuum
process from updating the statistics based on the incomplete
processing. This last interval is not how long it will attempt to
truncate, but how long it will keep one autovacuum worker making
unsuccessful attempts to acquire the lock before it is put to other
uses. Workers will keep coming back to this table until the
truncate phase is completed, just as it does without the patch; the
difference being that anytime it gets the lock, even briefly, it is
able to persist some progress.


That is all correct.



So the question on the table is which of these three intervals
should be GUCs, and what values to use if they aren't.


I could live with all the above defaults, but would like to see more 
comments on them.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-03 Thread Kevin Grittner
Jan Wieck wrote:

 Attached is a new patch that addresses most of the points raised
 in discussion before.
 
 1) Most of the configuration variables are derived from
 deadlock_timeout now. The check for conflicting lock request
 interval is deadlock_timeout/10, clamped to 10ms. The try to
 acquire exclusive lock interval is deadlock_timeout/20, also
 clamped to 10ms. The only GUC variable remaining is
 autovacuum_truncate_lock_try=2000ms with a range from 0 (just try
 once) to 2ms.

If we're going to keep this GUC, we need docs for it.

 I'd like to point out that this is a significant change in
 functionality as without the config option for the check
 interval, there is no longer any possibility to disable the call
 to LockHasWaiters() and return to the original (deadlock code
 kills autovacuum) behavior.

Arguably we could simplify the deadlock resolution code a little,
but it seems like it is probably safer to leave it as a failsafe,
at least for now.

 I did not touch the part about suppressing the stats and the
 ANALYZE step of auto vacuum+analyze. The situation is no
 different today. When the deadlock code kills autovacuum, stats
 aren't updated either. And this patch is meant to cause
 autovacuum to finish the truncate in a few minutes or hours, so
 that the situation fixes itself.

Unless someone want to argue for more aggressive updating of the
stats, I guess I'll yield to that argument.

Besides the documentation, the only thing I could spot on this
go-around was that this comment probably no longer has a reason for
being since this is no longer conditional and what it's doing is
obvious from the code:

    /* Initialize the starttime if we check for conflicting lock requests */

With docs and dropping the obsolete comment, I think this will be
ready.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-03 Thread Jan Wieck

On 12/3/2012 5:42 PM, Kevin Grittner wrote:

Jan Wieck wrote:


Attached is a new patch that addresses most of the points raised
in discussion before.

1) Most of the configuration variables are derived from
deadlock_timeout now. The check for conflicting lock request
interval is deadlock_timeout/10, clamped to 10ms. The try to
acquire exclusive lock interval is deadlock_timeout/20, also
clamped to 10ms. The only GUC variable remaining is
autovacuum_truncate_lock_try=2000ms with a range from 0 (just try
once) to 2ms.


If we're going to keep this GUC, we need docs for it.


Certainly. But since we're still debating which and how many GUC 
variables we want, I don't think doc-time has come yet.





I'd like to point out that this is a significant change in
functionality as without the config option for the check
interval, there is no longer any possibility to disable the call
to LockHasWaiters() and return to the original (deadlock code
kills autovacuum) behavior.


Arguably we could simplify the deadlock resolution code a little,
but it seems like it is probably safer to leave it as a failsafe,
at least for now.


Thinking about it, I'm not really happy with removing the 
autovacuum_truncate_lock_check GUC at all.


Fact is that the deadlock detection code and the configuration parameter 
for it should IMHO have nothing to do with all this in the first place. 
A properly implemented application does not deadlock. Someone running 
such a properly implemented application should be able to safely set 
deadlock_timeout to minutes without the slightest ill side effect, but 
with the benefit that the deadlock detection code itself does not add to 
the lock contention. The only reason one cannot do so today is because 
autovacuum's truncate phase could then freeze the application with an 
exclusive lock for that long.


I believe the check interval needs to be decoupled from the 
deadlock_timeout again. This will leave us with 2 GUCs at least.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-02 Thread Jan Wieck
Attached is a new patch that addresses most of the points raised in 
discussion before.


1) Most of the configuration variables are derived from deadlock_timeout 
now. The check for conflicting lock request interval is 
deadlock_timeout/10, clamped to 10ms. The try to acquire exclusive 
lock interval is deadlock_timeout/20, also clamped to 10ms. The only 
GUC variable remaining is autovacuum_truncate_lock_try=2000ms with a 
range from 0 (just try once) to 2ms.


I'd like to point out that this is a significant change in functionality 
as without the config option for the check interval, there is no longer 
any possibility to disable the call to LockHasWaiters() and return to 
the original (deadlock code kills autovacuum) behavior.


2) The partition lock in LockHasWaiters() was lowered to LW_SHARED. The 
LW_EXCLUSIVE was indeed a copy/paste result.


3) The instr_time handling was simplified as suggested.

4) Lower case TRUE/FALSE.


I did not touch the part about suppressing the stats and the ANALYZE 
step of auto vacuum+analyze. The situation is no different today. When 
the deadlock code kills autovacuum, stats aren't updated either. And 
this patch is meant to cause autovacuum to finish the truncate in a few 
minutes or hours, so that the situation fixes itself.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c9253a9..e8f88ad 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***
*** 52,62 
--- 52,64 
  #include storage/bufmgr.h
  #include storage/freespace.h
  #include storage/lmgr.h
+ #include storage/proc.h
  #include utils/lsyscache.h
  #include utils/memutils.h
  #include utils/pg_rusage.h
  #include utils/timestamp.h
  #include utils/tqual.h
+ #include portability/instr_time.h
  
  
  /*
*** typedef struct LVRelStats
*** 103,108 
--- 105,111 
  	ItemPointer dead_tuples;	/* array of ItemPointerData */
  	int			num_index_scans;
  	TransactionId latestRemovedXid;
+ 	bool		lock_waiter_detected;
  } LVRelStats;
  
  
*** static TransactionId FreezeLimit;
*** 114,119 
--- 117,126 
  
  static BufferAccessStrategy vac_strategy;
  
+ static int autovacuum_truncate_lock_check;
+ static int autovacuum_truncate_lock_retry;
+ static int autovacuum_truncate_lock_wait;
+ 
  
  /* non-export function prototypes */
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 193,198 
--- 200,207 
  	vacrelstats-old_rel_pages = onerel-rd_rel-relpages;
  	vacrelstats-old_rel_tuples = onerel-rd_rel-reltuples;
  	vacrelstats-num_index_scans = 0;
+ 	vacrelstats-pages_removed = 0;
+ 	vacrelstats-lock_waiter_detected = false;
  
  	/* Open all indexes of the relation */
  	vac_open_indexes(onerel, RowExclusiveLock, nindexes, Irel);
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 259,268 
  		vacrelstats-hasindex,
  		new_frozen_xid);
  
! 	/* report results to the stats collector, too */
! 	pgstat_report_vacuum(RelationGetRelid(onerel),
  		 onerel-rd_rel-relisshared,
  		 new_rel_tuples);
  
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
--- 268,285 
  		vacrelstats-hasindex,
  		new_frozen_xid);
  
! 	/*
! 	 * report results to the stats collector, too.
! 	 * An early terminated lazy_truncate_heap attempt 
! 	 * suppresses the message and also cancels the
! 	 * execution of ANALYZE, if that was ordered.
! 	 */
! 	if (!vacrelstats-lock_waiter_detected)
! 		pgstat_report_vacuum(RelationGetRelid(onerel),
  		 onerel-rd_rel-relisshared,
  		 new_rel_tuples);
+ 	else
+ 		vacstmt-options = ~VACOPT_ANALYZE;
  
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
*** lazy_truncate_heap(Relation onerel, LVRe
*** 1255,1334 
  	BlockNumber old_rel_pages = vacrelstats-rel_pages;
  	BlockNumber new_rel_pages;
  	PGRUsage	ru0;
  
  	pg_rusage_init(ru0);
  
  	/*
! 	 * We need full exclusive lock on the relation in order to do truncation.
! 	 * If we can't get it, give up rather than waiting --- we don't want to
! 	 * block other backends, and we don't want to deadlock (which is quite
! 	 * possible considering we already hold a lower-grade lock).
  	 */
! 	if (!ConditionalLockRelation(onerel, AccessExclusiveLock))
! 		return;
  
  	/*
! 	 * Now that we have exclusive lock, look to see if the rel has grown
! 	 * whilst we were vacuuming with non-exclusive lock.  If so, give up; the
! 	 * newly added pages presumably contain non-deletable tuples.
  	 */
! 	new_rel_pages = RelationGetNumberOfBlocks(onerel);
! 	if (new_rel_pages != old_rel_pages)
  	{
  		/*
! 		 * Note: we intentionally don't update 

Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-29 Thread Jan Wieck

On 11/28/2012 3:33 PM, Kevin Grittner wrote:

Kevin Grittner wrote:


I still need to review the timing calls, since I'm not familiar
with them so it wasn't immediately obvious to me whether they
were being used correctly. I have no reason to believe that they
aren't, but feel I should check.


It seems odd to me that assignment of one instr_time to another is
done with INSTR_TIME_SET_ZERO() of the target followed by
INSTR_TIME_ADD() with the target and the source. It seems to me
that simple assignment would be more readable, and I can't see any
down side.

Why shouldn't:

 INSTR_TIME_SET_ZERO(elapsed);
 INSTR_TIME_ADD(elapsed, currenttime);
 INSTR_TIME_SUBTRACT(elapsed, starttime);

instead be?:

 elapsed = currenttime;
 INSTR_TIME_SUBTRACT(elapsed, starttime);

And why shouldn't:

 INSTR_TIME_SET_ZERO(starttime);
 INSTR_TIME_ADD(starttime, currenttime);

instead be?:

 starttime = currenttime;

Resetting starttime this way seems especially odd.


instr_time is LARGE_INTEGER on Win32 but struct timeval on Unix. Is

starttime = currenttime;

portable if those are structs?




Also, I want to do another pass looking just for off-by-one
errors on blkno. Again, I have no reason to believe that there is
a problem; it just seems like it would be a particularly easy
type of mistake to make and miss when a key structure has this
field:

  BlockNumber nonempty_pages;
 /* actually, last nonempty page + 1 */


No problems found with that.


And I want to test more.


The patch worked as advertised in all my tests, but I became
uncomforatable with the games being played with the last autovacuum
timestamp and the count of dead tuples. Sure, that causes
autovacuum to kick back in until it has dealt with the truncation,
but it makes it hard for a human looking at the stat views to see
what's going on, and I'm not sure it wouldn't lead to bad plans due
to stale statistics.

Personally, I would rather see us add a boolean to indicate that
autovacuum was needed (regardless of the math on the other columns)
and use that to control the retries -- leaving the other columns
free to reflect reality.


You mean to extend the stats by yet another column? The thing is that 
this whole case happens rarely. We see this every other month or so and 
only on a rolling window table after it got severely bloated due to some 
abnormal use (purging disabled for some operational reason). The whole 
situation resolves itself after a few minutes to maybe an hour or two.


Personally I don't think living with a few wrong stats on one table for 
that time is so bad that it warrants changing that much more code.


Lower casing TRUE/FALSE will be done.

If the LW_SHARED is enough in LockHasWaiters(), that's fine too.

I think we have a consensus that the check interval should be derived 
from deadlock_timeout/10 clamped to 10ms. I'm going to remove that GUC.


About the other two, I'm just not sure. Maybe using half the value as 
for the lock waiter interval as the lock retry interval, again clamped 
to 10ms, and simply leaving one GUC that controls how long autovacuum 
should retry this. I'm not too worried about the retry interval. 
However, the problem with that overall retry duration is that this value 
very much depends on the usage pattern of the database. If long running 
transactions (like 5 seconds) are the norm, then a hard coded value of 
2 seconds before autovacuum gives up isn't going to help much.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-29 Thread Tom Lane
Jan Wieck janwi...@yahoo.com writes:
 On 11/28/2012 3:33 PM, Kevin Grittner wrote:
 Resetting starttime this way seems especially odd.

 instr_time is LARGE_INTEGER on Win32 but struct timeval on Unix. Is
  starttime = currenttime;
 portable if those are structs?

Sure.  We rely on struct assignment in lots of places.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-29 Thread Jan Wieck

On 11/29/2012 9:46 AM, Tom Lane wrote:

Jan Wieck janwi...@yahoo.com writes:

On 11/28/2012 3:33 PM, Kevin Grittner wrote:

Resetting starttime this way seems especially odd.



instr_time is LARGE_INTEGER on Win32 but struct timeval on Unix. Is
 starttime = currenttime;
portable if those are structs?


Sure.  We rely on struct assignment in lots of places.


Will be done then.


Thanks,
Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-23 Thread Alvaro Herrera
Jan,

Are you posting an updated patch?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-23 Thread Kevin Grittner
Alvaro Herrera wrote:

 Are you posting an updated patch?

Well, there wasn't exactly a consensus on what should change, so I'll
throw some initial review comments out even though I still need to
check some things in the code and do more testing.

The patch applied cleanly, compiled without warnings, and passed all
tests in `make check-world`.

The previous discussion seemed to me to achieve consensus on the need
for the feature, but a general dislike for adding so many new GUC
settings to configure it. I concur on that. I agree with the feelings
of others that just using deadlock_timeout / 10 (probably clamped to
a minimum of 10ms) would be good instead of
autovacuum_truncate_lock_check. I agree that the values of the other
two settings probably aren't too critical as long as they are fairly
reasonable, which I would define as being 20ms to 100ms with with
retries lasting no more than 2 seconds.  I'm inclined to suggest a
total time of deadlock_timeout * 2 clamped to 2 seconds, but maybe
that is over-engineering it.

There was also a mention of possibly inserting a vacuum_delay_point()
call outside of the AccessExclusiveLock. I don't feel strongly about
it, but if the page scanning cost can be tracked easily, I guess it
is better to do that.

Other than simplifying the code based on eliminating the new GUCs,
the coding issues I found were:

TRUE and FALSE were used as literals.  Since true and false are used
about 10 times as often and current code in the modified files is
mixed, I would recommend the lowercase form. We decided it wasn't
worth the trouble to convert the minority usage over, but I don't see
any reason to add new instances of the minority case.

In LockHasWaiters() the partition lock is acquired using
LW_EXCLUSIVE. I don't see why that can't be LW_SHARED. Was there a
reason for this, or was it just not changed after copy/paste?

I still need to review the timing calls, since I'm not familiar with
them so it wasn't immediately obvious to me whether they were being
used correctly. I have no reason to believe that they aren't, but
feel I should check.

Also, I want to do another pass looking just for off-by-one errors on
blkno. Again, I have no reason to believe that there is a problem; it
just seems like it would be a particularly easy type of mistake to
make and miss when a key structure has this field:

  BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */

And I want to test more.

But if a new version could be created based on the above, that would
be great. Chances seem relatively slim at this point that there will
be much else to change, if anything.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-15 Thread Dimitri Fontaine
Jan Wieck janwi...@yahoo.com writes:
 Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c to
 periodically check, if there is a conflicting lock request waiting. If not,
 keep going. If there is a waiter, truncate the relation to the point checked
 thus far, release the AccessExclusiveLock, then loop back to where we
 acquire this lock in the first place and continue checking/truncating.

I think that maybe we could just bail out after releasing the
AccessExclusiveLock and trust autovacuum to get back to truncating that
relation later. That would allow removing 2 of the 3 GUCs below:

 autovacuum_truncate_lock_check = 100ms # how frequent to check
# for conflicting locks

This is the one remaining. Could we maybe check for lock conflict after
every move backward a page, or some multiple thereof? The goal would be
to ensure that progress is made, while also being aware of concurrent
activity, ala CHECK_FOR_INTERRUPTS().

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-15 Thread Alvaro Herrera
Dimitri Fontaine wrote:
 Jan Wieck janwi...@yahoo.com writes:
  Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c to
  periodically check, if there is a conflicting lock request waiting. If not,
  keep going. If there is a waiter, truncate the relation to the point checked
  thus far, release the AccessExclusiveLock, then loop back to where we
  acquire this lock in the first place and continue checking/truncating.
 
 I think that maybe we could just bail out after releasing the
 AccessExclusiveLock and trust autovacuum to get back to truncating that
 relation later.

That doesn't work, because the truncating code is not reached unless
vacuuming actually took place.  So if you interrupt it, it will just not
get called again later.  Maybe we could have autovacuum somehow invoke
that separately, but that would require that the fact that truncation
was aborted is kept track of somewhere.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-15 Thread Amit Kapila
On Friday, November 16, 2012 4:09 AM Alvaro Herrera wrote:
 Dimitri Fontaine wrote:
  Jan Wieck janwi...@yahoo.com writes:
   Use this lmgr feature inside count_nondeletable_pages() of
 vacuumlazy.c to
   periodically check, if there is a conflicting lock request waiting.
 If not,
   keep going. If there is a waiter, truncate the relation to the point
 checked
   thus far, release the AccessExclusiveLock, then loop back to where
 we
   acquire this lock in the first place and continue
 checking/truncating.
 
  I think that maybe we could just bail out after releasing the
  AccessExclusiveLock and trust autovacuum to get back to truncating
 that
  relation later.
 
 That doesn't work, because the truncating code is not reached unless
 vacuuming actually took place.  So if you interrupt it, it will just not
 get called again later.  Maybe we could have autovacuum somehow invoke
 that separately, but that would require that the fact that truncation
 was aborted is kept track of somewhere.

Won't it have a chance to be handled next time when vacuum will trigger due
to updates/deletes on some other pages.

OTOH, may be next time again the same thing happens and it was not able to
complete the truncate.
So I think it's better to complete first time only, but may be using some
heuristic time for wait and retry rather than
with configuration variables.

With Regards,
Amit Kapila




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-10-29 Thread Robert Haas
On Wed, Oct 24, 2012 at 4:20 PM, Jan Wieck janwi...@yahoo.com wrote:
 This patch does introduce three new postgresql.conf parameters, which I
 would be happy to get rid of if we could derive them from something else.
 Something based on the deadlock timeout may be possible.

 autovacuum_truncate_lock_check = 100ms # how frequent to check
# for conflicting locks
 autovacuum_truncate_lock_retry = 50# how often to try acquiring
# the exclusive lock
 autovacuum_truncate_lock_wait = 20ms   # nap in between attempts

+1 for this general approach.

As you suggested downthread, I think that hard-coding
autovacuum_truncate_lock_check to one-tenth of the deadlock timeout
should be just fine.  For the other two parameters, I doubt we need to
make them configurable at all.  It's not exactly clear what to set
them to, but it does seem clear that the down side of setting them
incorrectly isn't very much as long as the defaults are roughly sane.
Personally, I'd be inclined to retry less frequently but over a
slightly longer time period - say twenty retries, one after every
100ms.  But I wouldn't be upset if we settled on what you've got here,
either.  We just don't want to let the total time we spend waiting for
the lock get too long, because that means pinning down an auto-vacuum
worker that might be critically needed elsewhere.  So the product of
autovacuum_truncate_lock_retry and autovacuum_truncate_lock_wait
probably should not be more than a couple of seconds.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-10-25 Thread Jan Wieck

Steven,

On 10/24/2012 10:46 PM, Stephen Frost wrote:

Jan,

* Jan Wieck (janwi...@yahoo.com) wrote:

This problem has been discussed before. Those familiar with the
subject please skip the next paragraph.


Apologies if this was already thought-of and ruled out for some reason,
but...


Because all the scanning had been done in parallel to normal DB
activity, it needs to verify that all those blocks are still empty.


Would it be possible to use the FSM to figure out if things have changed
since the last scan..?  Does that scan update the FSM, which would then
be updated by another backend in the event that it decided to write
something there?  Or do we consider the FSM to be completely
untrustworthy wrt this (and if so, I don't suppose there's any hope to
using the visibility map...)?


I honestly don't know if we can trust the FSM enough when it comes to 
throwing away heap pages. Can we?




The notion of having to double-scan and the AccessExclusiveLock on the
relation are telling me this work-around, while completely possible,
isn't exactly ideal...


Under normal circumstances with just a few pages to trim off the end 
this is no problem. Those pages were the last pages just scanned by this 
very autovacuum, so they are found in the shared buffers anyway. All the 
second scan does in that case is to fetch the page once more from shared 
buffers to be 100% sure, we are not truncating off new tuples. We 
definitely need the AccessExclusiveLock to prevent someone from 
extending the relation at the end between our check for relation size 
and the truncate. Fetching 50 empty blocks from the buffer cache while 
at it isn't that big of a deal and that is what it normally looks like.


The problem case this patch is dealing with is rolling window tables 
that experienced some bloat. The typical example is a log table, that 
has new data constantly added and the oldest data constantly purged out. 
This data normally rotates through some blocks like a rolling window. If 
for some reason (purging turned off for example) this table bloats by 
several GB and later shrinks back to its normal content, soon all the 
used blocks are at the beginning of the heap and we find tens of 
thousands of empty pages at the end. Only now does the second scan take 
more than 1000ms and autovacuum is at risk to get killed while at it.


Since we have experienced this problem several times now on our 
production systems, something clearly needs to be done. But IMHO it 
doesn't happen often enough to take any risk here.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-10-25 Thread Tom Lane
Jan Wieck janwi...@yahoo.com writes:
 On 10/24/2012 10:46 PM, Stephen Frost wrote:
 Would it be possible to use the FSM to figure out if things have changed
 since the last scan..?  Does that scan update the FSM, which would then
 be updated by another backend in the event that it decided to write
 something there?  Or do we consider the FSM to be completely
 untrustworthy wrt this (and if so, I don't suppose there's any hope to
 using the visibility map...)?

 I honestly don't know if we can trust the FSM enough when it comes to 
 throwing away heap pages. Can we?

No.  Backends are under no obligation to update FSM for each individual
tuple insertion, and typically don't do so.

More to the point, you have to take AccessExclusiveLock *anyway*,
because this is interlocking not only against new insertions but plain
read-only seqscans: if a seqscan falls off the end of the table it will
be very unhappy.  So I don't see where we'd buy anything by consulting
the FSM.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-10-25 Thread Jan Wieck

On 10/25/2012 9:45 AM, Tom Lane wrote:

Jan Wieck janwi...@yahoo.com writes:

On 10/24/2012 10:46 PM, Stephen Frost wrote:

Would it be possible to use the FSM to figure out if things have changed
since the last scan..?  Does that scan update the FSM, which would then
be updated by another backend in the event that it decided to write
something there?  Or do we consider the FSM to be completely
untrustworthy wrt this (and if so, I don't suppose there's any hope to
using the visibility map...)?



I honestly don't know if we can trust the FSM enough when it comes to
throwing away heap pages. Can we?


No.  Backends are under no obligation to update FSM for each individual
tuple insertion, and typically don't do so.

More to the point, you have to take AccessExclusiveLock *anyway*,
because this is interlocking not only against new insertions but plain
read-only seqscans: if a seqscan falls off the end of the table it will
be very unhappy.  So I don't see where we'd buy anything by consulting
the FSM.


Thank you.

One thing that I haven't mentioned yet is that with this patch, we could 
actually insert a vacuum_delay_point() into the loop in 
count_nondeletable_pages(). We no longer cling to the exclusive lock but 
rather get out of the way as soon as somebody needs the table. Under 
this condition we no longer need to do the second scan full bore.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-10-25 Thread Stephen Frost
Jan,

* Jan Wieck (janwi...@yahoo.com) wrote:
 The problem case this patch is dealing with is rolling window tables
 that experienced some bloat. The typical example is a log table,
 that has new data constantly added and the oldest data constantly
 purged out. This data normally rotates through some blocks like a
 rolling window. If for some reason (purging turned off for example)
 this table bloats by several GB and later shrinks back to its normal
 content, soon all the used blocks are at the beginning of the heap
 and we find tens of thousands of empty pages at the end. Only now
 does the second scan take more than 1000ms and autovacuum is at risk
 to get killed while at it.

My concern is that this could certainly also happen to a heavily updated
table in an OLTP type of environment where the requirement to take a
heavy lock to clean it up might prevent it from ever happening..  I was
simply hoping we could find a mechanism to lock just those pages we're
getting ready to nuke rather than the entire relation.  Perhaps we can
consider how to make those changes alongside of changes to eliminate or
reduce the extent locking that has been painful (for me at least) when
doing massive parallel loads into a table.

 Since we have experienced this problem several times now on our
 production systems, something clearly needs to be done. But IMHO it
 doesn't happen often enough to take any risk here.

I'm not advocating a 'do-nothing' approach, was just looking for another
option that might allow for this work to happen on the heap in parallel
with regular access.  Since we havn't got any way to do that currently,
+1 for moving forward with this as it clearly improves the current
situation.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-10-25 Thread Jan Wieck

On 10/25/2012 10:12 AM, Stephen Frost wrote:

Jan,

* Jan Wieck (janwi...@yahoo.com) wrote:

The problem case this patch is dealing with is rolling window tables
that experienced some bloat. The typical example is a log table,
that has new data constantly added and the oldest data constantly
purged out. This data normally rotates through some blocks like a
rolling window. If for some reason (purging turned off for example)
this table bloats by several GB and later shrinks back to its normal
content, soon all the used blocks are at the beginning of the heap
and we find tens of thousands of empty pages at the end. Only now
does the second scan take more than 1000ms and autovacuum is at risk
to get killed while at it.


My concern is that this could certainly also happen to a heavily updated
table in an OLTP type of environment where the requirement to take a
heavy lock to clean it up might prevent it from ever happening..  I was
simply hoping we could find a mechanism to lock just those pages we're
getting ready to nuke rather than the entire relation.  Perhaps we can
consider how to make those changes alongside of changes to eliminate or
reduce the extent locking that has been painful (for me at least) when
doing massive parallel loads into a table.


I've been testing this with loads of 20 writes/s to that bloated table. 
Preventing not only the clean up, but the following ANALYZE as well is 
precisely what happens. There may be multiple ways how to get into this 
situation, but once you're there the symptoms are the same. Vacuum fails 
to truncate it and causing a 1 second hiccup every minute, while vacuum 
is holding the exclusive lock until the deadlock detection code of 
another transaction kills it.


My patch doesn't change the logic how we ensure that we don't zap any 
data by accident with the truncate and Tom's comments suggest we should 
stick to it. It only makes autovacuum check frequently if the 
AccessExclusiveLock is actually blocking anyone and then get out of the 
way.


I would rather like to discuss any ideas how to do all this without 3 
new GUCs.


In the original code, the maximum delay that autovacuum can cause by 
holding the exclusive lock is one deadlock_timeout (default 1s). It 
would appear reasonable to me to use max(deadlock_timeout/10,10ms) as 
the interval to check for a conflicting lock request. For another 
transaction that needs to access the table this is 10 times faster than 
it is now and still guarantees that autovacuum will make some progress 
with the truncate.


The other two GUCs control how often and how fast autovacuum tries to 
acquire the exclusive lock in the first place. Since we actively release 
the lock *because someone needs it* it is pretty much guaranteed that 
the immediate next lock attempt fails. We on purpose do a 
ConditionalLockRelation() because there is a chance to deadlock. The 
current code only tries one lock attempt and gives up immediately. I 
don't know from what to derive a good value for how long to retry, but 
the nap time in between tries could be a hardcoded 20ms or using the 
cost based vacuum nap time (which defaults to 20ms).


Any other ideas are welcome.


Thanks,
Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-10-25 Thread Alvaro Herrera
Jan Wieck wrote:

 In the original code, the maximum delay that autovacuum can cause by
 holding the exclusive lock is one deadlock_timeout (default 1s). It
 would appear reasonable to me to use max(deadlock_timeout/10,10ms)
 as the interval to check for a conflicting lock request. For another
 transaction that needs to access the table this is 10 times faster
 than it is now and still guarantees that autovacuum will make some
 progress with the truncate.

So you would be calling GetCurrentTimestamp() continuously?  Since you
mentioned adding a vacuum delay point I wonder if it would make sense to
test for lockers each time it would consider going to sleep, instead.
(One hazard to keep in mind is the case where no vacuum delay is
configured.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] autovacuum truncate exclusive lock round two

2012-10-24 Thread Jan Wieck
This problem has been discussed before. Those familiar with the subject 
please skip the next paragraph.


When autovacuum finds a substantial amount of empty pages at the end of 
a relation, it attempts to truncate it in lazy_truncate_heap(). Because 
all the scanning had been done in parallel to normal DB activity, it 
needs to verify that all those blocks are still empty. To do that 
autovacuum grabs an AccessExclusiveLock on the relation, then scans 
backwards to the last non-empty page. If any other backend needs to 
access that table during this time, it will kill the autovacuum from the 
deadlock detection code, which by default is done after a 1000ms 
timeout. The autovacuum launcher will start another vacuum after 
(default) 60 seconds, which most likely is getting killed again, and 
again, and again. The net result of this is that the table is never 
truncated and every 60 seconds there is a 1 second hiccup before the 
autovacuum is killed.


Proposal:

Add functions to lmgr that are derived from the lock release code, but 
instead of releasing the lock and waking up waiters, just return a 
boolean telling if there are any waiters that would be woken up if this 
lock was released.


Use this lmgr feature inside count_nondeletable_pages() of vacuumlazy.c 
to periodically check, if there is a conflicting lock request waiting. 
If not, keep going. If there is a waiter, truncate the relation to the 
point checked thus far, release the AccessExclusiveLock, then loop back 
to where we acquire this lock in the first place and continue 
checking/truncating.


I have a working patch here:

https://github.com/wieck/postgres/tree/autovacuum-truncate-lock

This patch does introduce three new postgresql.conf parameters, which I 
would be happy to get rid of if we could derive them from something 
else. Something based on the deadlock timeout may be possible.


autovacuum_truncate_lock_check = 100ms # how frequent to check
   # for conflicting locks
autovacuum_truncate_lock_retry = 50# how often to try acquiring
   # the exclusive lock
autovacuum_truncate_lock_wait = 20ms   # nap in between attempts

With these settings, I see the truncate of a bloated table progressing 
at a rate of 3 minutes per GB, while that table is accessed 20 times per 
second.


The original kill autovacuum mechanism in the deadlock code is still 
there. All this code really does is 10 lmgr lookups per second and 
releasing the AccessExclusiveLock if there are any waiters. I don't 
think it can get any cheaper than this.


I am attaching a script that uses pgbench to demonstrate the actual 
problem of a bloated table with significant empty pages at the end.



Comments?


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


t1.autovac-lock-issue.tgz
Description: application/compressed

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-10-24 Thread Jan Wieck

Here is the patch for it.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c9253a9..9f880f0 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***
*** 57,62 
--- 57,63 
  #include utils/pg_rusage.h
  #include utils/timestamp.h
  #include utils/tqual.h
+ #include portability/instr_time.h
  
  
  /*
*** typedef struct LVRelStats
*** 103,108 
--- 104,110 
  	ItemPointer dead_tuples;	/* array of ItemPointerData */
  	int			num_index_scans;
  	TransactionId latestRemovedXid;
+ 	bool		lock_waiter_detected;
  } LVRelStats;
  
  
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 193,198 
--- 195,202 
  	vacrelstats-old_rel_pages = onerel-rd_rel-relpages;
  	vacrelstats-old_rel_tuples = onerel-rd_rel-reltuples;
  	vacrelstats-num_index_scans = 0;
+ 	vacrelstats-pages_removed = 0;
+ 	vacrelstats-lock_waiter_detected = false;
  
  	/* Open all indexes of the relation */
  	vac_open_indexes(onerel, RowExclusiveLock, nindexes, Irel);
*** lazy_vacuum_rel(Relation onerel, VacuumS
*** 259,268 
  		vacrelstats-hasindex,
  		new_frozen_xid);
  
! 	/* report results to the stats collector, too */
! 	pgstat_report_vacuum(RelationGetRelid(onerel),
  		 onerel-rd_rel-relisshared,
  		 new_rel_tuples);
  
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
--- 263,280 
  		vacrelstats-hasindex,
  		new_frozen_xid);
  
! 	/*
! 	 * report results to the stats collector, too.
! 	 * An early terminated lazy_truncate_heap attempt 
! 	 * suppresses the message and also cancels the
! 	 * execution of ANALYZE, if that was ordered.
! 	 */
! 	if (!vacrelstats-lock_waiter_detected)
! 		pgstat_report_vacuum(RelationGetRelid(onerel),
  		 onerel-rd_rel-relisshared,
  		 new_rel_tuples);
+ 	else
+ 		vacstmt-options = ~VACOPT_ANALYZE;
  
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess()  Log_autovacuum_min_duration = 0)
*** lazy_truncate_heap(Relation onerel, LVRe
*** 1255,1334 
  	BlockNumber old_rel_pages = vacrelstats-rel_pages;
  	BlockNumber new_rel_pages;
  	PGRUsage	ru0;
  
  	pg_rusage_init(ru0);
  
  	/*
! 	 * We need full exclusive lock on the relation in order to do truncation.
! 	 * If we can't get it, give up rather than waiting --- we don't want to
! 	 * block other backends, and we don't want to deadlock (which is quite
! 	 * possible considering we already hold a lower-grade lock).
! 	 */
! 	if (!ConditionalLockRelation(onerel, AccessExclusiveLock))
! 		return;
! 
! 	/*
! 	 * Now that we have exclusive lock, look to see if the rel has grown
! 	 * whilst we were vacuuming with non-exclusive lock.  If so, give up; the
! 	 * newly added pages presumably contain non-deletable tuples.
  	 */
! 	new_rel_pages = RelationGetNumberOfBlocks(onerel);
! 	if (new_rel_pages != old_rel_pages)
  	{
  		/*
! 		 * Note: we intentionally don't update vacrelstats-rel_pages with the
! 		 * new rel size here.  If we did, it would amount to assuming that the
! 		 * new pages are empty, which is unlikely.	Leaving the numbers alone
! 		 * amounts to assuming that the new pages have the same tuple density
! 		 * as existing ones, which is less unlikely.
  		 */
! 		UnlockRelation(onerel, AccessExclusiveLock);
! 		return;
! 	}
  
! 	/*
! 	 * Scan backwards from the end to verify that the end pages actually
! 	 * contain no tuples.  This is *necessary*, not optional, because other
! 	 * backends could have added tuples to these pages whilst we were
! 	 * vacuuming.
! 	 */
! 	new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
  
! 	if (new_rel_pages = old_rel_pages)
! 	{
! 		/* can't do anything after all */
! 		UnlockRelation(onerel, AccessExclusiveLock);
! 		return;
! 	}
  
! 	/*
! 	 * Okay to truncate.
! 	 */
! 	RelationTruncate(onerel, new_rel_pages);
  
! 	/*
! 	 * We can release the exclusive lock as soon as we have truncated.	Other
! 	 * backends can't safely access the relation until they have processed the
! 	 * smgr invalidation that smgrtruncate sent out ... but that should happen
! 	 * as part of standard invalidation processing once they acquire lock on
! 	 * the relation.
! 	 */
! 	UnlockRelation(onerel, AccessExclusiveLock);
  
! 	/*
! 	 * Update statistics.  Here, it *is* correct to adjust rel_pages without
! 	 * also touching reltuples, since the tuple count wasn't changed by the
! 	 * truncation.
! 	 */
! 	vacrelstats-rel_pages = new_rel_pages;
! 	vacrelstats-pages_removed = old_rel_pages - new_rel_pages;
  
! 	ereport(elevel,
! 			(errmsg(\%s\: truncated %u to %u pages,
! 	RelationGetRelationName(onerel),
! 	old_rel_pages, new_rel_pages),
! 			 errdetail(%s.,
! 	   pg_rusage_show(ru0;
  }
  
  /*
--- 1267,1388 
 

Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-10-24 Thread Stephen Frost
Jan,

* Jan Wieck (janwi...@yahoo.com) wrote:
 This problem has been discussed before. Those familiar with the
 subject please skip the next paragraph.

Apologies if this was already thought-of and ruled out for some reason,
but...

 Because all the scanning had been done in parallel to normal DB
 activity, it needs to verify that all those blocks are still empty.

Would it be possible to use the FSM to figure out if things have changed
since the last scan..?  Does that scan update the FSM, which would then
be updated by another backend in the event that it decided to write
something there?  Or do we consider the FSM to be completely
untrustworthy wrt this (and if so, I don't suppose there's any hope to
using the visibility map...)?

The notion of having to double-scan and the AccessExclusiveLock on the
relation are telling me this work-around, while completely possible,
isn't exactly ideal...

Perhaps another option would be a page-level or something which is
larger than per-row (strikes me as a lot of overhead for this and it's
not clear how we'd do it), but less than an entire relation, but there
are certainly pain points there too.

Thanks,

Stephen


signature.asc
Description: Digital signature