At Fri, 5 Oct 2018 15:35:04 +0900, Michael Paquier <mich...@paquier.xyz> wrote 
in <20181005063504.gb14...@paquier.xyz>
> On Fri, Oct 05, 2018 at 12:16:03PM +0900, Michael Paquier wrote:
> > So, I have come back to this stuff, and finished with the attached
> > instead, so as the assertion is in a single place.  I find that
> > clearer.  The comments have also been improved.  Thoughts?
> 
> And so...  I have been looking at committing this thing, and while
> testing in-depth I have been able to trigger a case where an autovacuum
> has been able to be not aggressive but anti-wraparound, which is exactly
> what should not be possible, no?  I have simply created an instance with
> autovacuum_freeze_max_age = 200000, then ran pgbench with
> autovacuum_freeze_table_age=200000 set for each table, and also ran
> installcheck-world in parallel.  This has been able to trigger the
> assertion pretty quickly.

I investigated it and in short, it can happen.

It is a kind of race consdition between two autovacuum
processes. do_autovacuum() looks into pg_class (using a snapshot)
and vacuum_set_xid_limits() looks into relcache. If concurrent
vacuum happens and one has finished the relation, another gets
relcache invalidation and relfrozenxid is updated. If this
happens between do_autovacuum() and vacuum_set_xid_limits(), the
latter sees newer relfrozenxid than the former. The problem
happens when it moves by more than 5% of
autovacuum_freeze_max_age.

If lazy_vacuum_rel() sees the situation, the relation is already
aggressively vacuumed by a cocurrent worker. We can just ingore
the state safely but also we know that the vacuum is useless.

1. Just allow the case there (and add comment).
   Causes redundant anti-wraparound vacuum.

2. Skip the relation by the condition.

   I think that we can safely skip the relation in the
   case. (attached)

3. Ensure that do_autovacuum always sees the same relfrozenxid
   with vacuum_set_xid_limits().

4. Prevent concurrent acuuming of the same relation rigorously,
  somehow.

Thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 8996d366e9..436d573454 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -249,6 +249,21 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	if (options & VACOPT_DISABLE_PAGE_SKIPPING)
 		aggressive = true;
 
+	/*
+	 * When is_wraparound, relfrozenxid is old enough and aggressive must be
+	 * true here. However, if another concurrent vacuum process has processed
+	 * this relation, relfrozenxid in relcache can be moved forward enough so
+	 * that aggressive is calculated as false here. This relation no longer
+	 * needs to be vacuumed. Bail out.
+	 */
+	if (params->is_wraparound && !aggressive)
+	{
+		elog(DEBUG1, "relation %d has been vacuumd ocncurrently, skip",
+			onerel->rd_id);
+		return;
+	}
+
+
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
 
 	vacrelstats->old_rel_pages = onerel->rd_rel->relpages;

Reply via email to