[Thanks to Robert to stepping up to keep this moving while I was down yesterday with a minor injury. I'm back on it today.]
On Wed, Jun 8, 2016 at 3:11 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Jun 8, 2016 at 4:04 PM, Kevin Grittner <kgri...@gmail.com> wrote: >> -- connection 1 >> drop table if exists t1; >> create table t1 (c1 int not null); >> insert into t1 select generate_series(1, 1000000); >> begin transaction isolation level repeatable read; >> select 1; >> >> -- connection 2 >> insert into t2 values (1); >> delete from t1 where c1 between 200000 and 299999; >> delete from t1 where c1 = 1000000; >> vacuum analyze verbose t1; >> select pg_sleep_for('2min'); >> vacuum analyze verbose t1; -- repeat if needed until dead rows vacuumed >> >> -- connection 1 >> select c1 from t1 where c1 = 100; >> select c1 from t1 where c1 = 250000; >> >> The problem occurs when an index is built while an old snapshot >> exists which can't see the effects of early pruning/vacuuming. The >> fix prevents use of such an index until all snapshots early enough >> to have a problem have been released. > > This example doesn't seem to involve any CREATE INDEX or REINDEX > operations, so I'm a little confused. Sorry; pasto -- the index build is supposed to be on connection 2 right after the dead rows are confirmed vacuumed away: create index t1_c1 on t1 (c1); > Generally, I think I see the hazard you're concerned about: I had > failed to realize, as you mentioned upthread, that new index pages > would have an LSN of 0. So if a tuple is pruned early and then the > index is reindexed, old snapshots won't realize that data is missing. > What I'm less certain about is whether you can actually get by with > reusing ii_BrokenHotChain to handle this case. v2 and later does not do that. v1 did, but that was a more blunt instrument. > For example, note this comment: > > * However, when reindexing an existing index, we should do nothing here. > * Any HOT chains that are broken with respect to the index must predate > * the index's original creation, so there is no need to change the > * index's usability horizon. Moreover, we *must not* try to change the > * index's pg_index entry while reindexing pg_index itself, and this > * optimization nicely prevents that. > > This logic doesn't apply to the old snapshot case; there, you'd need > to distrust the index whether it was an initial build or a REINDEX, > but it doesn't look like that's what the patch does. I'm worried > there could be other places where we rely on ii_BrokenHotChain to > detect only one specific condition that isn't quite the same as what > you're trying to use it for here. Well spotted. I had used a lot of discreet calls to get that reindex logic right, but it was verbose and ugly, so I had just added the new macros in this patch and started to implement them before I knocked off for the day. At handover I was too distracted to remember where I was with it. :-( See if it looks right to you now. Attached is v3. I will commit this patch to resolve this issue tomorrow, barring any objections before then. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 31a1438..449a665 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -2040,18 +2040,24 @@ index_build(Relation heapRelation, /* * If we found any potentially broken HOT chains, mark the index as not * being usable until the current transaction is below the event horizon. - * See src/backend/access/heap/README.HOT for discussion. + * See src/backend/access/heap/README.HOT for discussion. Also set this + * if early pruning/vacuuming is enabled for the heap relation. While it + * might become safe to use the index earlier based on actual cleanup + * activity and other active transactions, the test for that would be much + * more complex and would require some form of blocking, so keep it simple + * and fast by just using the current transaction. * * However, when reindexing an existing index, we should do nothing here. * Any HOT chains that are broken with respect to the index must predate * the index's original creation, so there is no need to change the * index's usability horizon. Moreover, we *must not* try to change the * index's pg_index entry while reindexing pg_index itself, and this - * optimization nicely prevents that. + * optimization nicely prevents that. The more complex rules needed for a + * reindex are handled separately after this function returns. * * We also need not set indcheckxmin during a concurrent index build, * because we won't set indisvalid true until all transactions that care - * about the broken HOT chains are gone. + * about the broken HOT chains or early pruning/vacuuming are gone. * * Therefore, this code path can only be taken during non-concurrent * CREATE INDEX. Thus the fact that heap_update will set the pg_index @@ -2060,7 +2066,8 @@ index_build(Relation heapRelation, * about any concurrent readers of the tuple; no other transaction can see * it yet. */ - if (indexInfo->ii_BrokenHotChain && !isreindex && + if ((indexInfo->ii_BrokenHotChain || EarlyVacuumEnabled(heapRelation)) && + !isreindex && !indexInfo->ii_Concurrent) { Oid indexId = RelationGetRelid(indexRelation); @@ -3389,6 +3396,11 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, * reindexing pg_index itself, we must not try to update tuples in it. * pg_index's indexes should always have these flags in their clean state, * so that won't happen. + * + * If early pruning/vacuuming is enabled for the heap relation, the + * usability horizon must be advanced to the current transaction on every + * build or rebuild. pg_index is OK in this regard because catalog tables + * are not subject to early cleanup. */ if (!skipped_constraint) { @@ -3396,6 +3408,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, HeapTuple indexTuple; Form_pg_index indexForm; bool index_bad; + bool early_vacuum_enabled = EarlyVacuumEnabled(heapRelation); pg_index = heap_open(IndexRelationId, RowExclusiveLock); @@ -3409,11 +3422,12 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, !indexForm->indisready || !indexForm->indislive); if (index_bad || - (indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain)) + (indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain) || + early_vacuum_enabled) { - if (!indexInfo->ii_BrokenHotChain) + if (!indexInfo->ii_BrokenHotChain && !early_vacuum_enabled) indexForm->indcheckxmin = false; - else if (index_bad) + else if (index_bad || early_vacuum_enabled) indexForm->indcheckxmin = true; indexForm->indisvalid = true; indexForm->indisready = true; diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 8a830d4..4e50b19 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -4290,8 +4290,7 @@ IssuePendingWritebacks(WritebackContext *context) void TestForOldSnapshot_impl(Snapshot snapshot, Relation relation) { - if (!IsCatalogRelation(relation) - && !RelationIsAccessibleInLogicalDecoding(relation) + if (RelationAllowsEarlyVacuum(relation) && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp()) ereport(ERROR, (errcode(ERRCODE_SNAPSHOT_TOO_OLD), diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c index f8a2a83..2eabd1c 100644 --- a/src/backend/utils/time/snapmgr.c +++ b/src/backend/utils/time/snapmgr.c @@ -1597,10 +1597,7 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin, { if (TransactionIdIsNormal(recentXmin) && old_snapshot_threshold >= 0 - && RelationNeedsWAL(relation) - && !IsCatalogRelation(relation) - && !RelationIsAccessibleInLogicalDecoding(relation) - && !RelationHasUnloggedIndex(relation)) + && RelationAllowsEarlyVacuum(relation)) { int64 ts = GetSnapshotCurrentTimestamp(); TransactionId xlimit = recentXmin; diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h index 4270696..effdb0c 100644 --- a/src/include/utils/snapmgr.h +++ b/src/include/utils/snapmgr.h @@ -31,6 +31,19 @@ #define OLD_SNAPSHOT_PADDING_ENTRIES 10 #define OLD_SNAPSHOT_TIME_MAP_ENTRIES (old_snapshot_threshold + OLD_SNAPSHOT_PADDING_ENTRIES) +/* + * Common definition of relation properties that allow early pruning/vacuuming + * when old_snapshot_threshold >= 0. + */ +#define RelationAllowsEarlyVacuum(rel) \ +( \ + RelationNeedsWAL(rel) \ + && !IsCatalogRelation(rel) \ + && !RelationIsAccessibleInLogicalDecoding(rel) \ + && !RelationHasUnloggedIndex(rel) \ +) + +#define EarlyVacuumEnabled(rel) (old_snapshot_threshold >= 0 && RelationAllowsEarlyVacuum(rel)) /* GUC variables */ extern PGDLLIMPORT int old_snapshot_threshold;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers