Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On 2021-Feb-23, Masahiko Sawada wrote: > I've looked at the v3 patch and it looks good to me. A minor comment is: > > + /* Catalog tables need to consider all backends. */ > + h->catalog_oldest_nonremovable = > + TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); > + > > I think that the above comment is not accurate since we consider only > backends on the same database in some cases. You're right. Adjusted. > the patch makes the following change: > > @@ -1847,7 +1871,6 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) > h->shared_oldest_nonremovable = > TransactionIdOlder(h->shared_oldest_nonremovable, >h->slot_catalog_xmin); > - h->catalog_oldest_nonremovable = h->data_oldest_nonremovable; > h->catalog_oldest_nonremovable = > TransactionIdOlder(h->catalog_oldest_nonremovable, >h->slot_catalog_xmin); > > In the original code, catalog_oldest_nonremovable is initialized with > data_oldest_nonremovable that considered slot_xmin. Therefore, IIUC > catalog_oldest_nonremovable is the oldest transaction id among > data_oldest_nonremovable, slot_xmin, and slot_catalog_xmin. But with > the patch, catalog_oldest_nonremovable doesn't consider slot_xmin. It > seems okay to me as catalog_oldest_nonremovable is interested in only > slot_catalog_xmin. I tend to agree -- it seems safe to ignore slot_xmin in that case. However, tracing through the maze of xmin vs. catalog_xmin transmission across the whole system is not trivial. I changed the patch so that we adjust to slot_xmin first, then to slot_catalog_xmin. I *think* the end effect should be the same. But if we want to make that claim to save that one assignment, we can make it in a separate commit. However, given that this comparison and assignment is outside the locked area, I don't think it's worth bothering with. Thanks for the input! I have pushed this patch. -- Álvaro Herrera Valdivia, Chile "El sentido de las cosas no viene de las cosas, sino de las inteligencias que las aplican a sus problemas diarios en busca del progreso." (Ernesto Hernández-Novich)
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On Tue, Feb 23, 2021 at 9:15 AM Álvaro Herrera wrote: > > On 2021-Feb-22, Álvaro Herrera wrote: > > > Here's an updated patch. > > > > I haven't added commentary or documentation to account for the new > > assumption, per Matthias' comment and Robert's discussion thereof. > > Done that. I also restructured the code a bit, since one line in > ComputedXidHorizons was duplicative. I've looked at the v3 patch and it looks good to me. A minor comment is: + /* Catalog tables need to consider all backends. */ + h->catalog_oldest_nonremovable = + TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); + I think that the above comment is not accurate since we consider only backends on the same database in some cases. FWIW, just for the record, around the following original code, /* * Check whether there are replication slots requiring an older xmin. */ h->shared_oldest_nonremovable = TransactionIdOlder(h->shared_oldest_nonremovable, h->slot_xmin); h->data_oldest_nonremovable = TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin); /* * The only difference between catalog / data horizons is that the slot's * catalog xmin is applied to the catalog one (so catalogs can be accessed * for logical decoding). Initialize with data horizon, and then back up * further if necessary. Have to back up the shared horizon as well, since * that also can contain catalogs. */ h->shared_oldest_nonremovable_raw = h->shared_oldest_nonremovable; h->shared_oldest_nonremovable = TransactionIdOlder(h->shared_oldest_nonremovable, h->slot_catalog_xmin); h->catalog_oldest_nonremovable = h->data_oldest_nonremovable; h->catalog_oldest_nonremovable = TransactionIdOlder(h->catalog_oldest_nonremovable, h->slot_catalog_xmin); the patch makes the following change: @@ -1847,7 +1871,6 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) h->shared_oldest_nonremovable = TransactionIdOlder(h->shared_oldest_nonremovable, h->slot_catalog_xmin); - h->catalog_oldest_nonremovable = h->data_oldest_nonremovable; h->catalog_oldest_nonremovable = TransactionIdOlder(h->catalog_oldest_nonremovable, h->slot_catalog_xmin); In the original code, catalog_oldest_nonremovable is initialized with data_oldest_nonremovable that considered slot_xmin. Therefore, IIUC catalog_oldest_nonremovable is the oldest transaction id among data_oldest_nonremovable, slot_xmin, and slot_catalog_xmin. But with the patch, catalog_oldest_nonremovable doesn't consider slot_xmin. It seems okay to me as catalog_oldest_nonremovable is interested in only slot_catalog_xmin. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On 2021-Feb-22, Álvaro Herrera wrote: > Here's an updated patch. > > I haven't added commentary or documentation to account for the new > assumption, per Matthias' comment and Robert's discussion thereof. Done that. I also restructured the code a bit, since one line in ComputedXidHorizons was duplicative. -- Álvaro Herrera39°49'30"S 73°17'W Tom: There seems to be something broken here. Teodor: I'm in sackcloth and ashes... Fixed. http://archives.postgresql.org/message-id/482d1632.8010...@sigaev.ru >From 98fa9ae33cb890e1d24a0c9be3a14139f15fdafe Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 15 Jan 2021 14:03:14 -0300 Subject: [PATCH v3] Teach VACUUM to ignore CIC Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql --- src/backend/storage/ipc/procarray.c | 39 +++-- src/backend/utils/misc/guc.c| 2 +- 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index a5daea8957..9c88f58c7f 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -1610,7 +1610,13 @@ TransactionIdIsActive(TransactionId xid) * relations that's not required, since only backends in my own database could * ever see the tuples in them. Also, we can ignore concurrently running lazy * VACUUMs because (a) they must be working on other tables, and (b) they - * don't need to do snapshot-based lookups. + * don't need to do snapshot-based lookups. Similarly, for the non-catalog + * horizon, we can ignore CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY + * when they are working on non-partial, non-expressional indexes, for the + * same reasons and because they can't run in transaction blocks. (They are + * not possible to ignore for catalogs, because CIC and RC do some catalog + * operations.) Do note that this means that CIC and RC must use a lock level + * that conflicts with VACUUM. * * This also computes a horizon used to truncate pg_subtrans. For that * backends in all databases have to be considered, and concurrently running @@ -1660,9 +1666,6 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) bool in_recovery = RecoveryInProgress(); TransactionId *other_xids = ProcGlobal->xids; - /* inferred after ProcArrayLock is released */ - h->catalog_oldest_nonremovable = InvalidTransactionId; - LWLockAcquire(ProcArrayLock, LW_SHARED); h->latest_completed = ShmemVariableCache->latestCompletedXid; @@ -1682,6 +1685,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) h->oldest_considered_running = initial; h->shared_oldest_nonremovable = initial; + h->catalog_oldest_nonremovable = initial; h->data_oldest_nonremovable = initial; /* @@ -1752,7 +1756,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) if (statusFlags & (PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING)) continue; - /* shared tables need to take backends in all database into account */ + /* shared tables need to take backends in all databases into account */ h->shared_oldest_nonremovable = TransactionIdOlder(h->shared_oldest_nonremovable, xmin); @@ -1773,11 +1777,26 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) MyDatabaseId == InvalidOid || proc->databaseId == MyDatabaseId || proc->databaseId == 0) /* always include WalSender */ { - h->data_oldest_nonremovable = -TransactionIdOlder(h->data_oldest_nonremovable, xmin); + /* + * We can ignore this backend if it's running CREATE INDEX + * CONCURRENTLY or REINDEX CONCURRENTLY on a "safe" index -- but + * only on vacuums of user-defined tables. + */ + if (!(statusFlags & PROC_IN_SAFE_IC)) +h->data_oldest_nonremovable = + TransactionIdOlder(h->data_oldest_nonremovable, xmin); + + /* Catalog tables need to consider all backends. */ + h->catalog_oldest_nonremovable = +TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); + } } + /* catalog horizon should never be later than data */ + Assert(TransactionIdPrecedesOrEquals(h->catalog_oldest_nonremovable, + h->data_oldest_nonremovable)); + /* * If in recovery fetch oldest xid in KnownAssignedXids, will be applied * after lock is released. @@ -1799,6 +1818,8 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) TransactionIdOlder(h->shared_oldest_nonremovable, kaxmin); h->data_oldest_nonremovable = TransactionIdOlder(h->data_oldest_nonremovable, kaxmin); + h->catalog_oldest_nonremovable = + TransactionIdOlder(h->catalog_oldest_nonremovable, kaxmin); /* temp relations cannot be accessed in recovery */ } else @@ -1825,6 +1846,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) h->data_oldest_nonremovable = TransactionIdRetreatedBy(h->data_oldest_nonremovable, vacuum_defer_cleanup_age); + h->catalog_oldest_nonremovable = + TransactionIdRetreatedBy(h->catalog_oldest_nonremovable, +
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Here's an updated patch. I haven't added commentary or documentation to account for the new assumption, per Matthias' comment and Robert's discussion thereof. -- Álvaro Herrera39°49'30"S 73°17'W "Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat) >From 4e1cf2ee0e6c9fb49fc2bc5932d3a59d65ee9ea7 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 15 Jan 2021 14:03:14 -0300 Subject: [PATCH v2] Teach VACUUM to ignore CIC Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql --- src/backend/storage/ipc/procarray.c | 40 +++-- src/backend/utils/misc/guc.c| 2 +- 2 files changed, 33 insertions(+), 9 deletions(-) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index a5daea8957..88b8271c22 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -1610,7 +1610,12 @@ TransactionIdIsActive(TransactionId xid) * relations that's not required, since only backends in my own database could * ever see the tuples in them. Also, we can ignore concurrently running lazy * VACUUMs because (a) they must be working on other tables, and (b) they - * don't need to do snapshot-based lookups. + * don't need to do snapshot-based lookups. Also, for the non-catalog + * horizon, we can ignore CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY + * when they are working on non-partial, non-expressional indexes, for the + * same reasons and because they can't run in transaction blocks. (They are + * not possible to ignore for catalogs, because CIC and RC do some catalog + * operations.) * * This also computes a horizon used to truncate pg_subtrans. For that * backends in all databases have to be considered, and concurrently running @@ -1660,9 +1665,6 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) bool in_recovery = RecoveryInProgress(); TransactionId *other_xids = ProcGlobal->xids; - /* inferred after ProcArrayLock is released */ - h->catalog_oldest_nonremovable = InvalidTransactionId; - LWLockAcquire(ProcArrayLock, LW_SHARED); h->latest_completed = ShmemVariableCache->latestCompletedXid; @@ -1682,6 +1684,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) h->oldest_considered_running = initial; h->shared_oldest_nonremovable = initial; + h->catalog_oldest_nonremovable = initial; h->data_oldest_nonremovable = initial; /* @@ -1752,7 +1755,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) if (statusFlags & (PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING)) continue; - /* shared tables need to take backends in all database into account */ + /* shared tables need to take backends in all databases into account */ h->shared_oldest_nonremovable = TransactionIdOlder(h->shared_oldest_nonremovable, xmin); @@ -1768,16 +1771,33 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) * to prune still needed data away). If the current backend never * connects to a database that is harmless, because * data_oldest_nonremovable will never be utilized. + * + * Additionally, processes doing CREATE INDEX CONCURRENTLY and REINDEX + * CONCURRENTLY on "safe" indexes can be ignored for non-catalog + * horizon. (But not for catalogs: some transactions in CIC/RC do + * catalog updates.) */ if (in_recovery || MyDatabaseId == InvalidOid || proc->databaseId == MyDatabaseId || proc->databaseId == 0) /* always include WalSender */ { - h->data_oldest_nonremovable = -TransactionIdOlder(h->data_oldest_nonremovable, xmin); + if (statusFlags & PROC_IN_SAFE_IC) +h->catalog_oldest_nonremovable = + TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); + else + { +h->data_oldest_nonremovable = + TransactionIdOlder(h->data_oldest_nonremovable, xmin); +h->catalog_oldest_nonremovable = + TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); + } } } + /* catalog horizon should never be later than data */ + Assert(TransactionIdPrecedesOrEquals(h->catalog_oldest_nonremovable, + h->data_oldest_nonremovable)); + /* * If in recovery fetch oldest xid in KnownAssignedXids, will be applied * after lock is released. @@ -1799,6 +1819,8 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) TransactionIdOlder(h->shared_oldest_nonremovable, kaxmin); h->data_oldest_nonremovable = TransactionIdOlder(h->data_oldest_nonremovable, kaxmin); + h->catalog_oldest_nonremovable = + TransactionIdOlder(h->catalog_oldest_nonremovable, kaxmin); /* temp relations cannot be accessed in recovery */ } else @@ -1825,6 +1847,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) h->data_oldest_nonremovable = TransactionIdRetreatedBy(h->data_oldest_nonremovable, vacuum_defer_cleanup_age); + h->catalog_oldest_nonremovable = + TransactionIdRetreatedBy(h->catalog_oldest_nonremovable, + vacuum_defer_cleanup_age); /* defer
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On Thu, Jan 21, 2021 at 3:08 PM Matthias van de Meent wrote: > Re-thinking this, and after some research: > > Is the behaviour of "any process that invalidates TIDs in this table > (that could be in an index on this table) always holds a lock that > conflicts with CIC/RiC on that table" a requirement of tableams, or is > it an implementation-detail? > > If it is a requirement, then this patch is a +1 for me (and that > requirement should be documented in such case), otherwise a -1 while > there is no mechanism in place to remove concurrently-invalidated TIDs > from CIC-ed/RiC-ed indexes. I don't believe that the table AM machinery presumes that every kind of table AM necessarily has to support VACUUM; or at least, I don't think the original version presumed that. However, if you want it to work some other way, there's not really any infrastructure for that, either. Like, if you want to run your own background workers to clean up your table, you could, but you'd have to arrange that yourself. If you want a SWEEP command or an OPTIMIZE TABLE command instead of VACUUM, you'd have to hack the grammar. Likewise, I don't know that there's any guarantee that CLUSTER would work on a table of any old AM either, or that it would do anything useful. But having said that, if you have a table AM that *does* support VACUUM and CLUSTER, I think it would have to treat TIDs pretty much just as we do today. Almost by definition, they have to live with the way the rest of the system works. TIDs have to be 6 bytes, and lock levels can't be changed, because there's nothing in table AM that would let you tinker with that stuff. The table AM can opt out of that machinery altogether in favor of just throwing an error, but it can't make it work differently unless somebody extends the API. It's possible that this might suck a lot for some AMs. For instance, if your AM is an in-memory btree, you might want CLUSTER to work in place, rather than by copying the whole relation file, but I doubt it's possible to make that work using the APIs as they exist. Maybe someday we'll have better ones, but right now we don't. As far as the specific question asked here, I don't really understand how it could ever work any differently. If you want to invalidate some TIDs in such a way that they can be reused by unrelated tuples - in the case of the heap this would be by making the line pointers LP_UNUSED - that has to be coordinated with the indexing machinery. I can imagine a table AM that never reuses TIDs - especially if we eventually have TIDs > 6 bytes - but I can't imagine a table AM that reuses TIDs that might still be present in the index without telling the index. And that seems to be what is being proposed here: if CIC or RiC could be putting TIDs into indexes while at the same time some of those very same TIDs were being made available for reuse, that would be chaos, and right now we have no mechanism other than the lock level to prevent that. We could invent one, maybe, but it doesn't exist today, and zheap never tried to invent anything like that. I agree that, if we do this, the comment should make clear that the CIC/RiC lock has to conflict with VACUUM to avoid indexing tuples that VACUUM is busy marking LP_UNUSED, and that this is precisely because other backends will ignore the CIC/RiC snapshot while determining whether a tuple is live. I don't think this is the case. Hypothetically speaking, suppose we rejected this patch. Then, suppose someone proposed to replace ShareUpdateExclusiveLock with two new lock levels VacuumTuplesLock and IndexTheRelationLock each of which conflicts with itself and all other lock levels with which ShareUpdateExclusiveLock conflicts, but the two don't conflict with each other. AFAIK, that patch would be acceptable today and unacceptable after this change. While I'm not arguing that as a reason to reject this patch, it's not impossible that somebody could: they'd just need to argue that the separated lock levels would have greater value than this patch. I don't think I believe that, but it's not a totally crazy suggestion. My main point though is that this meaningfully changing some assumptions about how the system works, and we should be sure to be clear about that. I looked at the patch but don't really understand it; the code in this area seems to have changed a lot since I last looked at it. So, I'm just commenting mostly on the specific question that was asked, and a little bit on the theory of the patch, but without expressing an opinion on the implementation. -- Robert Haas EDB: http://www.enterprisedb.com
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On Tue, 19 Jan 2021 at 21:59, Matthias van de Meent wrote: > > On Mon, 18 Jan 2021, 21:25 Álvaro Herrera, wrote: > > > > On 2021-Jan-18, Matthias van de Meent wrote: > > > > > Example: > > > > > > 1.) RI starts > > > 2.) PHASE 2: filling the index: > > > 2.1.) scanning the heap (live tuple is cached) > > > < tuple is deleted > > > < last transaction other than RI commits, only snapshot of RI exists > > > < vacuum drops the tuple, and cannot remove it from the new index > > > because this new index is not yet populated. > > > 2.2.) sorting tuples > > > 2.3.) index filled with tuples, incl. deleted tuple > > > 3.) PHASE 3: wait for transactions > > > 4.) PHASE 4: validate does not remove the tuple from the index, > > > because it is not built to do so: it will only insert new tuples. > > > Tuples that are marked for deletion are removed from the index only > > > through VACUUM (and optimistic ALL_DEAD detection). > > > > > > According to my limited knowledge of RI, it requires VACUUM to not run > > > on the table during the initial index build process (which is > > > currently guaranteed through the use of a snapshot). > > > > VACUUM cannot run concurrently with CIC or RI in a table -- both acquire > > ShareUpdateExclusiveLock, which conflicts with itself, so this cannot > > occur. > > Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock. > Are there no other ways that pages are optimistically pruned? > > But the base case still stands, ignoring CIC snapshots in would give > the semantic of all_dead to tuples that are actually still considered > alive in some context, and should not yet be deleted (you're deleting > data from an in-use snapshot). Any local pruning optimizations using > all_dead mechanics now cannot be run on the table unless they hold an > ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms > (other than below). Re-thinking this, and after some research: Is the behaviour of "any process that invalidates TIDs in this table (that could be in an index on this table) always holds a lock that conflicts with CIC/RiC on that table" a requirement of tableams, or is it an implementation-detail? If it is a requirement, then this patch is a +1 for me (and that requirement should be documented in such case), otherwise a -1 while there is no mechanism in place to remove concurrently-invalidated TIDs from CIC-ed/RiC-ed indexes. This concurrently-invalidated check could be done through e.g. updating validate_index to have one more phase that removes unknown / incorrect TIDs from the index. As a note: index insertion logic would then also have to be able to handle duplicate TIDs in the index. Regards, Matthias van de Meent Regards, Matthias van de Meent
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On Mon, 18 Jan 2021, 21:25 Álvaro Herrera, wrote: > > On 2021-Jan-18, Matthias van de Meent wrote: > > > Example: > > > > 1.) RI starts > > 2.) PHASE 2: filling the index: > > 2.1.) scanning the heap (live tuple is cached) > > < tuple is deleted > > < last transaction other than RI commits, only snapshot of RI exists > > < vacuum drops the tuple, and cannot remove it from the new index > > because this new index is not yet populated. > > 2.2.) sorting tuples > > 2.3.) index filled with tuples, incl. deleted tuple > > 3.) PHASE 3: wait for transactions > > 4.) PHASE 4: validate does not remove the tuple from the index, > > because it is not built to do so: it will only insert new tuples. > > Tuples that are marked for deletion are removed from the index only > > through VACUUM (and optimistic ALL_DEAD detection). > > > > According to my limited knowledge of RI, it requires VACUUM to not run > > on the table during the initial index build process (which is > > currently guaranteed through the use of a snapshot). > > VACUUM cannot run concurrently with CIC or RI in a table -- both acquire > ShareUpdateExclusiveLock, which conflicts with itself, so this cannot > occur. Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock. Are there no other ways that pages are optimistically pruned? But the base case still stands, ignoring CIC snapshots in would give the semantic of all_dead to tuples that are actually still considered alive in some context, and should not yet be deleted (you're deleting data from an in-use snapshot). Any local pruning optimizations using all_dead mechanics now cannot be run on the table unless they hold an ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms (other than below). > I do wonder if the problem you suggest (or something similar) can occur > via HOT pruning, though. It could not, at least not at the current HEAD, as only one tuple in a HOT-chain can be alive at one point, and all indexes point to the root of the HOT-chain, which is never HOT-pruned. See also the src/backend/access/heap/README.HOT. Regards, Matthias van de Meent
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On 2021-Jan-18, Matthias van de Meent wrote: > On Fri, 15 Jan 2021 at 15:29, Álvaro Herrera wrote: > Would this not need to be the following? Right now, it resets > potentially older h->catalog_oldest_nonremovable (which is set in the > PROC_IN_SAFE_IC branch). > > > +if (statusFlags & PROC_IN_SAFE_IC) > > +h->catalog_oldest_nonremovable = > > +TransactionIdOlder(h->catalog_oldest_nonremovable, > > xmin); > > +else > > +{ > > +h->data_oldest_nonremovable = > > +TransactionIdOlder(h->data_oldest_nonremovable, xmin); > > +h->catalog_oldest_nonremovable = > > +TransactionIdOlder(h->catalog_oldest_nonremovable, > > xmin); > > +} Oops, you're right. -- Álvaro Herrera Valdivia, Chile
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On 2021-Jan-18, Matthias van de Meent wrote: > Example: > > 1.) RI starts > 2.) PHASE 2: filling the index: > 2.1.) scanning the heap (live tuple is cached) > < tuple is deleted > < last transaction other than RI commits, only snapshot of RI exists > < vacuum drops the tuple, and cannot remove it from the new index > because this new index is not yet populated. > 2.2.) sorting tuples > 2.3.) index filled with tuples, incl. deleted tuple > 3.) PHASE 3: wait for transactions > 4.) PHASE 4: validate does not remove the tuple from the index, > because it is not built to do so: it will only insert new tuples. > Tuples that are marked for deletion are removed from the index only > through VACUUM (and optimistic ALL_DEAD detection). > > According to my limited knowledge of RI, it requires VACUUM to not run > on the table during the initial index build process (which is > currently guaranteed through the use of a snapshot). VACUUM cannot run concurrently with CIC or RI in a table -- both acquire ShareUpdateExclusiveLock, which conflicts with itself, so this cannot occur. I do wonder if the problem you suggest (or something similar) can occur via HOT pruning, though. -- Álvaro Herrera Valdivia, Chile
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On Fri, 15 Jan 2021 at 15:29, Álvaro Herrera wrote: > > So one last remaining improvement was to have VACUUM ignore processes > doing CIC and RC to compute the Xid horizon of tuples to remove. I > think we can do something simple like the attached patch. Regarding the patch: > +if (statusFlags & PROC_IN_SAFE_IC) > +h->catalog_oldest_nonremovable = > +TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); > +else > +h->data_oldest_nonremovable = h->catalog_oldest_nonremovable > = > +TransactionIdOlder(h->data_oldest_nonremovable, xmin); Would this not need to be the following? Right now, it resets potentially older h->catalog_oldest_nonremovable (which is set in the PROC_IN_SAFE_IC branch). > +if (statusFlags & PROC_IN_SAFE_IC) > +h->catalog_oldest_nonremovable = > +TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); > +else > +{ > +h->data_oldest_nonremovable = > +TransactionIdOlder(h->data_oldest_nonremovable, xmin); > +h->catalog_oldest_nonremovable = > +TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); > +} Prior to reading the rest of my response: I do not understand the intricacies of the VACUUM process, nor the systems of db snapshots, so it's reasonably possible I misunderstand this. But would this patch not allow for tuples to be created, deleted and vacuumed away from a table whilst rebuilding an index on that table, resulting in invalid pointers in the index? Example: 1.) RI starts 2.) PHASE 2: filling the index: 2.1.) scanning the heap (live tuple is cached) < tuple is deleted < last transaction other than RI commits, only snapshot of RI exists < vacuum drops the tuple, and cannot remove it from the new index because this new index is not yet populated. 2.2.) sorting tuples 2.3.) index filled with tuples, incl. deleted tuple 3.) PHASE 3: wait for transactions 4.) PHASE 4: validate does not remove the tuple from the index, because it is not built to do so: it will only insert new tuples. Tuples that are marked for deletion are removed from the index only through VACUUM (and optimistic ALL_DEAD detection). According to my limited knowledge of RI, it requires VACUUM to not run on the table during the initial index build process (which is currently guaranteed through the use of a snapshot). Regards, Matthias van de Meent
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
So one last remaining improvement was to have VACUUM ignore processes doing CIC and RC to compute the Xid horizon of tuples to remove. I think we can do something simple like the attached patch. -- Álvaro Herrera Valdivia, Chile "Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index cf12eda504..f584230b79 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -1601,7 +1601,7 @@ TransactionIdIsActive(TransactionId xid) * well as "internally" by GlobalVisUpdate() (see comment above struct * GlobalVisState). * - * See the definition of ComputedXidHorizonsResult for the various computed + * See the definition of ComputeXidHorizonsResult for the various computed * horizons. * * For VACUUM separate horizons (used to decide which deleted tuples must @@ -1610,7 +1610,12 @@ TransactionIdIsActive(TransactionId xid) * relations that's not required, since only backends in my own database could * ever see the tuples in them. Also, we can ignore concurrently running lazy * VACUUMs because (a) they must be working on other tables, and (b) they - * don't need to do snapshot-based lookups. + * don't need to do snapshot-based lookups. Also, for the non-catalog + * horizon, we can ignore CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY + * when they are working on non-partial, non-expressional indexes, for the + * same reasons and because they can't run in transaction blocks. (They are + * not possible to ignore for catalogs, because CIC and RC do some catalog + * operations.) * * This also computes a horizon used to truncate pg_subtrans. For that * backends in all databases have to be considered, and concurrently running @@ -1660,9 +1665,6 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) bool in_recovery = RecoveryInProgress(); TransactionId *other_xids = ProcGlobal->xids; - /* inferred after ProcArrayLock is released */ - h->catalog_oldest_nonremovable = InvalidTransactionId; - LWLockAcquire(ProcArrayLock, LW_SHARED); h->latest_completed = ShmemVariableCache->latestCompletedXid; @@ -1682,6 +1684,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) h->oldest_considered_running = initial; h->shared_oldest_nonremovable = initial; + h->catalog_oldest_nonremovable = initial; h->data_oldest_nonremovable = initial; /* @@ -1752,7 +1755,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) if (statusFlags & (PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING)) continue; - /* shared tables need to take backends in all database into account */ + /* shared tables need to take backends in all databases into account */ h->shared_oldest_nonremovable = TransactionIdOlder(h->shared_oldest_nonremovable, xmin); @@ -1768,16 +1771,29 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) * to prune still needed data away). If the current backend never * connects to a database that is harmless, because * data_oldest_nonremovable will never be utilized. + * + * Additionally, processes doing CREATE INDEX CONCURRENTLY and REINDEX + * CONCURRENTLY on "safe" indexes can be ignored for non-catalog + * horizon. (But not for catalogs: some transactions in CIC/RC do + * catalog updates.) */ if (in_recovery || MyDatabaseId == InvalidOid || proc->databaseId == MyDatabaseId || proc->databaseId == 0) /* always include WalSender */ { - h->data_oldest_nonremovable = -TransactionIdOlder(h->data_oldest_nonremovable, xmin); + if (statusFlags & PROC_IN_SAFE_IC) +h->catalog_oldest_nonremovable = + TransactionIdOlder(h->catalog_oldest_nonremovable, xmin); + else +h->data_oldest_nonremovable = h->catalog_oldest_nonremovable = + TransactionIdOlder(h->data_oldest_nonremovable, xmin); } } + /* catalog horizon should never be later than data */ + Assert(TransactionIdPrecedesOrEquals(h->catalog_oldest_nonremovable, + h->data_oldest_nonremovable)); + /* * If in recovery fetch oldest xid in KnownAssignedXids, will be applied * after lock is released. @@ -1799,6 +1815,8 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) TransactionIdOlder(h->shared_oldest_nonremovable, kaxmin); h->data_oldest_nonremovable = TransactionIdOlder(h->data_oldest_nonremovable, kaxmin); + h->catalog_oldest_nonremovable = + TransactionIdOlder(h->catalog_oldest_nonremovable, kaxmin); /* temp relations cannot be accessed in recovery */ } else @@ -1825,6 +1843,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) h->data_oldest_nonremovable = TransactionIdRetreatedBy(h->data_oldest_nonremovable, vacuum_defer_cleanup_age); + h->catalog_oldest_nonremovable = + TransactionIdRetreatedBy(h->catalog_oldest_nonremovable, + vacuum_defer_cleanup_age); /* defer doesn't apply to temp relations */ } @@ -1847,7 +1868,6 @@
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On 2021-Jan-12, Álvaro Herrera wrote: > On 2021-Jan-12, Álvaro Herrera wrote: > > > > For the 0001 patch, since ReindexIndexInfo is used only within > > > ReindexRelationConcurrently() I think it’s a function-local structure > > > type. So we can declare it within the function. What do you think? > > > > That's a good idea. Pushed with that change, thanks. > > Here's the other patch, with comments fixed per reviews, and rebased to > account for the scope change. Pushed. At the last minute I decided to back off on reverting the flag set that DefineIndex has, because there was no point in doing that. I also updated the comment in (two places of) ReindexRelationConcurrently about why we don't do it there. The previous submission had this: > + /* > + * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, > because > + * it only acquires an Xid to do some catalog manipulations, after the > + * wait is over. > + */ but this fails to point out that the main reason not to do it, is to avoid having to decide whether to do it or not when some indexes are safe and others aren't. So I changed to: + /* +* While we could set PROC_IN_SAFE_IC if all indexes qualified, there's no +* real need for that, because we only acquire an Xid after the wait is +* done, and that lasts for a very short period. +*/ Thanks! -- Álvaro Herrera Valdivia, Chile "The eagle never lost so much time, as when he submitted to learn of the crow." (William Blake)
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On 2021-Jan-12, Álvaro Herrera wrote: > > For the 0001 patch, since ReindexIndexInfo is used only within > > ReindexRelationConcurrently() I think it’s a function-local structure > > type. So we can declare it within the function. What do you think? > > That's a good idea. Pushed with that change, thanks. Here's the other patch, with comments fixed per reviews, and rebased to account for the scope change. -- Álvaro Herrera Valdivia, Chile Tulio: oh, para qué servirá este boton, Juan Carlos? Policarpo: No, aléjense, no toquen la consola! Juan Carlos: Lo apretaré una y otra vez. >From f2476636a358b77aca4b7adb36ca8e383e78af9b Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 30 Nov 2020 16:01:46 -0300 Subject: [PATCH v2] set PROC_IN_SAFE_IC during REINDEX CONCURRENTLY --- src/backend/commands/indexcmds.c | 56 +--- src/include/storage/proc.h | 1 + 2 files changed, 53 insertions(+), 4 deletions(-) diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 8c9c39a467..35c3d20eae 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -385,7 +385,7 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts) * lazy VACUUMs, because they won't be fazed by missing index entries * either. (Manual ANALYZEs, however, can't be excluded because they * might be within transactions that are going to do arbitrary operations - * later.) Processes running CREATE INDEX CONCURRENTLY + * later.) Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY * on indexes that are neither expressional nor partial are also safe to * ignore, since we know that those processes won't examine any data * outside the table they're indexing. @@ -1566,9 +1566,11 @@ DefineIndex(Oid relationId, CommitTransactionCommand(); StartTransactionCommand(); - /* Tell concurrent index builds to ignore us, if index qualifies */ - if (safe_index) - set_indexsafe_procflags(); + /* + * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because + * it only acquires an Xid to do some catalog manipulations, after the + * wait is over. + */ /* We should now definitely not be advertising any xmin. */ Assert(MyProc->xmin == InvalidTransactionId); @@ -3066,6 +3068,7 @@ ReindexRelationConcurrently(Oid relationOid, int options) Oid indexId; Oid tableId; Oid amId; + bool safe; /* for set_indexsafe_procflags */ } ReindexIndexInfo; List *heapRelationIds = NIL; List *indexIds = NIL; @@ -3377,6 +3380,9 @@ ReindexRelationConcurrently(Oid relationOid, int options) heapRel = table_open(indexRel->rd_index->indrelid, ShareUpdateExclusiveLock); + /* determine safety of this index for set_indexsafe_procflags */ + idx->safe = (indexRel->rd_indexprs == NIL && + indexRel->rd_indpred == NIL); idx->tableId = RelationGetRelid(heapRel); idx->amId = indexRel->rd_rel->relam; @@ -3418,6 +3424,7 @@ ReindexRelationConcurrently(Oid relationOid, int options) newidx = palloc(sizeof(ReindexIndexInfo)); newidx->indexId = newIndexId; + newidx->safe = idx->safe; newidx->tableId = idx->tableId; newidx->amId = idx->amId; @@ -3485,6 +3492,11 @@ ReindexRelationConcurrently(Oid relationOid, int options) CommitTransactionCommand(); StartTransactionCommand(); + /* + * Because we don't take a snapshot in this transaction, there's no need + * to set the PROC_IN_SAFE_IC flag here. + */ + /* * Phase 2 of REINDEX CONCURRENTLY * @@ -3514,6 +3526,10 @@ ReindexRelationConcurrently(Oid relationOid, int options) */ CHECK_FOR_INTERRUPTS(); + /* Tell concurrent indexing to ignore us, if index qualifies */ + if (newidx->safe) + set_indexsafe_procflags(); + /* Set ActiveSnapshot since functions in the indexes may need it */ PushActiveSnapshot(GetTransactionSnapshot()); @@ -3534,8 +3550,14 @@ ReindexRelationConcurrently(Oid relationOid, int options) PopActiveSnapshot(); CommitTransactionCommand(); } + StartTransactionCommand(); + /* + * Because we don't take a snapshot in this transaction, there's no need + * to set the PROC_IN_SAFE_IC flag here. + */ + /* * Phase 3 of REINDEX CONCURRENTLY * @@ -3564,6 +3586,10 @@ ReindexRelationConcurrently(Oid relationOid, int options) */ CHECK_FOR_INTERRUPTS(); + /* Tell concurrent indexing to ignore us, if index qualifies */ + if (newidx->safe) + set_indexsafe_procflags(); + /* * Take the "reference snapshot" that will be used by validate_index() * to filter candidate tuples. @@ -3607,6 +3633,9 @@ ReindexRelationConcurrently(Oid relationOid, int options) * interesting tuples. But since it might not contain tuples deleted * just before the reference snap was taken, we have to wait out any * transactions that might have older snapshots. + * + * Because we don't take a snapshot in this transaction, there's no + * need to set the
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On 2021-Jan-04, Masahiko Sawada wrote: > On Tue, Dec 8, 2020 at 5:18 PM Hamid Akhtar wrote: > > > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: not tested > > Spec compliant: not tested > > Documentation:not tested > > > > The patch looks good to me. With regards to the code comments, I had pretty > > similar concerns as already raised by Dmitry; and those have already been > > answered by you. So this patch is good to go from my side once you have > > updated the comments per your last email. > > For the 0001 patch, since ReindexIndexInfo is used only within > ReindexRelationConcurrently() I think it’s a function-local structure > type. So we can declare it within the function. What do you think? That's a good idea. Pushed with that change, thanks. Thanks also to Dmitry and Hamid for the reviews. -- Álvaro Herrera39°49'30"S 73°17'W
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On Tue, Dec 8, 2020 at 5:18 PM Hamid Akhtar wrote: > > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: not tested > Spec compliant: not tested > Documentation:not tested > > The patch looks good to me. With regards to the code comments, I had pretty > similar concerns as already raised by Dmitry; and those have already been > answered by you. So this patch is good to go from my side once you have > updated the comments per your last email. For the 0001 patch, since ReindexIndexInfo is used only within ReindexRelationConcurrently() I think it’s a function-local structure type. So we can declare it within the function. What do you think? Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: not tested Documentation:not tested The patch looks good to me. With regards to the code comments, I had pretty similar concerns as already raised by Dmitry; and those have already been answered by you. So this patch is good to go from my side once you have updated the comments per your last email.
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On 2020-Dec-04, Dmitry Dolgov wrote: > * This one is mostly for me to understand. There are couple of places > with a commentary that 'PROC_IN_SAFE_IC is not necessary, because the > transaction only takes a snapshot to do some catalog manipulation'. > But for some of them I don't immediately see in the relevant code > anything related to snapshots. E.g. one in DefineIndex is followed by > WaitForOlderSnapshots (which seems to only do waiting, not taking a > snapshot), index_set_state_flags and CacheInvalidateRelcacheByRelid. > Is taking a snapshot hidden somewhere there inside? Well, they're actually going to acquire an Xid, not a snapshot, so the comment is slightly incorrect; I'll fix it, thanks for pointing that out. The point stands: because those transactions are of very short duration (at least of very short duration after the point where the XID is obtained), it's not necessary to set the PROC_IN_SAFE_IC flag, since it won't cause any disruption to other processes. It is possible that I copied the wrong comment in DefineIndex. (I only noticed that one after I had mulled over the ones in ReindexRelationConcurrently, each of which is skipping setting the flag for slightly different reasons.)
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
> On Mon, Nov 30, 2020 at 04:54:39PM -0300, Alvaro Herrera wrote: > > In a previous thread [1], we added smarts so that processes running > CREATE INDEX CONCURRENTLY would not wait for each other. > > One is adding the same to REINDEX CONCURRENTLY. I've attached patch > 0002 here which does that. > > Why 0002, you ask? That's because preparatory patch 0001 simplifies the > ReindexRelationConcurrently somewhat by adding a struct to be used of > indexes that are going to be processed, instead of just a list of Oids. > This is a good change in itself because it let us get rid of duplicative > open/close of the index rels in order to obtain some info that's already > known at the start. Thanks! The patch looks pretty good to me, after reading it I only have a few minor comments/questions: * ReindexIndexInfo sounds a bit weird for me because of the repeating part, although I see there is already a similar ReindexIndexCallbackState so probably it's fine. * This one is mostly for me to understand. There are couple of places with a commentary that 'PROC_IN_SAFE_IC is not necessary, because the transaction only takes a snapshot to do some catalog manipulation'. But for some of them I don't immediately see in the relevant code anything related to snapshots. E.g. one in DefineIndex is followed by WaitForOlderSnapshots (which seems to only do waiting, not taking a snapshot), index_set_state_flags and CacheInvalidateRelcacheByRelid. Is taking a snapshot hidden somewhere there inside?