Re: [DOC] Document concurrent index builds waiting on each other
James Coleman writes: > On Wed, Jan 13, 2021 at 5:00 PM Tom Lane wrote: >> [ raised eyebrow ] Surely REINDEX and VACUUM can't run on the same >> table at the same time. > + Like any long-running transaction, CREATE INDEX on a > + table can affect which tuples can be removed by concurrent > + VACUUM on any other table. > The "on a table" is the table on which the REINDEX/CREATE INDEX is > occurring. The "any other table" is where VACUUM might run. I still think it'd be just as clear without the auxiliary clauses, say + Like any long-running transaction, CREATE INDEX + can affect which tuples can be removed by concurrent + VACUUM operations. regards, tom lane
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Jan 13, 2021 at 5:00 PM Tom Lane wrote: > > James Coleman writes: > > On Wed, Jan 13, 2021 at 4:29 PM Tom Lane wrote: > >> but the antecedent of "a table" is a bit unclear; people might > >> wonder if it means the table being reindexed. > > > It does mean the table being reindexed; the last phrase says "any > > table" meaning "any other table". > > [ raised eyebrow ] Surely REINDEX and VACUUM can't run on the same > table at the same time. + Like any long-running transaction, CREATE INDEX on a + table can affect which tuples can be removed by concurrent + VACUUM on any other table. The "on a table" is the table on which the REINDEX/CREATE INDEX is occurring. The "any other table" is where VACUUM might run. James
Re: [DOC] Document concurrent index builds waiting on each other
James Coleman writes: > On Wed, Jan 13, 2021 at 4:29 PM Tom Lane wrote: >> but the antecedent of "a table" is a bit unclear; people might >> wonder if it means the table being reindexed. > It does mean the table being reindexed; the last phrase says "any > table" meaning "any other table". [ raised eyebrow ] Surely REINDEX and VACUUM can't run on the same table at the same time. regards, tom lane
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Jan 13, 2021 at 4:29 PM Tom Lane wrote: > > Alvaro Herrera writes: > > On 2021-Jan-13, James Coleman wrote: > This is true. So I propose > Like any long-running transaction, REINDEX can > affect which tuples can be removed by concurrent > VACUUM > on any table. > > >> Looks like what got committed is "REINDEX on a table" not "on any", > >> but I'm not sure that matters too much. > > > Ouch. The difference seems slight enough that it doesn't matter; is it > > ungrammatical? > > I'd personally have written "on other tables" or "on another table", > or left out that clause altogether and just said "concurrent > VACUUM". I'm not sure it's ungrammatical exactly, > but the antecedent of "a table" is a bit unclear; people might > wonder if it means the table being reindexed. It does mean the table being reindexed; the last phrase says "any table" meaning "any other table". James
Re: [DOC] Document concurrent index builds waiting on each other
Alvaro Herrera writes: > On 2021-Jan-13, James Coleman wrote: This is true. So I propose Like any long-running transaction, REINDEX can affect which tuples can be removed by concurrent VACUUM on any table. >> Looks like what got committed is "REINDEX on a table" not "on any", >> but I'm not sure that matters too much. > Ouch. The difference seems slight enough that it doesn't matter; is it > ungrammatical? I'd personally have written "on other tables" or "on another table", or left out that clause altogether and just said "concurrent VACUUM". I'm not sure it's ungrammatical exactly, but the antecedent of "a table" is a bit unclear; people might wonder if it means the table being reindexed. regards, tom lane
Re: [DOC] Document concurrent index builds waiting on each other
On 2021-Jan-13, James Coleman wrote: > On Wed, Jan 13, 2021 at 4:05 PM Alvaro Herrera > wrote: > > > > On 2021-Jan-13, James Coleman wrote: > > > > > On Wed, Jan 13, 2021 at 12:33 PM Alvaro Herrera > > > wrote: > > > > > > This is true. So I propose > > > > > > > > Like any long-running transaction, REINDEX can > > > > affect which tuples can be removed by concurrent > > > > VACUUM > > > > on any table. > > > > > > That sounds good to me. > > > > Great, pushed with one more wording tweak: "REINDEX on any table can > > affect ... on any other table". To pg12 and up. > > Looks like what got committed is "REINDEX on a table" not "on any", > but I'm not sure that matters too much. Ouch. The difference seems slight enough that it doesn't matter; is it ungrammatical? Either way I'm gonna close this CF entry now, finally. Thank you for your patience! -- Álvaro Herrera Valdivia, Chile "I can't go to a restaurant and order food because I keep looking at the fonts on the menu. Five minutes later I realize that it's also talking about food" (Donald Knuth)
Re: [DOC] Document concurrent index builds waiting on each other
On 2021-Jan-13, Alvaro Herrera wrote: > I wondered about noting whether only processes in the current database > are affected, but then I noticed that the current code since commit > dc7420c2c927 uses a completely different algorithm than what we had with > GetOldestXmin() and does not consider database boundaries at all. > This doesn't sound great to me, since a misbehaved database can now > affect others ... Maybe I misunderstand that code. This appears to be false, per ComputeXidHorizons. -- Álvaro Herrera39°49'30"S 73°17'W "Ni aún el genio muy grande llegaría muy lejos si tuviera que sacarlo todo de su propio interior" (Goethe)
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Jan 13, 2021 at 4:05 PM Alvaro Herrera wrote: > > On 2021-Jan-13, James Coleman wrote: > > > On Wed, Jan 13, 2021 at 12:33 PM Alvaro Herrera > > wrote: > > > > This is true. So I propose > > > > > > Like any long-running transaction, REINDEX can > > > affect which tuples can be removed by concurrent > > > VACUUM > > > on any table. > > > > That sounds good to me. > > Great, pushed with one more wording tweak: "REINDEX on any table can > affect ... on any other table". To pg12 and up. Looks like what got committed is "REINDEX on a table" not "on any", but I'm not sure that matters too much. James
Re: [DOC] Document concurrent index builds waiting on each other
On 2021-Jan-13, James Coleman wrote: > On Wed, Jan 13, 2021 at 12:33 PM Alvaro Herrera > wrote: > > This is true. So I propose > > > > Like any long-running transaction, REINDEX can > > affect which tuples can be removed by concurrent > > VACUUM > > on any table. > > That sounds good to me. Great, pushed with one more wording tweak: "REINDEX on any table can affect ... on any other table". To pg12 and up. I wondered about noting whether only processes in the current database are affected, but then I noticed that the current code since commit dc7420c2c927 uses a completely different algorithm than what we had with GetOldestXmin() and does not consider database boundaries at all. This doesn't sound great to me, since a misbehaved database can now affect others ... Maybe I misunderstand that code. -- Álvaro Herrera39°49'30"S 73°17'W "This is what I like so much about PostgreSQL. Most of the surprises are of the "oh wow! That's cool" Not the "oh shit!" kind. :)" Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Jan 13, 2021 at 12:33 PM Alvaro Herrera wrote: > > On 2021-Jan-13, James Coleman wrote: > > > On Wed, Jan 13, 2021 at 12:58 AM Michael Paquier > > wrote: > > > > > > On Tue, Jan 12, 2021 at 04:51:39PM -0300, Alvaro Herrera wrote: > > > > I looked into this again, and I didn't like what I had added to > > > > maintenance.sgml at all. It seems out of place where I put it; and I > > > > couldn't find any great spots. Going back to your original proposal, > > > > what about something like this? It's just one more para in the "notes" > > > > section in CREATE INDEX and REINDEX pages, without any additions to the > > > > VACUUM pages. > > > > > > +1. > > > > I think one more para in the notes is good. But shouldn't we still > > clarify the issue is specific to CONCURRENTLY? > > How is it specific to concurrent builds? What we're documenting here is > the behavior of vacuum, and that one is identical in both regular builds > and concurrent builds (since vacuum has to avoid removing rows from > under either of them). The only reason concurrent builds are > interesting is because they take longer. > > What was specific to concurrent builds was the fact that you can't have > more than one at a time, and that one is what was added in 58ebe967f. Ah, right. I've mixed those up at least once on this thread already. > > Also that it's not just the table being indexed seems fairly significant. > > This is true. So I propose > > Like any long-running transaction, REINDEX can > affect which tuples can be removed by concurrent VACUUM > on any table. That sounds good to me. James
Re: [DOC] Document concurrent index builds waiting on each other
On 2021-Jan-13, James Coleman wrote: > On Wed, Jan 13, 2021 at 12:58 AM Michael Paquier wrote: > > > > On Tue, Jan 12, 2021 at 04:51:39PM -0300, Alvaro Herrera wrote: > > > I looked into this again, and I didn't like what I had added to > > > maintenance.sgml at all. It seems out of place where I put it; and I > > > couldn't find any great spots. Going back to your original proposal, > > > what about something like this? It's just one more para in the "notes" > > > section in CREATE INDEX and REINDEX pages, without any additions to the > > > VACUUM pages. > > > > +1. > > I think one more para in the notes is good. But shouldn't we still > clarify the issue is specific to CONCURRENTLY? How is it specific to concurrent builds? What we're documenting here is the behavior of vacuum, and that one is identical in both regular builds and concurrent builds (since vacuum has to avoid removing rows from under either of them). The only reason concurrent builds are interesting is because they take longer. What was specific to concurrent builds was the fact that you can't have more than one at a time, and that one is what was added in 58ebe967f. > Also that it's not just the table being indexed seems fairly significant. This is true. So I propose Like any long-running transaction, REINDEX can affect which tuples can be removed by concurrent VACUUM on any table. -- Álvaro Herrera39°49'30"S 73°17'W
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Jan 13, 2021 at 12:58 AM Michael Paquier wrote: > > On Tue, Jan 12, 2021 at 04:51:39PM -0300, Alvaro Herrera wrote: > > I looked into this again, and I didn't like what I had added to > > maintenance.sgml at all. It seems out of place where I put it; and I > > couldn't find any great spots. Going back to your original proposal, > > what about something like this? It's just one more para in the "notes" > > section in CREATE INDEX and REINDEX pages, without any additions to the > > VACUUM pages. > > +1. I think one more para in the notes is good. But shouldn't we still clarify the issue is specific to CONCURRENTLY? Also that it's not just the table being indexed seems fairly significant. How about something like: --- Like any long-running transaction, REINDEX CONCURRENTLY can affect which tuples can be removed by concurrent VACUUM on any table. --- James
Re: [DOC] Document concurrent index builds waiting on each other
On Tue, Jan 12, 2021 at 04:51:39PM -0300, Alvaro Herrera wrote: > I looked into this again, and I didn't like what I had added to > maintenance.sgml at all. It seems out of place where I put it; and I > couldn't find any great spots. Going back to your original proposal, > what about something like this? It's just one more para in the "notes" > section in CREATE INDEX and REINDEX pages, without any additions to the > VACUUM pages. +1. -- Michael signature.asc Description: PGP signature
Re: [DOC] Document concurrent index builds waiting on each other
On 2020-Dec-01, James Coleman wrote: > On Tue, Dec 1, 2020 at 6:51 PM Alvaro Herrera wrote: > > Makes sense. ISTM that if we want to have a cautionary blurb CIC docs, > > it should go in REINDEX CONCURRENTLY as well. > > Agreed. Or, alternatively, a blurb something like "Please note how CIC > interacts with VACUUM ...", and then the primary language in > maintenance.sgml. That would have the benefit of maintaining the core > language in only one place. I looked into this again, and I didn't like what I had added to maintenance.sgml at all. It seems out of place where I put it; and I couldn't find any great spots. Going back to your original proposal, what about something like this? It's just one more para in the "notes" section in CREATE INDEX and REINDEX pages, without any additions to the VACUUM pages. -- Álvaro Herrera39°49'30"S 73°17'W >From 8b0aa591f5d3f34ccdc2cd0fc6031dd421f229c2 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 30 Nov 2020 18:50:06 -0300 Subject: [PATCH v6] Highlight vacuum consideration in create index docs Per James Coleman --- doc/src/sgml/ref/create_index.sgml | 5 + doc/src/sgml/ref/reindex.sgml | 5 + 2 files changed, 10 insertions(+) diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 2054d5d943..d951f14b09 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -829,6 +829,11 @@ Indexes: to remove an index. + + Like any long-running transaction, CREATE INDEX can + affect which tuples can be removed by concurrent VACUUM. + + Prior releases of PostgreSQL also had an R-tree index method. This method has been removed because diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml index 6e1cf06713..ef553f6481 100644 --- a/doc/src/sgml/ref/reindex.sgml +++ b/doc/src/sgml/ref/reindex.sgml @@ -436,6 +436,11 @@ Indexes: CONCURRENTLY cannot. + +Like any long-running transaction, REINDEX can +affect which tuples can be removed by concurrent VACUUM. + + REINDEX SYSTEM does not support CONCURRENTLY since system catalogs cannot be reindexed -- 2.20.1
Re: [DOC] Document concurrent index builds waiting on each other
On Tue, Dec 1, 2020 at 8:05 PM James Coleman wrote: > > On Tue, Dec 1, 2020 at 6:51 PM Alvaro Herrera wrote: > > > > On 2020-Nov-30, James Coleman wrote: > > > > > On Mon, Nov 30, 2020 at 4:53 PM Alvaro Herrera > > > wrote: > > > > > > > > On 2020-Sep-30, Michael Paquier wrote: > > > > > > Yeah, I think it might be more sensible to document this in > > > > maintenance.sgml, as part of the paragraph that discusses removing > > > > tuples "to save space". But making it inline with the rest of the flow, > > > > it seems to distract from higher-level considerations, so I suggest to > > > > make it a footnote instead. > > > > > > I have mixed feelings about wholesale moving it; users aren't likely > > > to read the vacuum doc when considering how running CIC might impact > > > their system, though I do understand why it otherwise fits there. > > > > Makes sense. ISTM that if we want to have a cautionary blurb CIC docs, > > it should go in REINDEX CONCURRENTLY as well. > > Agreed. Or, alternatively, a blurb something like "Please note how CIC > interacts with VACUUM ...", and then the primary language in > maintenance.sgml. That would have the benefit of maintaining the core > language in only one place. Any thoughts on this? James
Re: [DOC] Document concurrent index builds waiting on each other
On Tue, Dec 1, 2020 at 6:51 PM Alvaro Herrera wrote: > > On 2020-Nov-30, James Coleman wrote: > > > On Mon, Nov 30, 2020 at 4:53 PM Alvaro Herrera > > wrote: > > > > > > On 2020-Sep-30, Michael Paquier wrote: > > > > Yeah, I think it might be more sensible to document this in > > > maintenance.sgml, as part of the paragraph that discusses removing > > > tuples "to save space". But making it inline with the rest of the flow, > > > it seems to distract from higher-level considerations, so I suggest to > > > make it a footnote instead. > > > > I have mixed feelings about wholesale moving it; users aren't likely > > to read the vacuum doc when considering how running CIC might impact > > their system, though I do understand why it otherwise fits there. > > Makes sense. ISTM that if we want to have a cautionary blurb CIC docs, > it should go in REINDEX CONCURRENTLY as well. Agreed. Or, alternatively, a blurb something like "Please note how CIC interacts with VACUUM ...", and then the primary language in maintenance.sgml. That would have the benefit of maintaining the core language in only one place. > > > I'm not sure on the wording to use; what about this? > > > > The wording seems fine to me. > > Great, thanks. > > > This is a replacement for what was 0002 earlier? And 0001 from earlier > > still seems to be a useful standalone patch? > > 0001 is the one that I got pushed yesterday, I think -- correct? > src/tools/git_changelog says: > > Author: Alvaro Herrera > Branch: master [58ebe967f] 2020-11-30 18:24:55 -0300 > Branch: REL_13_STABLE [3fe0e7c3f] 2020-11-30 18:24:55 -0300 > Branch: REL_12_STABLE [b2603f16a] 2020-11-30 18:24:55 -0300 > Branch: REL_11_STABLE [ed9c9b033] 2020-11-30 18:24:55 -0300 > Branch: REL_10_STABLE [d3bd36a63] 2020-11-30 18:24:55 -0300 > Branch: REL9_6_STABLE [b3d33bf59] 2020-11-30 18:24:55 -0300 > Branch: REL9_5_STABLE [968a537b4] 2020-11-30 18:24:55 -0300 > > Document concurrent indexes waiting on each other > > Because regular CREATE INDEX commands are independent, and there's no > logical data dependency, it's not immediately obvious that transactions > held by concurrent index builds on one table will block the second phase > of concurrent index creation on an unrelated table, so document this > caveat. > > Backpatch this all the way back. In branch master, mention that only > some indexes are involved. > > Author: James Coleman > Reviewed-by: David Johnston > Discussion: > https://postgr.es/m/CAAaqYe994=purn8cjz4ueo_s-ffrr_3ogeryhtdghab2wg_...@mail.gmail.com Ah, yes, somehow I'd missed that that had been pushed. James
Re: [DOC] Document concurrent index builds waiting on each other
On 2020-Nov-30, James Coleman wrote: > On Mon, Nov 30, 2020 at 4:53 PM Alvaro Herrera > wrote: > > > > On 2020-Sep-30, Michael Paquier wrote: > > Yeah, I think it might be more sensible to document this in > > maintenance.sgml, as part of the paragraph that discusses removing > > tuples "to save space". But making it inline with the rest of the flow, > > it seems to distract from higher-level considerations, so I suggest to > > make it a footnote instead. > > I have mixed feelings about wholesale moving it; users aren't likely > to read the vacuum doc when considering how running CIC might impact > their system, though I do understand why it otherwise fits there. Makes sense. ISTM that if we want to have a cautionary blurb CIC docs, it should go in REINDEX CONCURRENTLY as well. > > I'm not sure on the wording to use; what about this? > > The wording seems fine to me. Great, thanks. > This is a replacement for what was 0002 earlier? And 0001 from earlier > still seems to be a useful standalone patch? 0001 is the one that I got pushed yesterday, I think -- correct? src/tools/git_changelog says: Author: Alvaro Herrera Branch: master [58ebe967f] 2020-11-30 18:24:55 -0300 Branch: REL_13_STABLE [3fe0e7c3f] 2020-11-30 18:24:55 -0300 Branch: REL_12_STABLE [b2603f16a] 2020-11-30 18:24:55 -0300 Branch: REL_11_STABLE [ed9c9b033] 2020-11-30 18:24:55 -0300 Branch: REL_10_STABLE [d3bd36a63] 2020-11-30 18:24:55 -0300 Branch: REL9_6_STABLE [b3d33bf59] 2020-11-30 18:24:55 -0300 Branch: REL9_5_STABLE [968a537b4] 2020-11-30 18:24:55 -0300 Document concurrent indexes waiting on each other Because regular CREATE INDEX commands are independent, and there's no logical data dependency, it's not immediately obvious that transactions held by concurrent index builds on one table will block the second phase of concurrent index creation on an unrelated table, so document this caveat. Backpatch this all the way back. In branch master, mention that only some indexes are involved. Author: James Coleman Reviewed-by: David Johnston Discussion: https://postgr.es/m/CAAaqYe994=purn8cjz4ueo_s-ffrr_3ogeryhtdghab2wg_...@mail.gmail.com
Re: [DOC] Document concurrent index builds waiting on each other
On Mon, Nov 30, 2020 at 4:53 PM Alvaro Herrera wrote: > > On 2020-Sep-30, Michael Paquier wrote: > > > + > > + CREATE INDEX (including the > > CONCURRENTLY > > + option) commands are included when VACUUM calculates > > what > > + dead tuples are safe to remove even on tables other than the one being > > indexed. > > + > > FWIW, this is true as well for REINDEX CONCURRENTLY because both use > > the same code paths for index builds and validation, with basically > > the same waiting phases. But is CREATE INDEX the correct place for > > that? Wouldn't it be better to tell about such things on the VACUUM > > doc? > > Yeah, I think it might be more sensible to document this in > maintenance.sgml, as part of the paragraph that discusses removing > tuples "to save space". But making it inline with the rest of the flow, > it seems to distract from higher-level considerations, so I suggest to > make it a footnote instead. I have mixed feelings about wholesale moving it; users aren't likely to read the vacuum doc when considering how running CIC might impact their system, though I do understand why it otherwise fits there. Even if the primary details are in the vacuum, I tend to think a reference note (or link to the vacuum docs) in the create index docs would be useful. The principle here is that 1.) vacuum is automatic/part of the background of the system, not just something people trigger manually, and 2.) we ought to document things where the user action triggering the behavior is documented. > I'm not sure on the wording to use; what about this? The wording seems fine to me. This is a replacement for what was 0002 earlier? And 0001 from earlier still seems to be a useful standalone patch? James
Re: [DOC] Document concurrent index builds waiting on each other
On 2020-Sep-30, Michael Paquier wrote: > + > + CREATE INDEX (including the > CONCURRENTLY > + option) commands are included when VACUUM calculates > what > + dead tuples are safe to remove even on tables other than the one being > indexed. > + > FWIW, this is true as well for REINDEX CONCURRENTLY because both use > the same code paths for index builds and validation, with basically > the same waiting phases. But is CREATE INDEX the correct place for > that? Wouldn't it be better to tell about such things on the VACUUM > doc? Yeah, I think it might be more sensible to document this in maintenance.sgml, as part of the paragraph that discusses removing tuples "to save space". But making it inline with the rest of the flow, it seems to distract from higher-level considerations, so I suggest to make it a footnote instead. I'm not sure on the wording to use; what about this? >From 6c9ad72e4e61dbf05f34146cb67706dd675a38f0 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 30 Nov 2020 18:50:06 -0300 Subject: [PATCH v5] Note CIC and RC in vacuum's doc Per James Coleman --- doc/src/sgml/maintenance.sgml | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 4d8ad754f8..d68d7f936e 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -159,7 +159,17 @@ concurrency control (MVCC, see ): the row version must not be deleted while it is still potentially visible to other transactions. But eventually, an outdated or deleted row version is no -longer of interest to any transaction. The space it occupies must then be +longer of interest to any transaction. + + + Note that VACUUM needs to retain tuples that are + nominally visible to transactions running + CREATE INDEX CONCURRENTLY or + REINDEX CONCURRENTLY, even when run on tables + other than the tables being indexed. + + +The space it occupies must then be reclaimed for reuse by new rows, to avoid unbounded growth of disk space requirements. This is done by running VACUUM. -- 2.20.1
Re: [DOC] Document concurrent index builds waiting on each other
On 2020-Nov-30, Anastasia Lubennikova wrote: > The commitfest is nearing the end and I wonder what is this discussion > waiting for. > It looks like the proposed patch received its fair share of review, so > I mark it as ReadyForCommitter and lay responsibility for the final > decision on them. I'll get these pushed now, thanks for the reminder.
Re: [DOC] Document concurrent index builds waiting on each other
Status update for a commitfest entry. The commitfest is nearing the end and I wonder what is this discussion waiting for. It looks like the proposed patch received its fair share of review, so I mark it as ReadyForCommitter and lay responsibility for the final decision on them. The new status of this patch is: Ready for Committer
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Oct 21, 2020 at 3:25 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > > v3-0002 needs a rebase over the create_index.sgml page due to the change > of the nearby xref to link. Attached as v4-0002 along with the original > v3-0001. > > attached... Reading the commit message on 0002 - vacuum isn't a transaction-taking command so it wouldn't interfere with itself, create index does use transactions and thus it's not surprising that it interferes with vacuum - which looks at transactions, not commands (as most of the internals would I'd presume). David J. v3-0001-Document-concurrent-indexes-waiting-on-each-other.patch Description: Binary data v4-0002-Document-vacuum-on-one-table-depending-on-concurr.patch Description: Binary data
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Sep 30, 2020 at 2:10 AM Michael Paquier wrote: > On Tue, Sep 08, 2020 at 01:25:21PM -0400, James Coleman wrote: > > Álvaro's patch confused the current state of this thread, so I'm > > reattaching (rebased) v2 as v3. > > + > + CREATE INDEX (including the > CONCURRENTLY > + option) commands are included when VACUUM > calculates what > + dead tuples are safe to remove even on tables other than the one being > indexed. > + > FWIW, this is true as well for REINDEX CONCURRENTLY because both use > the same code paths for index builds and validation, with basically > the same waiting phases. But is CREATE INDEX the correct place for > that? Wouldn't it be better to tell about such things on the VACUUM > doc? > > 0001 sounds fine to me. > > v3-0002 needs a rebase over the create_index.sgml page due to the change of the nearby xref to link. Attached as v4-0002 along with the original v3-0001. I resisted the temptation to commit my word-smithing thoughts to the affected paragraph. The word "phase" appearing out of nowhere struck me a bit oddly. "Then finally the" feels like it is missing a couple of commas - or just drop the finally. "then two table scans occur in separate transactions" reads better than "two more transactions" IMO. For 0002 maybe focus on the fact that CREATE INDEX is a global concern even though it only names a single table in any one invocation. As a consequence, while it is running, vacuum cannot bring the system's oldest xid more current than the oldest xid on any index-in-progress table (I don't know exactly how this works). And, rehasing 0001, all concurrent indexing will finish at the same time. In short maybe focus less on procedure and specific waiting states and more on the user-visible consequences. 0001 didn't really clear things up much in that regard. It reads like we are introducing a deadlock situation even though that evidently is not the case. I concur that vacuum's perspective on the create index global reach needs to be addressed there if it is not already. I'm a bit confused as to why/whether create index transactions are somehow special in this regard, compared to other transactions. I infer from the existence of 0002 that they somehow are... My conclusion thus far is that with respect to the original complaint: On 2019-09-18 13:51:00 -0400, James Coleman wrote: > In my experience it's not immediately obvious (even after reading the > documentation) the implications of how concurrent index builds manage > transactions with respect to multiple concurrent index builds in > flight at the same time. These two limited scope patches have not materially moved the needle in understanding. They are too technical when the underlying issue is comprehension by non-technical people in terms of how they see their system behave. David J.
Re: [DOC] Document concurrent index builds waiting on each other
On Tue, Sep 08, 2020 at 01:25:21PM -0400, James Coleman wrote: > Álvaro's patch confused the current state of this thread, so I'm > reattaching (rebased) v2 as v3. + + CREATE INDEX (including the CONCURRENTLY + option) commands are included when VACUUM calculates what + dead tuples are safe to remove even on tables other than the one being indexed. + FWIW, this is true as well for REINDEX CONCURRENTLY because both use the same code paths for index builds and validation, with basically the same waiting phases. But is CREATE INDEX the correct place for that? Wouldn't it be better to tell about such things on the VACUUM doc? 0001 sounds fine to me. -- Michael signature.asc Description: PGP signature
Re: [DOC] Document concurrent index builds waiting on each other
On Fri, Jul 31, 2020 at 2:51 PM James Coleman wrote: > > On Thu, Jul 16, 2020 at 7:34 PM David Johnston > wrote: > > > > The following review has been posted through the commitfest application: > > make installcheck-world: not tested > > Implements feature: not tested > > Spec compliant: not tested > > Documentation:tested, passed > > > > James, > > > > I'm on board with the point of pointing out explicitly the "concurrent > > index builds on multiple tables at the same time will not return on any one > > table until all have completed", with back-patching. I do not believe the > > new paragraph is necessary though. I'd suggest trying to weave it into the > > existing paragraph ending "Even then, however, the index may not be > > immediately usable for queries: in the worst case, it cannot be used as > > long as transactions exist that predate the start of the index build." > > Adding "Notably, " in front of the existing sentence fragment above and > > tacking it onto the end probably suffices. > > I'm not sure "the index may not be immediately usable for queries" is > really accurate/sufficient: it seems to imply the CREATE INDEX has > returned but for some reason the index isn't yet valid. The issue I'm > trying to describe here is that the CREATE INDEX query itself will not > return until all preceding queries have completed *including* > concurrent index creations on unrelated tables. > > > I don't actually don't whether this is true behavior though. Is it > > something our tests do, or could, demonstrate? > > It'd take tests that exercise parallelism, but it's pretty simple to > demonstrate (but you do have to catch the first index build in a scan > phase, so you either need lots of data or a hack). Here's an example > that uses a bit of a hack to simulate a slow scan phase: > > Setup: > create table items(i int); > create table others(i int); > create function slow_expr() returns text as $$ select pg_sleep(15); > select '5'; $$ language sql immutable; > insert into items(i) values (1), (2); > insert into others(i) values (1), (2); > > Then the following in order: > 1. In session A: create index concurrently on items((i::text || slow_expr())); > 2. In session B (at the same time): create index concurrently on others(i); > > You'll notice that the 2nd command, which should be practically > instantaneous, waits on the first ~30s scan phase of (1) before it > returns. The same is true if after (2) completes you immediately run > it again -- it waits on the second ~30s scan phase of (1). > > That does reveal a bit of complexity though that that the current > patch doesn't address, which is that this can be phase dependent (and > that complexity gets a lot more non-obvious when there's real live > activity (particularly long-running transactions) in the system as > well. > > I've attached a new patch series with two items: > 1. A simpler (and I believe more correct) doc changes for "cic blocks > cic on other tables". > 2. A patch to document that all index builds can prevent tuples from > being vacuumed away on other tables. > > If it's preferable we could commit the first and discuss the second > separately, but since that limitation was also discussed up-thread, I > decided to include them both here for now. Álvaro's patch confused the current state of this thread, so I'm reattaching (rebased) v2 as v3. James v3-0002-Document-vacuum-on-one-table-depending-on-concurr.patch Description: Binary data v3-0001-Document-concurrent-indexes-waiting-on-each-other.patch Description: Binary data
Re: [DOC] Document concurrent index builds waiting on each other
On 2020-Aug-04, Alvaro Herrera wrote: > diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h > index b20e2ad4f6..43c8ea3e31 100644 > --- a/src/include/storage/proc.h > +++ b/src/include/storage/proc.h > @@ -53,6 +53,8 @@ struct XidCache > #define PROC_IS_AUTOVACUUM 0x01/* is it an autovac > worker? */ > #define PROC_IN_VACUUM 0x02/* currently running > lazy vacuum */ > #define PROC_IN_ANALYZE 0x04/* currently running > analyze */ > +#define PROC_IN_CIC 0x40/* currently > running CREATE INDEX > + >CONCURRENTLY */ > #define PROC_VACUUM_FOR_WRAPAROUND 0x08/* set by > autovac only */ > #define PROC_IN_LOGICAL_DECODING0x10/* currently > doing logical > > * decoding outside xact */ Hah, missed to add new bit to PROC_VACUUM_STATE_MASK here. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [DOC] Document concurrent index builds waiting on each other
On 2020-Mar-25, Andres Freund wrote: > What I was thinking of was a new flag, with a distinct value from > PROC_IN_VACUUM. It'd currently just be specified in the > GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid > needing to wait for other CICs on different relations. Since CIC is not > permitted on system tables, and CIC doesn't do DML on normal tables, it > seems fairly obviously correct to exclude other CICs. Hmm, that does work, and seems a pretty small patch -- attached. Of course, some more commentary is necessary, but the theory of operation is as Andres says. (It does not solve the vacuuming problem I was describing in the other thread, only the spurious waiting that James is complaining about in this thread.) I'm going to try and poke holes on this now ... (Expression indexes with falsely immutable functions?) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From a691c48ddd5d9ff99e2f17b91777028bd5fbf36b Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 4 Aug 2020 22:04:57 -0400 Subject: [PATCH] Flag CREATE INDEX CONCURRENTLY to avoid spurious waiting --- src/backend/commands/indexcmds.c | 13 +++-- src/include/storage/proc.h | 2 ++ src/include/storage/procarray.h | 6 +- 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index a3cb3cd47f..241c8a337e 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -393,7 +393,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress) VirtualTransactionId *old_snapshots; old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false, - PROC_IS_AUTOVACUUM | PROC_IN_VACUUM, + PROC_IS_AUTOVACUUM | PROC_IN_VACUUM | PROC_IN_CIC, &n_old_snapshots); if (progress) pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots); @@ -413,7 +413,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress) newer_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false, - PROC_IS_AUTOVACUUM | PROC_IN_VACUUM, + PROC_IS_AUTOVACUUM | PROC_IN_VACUUM | PROC_IN_CIC, &n_newer_snapshots); for (j = i; j < n_old_snapshots; j++) { @@ -1420,6 +1420,9 @@ DefineIndex(Oid relationId, CommitTransactionCommand(); StartTransactionCommand(); + /* Set flag for other concurrent index builds to ignore us */ + MyPgXact->vacuumFlags |= PROC_IN_CIC; + /* * The index is now visible, so we can report the OID. */ @@ -1482,6 +1485,9 @@ DefineIndex(Oid relationId, CommitTransactionCommand(); StartTransactionCommand(); + /* Set flag for other concurrent index builds to ignore us */ + MyPgXact->vacuumFlags |= PROC_IN_CIC; + /* * Phase 3 of concurrent index build * @@ -1541,6 +1547,9 @@ DefineIndex(Oid relationId, CommitTransactionCommand(); StartTransactionCommand(); + /* Set flag for other concurrent index builds to ignore us */ + MyPgXact->vacuumFlags |= PROC_IN_CIC; + /* We should now definitely not be advertising any xmin. */ Assert(MyPgXact->xmin == InvalidTransactionId); diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index b20e2ad4f6..43c8ea3e31 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -53,6 +53,8 @@ struct XidCache #define PROC_IS_AUTOVACUUM 0x01 /* is it an autovac worker? */ #define PROC_IN_VACUUM 0x02 /* currently running lazy vacuum */ #define PROC_IN_ANALYZE 0x04 /* currently running analyze */ +#define PROC_IN_CIC 0x40 /* currently running CREATE INDEX + CONCURRENTLY */ #define PROC_VACUUM_FOR_WRAPAROUND 0x08 /* set by autovac only */ #define PROC_IN_LOGICAL_DECODING 0x10 /* currently doing logical * decoding outside xact */ diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h index a5c7d0c064..0255949307 100644 --- a/src/include/storage/procarray.h +++ b/src/include/storage/procarray.h @@ -36,6 +36,9 @@ #define PROCARRAY_SLOTS_XMIN 0x20 /* replication slot xmin, * catalog_xmin */ +#define PROCARRAY_CIC_FLAG0x40 /* currently running CREATE INDEX + * CONCURRENTLY */ + /* * Only flags in PROCARRAY_PROC_FLAGS_MASK are considered when matching * PGXACT->vacuumFlags. Other flags are used for different purposes and @@ -43,7 +46,8 @@ */ #define PROCARRAY_PROC_FLAGS_MASK (PROCARRAY_VACUUM_FLAG | \ PROCARRAY_ANALYZE_FLAG | \ - PROCARRAY_LOGICAL_DECODING_FLAG) + PROCARRAY_LOGICAL_DECODING_FLAG | \ + PROCARRAY_CIC_FLAG) /* Use the following flags as an input "flags" to GetOldestXmin function */ /* Consider all backends except for logical decoding ones which manage xmin separately */ -- 2.20.1
Re: [DOC] Document concurrent index builds waiting on each other
On Thu, Jul 16, 2020 at 7:34 PM David Johnston wrote: > > The following review has been posted through the commitfest application: > make installcheck-world: not tested > Implements feature: not tested > Spec compliant: not tested > Documentation:tested, passed > > James, > > I'm on board with the point of pointing out explicitly the "concurrent index > builds on multiple tables at the same time will not return on any one table > until all have completed", with back-patching. I do not believe the new > paragraph is necessary though. I'd suggest trying to weave it into the > existing paragraph ending "Even then, however, the index may not be > immediately usable for queries: in the worst case, it cannot be used as long > as transactions exist that predate the start of the index build." Adding > "Notably, " in front of the existing sentence fragment above and tacking it > onto the end probably suffices. I'm not sure "the index may not be immediately usable for queries" is really accurate/sufficient: it seems to imply the CREATE INDEX has returned but for some reason the index isn't yet valid. The issue I'm trying to describe here is that the CREATE INDEX query itself will not return until all preceding queries have completed *including* concurrent index creations on unrelated tables. > I don't actually don't whether this is true behavior though. Is it something > our tests do, or could, demonstrate? It'd take tests that exercise parallelism, but it's pretty simple to demonstrate (but you do have to catch the first index build in a scan phase, so you either need lots of data or a hack). Here's an example that uses a bit of a hack to simulate a slow scan phase: Setup: create table items(i int); create table others(i int); create function slow_expr() returns text as $$ select pg_sleep(15); select '5'; $$ language sql immutable; insert into items(i) values (1), (2); insert into others(i) values (1), (2); Then the following in order: 1. In session A: create index concurrently on items((i::text || slow_expr())); 2. In session B (at the same time): create index concurrently on others(i); You'll notice that the 2nd command, which should be practically instantaneous, waits on the first ~30s scan phase of (1) before it returns. The same is true if after (2) completes you immediately run it again -- it waits on the second ~30s scan phase of (1). That does reveal a bit of complexity though that that the current patch doesn't address, which is that this can be phase dependent (and that complexity gets a lot more non-obvious when there's real live activity (particularly long-running transactions) in the system as well. I've attached a new patch series with two items: 1. A simpler (and I believe more correct) doc changes for "cic blocks cic on other tables". 2. A patch to document that all index builds can prevent tuples from being vacuumed away on other tables. If it's preferable we could commit the first and discuss the second separately, but since that limitation was also discussed up-thread, I decided to include them both here for now. James v2-0001-Document-concurrent-indexes-waiting-on-each-other.patch Description: Binary data v2-0002-Document-vacuum-on-one-table-depending-on-concurr.patch Description: Binary data
Re: [DOC] Document concurrent index builds waiting on each other
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:tested, passed James, I'm on board with the point of pointing out explicitly the "concurrent index builds on multiple tables at the same time will not return on any one table until all have completed", with back-patching. I do not believe the new paragraph is necessary though. I'd suggest trying to weave it into the existing paragraph ending "Even then, however, the index may not be immediately usable for queries: in the worst case, it cannot be used as long as transactions exist that predate the start of the index build." Adding "Notably, " in front of the existing sentence fragment above and tacking it onto the end probably suffices. I don't actually don't whether this is true behavior though. Is it something our tests do, or could, demonstrate? It is sorta weird to say "one will not return until all have completed, though, since usually people think return means completed". That whole paragraph is a bit unclear for the inexperienced DBA, in particular marked ready to use but isn't usable. That isn't really on this patch to fix though, and the clarity around concurrent CIC seems worthwhile to add, even if imprecise - IMO it doesn't make that whole section any less clear and points out what seems to be a unique dynamic. IOW I would send the simple fix (inline, not a new paragraph) to a committer. The bigger doc reworking or actual behavioral improvements shouldn't hold up such a simple improvement. David J. The new status of this patch is: Waiting on Author
Re: [DOC] Document concurrent index builds waiting on each other
On Thu, Apr 16, 2020 at 6:12 PM Andres Freund wrote: > > Hi, > > On 2020-04-15 21:44:48 -0400, James Coleman wrote: > > On Wed, Apr 15, 2020 at 6:31 PM Andres Freund wrote: > > > If it's about the xmin horizon for vacuum: I think we could probably > > > avoid that using the same flag. As vacuum cannot be run against a table > > > that has a CIC running (although it'd theoretically be possible to allow > > > that), it should be safe to ignore PROC_IN_CIC backends in vacuum's > > > GetOldestXmin() call. That might not be true for system relations, but > > > we don't allow CIC on those. > > > > Yeah, I mean that if I have a CIC running on table X then vacuum can't > > remove dead tuples (from after the CIC's snapshot) on table Y. > > For me "blocking" evokes waiting for a lock, which is why I thought > you'd not mean that issue. It was sloppy choice of language on my part; for better or worse at work we've taken to talking about "blocking vacuum" when that's really shorthand for "blocking [or you'd prefer preventing] vacuuming dead tuples". > > That's a pretty significant danger, given the combination of: > > 1. Index builds on very large tables can take many days, and > > We at least don't hold a single snapshot over the multiple phases... For sure. And text sorting improvements have made this better also, still, as you often point out re: xid size, databases are only getting larger (and more TPS). > > 2. The well understood problems of high update tables with dead tuples > > and poor plans. > > Which specific problem are you referring to? The planner probing the end > of the index for values outside of the histogram? I'd hope > 3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd improved the situation there a > bit? Yes, and other commits too, IIRC from the time we spent debugging exactly the scenario mentioned in that commit. But by "poor plans" I don't mean specifically "poor planning time" but that we can still end up choosing the "wrong" plan, right? And dead tuples can make an index scan be significantly worse than it would otherwise be. Same for a seq scan: you can end up looking at millions of dead tuples in a nominally 500 row table. > > > [description why we could ignore CIC for vacuum horizon on other tables ] > > > I've previously discussed this with other hackers and the reasoning > > they'd understood way that we couldn't always safely ignore > > PROC_IN_CIC backends in the vacuum's oldest xmin call because of > > function indexes, and the fact that (despite clear recommendations to > > the contrary), there's nothing actually preventing someone from adding > > a function index on table X that queries table Y. > > Well, even if we consider this an actual problem, we could still use > PROC_IN_CIC for non-expression non-partial indexes (index operator > themselves better ensure this isn't a problem, or they're ridiculously > broken already - they can get called during vacuum). Agreed. It'd be unfortunate to have to limit it though. > Even when expressions are involved, I don't think that necessarily would > have to mean that we need to use the same snapshot to run expressions in > for the hole scan. So we could occasionally take a new snapshot for the > purpose of computing expressions. > > The hard part presumably would be that we'd need to advertise one xmin > for the expression snapshot to protect tuples potentially accessed from > being removed, but at the same time we also need to advertise the xmin > of the snapshot used by CIC, to avoid HOT pruning in other session from > removing tuple versions from the table the index is being created > on. > > There's not really infrastructure for doing so. I think we'd basically > have to start publicizing multiple xmin values (as long as PGXACT->xmin > is <= new xmin for expressions, only GetOldestXmin() would need to care, > and it's not that performance critical). Not pretty. In other words, pretty invasive. > > I'm not sure I buy that we should care about people doing something > > clearly so dangerous, but...I grant that it'd be nice not to cause new > > crashes. > > I don't think it's just dangerous expressions that would be > affected. Normal expression indexes need to be able to do syscache > lookups etc, and they can't safely do so if tuple versions can be > removed in the middle of a scan. We could avoid that by not ignoring > PROC_IN_CIC backend in GetOldestXmin() calls for catalog tables (yuck). At first glance this sounds a lot less invasive, but I also agree it's gross. James
Re: [DOC] Document concurrent index builds waiting on each other
Hi, On 2020-04-15 21:44:48 -0400, James Coleman wrote: > On Wed, Apr 15, 2020 at 6:31 PM Andres Freund wrote: > > If it's about the xmin horizon for vacuum: I think we could probably > > avoid that using the same flag. As vacuum cannot be run against a table > > that has a CIC running (although it'd theoretically be possible to allow > > that), it should be safe to ignore PROC_IN_CIC backends in vacuum's > > GetOldestXmin() call. That might not be true for system relations, but > > we don't allow CIC on those. > > Yeah, I mean that if I have a CIC running on table X then vacuum can't > remove dead tuples (from after the CIC's snapshot) on table Y. For me "blocking" evokes waiting for a lock, which is why I thought you'd not mean that issue. > That's a pretty significant danger, given the combination of: > 1. Index builds on very large tables can take many days, and We at least don't hold a single snapshot over the multiple phases... > 2. The well understood problems of high update tables with dead tuples > and poor plans. Which specific problem are you referring to? The planner probing the end of the index for values outside of the histogram? I'd hope 3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd improved the situation there a bit? > > [description why we could ignore CIC for vacuum horizon on other tables ] > I've previously discussed this with other hackers and the reasoning > they'd understood way that we couldn't always safely ignore > PROC_IN_CIC backends in the vacuum's oldest xmin call because of > function indexes, and the fact that (despite clear recommendations to > the contrary), there's nothing actually preventing someone from adding > a function index on table X that queries table Y. Well, even if we consider this an actual problem, we could still use PROC_IN_CIC for non-expression non-partial indexes (index operator themselves better ensure this isn't a problem, or they're ridiculously broken already - they can get called during vacuum). Even when expressions are involved, I don't think that necessarily would have to mean that we need to use the same snapshot to run expressions in for the hole scan. So we could occasionally take a new snapshot for the purpose of computing expressions. The hard part presumably would be that we'd need to advertise one xmin for the expression snapshot to protect tuples potentially accessed from being removed, but at the same time we also need to advertise the xmin of the snapshot used by CIC, to avoid HOT pruning in other session from removing tuple versions from the table the index is being created on. There's not really infrastructure for doing so. I think we'd basically have to start publicizing multiple xmin values (as long as PGXACT->xmin is <= new xmin for expressions, only GetOldestXmin() would need to care, and it's not that performance critical). Not pretty. > I'm not sure I buy that we should care about people doing something > clearly so dangerous, but...I grant that it'd be nice not to cause new > crashes. I don't think it's just dangerous expressions that would be affected. Normal expression indexes need to be able to do syscache lookups etc, and they can't safely do so if tuple versions can be removed in the middle of a scan. We could avoid that by not ignoring PROC_IN_CIC backend in GetOldestXmin() calls for catalog tables (yuck). Regards, Andres
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Apr 15, 2020 at 6:31 PM Andres Freund wrote: > > Hi, > > On 2020-04-15 09:31:58 -0400, James Coleman wrote: > > On Wed, Mar 25, 2020 at 3:58 PM Andres Freund wrote: > > > On 2020-03-25 16:30:10 -0300, Alvaro Herrera wrote: > > > > I posted this in November > > > > https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql but I didn't > > > > put time to go through the issues there. > > > > > > Oh, missed that. > > > > > > > > > > I don't know if my approach is exactly what Andres has in mind > > > > > > Not quite. I don't think it's generally correct for CIC to set > > > PROC_IN_VACUUM. I'm doubtful it's the case even just for plain indexes - > > > we don't want rows to be pruned away from under us. I also think we'd > > > want to set such a flag during all of the CIC phases? > > > > > > What I was thinking of was a new flag, with a distinct value from > > > PROC_IN_VACUUM. It'd currently just be specified in the > > > GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid > > > needing to wait for other CICs on different relations. Since CIC is not > > > permitted on system tables, and CIC doesn't do DML on normal tables, it > > > seems fairly obviously correct to exclude other CICs. > > > > That would keep CIC from blocking other CICs, but it wouldn't solve > > the problem of CIC blocking vacuum on unrelated tables, right? Perhaps > > that's orthogonal though. > > I am not sure what blocking you are referring to here? CIC shouldn't > block vacuum on other tables from running? Or do you just mean that > vacuum will not be able to remove some rows due to the snapshot from the > CIC? That'd be an orthogonal problem, yes. > > If it's about the xmin horizon for vacuum: I think we could probably > avoid that using the same flag. As vacuum cannot be run against a table > that has a CIC running (although it'd theoretically be possible to allow > that), it should be safe to ignore PROC_IN_CIC backends in vacuum's > GetOldestXmin() call. That might not be true for system relations, but > we don't allow CIC on those. Yeah, I mean that if I have a CIC running on table X then vacuum can't remove dead tuples (from after the CIC's snapshot) on table Y. That's a pretty significant danger, given the combination of: 1. Index builds on very large tables can take many days, and 2. The well understood problems of high update tables with dead tuples and poor plans. I've previously discussed this with other hackers and the reasoning they'd understood way that we couldn't always safely ignore PROC_IN_CIC backends in the vacuum's oldest xmin call because of function indexes, and the fact that (despite clear recommendations to the contrary), there's nothing actually preventing someone from adding a function index on table X that queries table Y. I'm not sure I buy that we should care about people doing something clearly so dangerous, but...I grant that it'd be nice not to cause new crashes. James
Re: [DOC] Document concurrent index builds waiting on each other
Hi, On 2020-04-15 09:31:58 -0400, James Coleman wrote: > On Wed, Mar 25, 2020 at 3:58 PM Andres Freund wrote: > > On 2020-03-25 16:30:10 -0300, Alvaro Herrera wrote: > > > I posted this in November > > > https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql but I didn't > > > put time to go through the issues there. > > > > Oh, missed that. > > > > > > > I don't know if my approach is exactly what Andres has in mind > > > > Not quite. I don't think it's generally correct for CIC to set > > PROC_IN_VACUUM. I'm doubtful it's the case even just for plain indexes - > > we don't want rows to be pruned away from under us. I also think we'd > > want to set such a flag during all of the CIC phases? > > > > What I was thinking of was a new flag, with a distinct value from > > PROC_IN_VACUUM. It'd currently just be specified in the > > GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid > > needing to wait for other CICs on different relations. Since CIC is not > > permitted on system tables, and CIC doesn't do DML on normal tables, it > > seems fairly obviously correct to exclude other CICs. > > That would keep CIC from blocking other CICs, but it wouldn't solve > the problem of CIC blocking vacuum on unrelated tables, right? Perhaps > that's orthogonal though. I am not sure what blocking you are referring to here? CIC shouldn't block vacuum on other tables from running? Or do you just mean that vacuum will not be able to remove some rows due to the snapshot from the CIC? That'd be an orthogonal problem, yes. If it's about the xmin horizon for vacuum: I think we could probably avoid that using the same flag. As vacuum cannot be run against a table that has a CIC running (although it'd theoretically be possible to allow that), it should be safe to ignore PROC_IN_CIC backends in vacuum's GetOldestXmin() call. That might not be true for system relations, but we don't allow CIC on those. Greetings, Andres Freund
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Mar 25, 2020 at 3:58 PM Andres Freund wrote: > > Hi, > > On 2020-03-25 16:30:10 -0300, Alvaro Herrera wrote: > > I posted this in November > > https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql but I didn't > > put time to go through the issues there. > > Oh, missed that. > > > > I don't know if my approach is exactly what Andres has in mind > > Not quite. I don't think it's generally correct for CIC to set > PROC_IN_VACUUM. I'm doubtful it's the case even just for plain indexes - > we don't want rows to be pruned away from under us. I also think we'd > want to set such a flag during all of the CIC phases? > > What I was thinking of was a new flag, with a distinct value from > PROC_IN_VACUUM. It'd currently just be specified in the > GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid > needing to wait for other CICs on different relations. Since CIC is not > permitted on system tables, and CIC doesn't do DML on normal tables, it > seems fairly obviously correct to exclude other CICs. That would keep CIC from blocking other CICs, but it wouldn't solve the problem of CIC blocking vacuum on unrelated tables, right? Perhaps that's orthogonal though. James
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Mar 25, 2020 at 05:12:48PM -0300, Alvaro Herrera wrote: > Hmm, that sounds more promising. Haven't looked at that myself in details. But as I doubt that this would be backpatched, wouldn't it be better to document the issue for now? -- Michael signature.asc Description: PGP signature
Re: [DOC] Document concurrent index builds waiting on each other
On 2020-Mar-25, Andres Freund wrote: > > I don't know if my approach is exactly what Andres has in mind > > Not quite. I don't think it's generally correct for CIC to set > PROC_IN_VACUUM. I'm doubtful it's the case even just for plain indexes - > we don't want rows to be pruned away from under us. I also think we'd > want to set such a flag during all of the CIC phases? > > What I was thinking of was a new flag, with a distinct value from > PROC_IN_VACUUM. It'd currently just be specified in the > GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid > needing to wait for other CICs on different relations. Since CIC is not > permitted on system tables, and CIC doesn't do DML on normal tables, it > seems fairly obviously correct to exclude other CICs. Hmm, that sounds more promising. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [DOC] Document concurrent index builds waiting on each other
Hi, On 2020-03-25 16:30:10 -0300, Alvaro Herrera wrote: > I posted this in November > https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql but I didn't > put time to go through the issues there. Oh, missed that. > I don't know if my approach is exactly what Andres has in mind Not quite. I don't think it's generally correct for CIC to set PROC_IN_VACUUM. I'm doubtful it's the case even just for plain indexes - we don't want rows to be pruned away from under us. I also think we'd want to set such a flag during all of the CIC phases? What I was thinking of was a new flag, with a distinct value from PROC_IN_VACUUM. It'd currently just be specified in the GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid needing to wait for other CICs on different relations. Since CIC is not permitted on system tables, and CIC doesn't do DML on normal tables, it seems fairly obviously correct to exclude other CICs. Greetings, Andres Freund
Re: [DOC] Document concurrent index builds waiting on each other
Hi, On 2020-03-25 15:24:44 -0400, James Coleman wrote: > Andres: If we got this fixed in current PG would you be opposed to > documenting the caveat in previous versions? Not really. I'm just not confident it's going to be useful, given the amount of details needed to be provided to really make sense of the issue (the earlier CIC phases don't wait for snapshots, but just relation locks etc). Greetings, Andres Freund
Re: [DOC] Document concurrent index builds waiting on each other
On 2020-Mar-25, James Coleman wrote: > Alvaro: I think you had some ideas on this too; any chance you've know > of a patch that anyone's got cooking? I posted this in November https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql but I didn't put time to go through the issues there. I don't know if my approach is exactly what Andres has in mind, but I was discouraged by the number of gotchas for which the optimization I propose has to be turned off. Maybe that preliminary patch can serve as a discussion starter, if nothing else. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Mar 25, 2020 at 3:19 PM Andres Freund wrote: > > Hi, > > On 2019-09-18 13:51:00 -0400, James Coleman wrote: > > In my experience it's not immediately obvious (even after reading the > > documentation) the implications of how concurrent index builds manage > > transactions with respect to multiple concurrent index builds in > > flight at the same time. > > > > Specifically, as I understand multiple concurrent index builds running > > at the same time will all return at the same time as the longest > > running one. > > > > I've attached a small patch to call this caveat out specifically in > > the documentation. I think the description in the patch is accurate, > > but please let me know if there's some intricacies around how the > > various stages might change the results. > > > > James Coleman > > I'd much rather see effort spent fixing this issue as far as it relates > to concurrent CICs. For the snapshot waits we can add a procarray flag > (alongside PROCARRAY_VACUUM_FLAG) indicating that the backend is > doing. Which WaitForOlderSnapshots() can then use to ignore those CICs, > which is safe, because those transactions definitely don't insert into > relations targeted by CIC. The change to WaitForOlderSnapshots() would > just be to pass the new flag to GetCurrentVirtualXIDs, I think. Alvaro: I think you had some ideas on this too; any chance you've know of a patch that anyone's got cooking? Andres: If we got this fixed in current PG would you be opposed to documenting the caveat in previous versions? Thanks, James
Re: [DOC] Document concurrent index builds waiting on each other
Hi, On 2019-09-18 13:51:00 -0400, James Coleman wrote: > In my experience it's not immediately obvious (even after reading the > documentation) the implications of how concurrent index builds manage > transactions with respect to multiple concurrent index builds in > flight at the same time. > > Specifically, as I understand multiple concurrent index builds running > at the same time will all return at the same time as the longest > running one. > > I've attached a small patch to call this caveat out specifically in > the documentation. I think the description in the patch is accurate, > but please let me know if there's some intricacies around how the > various stages might change the results. > > James Coleman I'd much rather see effort spent fixing this issue as far as it relates to concurrent CICs. For the snapshot waits we can add a procarray flag (alongside PROCARRAY_VACUUM_FLAG) indicating that the backend is doing. Which WaitForOlderSnapshots() can then use to ignore those CICs, which is safe, because those transactions definitely don't insert into relations targeted by CIC. The change to WaitForOlderSnapshots() would just be to pass the new flag to GetCurrentVirtualXIDs, I think. Greetings, Andres Freund
Re: [DOC] Document concurrent index builds waiting on each other
On Sun, Sep 29, 2019 at 9:24 PM Michael Paquier wrote: > > On Sat, Sep 28, 2019 at 10:22:28PM -0300, Alvaro Herrera wrote: > > I always thought that create index concurrently was prevented from > > running concurrently in a table by the ShareUpdateExclusive lock that's > > held during the operation. > > REINDEX CONCURRENTLY and CIC can deadlock while waiting for each other > to finish after their validation phase, see: > https://www.postgresql.org/message-id/20190507030756.gd1...@paquier.xyz > https://www.postgresql.org/message-id/20190507032543.gh1...@paquier.xyz Michael, Thanks for the cross-link. Do you think this would be valuable to document at the same time? Or did you just want to ensure we were also aware of this particular downfall? If the latter, I appreciate it, it's helpful info. If the latter, let me know, and I'll try to update the patch. Thanks, James
Re: [DOC] Document concurrent index builds waiting on each other
I went ahead and registered this in the current only CF as https://commitfest.postgresql.org/27/2454/ Alvaro: Would you like to be added as a reviewer? James
Re: [DOC] Document concurrent index builds waiting on each other
On Sat, Sep 28, 2019 at 10:22:28PM -0300, Alvaro Herrera wrote: > I always thought that create index concurrently was prevented from > running concurrently in a table by the ShareUpdateExclusive lock that's > held during the operation. REINDEX CONCURRENTLY and CIC can deadlock while waiting for each other to finish after their validation phase, see: https://www.postgresql.org/message-id/20190507030756.gd1...@paquier.xyz https://www.postgresql.org/message-id/20190507032543.gh1...@paquier.xyz -- Michael signature.asc Description: PGP signature
Re: [DOC] Document concurrent index builds waiting on each other
On 2019-Sep-28, James Coleman wrote: > I believe caveats like this are worth calling out rather than > expecting users to have to understand the implementation details an > work out the implications on their own. I agree. > I read Alvaro as referring to the fact that the docs already call out > the following: > > > Regular index builds permit other regular index builds on the same > > table to occur simultaneously, but only one concurrent index build > > can occur on a table at a time. Yeah, that's what I was understanding. BTW I think there's an approach that could alleviate part of this problem, at least some of the time: whenever CIC runs for an index that's not on expression and not partial, we could set the PROC_IN_VACUUM flag. That would cause it to get ignored by other processes for snapshot purposes (including CIC itself), as well as by vacuum. I need to take some time to research the safety of this, but intuitively it seems safe. Even further, I think we could also do it for regular CREATE INDEX (under the same conditions) provided that it's not run in a transaction block. But that requires even more research/proof. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [DOC] Document concurrent index builds waiting on each other
On Sat, Sep 28, 2019 at 9:56 PM Bruce Momjian wrote: > > On Sat, Sep 28, 2019 at 09:54:48PM -0400, James Coleman wrote: > > On Sat, Sep 28, 2019 at 9:22 PM Alvaro Herrera > > wrote: > > > > > > On 2019-Sep-28, Bruce Momjian wrote: > > > > > > > The CREATE INDEX docs already say: > > > > > > > > In a concurrent index build, the index is actually entered into > > > > the system catalogs in one transaction, then two table scans occur > > > > in > > > > two more transactions. Before each table scan, the index build must > > > > wait for existing transactions that have modified the table to > > > > terminate. > > > > After the second scan, the index build must wait for any > > > > transactions > > > > --> that have a snapshot (see ) predating the > > > > second > > > > --> scan to terminate. Then finally the index can be marked ready for > > > > use, > > > > > > > > So, having multiple concurrent index scans is just a special case of > > > > having to "wait for any transactions that have a snapshot", no? I am > > > > not sure adding a doc mention of other index builds really is helpful. While that may be technically true, as a co-worker of mine likes to point out, being "technically correct" is the worst kind of correct. Here's what I mean: First, I believe the docs should aim to be as useful as possible to even those with more entry-level understanding of PostgreSQL. The fact the paragraph you cite actually links to the entire chapter on concurrency control in Postgres demonstrates that there's some not-so-immediate stuff here to consider. For one: is it obvious to all users that the transaction held by CIC (or even that all transactions) has an open snapshot? Second, this is a difference from a regular CREATE INDEX, and we already call out as caveats differences between CREATE INDEX CONCURRENTLY and regular CREATE INDEX as I point out below re: Alvaro's comment. Third, related to the above point, many DDL commands only block DDL against the table being operated on. The fact that CIC here is different is, in my opinion, a fairly surprising break from that pattern, and as such likely to catch users off guard. I can attest that this surprised at least one entire database team a while back :) including many people who've been operating Postgres at a large scale for a long time. I believe caveats like this are worth calling out rather than expecting users to have to understand the implementation details an work out the implications on their own. > > > I always thought that create index concurrently was prevented from > > > running concurrently in a table by the ShareUpdateExclusive lock that's > > > held during the operation. > > > > You mean multiple CICs on a single table at the same time? Yes, that > > (unfortunately) isn't possible, but I'm concerned in the patch with > > the fact that CIC on table X blocks CIC on table Y. > > I think any open transaction will block CIC, which is my point. I read Alvaro as referring to the fact that the docs already call out the following: > Regular index builds permit other regular index builds on the same table to > occur simultaneously, but only one concurrent index build can occur on a > table at a time. James
Re: [DOC] Document concurrent index builds waiting on each other
On Sat, Sep 28, 2019 at 09:54:48PM -0400, James Coleman wrote: > On Sat, Sep 28, 2019 at 9:22 PM Alvaro Herrera > wrote: > > > > On 2019-Sep-28, Bruce Momjian wrote: > > > > > The CREATE INDEX docs already say: > > > > > > In a concurrent index build, the index is actually entered into > > > the system catalogs in one transaction, then two table scans occur in > > > two more transactions. Before each table scan, the index build must > > > wait for existing transactions that have modified the table to > > > terminate. > > > After the second scan, the index build must wait for any transactions > > > --> that have a snapshot (see ) predating the second > > > --> scan to terminate. Then finally the index can be marked ready for > > > use, > > > > > > So, having multiple concurrent index scans is just a special case of > > > having to "wait for any transactions that have a snapshot", no? I am > > > not sure adding a doc mention of other index builds really is helpful. > > > > I always thought that create index concurrently was prevented from > > running concurrently in a table by the ShareUpdateExclusive lock that's > > held during the operation. > > You mean multiple CICs on a single table at the same time? Yes, that > (unfortunately) isn't possible, but I'm concerned in the patch with > the fact that CIC on table X blocks CIC on table Y. I think any open transaction will block CIC, which is my point. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: [DOC] Document concurrent index builds waiting on each other
On Sat, Sep 28, 2019 at 9:22 PM Alvaro Herrera wrote: > > On 2019-Sep-28, Bruce Momjian wrote: > > > The CREATE INDEX docs already say: > > > > In a concurrent index build, the index is actually entered into > > the system catalogs in one transaction, then two table scans occur in > > two more transactions. Before each table scan, the index build must > > wait for existing transactions that have modified the table to > > terminate. > > After the second scan, the index build must wait for any transactions > > --> that have a snapshot (see ) predating the second > > --> scan to terminate. Then finally the index can be marked ready for use, > > > > So, having multiple concurrent index scans is just a special case of > > having to "wait for any transactions that have a snapshot", no? I am > > not sure adding a doc mention of other index builds really is helpful. > > I always thought that create index concurrently was prevented from > running concurrently in a table by the ShareUpdateExclusive lock that's > held during the operation. You mean multiple CICs on a single table at the same time? Yes, that (unfortunately) isn't possible, but I'm concerned in the patch with the fact that CIC on table X blocks CIC on table Y. James
Re: [DOC] Document concurrent index builds waiting on each other
On 2019-Sep-28, Bruce Momjian wrote: > The CREATE INDEX docs already say: > > In a concurrent index build, the index is actually entered into > the system catalogs in one transaction, then two table scans occur in > two more transactions. Before each table scan, the index build must > wait for existing transactions that have modified the table to terminate. > After the second scan, the index build must wait for any transactions > --> that have a snapshot (see ) predating the second > --> scan to terminate. Then finally the index can be marked ready for use, > > So, having multiple concurrent index scans is just a special case of > having to "wait for any transactions that have a snapshot", no? I am > not sure adding a doc mention of other index builds really is helpful. I always thought that create index concurrently was prevented from running concurrently in a table by the ShareUpdateExclusive lock that's held during the operation. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [DOC] Document concurrent index builds waiting on each other
On Wed, Sep 18, 2019 at 01:51:00PM -0400, James Coleman wrote: > In my experience it's not immediately obvious (even after reading the > documentation) the implications of how concurrent index builds manage > transactions with respect to multiple concurrent index builds in > flight at the same time. > > Specifically, as I understand multiple concurrent index builds running > at the same time will all return at the same time as the longest > running one. > > I've attached a small patch to call this caveat out specifically in > the documentation. I think the description in the patch is accurate, > but please let me know if there's some intricacies around how the > various stages might change the results. The CREATE INDEX docs already say: In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then two table scans occur in two more transactions. Before each table scan, the index build must wait for existing transactions that have modified the table to terminate. After the second scan, the index build must wait for any transactions --> that have a snapshot (see ) predating the second --> scan to terminate. Then finally the index can be marked ready for use, So, having multiple concurrent index scans is just a special case of having to "wait for any transactions that have a snapshot", no? I am not sure adding a doc mention of other index builds really is helpful. --- > commit 9e28e704820eebb81ff94c1c3cbfb7db087b2c45 > Author: James Coleman > Date: Wed Sep 18 13:36:22 2019 -0400 > > Document concurrent indexes waiting on each other > > It's not immediately obvious that because concurrent index building > waits on previously running transactions to complete, running multiple > concurrent index builds at the same time will result in each of them > taking as long to return as the longest takes, so, document this caveat. > > diff --git a/doc/src/sgml/ref/create_index.sgml > b/doc/src/sgml/ref/create_index.sgml > index 629a31ef79..35f15abb0e 100644 > --- a/doc/src/sgml/ref/create_index.sgml > +++ b/doc/src/sgml/ref/create_index.sgml > @@ -616,6 +616,13 @@ Indexes: > cannot. > > > + > +Because the second table scan must wait for any transactions having a > +snapshot preceding the start of that scan to finish before completing the > +scan, concurrent index builds on multiple tables at the same time will > +not return on any one table until all have completed. > + > + > > Concurrent builds for indexes on partitioned tables are currently not > supported. However, you may concurrently build the index on each -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
[DOC] Document concurrent index builds waiting on each other
In my experience it's not immediately obvious (even after reading the documentation) the implications of how concurrent index builds manage transactions with respect to multiple concurrent index builds in flight at the same time. Specifically, as I understand multiple concurrent index builds running at the same time will all return at the same time as the longest running one. I've attached a small patch to call this caveat out specifically in the documentation. I think the description in the patch is accurate, but please let me know if there's some intricacies around how the various stages might change the results. James Coleman commit 9e28e704820eebb81ff94c1c3cbfb7db087b2c45 Author: James Coleman Date: Wed Sep 18 13:36:22 2019 -0400 Document concurrent indexes waiting on each other It's not immediately obvious that because concurrent index building waits on previously running transactions to complete, running multiple concurrent index builds at the same time will result in each of them taking as long to return as the longest takes, so, document this caveat. diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 629a31ef79..35f15abb0e 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -616,6 +616,13 @@ Indexes: cannot. + +Because the second table scan must wait for any transactions having a +snapshot preceding the start of that scan to finish before completing the +scan, concurrent index builds on multiple tables at the same time will +not return on any one table until all have completed. + + Concurrent builds for indexes on partitioned tables are currently not supported. However, you may concurrently build the index on each