Re: Intermittent buildfarm failures on wrasse
I was looking at backpatching this to pg13. That made me realize that commit dc7420c2c927 changed things in 14; and before that commit, the bitmask that is checked is PROCARRAY_FLAGS_VACUUM, which has a definition independently from whatever proc.h says. As far as I can tell, there's no problem with the patches I post here (the backpatched version for pg13 and p14). But's it's something to be aware of; and if we do want to add the additional bits to the bitmask, we should do that in a separate master-only commit. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ >From 6595af9af208de33ab431b502e2f178f7d0f8007 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Sat, 14 May 2022 16:51:23 +0200 Subject: [PATCH] Repurpose PROC_COPYABLE_FLAGS as PROC_XMIN_FLAGS This is a slight, convenient semantics change from what commit 0f0cfb494004 introduced that lets us simplify the coding in the one place where it is used. --- src/backend/storage/ipc/procarray.c | 11 ++- src/include/storage/proc.h | 7 +++ 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 6ff8d8699b..447b6e3de7 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -1916,12 +1916,13 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc) TransactionIdIsNormal(xid) && TransactionIdPrecedesOrEquals(xid, xmin)) { - /* Install xmin */ + /* + * Install xmin and propagate the vacuumFlags that affect how the + * value is interpreted by vacuum. + */ MyPgXact->xmin = TransactionXmin = xmin; - - /* Flags being copied must be valid copy-able flags. */ - Assert((pgxact->vacuumFlags & (~PROC_COPYABLE_FLAGS)) == 0); - MyPgXact->vacuumFlags = pgxact->vacuumFlags; + MyPgXact->vacuumFlags = (MyPgXact->vacuumFlags & ~PROC_XMIN_FLAGS) | + (pgxact->vacuumFlags & PROC_XMIN_FLAGS); result = true; } diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 9cf9684b41..7c85b5645b 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -63,11 +63,10 @@ struct XidCache (PROC_IN_VACUUM | PROC_IN_ANALYZE | PROC_VACUUM_FOR_WRAPAROUND) /* - * Flags that are valid to copy from another proc, the parallel leader - * process in practice. Currently, a flag that is set during parallel - * vacuum is allowed. + * Xmin-related flags. Make sure any flags that affect how the process' Xmin + * value is interpreted by VACUUM are included here. */ -#define PROC_COPYABLE_FLAGS (PROC_IN_VACUUM) +#define PROC_XMIN_FLAGS (PROC_IN_VACUUM) /* * We allow a small number of "weak" relation locks (AccessShareLock, -- 2.30.2 >From 35197d187fe3e7c9e4bd563396c9c7459e39a7ff Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Sat, 14 May 2022 16:51:23 +0200 Subject: [PATCH] Repurpose PROC_COPYABLE_FLAGS as PROC_XMIN_FLAGS This is a slight, convenient semantics change from what commit 0f0cfb494004 introduced that lets us simplify the coding in the one place where it is used. --- src/backend/storage/ipc/procarray.c | 11 ++- src/include/storage/proc.h | 7 +++ 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 127be9c017..08053a7e8f 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -2686,12 +2686,13 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc) TransactionIdIsNormal(xid) && TransactionIdPrecedesOrEquals(xid, xmin)) { - /* Install xmin */ + /* + * Install xmin and propagate the statusFlags that affect how the + * value is interpreted by vacuum. + */ MyProc->xmin = TransactionXmin = xmin; - - /* Flags being copied must be valid copy-able flags. */ - Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); - MyProc->statusFlags = proc->statusFlags; + MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) | + (proc->statusFlags & PROC_XMIN_FLAGS); ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; result = true; diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 11b514c9ae..1464fad9b9 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -66,11 +66,10 @@ struct XidCache (PROC_IN_VACUUM | PROC_IN_SAFE_IC | PROC_VACUUM_FOR_WRAPAROUND) /* - * Flags that are valid to copy from another proc, the parallel leader - * process in practice. Currently, flags that are set during parallel - * vacuum and parallel index creation are allowed. + * Xmin-related flags. Make sure any flags that affect how the process' Xmin + * value is interpreted by VACUUM are included here. */ -#define PROC_COPYABLE_FLAGS (PROC_IN_VACUUM | PROC_IN_SAFE_IC) +#define PROC_XMIN_FLAGS (PROC_IN_VACUUM | PROC_IN_SAFE_IC) /* * We allow a small number of "weak" relation locks
Re: Intermittent buildfarm failures on wrasse
On Sun, May 15, 2022 at 12:29 AM Alvaro Herrera wrote: > > On 2022-Apr-20, Masahiko Sawada wrote: > > > > MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) | > > > (proc->statusFlags & PROC_XMIN_FLAGS); > > > > > > Perhaps the latter is more future-proof. > > > Copying only xmin-related flags in this way also makes sense to me and > > there is no problem at least for now. A note would be that when we > > introduce a new flag that needs to be copied in the future, we need to > > make sure to add it to PROC_XMIN_FLAGS so it is copied. Otherwise a > > similar issue we fixed by 0f0cfb494004befb0f6e could happen again. > > OK, done this way -- patch attached. Thank you for updating the patch. > > Reading the comment I wrote about it, I wonder if flags > PROC_AFFECTS_ALL_HORIZONS and PROC_IN_LOGICAL_DECODING should also be > included. I think the only reason we don't care at this point is that > walsenders (logical or otherwise) do not engage in snapshot copying. > But if we were to implement usage of parallel workers sharing a common > snapshot to do table sync in parallel, then it ISTM it would be > important to copy at least the latter. Not sure there are any cases > were we might care about the former. Yeah, it seems to be inconsistent between the comment (and the new name) and the flags actually included. I think we can include all xmin-related flags to PROC_XMIN_FLAGS as the comment says. That way, it would be useful also for other use cases, and I don't see any downside for now. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Intermittent buildfarm failures on wrasse
On 2022-Apr-20, Masahiko Sawada wrote: > > MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) | > > (proc->statusFlags & PROC_XMIN_FLAGS); > > > > Perhaps the latter is more future-proof. > Copying only xmin-related flags in this way also makes sense to me and > there is no problem at least for now. A note would be that when we > introduce a new flag that needs to be copied in the future, we need to > make sure to add it to PROC_XMIN_FLAGS so it is copied. Otherwise a > similar issue we fixed by 0f0cfb494004befb0f6e could happen again. OK, done this way -- patch attached. Reading the comment I wrote about it, I wonder if flags PROC_AFFECTS_ALL_HORIZONS and PROC_IN_LOGICAL_DECODING should also be included. I think the only reason we don't care at this point is that walsenders (logical or otherwise) do not engage in snapshot copying. But if we were to implement usage of parallel workers sharing a common snapshot to do table sync in parallel, then it ISTM it would be important to copy at least the latter. Not sure there are any cases were we might care about the former. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Every machine is a smoke machine if you operate it wrong enough." https://twitter.com/libseybieda/status/1541673325781196801 >From 95bd3bf62992987e1d6e078520ff76133248579e Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Sat, 14 May 2022 16:51:23 +0200 Subject: [PATCH v2] Repurpose PROC_COPYABLE_FLAGS as PROC_XMIN_FLAGS This is a slight, convenient semantics change from what commit 0f0cfb494004 introduced that lets us simplify the coding in the one place where it is used. --- src/backend/storage/ipc/procarray.c | 17 +++-- src/include/storage/proc.h | 7 +++ 2 files changed, 10 insertions(+), 14 deletions(-) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index ca22336e35..cd58c5faf0 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -2685,17 +2685,14 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc) TransactionIdIsNormal(xid) && TransactionIdPrecedesOrEquals(xid, xmin)) { - /* Install xmin */ + /* + * Install xmin and propagate the statusFlags that affect how the + * value is interpreted by vacuum. + */ MyProc->xmin = TransactionXmin = xmin; - - /* walsender cheats by passing proc == MyProc, don't check its flags */ - if (proc != MyProc) - { - /* Flags being copied must be valid copy-able flags. */ - Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); - MyProc->statusFlags = proc->statusFlags; - ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; - } + MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) | + (proc->statusFlags & PROC_XMIN_FLAGS); + ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; result = true; } diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 15be21c00a..2579e619eb 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -69,11 +69,10 @@ struct XidCache (PROC_IN_VACUUM | PROC_IN_SAFE_IC | PROC_VACUUM_FOR_WRAPAROUND) /* - * Flags that are valid to copy from another proc, the parallel leader - * process in practice. Currently, flags that are set during parallel - * vacuum and parallel index creation are allowed. + * Xmin-related flags. Make sure any flags that affect how the process' Xmin + * value is interpreted by VACUUM are included here. */ -#define PROC_COPYABLE_FLAGS (PROC_IN_VACUUM | PROC_IN_SAFE_IC) +#define PROC_XMIN_FLAGS (PROC_IN_VACUUM | PROC_IN_SAFE_IC) /* * We allow a small number of "weak" relation locks (AccessShareLock, -- 2.30.2
Re: Intermittent buildfarm failures on wrasse
On Wed, Apr 20, 2022 at 3:29 AM Tom Lane wrote: > > Alvaro Herrera writes: > > On 2022-Apr-15, Tom Lane wrote: > >> Here's a WIP patch for that. The only exciting thing in it is that > >> because of some undocumented cowboy programming in walsender.c, the > >> Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); > >> in ProcArrayInstallRestoredXmin fires unless we skip that. > > > Hmm, maybe a better use of that define is to use to select which flags > > to copy, rather than to ensure we they are the only ones set. What > > about this? > > Yeah, I thought about that too, but figured that the author probably > had a reason for writing the assertion the way it was. The motivation behind the assertion was that when we copy whole statusFlags from the leader process to the worker process we want to make sure that the flags we're copying is a known subset of the flags that are valid to copy from the leader. > If we want > to redefine PROC_COPYABLE_FLAGS as "flags associated with xmin", > that's fine by me. But I'd suggest that both the name of the macro > and the comment for it in proc.h should be revised to match that > definition. > > Another point is that as you've coded it, the code doesn't so much > copy those flags as union them with whatever the recipient had, > which seems wrong. I could go with either > > Assert(!(MyProc->statusFlags & PROC_XMIN_FLAGS)); > MyProc->statusFlags |= (proc->statusFlags & PROC_XMIN_FLAGS); > > or > > MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) | > (proc->statusFlags & PROC_XMIN_FLAGS); > > Perhaps the latter is more future-proof. Copying only xmin-related flags in this way also makes sense to me and there is no problem at least for now. A note would be that when we introduce a new flag that needs to be copied in the future, we need to make sure to add it to PROC_XMIN_FLAGS so it is copied. Otherwise a similar issue we fixed by 0f0cfb494004befb0f6e could happen again. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Intermittent buildfarm failures on wrasse
Alvaro Herrera writes: > On 2022-Apr-15, Tom Lane wrote: >> Here's a WIP patch for that. The only exciting thing in it is that >> because of some undocumented cowboy programming in walsender.c, the >> Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); >> in ProcArrayInstallRestoredXmin fires unless we skip that. > Hmm, maybe a better use of that define is to use to select which flags > to copy, rather than to ensure we they are the only ones set. What > about this? Yeah, I thought about that too, but figured that the author probably had a reason for writing the assertion the way it was. If we want to redefine PROC_COPYABLE_FLAGS as "flags associated with xmin", that's fine by me. But I'd suggest that both the name of the macro and the comment for it in proc.h should be revised to match that definition. Another point is that as you've coded it, the code doesn't so much copy those flags as union them with whatever the recipient had, which seems wrong. I could go with either Assert(!(MyProc->statusFlags & PROC_XMIN_FLAGS)); MyProc->statusFlags |= (proc->statusFlags & PROC_XMIN_FLAGS); or MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) | (proc->statusFlags & PROC_XMIN_FLAGS); Perhaps the latter is more future-proof. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On 2022-Apr-15, Tom Lane wrote: > Here's a WIP patch for that. The only exciting thing in it is that > because of some undocumented cowboy programming in walsender.c, the > Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); > in ProcArrayInstallRestoredXmin fires unless we skip that. Hmm, maybe a better use of that define is to use to select which flags to copy, rather than to ensure we they are the only ones set. What about this? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "¿Qué importan los años? Lo que realmente importa es comprobar que a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 25c310f675..4347941568 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -2685,17 +2685,14 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc) TransactionIdIsNormal(xid) && TransactionIdPrecedesOrEquals(xid, xmin)) { - /* Install xmin */ + /* + * Install xmin. In addition, propagate statusFlags that affect how + * the value is interpreted by vacuum. + */ MyProc->xmin = TransactionXmin = xmin; - /* walsender cheats by passing proc == MyProc, don't check its flags */ - if (proc != MyProc) - { - /* Flags being copied must be valid copy-able flags. */ - Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); - MyProc->statusFlags = proc->statusFlags; - ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; - } + MyProc->statusFlags |= (proc->statusFlags & PROC_COPYABLE_FLAGS); + ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; result = true; }
Re: Intermittent buildfarm failures on wrasse
On Sun, Apr 17, 2022 at 7:36 AM Andres Freund wrote: > > Part of the problem here is that we determine VACUUM's FreezeLimit by > > calculating `OldestXmin - vacuum_freeze_min_age` (more or less [1]). > > What the message outputs is OldestXmin and not FreezeLimit though. My higher level point is that there is a general tendency to assume that OldestXmin is the same thing as NextXID, which it isn't. It's an easy enough mistake to make, though, in part because they're usually quite close together. The "Routine Vacuuming" docs seem to suggest that they're the same thing, or at least that's what I take away from the following sentence: "This implies that if a table is not otherwise vacuumed, autovacuum will be invoked on it approximately once every autovacuum_freeze_max_age minus vacuum_freeze_min_age transactions". > And FreezeLimit doesn't affect "dead but not yet removable". But OldestXmin affects FreezeLimit. Anyway, I'm not opposed to showing the age at the start as well. But from the point of view of issues like this tenk1 issue, it would be more useful to just report on new_rel_allvisible. It would also be more useful to users. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-15 11:12:34 -0700, Peter Geoghegan wrote: > On Fri, Apr 15, 2022 at 10:43 AM Andres Freund wrote: > > I think it'd be interesting - particularly for large relations or when > > looking to adjust autovac cost limits. > > > Something like: > > removable cutoff: %u, age at start: %u, age at end: %u... > > Part of the problem here is that we determine VACUUM's FreezeLimit by > calculating `OldestXmin - vacuum_freeze_min_age` (more or less [1]). What the message outputs is OldestXmin and not FreezeLimit though. And FreezeLimit doesn't affect "dead but not yet removable". IOW, the following might be right, but that seems independent of improving the output of diff = (int32) (ReadNextTransactionId() - OldestXmin); appendStringInfo(, _("removable cutoff: %u, which was %d XIDs old when operation ended\n"), OldestXmin, diff); > Why should we do less freezing due to the presence of an old snapshot? > Sure, that has to happen with those XIDs that are fundamentally > ineligible for freezing due to the presence of the old snapshot -- but > what about those XIDs that *are* eligible, and still don't get frozen > at first? > > We should determine FreezeLimit by calculating `NextXID - > vacuum_freeze_min_age ` instead (and then clamp, to make sure that > it's always <= OldestXmin). That approach would make our final > FreezeLimit "strictly age-based". > > [1] We do something a bit like this when OldestXmin is already very > old -- then FreezeLimit is the same value as OldestXmin (see WARNING > from vacuum_set_xid_limits() function). That's better than nothing, > but doesn't change the fact that our general approach to calculating > FreezeLimit makes little sense. Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
Andres Freund writes: > On April 15, 2022 2:14:47 PM EDT, Tom Lane wrote: >> I could use some help filling in the XXX comments, because it's far >> from clear to me *why* walsenders need this to happen. > If you want to commit before: The reason is that walsenders use their xmin > to represent the xmin of standbys when using hot_standby_feedback. Since > we're only transmitting global horizons up from standbys, it has to influence > globally (and it would be hard to represent per db horizons anyway). Got it. I rewrote the comments and pushed. Noah, it should be safe to turn wrasse back on. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Hi, On April 15, 2022 2:14:47 PM EDT, Tom Lane wrote: >Andres Freund writes: >> On 2022-04-15 12:36:52 -0400, Tom Lane wrote: >>> Yeah, I was also thinking about a flag in PGPROC being a more reliable >>> way to do this. Is there anything besides walsenders that should set >>> that flag? > >> Not that I can think of. It's only because of hs_feedback that we need >> to. I guess it's possible that somebody build some extension that needs >> something similar, but then they'd need to set that flag... > >Here's a WIP patch for that. The only exciting thing in it is that >because of some undocumented cowboy programming in walsender.c, the > Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); >in ProcArrayInstallRestoredXmin fires unless we skip that. > >I could use some help filling in the XXX comments, because it's far >from clear to me *why* walsenders need this to happen. I'm out for the rest of the day due to family events (visiting my girlfriend's parents till Wednesday), I can take a stab at formulating something after. If you want to commit before: The reason is that walsenders use their xmin to represent the xmin of standbys when using hot_standby_feedback. Since we're only transmitting global horizons up from standbys, it has to influence globally (and it would be hard to represent per db horizons anyway). Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Intermittent buildfarm failures on wrasse
Andres Freund writes: > On 2022-04-15 12:36:52 -0400, Tom Lane wrote: >> Yeah, I was also thinking about a flag in PGPROC being a more reliable >> way to do this. Is there anything besides walsenders that should set >> that flag? > Not that I can think of. It's only because of hs_feedback that we need > to. I guess it's possible that somebody build some extension that needs > something similar, but then they'd need to set that flag... Here's a WIP patch for that. The only exciting thing in it is that because of some undocumented cowboy programming in walsender.c, the Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); in ProcArrayInstallRestoredXmin fires unless we skip that. I could use some help filling in the XXX comments, because it's far from clear to me *why* walsenders need this to happen. regards, tom lane diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index be40261393..a04a4a8e5d 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -285,6 +285,15 @@ InitWalSender(void) MarkPostmasterChildWalSender(); SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE); + /* + * Also, mark ourselves in PGPROC as affecting vacuum horizons in all + * databases. XXX explain why + */ + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + MyProc->statusFlags |= PROC_AFFECTS_ALL_HORIZONS; + ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; + LWLockRelease(ProcArrayLock); + /* Initialize empty timestamp buffer for lag tracking. */ lag_tracker = MemoryContextAllocZero(TopMemoryContext, sizeof(LagTracker)); } diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index f6e98aae29..8b867bd56c 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -1821,10 +1821,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h) * to prune still needed data away). If the current backend never * connects to a database that is harmless, because * data_oldest_nonremovable will never be utilized. + * + * XXX explain PROC_AFFECTS_ALL_HORIZONS, too */ if (in_recovery || MyDatabaseId == InvalidOid || proc->databaseId == MyDatabaseId || - proc->databaseId == 0) /* always include WalSender */ + (statusFlags & PROC_AFFECTS_ALL_HORIZONS)) { /* * We can ignore this backend if it's running CREATE INDEX @@ -2681,10 +2683,14 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc) /* Install xmin */ MyProc->xmin = TransactionXmin = xmin; - /* Flags being copied must be valid copy-able flags. */ - Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); - MyProc->statusFlags = proc->statusFlags; - ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; + /* walsender cheats by passing proc == MyProc, don't check flags */ + if (proc != MyProc) + { + /* Flags being copied must be valid copy-able flags. */ + Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); + MyProc->statusFlags = proc->statusFlags; + ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; + } result = true; } diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 809b74db5e..e132665938 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -60,6 +60,9 @@ struct XidCache #define PROC_VACUUM_FOR_WRAPAROUND 0x08 /* set by autovac only */ #define PROC_IN_LOGICAL_DECODING 0x10 /* currently doing logical * decoding outside xact */ +#define PROC_AFFECTS_ALL_HORIZONS 0x20 /* proc's xmin must be + * included in vacuum horizons + * in all databases */ /* flags reset at EOXact */ #define PROC_VACUUM_STATE_MASK \
Re: Intermittent buildfarm failures on wrasse
On Fri, Apr 15, 2022 at 10:43 AM Andres Freund wrote: > I think it'd be interesting - particularly for large relations or when > looking to adjust autovac cost limits. > Something like: > removable cutoff: %u, age at start: %u, age at end: %u... Part of the problem here is that we determine VACUUM's FreezeLimit by calculating `OldestXmin - vacuum_freeze_min_age` (more or less [1]). Why should we do less freezing due to the presence of an old snapshot? Sure, that has to happen with those XIDs that are fundamentally ineligible for freezing due to the presence of the old snapshot -- but what about those XIDs that *are* eligible, and still don't get frozen at first? We should determine FreezeLimit by calculating `NextXID - vacuum_freeze_min_age ` instead (and then clamp, to make sure that it's always <= OldestXmin). That approach would make our final FreezeLimit "strictly age-based". [1] We do something a bit like this when OldestXmin is already very old -- then FreezeLimit is the same value as OldestXmin (see WARNING from vacuum_set_xid_limits() function). That's better than nothing, but doesn't change the fact that our general approach to calculating FreezeLimit makes little sense. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-15 10:23:56 -0700, Peter Geoghegan wrote: > On Fri, Apr 15, 2022 at 10:15 AM Tom Lane wrote: > > > As well as the age of OldestXmin at the start of VACUUM. > > > > Is it worth capturing and logging both of those numbers? Why is > > the age at the end more interesting than the age at the start? > > As Andres said, that's often more interesting because most of the time > OldestXmin is not held back by much (not enough to matter). I think it'd be interesting - particularly for large relations or when looking to adjust autovac cost limits. It's not rare for autovac to take long enough that another autovac is necessary immediately again. Also helps to interpret the "dead but not yet removable" counts. Something like: removable cutoff: %u, age at start: %u, age at end: %u... Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
On Fri, Apr 15, 2022 at 10:15 AM Tom Lane wrote: > > As well as the age of OldestXmin at the start of VACUUM. > > Is it worth capturing and logging both of those numbers? Why is > the age at the end more interesting than the age at the start? As Andres said, that's often more interesting because most of the time OldestXmin is not held back by much (not enough to matter). Users will often look at the output of successive related VACUUM operations. Often the way things change over time is much more interesting than the details at any particular point in time. Especially in the kinds of extreme cases I'm thinking about. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > On Fri, Apr 15, 2022 at 10:05 AM Andres Freund wrote: >> The other shows >> the age of OldestXmin at the end of the vacuum. Which is influenced by >> what's currently running. > As well as the age of OldestXmin at the start of VACUUM. Is it worth capturing and logging both of those numbers? Why is the age at the end more interesting than the age at the start? regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Fri, Apr 15, 2022 at 10:05 AM Andres Freund wrote: > I don't think they're actually that comparable. One shows how much > relfrozenxid advanced, to a large degree influenced by the time between > aggressive (or "unintentionally aggressive") vacuums. It matters more in the extreme cases. The most recent possible value for our new relfrozenxid is OldestXmin/removable cutoff. So when something holds back OldestXmin, it also holds back new relfrozenxid values. > The other shows > the age of OldestXmin at the end of the vacuum. Which is influenced by > what's currently running. As well as the age of OldestXmin at the start of VACUUM. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-15 09:29:20 -0700, Peter Geoghegan wrote: > On Fri, Apr 15, 2022 at 8:14 AM Tom Lane wrote: > > BTW, before I forget: the wording of this log message is just awful. > > On first sight, I thought that it meant that we'd computed OldestXmin > > a second time and discovered that it advanced by 26 xids while the VACUUM > > was running. > > > "removable cutoff: %u, which was %d xids old when operation ended\n" > > How the output appears when placed right before the output describing > how VACUUM advanced relfrozenxid is an important consideration. I want > the format and wording that we use to imply a relationship between > these two things. Right now, that other line looks like this: > > "new relfrozenxid: %u, which is %d xids ahead of previous value\n" > > Do you think that this juxtaposition works well? I don't think they're actually that comparable. One shows how much relfrozenxid advanced, to a large degree influenced by the time between aggressive (or "unintentionally aggressive") vacuums. The other shows the age of OldestXmin at the end of the vacuum. Which is influenced by what's currently running. Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
I wrote: > Um, this is the logical replication launcher, not the autovac launcher. > Your observation that a sleep in get_database_list() reproduces it > confirms that, and I don't entirely see why the timing of the LR launcher > would have changed. Oh, to clarify: I misread "get_database_list()" as "get_subscription_list()", which is the part of the LR launcher startup that causes the problem for me. So what we actually have confirmed is that BOTH of those launchers are problematic for this. And AFAICS neither of them needs to be causing horizon adjustments for non-shared tables. (It's possible that the AV launcher is responsible in some of wrasse's reports, but it's been the LR launcher in five out of five sufficiently-instrumented failures for me.) regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Hi, (Sent again, somehow my editor started to sometimes screw up mail headers, and ate the From:, sorry for the duplicate) On 2022-04-15 12:36:52 -0400, Tom Lane wrote: > Andres Freund writes: > > On April 15, 2022 11:23:40 AM EDT, Tom Lane wrote: > >> The something is the logical replication launcher. In the failing runs, > >> it is advertising xmin = 724 (the post-initdb NextXID) and continues to > >> do so well past the point where tenk1 gets vacuumed. > > > That explains it. Before shmstat autovac needed to wait for the stats > > collector to write out stats. Now it's near instantaneous. So the issue > > probably existed before, just unlikely to ever be reached. > > Um, this is the logical replication launcher, not the autovac > launcher. Short term confusion... > Your observation that a sleep in get_database_list() reproduces it > confirms that I don't understand what you mean here? get_database_list() is autovac launcher code? So being able to reproduce the problem by putting in a sleep there doesn't seem like a confirm anything about the logical rep launcher? > , and I don't entirely see why the timing of the LR launcher > would have changed. Could still be related to the autovac launcher not requesting / pgstats not writing / launcher not reading the stats file(s). That obviously is going to have some scheduler impact. > > We can't just ignore database less xmins for non-shared rels, because > > walsender propagates hot_standby_feedback that way. But we can probably add > > a flag somewhere indicating whether a database less PGPROC has to be > > accounted in the horizon for non-shared rels. > > Yeah, I was also thinking about a flag in PGPROC being a more reliable > way to do this. Is there anything besides walsenders that should set > that flag? Not that I can think of. It's only because of hs_feedback that we need to. I guess it's possible that somebody build some extension that needs something similar, but then they'd need to set that flag... Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-15 12:36:52 -0400, Tom Lane wrote: > Andres Freund writes: > > On April 15, 2022 11:23:40 AM EDT, Tom Lane wrote: > >> The something is the logical replication launcher. In the failing runs, > >> it is advertising xmin = 724 (the post-initdb NextXID) and continues to > >> do so well past the point where tenk1 gets vacuumed. > > > That explains it. Before shmstat autovac needed to wait for the stats > > collector to write out stats. Now it's near instantaneous. So the issue > > probably existed before, just unlikely to ever be reached. > > Um, this is the logical replication launcher, not the autovac > launcher. Short term confusion... > Your observation that a sleep in get_database_list() reproduces it > confirms that I don't understand what you mean here? get_database_list() is autovac launcher code? So being able to reproduce the problem by putting in a sleep there doesn't seem like a confirm anything about the logical rep launcher? > , and I don't entirely see why the timing of the LR launcher > would have changed. Could still be related to the autovac launcher not requesting / pgstats not writing / launcher not reading the stats file(s). That obviously is going to have some scheduler impact. > > We can't just ignore database less xmins for non-shared rels, because > > walsender propagates hot_standby_feedback that way. But we can probably add > > a flag somewhere indicating whether a database less PGPROC has to be > > accounted in the horizon for non-shared rels. > > Yeah, I was also thinking about a flag in PGPROC being a more reliable > way to do this. Is there anything besides walsenders that should set > that flag? Not that I can think of. It's only because of hs_feedback that we need to. I guess it's possible that somebody build some extension that needs something similar, but then they'd need to set that flag... Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
On Fri, Apr 15, 2022 at 9:40 AM Tom Lane wrote: > > Do you think that this juxtaposition works well? > > Seems all right to me; do you have a better suggestion? No. At first I thought that mixing "which is" and "which was" wasn't quite right. I changed my mind, though. Your new wording is fine. I'll update the log output some time today. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > On Fri, Apr 15, 2022 at 8:14 AM Tom Lane wrote: >> BTW, before I forget: the wording of this log message is just awful. >> [ so how about ] >> "removable cutoff: %u, which was %d xids old when operation ended\n" > How the output appears when placed right before the output describing > how VACUUM advanced relfrozenxid is an important consideration. I want > the format and wording that we use to imply a relationship between > these two things. Right now, that other line looks like this: > "new relfrozenxid: %u, which is %d xids ahead of previous value\n" > Do you think that this juxtaposition works well? Seems all right to me; do you have a better suggestion? regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Andres Freund writes: > On April 15, 2022 11:23:40 AM EDT, Tom Lane wrote: >> The something is the logical replication launcher. In the failing runs, >> it is advertising xmin = 724 (the post-initdb NextXID) and continues to >> do so well past the point where tenk1 gets vacuumed. > That explains it. Before shmstat autovac needed to wait for the stats > collector to write out stats. Now it's near instantaneous. So the issue > probably existed before, just unlikely to ever be reached. Um, this is the logical replication launcher, not the autovac launcher. Your observation that a sleep in get_database_list() reproduces it confirms that, and I don't entirely see why the timing of the LR launcher would have changed. (On thinking about it, I suppose the AV launcher might trigger this too, but that is not the PID I saw in testing.) > We can't just ignore database less xmins for non-shared rels, because > walsender propagates hot_standby_feedback that way. But we can probably add a > flag somewhere indicating whether a database less PGPROC has to be accounted > in the horizon for non-shared rels. Yeah, I was also thinking about a flag in PGPROC being a more reliable way to do this. Is there anything besides walsenders that should set that flag? regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Fri, Apr 15, 2022 at 8:14 AM Tom Lane wrote: > BTW, before I forget: the wording of this log message is just awful. > On first sight, I thought that it meant that we'd computed OldestXmin > a second time and discovered that it advanced by 26 xids while the VACUUM > was running. > "removable cutoff: %u, which was %d xids old when operation ended\n" How the output appears when placed right before the output describing how VACUUM advanced relfrozenxid is an important consideration. I want the format and wording that we use to imply a relationship between these two things. Right now, that other line looks like this: "new relfrozenxid: %u, which is %d xids ahead of previous value\n" Do you think that this juxtaposition works well? > Also, is it really our practice to spell XID in lower-case in > user-facing messages? There are examples of both. This could easily be changed to "XIDs". -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Hi, On April 15, 2022 11:23:40 AM EDT, Tom Lane wrote: >I wrote: >> So there's no longer any doubt that something is holding back OldestXmin. >> I will go put some instrumentation into the code that's computing that. > >The something is the logical replication launcher. In the failing runs, >it is advertising xmin = 724 (the post-initdb NextXID) and continues to >do so well past the point where tenk1 gets vacuumed. > >Discuss. That explains it. Before shmstat autovac needed to wait for the stats collector to write out stats. Now it's near instantaneous. So the issue probably existed before, just unlikely to ever be reached. We can't just ignore database less xmins for non-shared rels, because walsender propagates hot_standby_feedback that way. But we can probably add a flag somewhere indicating whether a database less PGPROC has to be accounted in the horizon for non-shared rels. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Intermittent buildfarm failures on wrasse
Andres Freund writes: > On 2022-04-15 10:15:32 -0400, Tom Lane wrote: >> removable cutoff: 724, older by 26 xids when operation ended > The horizon advancing by 26 xids during tenk1's vacuum seems like quite > a bit, given there's no normal concurrent activity during test_setup. Hah, so you were taken in by this wording too. See my complaint about it downthread. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Andres Freund writes: > Off for a bit, but I realized that we likely don't exclude the launcher > because it's not database associated... Yeah. I think this bit in ComputeXidHorizons needs rethinking: /* * Normally queries in other databases are ignored for anything but * the shared horizon. ... */ if (in_recovery || MyDatabaseId == InvalidOid || proc->databaseId == MyDatabaseId || proc->databaseId == 0)/* always include WalSender */ { The "proc->databaseId == 0" business apparently means to include only walsender processes, and it's broken because that condition doesn't include only walsender processes. At this point we have the following conclusions: 1. A slow transaction in the launcher's initial get_database_list() call fully explains these failures. (I had been thinking that the launcher's xact would have to persist as far as the create_index script, but that's not so: it only has to last until test_setup begins vacuuming tenk1. The CREATE INDEX steps are not doing any visibility map changes of their own, but what they are doing is updating relallvisible from the results of visibilitymap_count(). That's why they undid the effects of manually poking relallvisible, without actually inserting any data better than what the initial VACUUM computed.) 2. We can probably explain why only wrasse sees this as some quirk of the Solaris scheduler. I'm satisfied to blame it-happens-in- installcheck-but-not-check on that too. 3. It remains unclear why we suddenly started seeing this last week. I suppose it has to be a side-effect of the pgstats changes, but the mechanism is obscure. Probably not worth the effort to pin down exactly why. As for fixing it, what I think would be the preferable answer is to fix the above-quoted logic so that it indeed includes only walsenders and not random other background workers. (Why does it need to include walsenders, anyway? The commentary sucks.) Alternatively, or perhaps also, we could do what was discussed previously and make a hack to allow delaying vacuum until the system is quiescent. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
I wrote: > So there's no longer any doubt that something is holding back OldestXmin. > I will go put some instrumentation into the code that's computing that. The something is the logical replication launcher. In the failing runs, it is advertising xmin = 724 (the post-initdb NextXID) and continues to do so well past the point where tenk1 gets vacuumed. Discuss. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Hi, On April 15, 2022 11:12:10 AM EDT, Andres Freund wrote: >Hi, > >On 2022-04-15 10:15:32 -0400, Tom Lane wrote: >> The morning's first result is that during a failing run, >> the vacuum in test_setup sees >> >> 2022-04-15 16:01:43.064 CEST [4436:75] pg_regress/test_setup LOG: >> statement: VACUUM ANALYZE tenk1; >> 2022-04-15 16:01:43.064 CEST [4436:76] pg_regress/test_setup LOG: vacuuming >> "regression.public.tenk1" >> 2022-04-15 16:01:43.064 CEST [4436:77] pg_regress/test_setup STATEMENT: >> VACUUM ANALYZE tenk1; >> 2022-04-15 16:01:43.071 CEST [4436:78] pg_regress/test_setup LOG: finished >> vacuuming "regression.public.tenk1": index scans: 0 >> pages: 0 removed, 345 remain, 345 scanned (100.00% of total) >> tuples: 0 removed, 1 remain, 0 are dead but not yet removable >> removable cutoff: 724, older by 26 xids when operation ended >> index scan not needed: 0 pages from table (0.00% of total) had 0 >> dead item identifiers removed >> avg read rate: 2.189 MB/s, avg write rate: 2.189 MB/s >> buffer usage: 695 hits, 2 misses, 2 dirtied >> WAL usage: 1 records, 0 full page images, 188 bytes >> system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s >> 2022-04-15 16:01:43.071 CEST [4436:79] pg_regress/test_setup STATEMENT: >> VACUUM ANALYZE tenk1; > >The horizon advancing by 26 xids during tenk1's vacuum seems like quite >a bit, given there's no normal concurrent activity during test_setup. > > >> In fact, right after initdb pg_controldata shows >> Latest checkpoint's NextXID: 0:724 >> Latest checkpoint's oldestXID:716 > >So that's the xmin that e.g. the autovac launcher ends up with during >start... > >If I make get_database_list() sleep for 5s within the scan, I can >reproduce on x86-64. Off for a bit, but I realized that we likely don't exclude the launcher because it's not database associated... -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Intermittent buildfarm failures on wrasse
I wrote: > the vacuum in test_setup sees > ... > removable cutoff: 724, older by 26 xids when operation ended > ... BTW, before I forget: the wording of this log message is just awful. On first sight, I thought that it meant that we'd computed OldestXmin a second time and discovered that it advanced by 26 xids while the VACUUM was running. Looking at the code, I see that's not so: diff = (int32) (ReadNextTransactionId() - OldestXmin); appendStringInfo(, _("removable cutoff: %u, older by %d xids when operation ended\n"), OldestXmin, diff); but good luck understanding what it actually means from the message text alone. I think more appropriate wording would be something like "removable cutoff: %u, which was %d xids old when operation ended\n" Also, is it really our practice to spell XID in lower-case in user-facing messages? Thoughts, better ideas? regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-15 10:15:32 -0400, Tom Lane wrote: > The morning's first result is that during a failing run, > the vacuum in test_setup sees > > 2022-04-15 16:01:43.064 CEST [4436:75] pg_regress/test_setup LOG: statement: > VACUUM ANALYZE tenk1; > 2022-04-15 16:01:43.064 CEST [4436:76] pg_regress/test_setup LOG: vacuuming > "regression.public.tenk1" > 2022-04-15 16:01:43.064 CEST [4436:77] pg_regress/test_setup STATEMENT: > VACUUM ANALYZE tenk1; > 2022-04-15 16:01:43.071 CEST [4436:78] pg_regress/test_setup LOG: finished > vacuuming "regression.public.tenk1": index scans: 0 > pages: 0 removed, 345 remain, 345 scanned (100.00% of total) > tuples: 0 removed, 1 remain, 0 are dead but not yet removable > removable cutoff: 724, older by 26 xids when operation ended > index scan not needed: 0 pages from table (0.00% of total) had 0 dead > item identifiers removed > avg read rate: 2.189 MB/s, avg write rate: 2.189 MB/s > buffer usage: 695 hits, 2 misses, 2 dirtied > WAL usage: 1 records, 0 full page images, 188 bytes > system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s > 2022-04-15 16:01:43.071 CEST [4436:79] pg_regress/test_setup STATEMENT: > VACUUM ANALYZE tenk1; The horizon advancing by 26 xids during tenk1's vacuum seems like quite a bit, given there's no normal concurrent activity during test_setup. > In fact, right after initdb pg_controldata shows > Latest checkpoint's NextXID: 0:724 > Latest checkpoint's oldestXID:716 So that's the xmin that e.g. the autovac launcher ends up with during start... If I make get_database_list() sleep for 5s within the scan, I can reproduce on x86-64. Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
The morning's first result is that during a failing run, the vacuum in test_setup sees 2022-04-15 16:01:43.064 CEST [4436:75] pg_regress/test_setup LOG: statement: VACUUM ANALYZE tenk1; 2022-04-15 16:01:43.064 CEST [4436:76] pg_regress/test_setup LOG: vacuuming "regression.public.tenk1" 2022-04-15 16:01:43.064 CEST [4436:77] pg_regress/test_setup STATEMENT: VACUUM ANALYZE tenk1; 2022-04-15 16:01:43.071 CEST [4436:78] pg_regress/test_setup LOG: finished vacuuming "regression.public.tenk1": index scans: 0 pages: 0 removed, 345 remain, 345 scanned (100.00% of total) tuples: 0 removed, 1 remain, 0 are dead but not yet removable removable cutoff: 724, older by 26 xids when operation ended index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 2.189 MB/s, avg write rate: 2.189 MB/s buffer usage: 695 hits, 2 misses, 2 dirtied WAL usage: 1 records, 0 full page images, 188 bytes system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2022-04-15 16:01:43.071 CEST [4436:79] pg_regress/test_setup STATEMENT: VACUUM ANALYZE tenk1; OldestXmin = 724 is too old to consider tenk1's contents as all-visible: regression=# select distinct xmin from tenk1; xmin -- 749 (1 row) In fact, right after initdb pg_controldata shows Latest checkpoint's NextXID: 0:724 Latest checkpoint's oldestXID:716 So there's no longer any doubt that something is holding back OldestXmin. I will go put some instrumentation into the code that's computing that. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 10:12:05PM -0700, Andres Freund wrote: > On 2022-04-14 19:45:15 -0700, Noah Misch wrote: > > I suspect the failure is somehow impossible in "check". Yesterday, I > > cranked > > up the number of locales, so there are now a lot more installcheck. Before > > that, each farm run had one "check" and two "installcheck". Those days saw > > ten installcheck failures, zero check failures. > > I notice that the buildfarm appears to run initdb with syncing enabled > ("syncing data to disk ... ok" in the initdb steps). Whereas pg_regress > uses --no-sync. Yep. > I wonder if that's what makes the difference? Now that you reproduced > it, does it still reproduce with --no-sync added? It does; the last version of my script used "initdb -N ...". > Also worth noting that pg_regress doesn't go through pg_ctl... Hmmm.
Re: Intermittent buildfarm failures on wrasse
Andres Freund writes: > On 2022-04-14 23:56:15 -0400, Tom Lane wrote: >> Well, damn. I changed my script that way and it failed on the tenth >> iteration (versus a couple hundred successful iterations the other >> way). > Just to make sure: This is also on wrasse? Right, gcc211 with a moderately close approximation to wrasse's build details. Why that shows the problem when we've not seen it elsewhere remains to be seen. > What DSM backend do we end up with on solaris? With shared memory stats > we're using DSM a lot earlier and more commonly than before. That ... is an interesting point. It seems to be just "posix" though. >> So somehow this is related to time-since-initdb, not >> time-since-postmaster-start. Any ideas? > Perhaps it makes a difference that we start with a "young" database xid > age wise? We've had bugs around subtracting xids and ending up on some > special one in the past. It does seem like it's got to be related to small XID and/or small LSN values. No clue right now, but more news tomorrow, I hope. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-14 19:45:15 -0700, Noah Misch wrote: > I suspect the failure is somehow impossible in "check". Yesterday, I cranked > up the number of locales, so there are now a lot more installcheck. Before > that, each farm run had one "check" and two "installcheck". Those days saw > ten installcheck failures, zero check failures. I notice that the buildfarm appears to run initdb with syncing enabled ("syncing data to disk ... ok" in the initdb steps). Whereas pg_regress uses --no-sync. I wonder if that's what makes the difference? Now that you reproduced it, does it still reproduce with --no-sync added? Also worth noting that pg_regress doesn't go through pg_ctl... Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-14 23:56:15 -0400, Tom Lane wrote: > I wrote: > > One thing I'm eyeing now is that it looks like Noah is re-initdb'ing > > each time, whereas I'd just stopped and started the postmaster of > > an existing installation. That does not seem like it could matter > > but ... > > Well, damn. I changed my script that way and it failed on the tenth > iteration (versus a couple hundred successful iterations the other > way). Just to make sure: This is also on wrasse? What DSM backend do we end up with on solaris? With shared memory stats we're using DSM a lot earlier and more commonly than before. Another thing that might be worth trying is to enable checksums. I've caught weird bugs with that in the past. And it's possible that bgwriter writes out a page that we then read back in quickly after, or something like that. > So somehow this is related to time-since-initdb, not > time-since-postmaster-start. Any ideas? Perhaps it makes a difference that we start with a "young" database xid age wise? We've had bugs around subtracting xids and ending up on some special one in the past. Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 11:56:15PM -0400, Tom Lane wrote: > Anyway, I'm too tired to do more tonight, but now that I can reproduce it > I will stick some debugging logic in tomorrow. I no longer think we > should clutter the git repo with any more short-term hacks. Sounds good. I've turned off the wrasse buildfarm client for the moment. It's less useful than your local setup, and they'd compete for resources.
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-14 22:40:51 -0400, Tom Lane wrote: > Peter Geoghegan writes: > > Suppose that the bug was actually in 06f5295af6, "Add single-item > > cache when looking at topmost XID of a subtrans XID". Doesn't that fit > > your timeline just as well? > > I'd dismissed that on the grounds that there are no subtrans XIDs > involved in tenk1's contents. However, if that patch was faulty > enough, maybe it affected other cases besides the advertised one? > I've not read it. I was planning to complain about that commit, fwiw. Without so much as an assertion verifying the cache is correct it seems quite dangerous to me. And looking at it, it has obvious wraparound issues... But that can't matter here, obviously. We also reach SubTransGetTopmostTransaction() from XactLockTableWait() but I don't quite see how we reach that here either... Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
I wrote: > One thing I'm eyeing now is that it looks like Noah is re-initdb'ing > each time, whereas I'd just stopped and started the postmaster of > an existing installation. That does not seem like it could matter > but ... Well, damn. I changed my script that way and it failed on the tenth iteration (versus a couple hundred successful iterations the other way). So somehow this is related to time-since-initdb, not time-since-postmaster-start. Any ideas? Anyway, I'm too tired to do more tonight, but now that I can reproduce it I will stick some debugging logic in tomorrow. I no longer think we should clutter the git repo with any more short-term hacks. regards, tom lane PS a bit later: I've not yet reproduced it a second time, so the failure rate is unfortunately a lot less than one-in-ten. Still, this eliminates the idea that there's some secret sauce in Noah's build details.
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 11:06:04PM -0400, Tom Lane wrote: > Noah Misch writes: > > But 24s after that email, it did reproduce the problem. > > Ain't that always the way. Quite so. > > Same symptoms as the > > last buildfarm runs, including visfrac=0. I'm attaching my script. (It has > > various references to my home directory, so it's not self-contained.) > > That's interesting, because I see you used installcheck-parallel, > which I'd not been using on the grounds that wrasse isn't parallelizing > these runs. That puts a big hole in my working assumption that the > problem is one of timing. With "make installcheck-tests TESTS='test_setup create_index'" it remains reproducible. The attached script reproduced it in 103s and then in 703s. > I'm now suspecting that the key issue is > something about how wrasse is building the executables that I did > not exactly reproduce. I did not try to copy the build details > involving stuff under your home directory (like the private openldap > version), mainly because it didn't seem like openldap or uuid or > perl could be involved at all. But maybe somehow? Can't rule it out entirely. I think I've now put world-read on all the directories referenced in the script, in the event you'd like to use them. repro-wrasse-vis.sh Description: Bourne shell script
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > Well, Noah is running wrasse with 'fsync = off'. And did so in the > script as well. As am I. I duplicated wrasse's config to the extent of cat >>$PGDATA/postgresql.conf <
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 7:54 PM Tom Lane wrote: > This is far from the first time that I've failed to reproduce a buildfarm > result manually, even on the very machine hosting the animal. I would > like to identify the cause(s) of that. One obvious theory is that the > environment under a cron job is different --- but the only thing I know > of that should be different is possibly nice'ing the job priorities. > I did try a fair number of test cycles under "nice" in this case. > Anybody have other ideas? Well, Noah is running wrasse with 'fsync = off'. And did so in the script as well. That seems like it definitely could matter. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Noah Misch writes: > But 24s after that email, it did reproduce the problem. Ain't that always the way. > Same symptoms as the > last buildfarm runs, including visfrac=0. I'm attaching my script. (It has > various references to my home directory, so it's not self-contained.) That's interesting, because I see you used installcheck-parallel, which I'd not been using on the grounds that wrasse isn't parallelizing these runs. That puts a big hole in my working assumption that the problem is one of timing. I'm now suspecting that the key issue is something about how wrasse is building the executables that I did not exactly reproduce. I did not try to copy the build details involving stuff under your home directory (like the private openldap version), mainly because it didn't seem like openldap or uuid or perl could be involved at all. But maybe somehow? regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Noah Misch writes: > Like Tom, I'm failing to reproduce this outside the buildfarm client. This is far from the first time that I've failed to reproduce a buildfarm result manually, even on the very machine hosting the animal. I would like to identify the cause(s) of that. One obvious theory is that the environment under a cron job is different --- but the only thing I know of that should be different is possibly nice'ing the job priorities. I did try a fair number of test cycles under "nice" in this case. Anybody have other ideas? regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 07:45:15PM -0700, Noah Misch wrote: > On Thu, Apr 14, 2022 at 06:52:49PM -0700, Andres Freund wrote: > > On 2022-04-14 21:32:27 -0400, Tom Lane wrote: > > > Peter Geoghegan writes: > > > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly > > > > the same set of symptoms (the same index-only scan create_index > > > > regressions), > > > > > > Hm. I'm starting to get the feeling that the real problem here is > > > we've "optimized" the system to the point where repeatable results > > > from VACUUM are impossible :-( > > > > The synchronous_commit issue is an old one. It might actually be worth > > addressing it by flushing out pending async commits out instead. It just > > started to be noticeable when tenk1 load and vacuum were moved closer. > > > > > > What do you think about applying a polished version of what I posted in > > https://postgr.es/m/20220414164830.63rk5zqsvtqqk7qz%40alap3.anarazel.de > > ? That'd tell us a bit more about the horizon etc. > > No objection. > > > It's also interesting that it only happens in the installcheck cases, > > afaics, not the check ones. Although that might just be because there's > > more of them... > > I suspect the failure is somehow impossible in "check". Yesterday, I cranked > up the number of locales, so there are now a lot more installcheck. Before > that, each farm run had one "check" and two "installcheck". Those days saw > ten installcheck failures, zero check failures. > > Like Tom, I'm failing to reproduce this outside the buildfarm client. I wrote > a shell script to closely resemble the buildfarm installcheck sequence, but > it's lasted a dozen runs without failing. But 24s after that email, it did reproduce the problem. Same symptoms as the last buildfarm runs, including visfrac=0. I'm attaching my script. (It has various references to my home directory, so it's not self-contained.) repro-wrasse-vis.sh Description: Bourne shell script
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 06:52:49PM -0700, Andres Freund wrote: > On 2022-04-14 21:32:27 -0400, Tom Lane wrote: > > Peter Geoghegan writes: > > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly > > > the same set of symptoms (the same index-only scan create_index > > > regressions), > > > > Hm. I'm starting to get the feeling that the real problem here is > > we've "optimized" the system to the point where repeatable results > > from VACUUM are impossible :-( > > The synchronous_commit issue is an old one. It might actually be worth > addressing it by flushing out pending async commits out instead. It just > started to be noticeable when tenk1 load and vacuum were moved closer. > > > What do you think about applying a polished version of what I posted in > https://postgr.es/m/20220414164830.63rk5zqsvtqqk7qz%40alap3.anarazel.de > ? That'd tell us a bit more about the horizon etc. No objection. > It's also interesting that it only happens in the installcheck cases, > afaics, not the check ones. Although that might just be because there's > more of them... I suspect the failure is somehow impossible in "check". Yesterday, I cranked up the number of locales, so there are now a lot more installcheck. Before that, each farm run had one "check" and two "installcheck". Those days saw ten installcheck failures, zero check failures. Like Tom, I'm failing to reproduce this outside the buildfarm client. I wrote a shell script to closely resemble the buildfarm installcheck sequence, but it's lasted a dozen runs without failing.
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > Suppose that the bug was actually in 06f5295af6, "Add single-item > cache when looking at topmost XID of a subtrans XID". Doesn't that fit > your timeline just as well? I'd dismissed that on the grounds that there are no subtrans XIDs involved in tenk1's contents. However, if that patch was faulty enough, maybe it affected other cases besides the advertised one? I've not read it. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 7:20 PM Tom Lane wrote: > Oh! You mean that maybe the OldestXmin horizon was fine, but something > decided not to update hint bits (and therefore also not the all-visible > bit) anyway? Worth investigating I guess. Yes. That is starting to seem like a plausible alternative explanation. > > I'd really like to know what the removable cutoff > > looks like for these VACUUM operations, which is something like > > Andres' VACUUM VERBOSE debug patch should tell us. > > Yeah. I'd hoped to investigate this manually and not have to clutter > the main repo with debugging commits. Suppose that the bug was actually in 06f5295af6, "Add single-item cache when looking at topmost XID of a subtrans XID". Doesn't that fit your timeline just as well? I haven't really started to investigate that theory (just putting dinner on here). Just a wild guess at this point. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > That was the intent, but that in itself doesn't mean that it isn't > something to do with setting hint bits (not the OldestXmin horizon > being held back). Oh! You mean that maybe the OldestXmin horizon was fine, but something decided not to update hint bits (and therefore also not the all-visible bit) anyway? Worth investigating I guess. > I'd really like to know what the removable cutoff > looks like for these VACUUM operations, which is something like > Andres' VACUUM VERBOSE debug patch should tell us. Yeah. I'd hoped to investigate this manually and not have to clutter the main repo with debugging commits. However, since I've still utterly failed to reproduce the problem on wrasse's host, I think we're going to be forced to do that. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 6:53 PM Tom Lane wrote: > That band-aid only addressed the situation of someone having turned > off synchronous_commit in the first place; which is not the case > on wrasse or most/all other buildfarm animals. Whatever we're > dealing with here is something independent of that. That was the intent, but that in itself doesn't mean that it isn't something to do with setting hint bits (not the OldestXmin horizon being held back). I'd really like to know what the removable cutoff looks like for these VACUUM operations, which is something like Andres' VACUUM VERBOSE debug patch should tell us. > > Is there any patch that could plausibly have had that effect, whose > > commit fits with our timeline for the problems seen on wrasse? > > I already enumerated my suspects, back at the top of this thread. Right, but I thought that the syncronous_commit thing was new information that made that worth revisiting. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > Anyway, I suppose it's possible that problems reappeared here due to > some other patch. Something else could have broken Andres' earlier > band aid solution (which was to set synchronous_commit=on in > test_setup). That band-aid only addressed the situation of someone having turned off synchronous_commit in the first place; which is not the case on wrasse or most/all other buildfarm animals. Whatever we're dealing with here is something independent of that. > Is there any patch that could plausibly have had that effect, whose > commit fits with our timeline for the problems seen on wrasse? I already enumerated my suspects, back at the top of this thread. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-14 21:32:27 -0400, Tom Lane wrote: > Peter Geoghegan writes: > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly > > the same set of symptoms (the same index-only scan create_index > > regressions), > > Hm. I'm starting to get the feeling that the real problem here is > we've "optimized" the system to the point where repeatable results > from VACUUM are impossible :-( The synchronous_commit issue is an old one. It might actually be worth addressing it by flushing out pending async commits out instead. It just started to be noticeable when tenk1 load and vacuum were moved closer. What do you think about applying a polished version of what I posted in https://postgr.es/m/20220414164830.63rk5zqsvtqqk7qz%40alap3.anarazel.de ? That'd tell us a bit more about the horizon etc. It doesn't have to be the autovacuum launcher. I think it shouldn't even be taken into account - it's not database connected, and tenk1 isn't a shared relation. All very odd. It's also interesting that it only happens in the installcheck cases, afaics, not the check ones. Although that might just be because there's more of them... Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 6:32 PM Tom Lane wrote: > Hm. I'm starting to get the feeling that the real problem here is > we've "optimized" the system to the point where repeatable results > from VACUUM are impossible :-( I don't think that there is any fundamental reason why VACUUM cannot have repeatable results. Anyway, I suppose it's possible that problems reappeared here due to some other patch. Something else could have broken Andres' earlier band aid solution (which was to set synchronous_commit=on in test_setup). Is there any patch that could plausibly have had that effect, whose commit fits with our timeline for the problems seen on wrasse? -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > Are you aware of Andres' commit 02fea8fd? That work prevented exactly > the same set of symptoms (the same index-only scan create_index > regressions), Hm. I'm starting to get the feeling that the real problem here is we've "optimized" the system to the point where repeatable results from VACUUM are impossible :-( regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 3:28 PM Peter Geoghegan wrote: > A bunch of autovacuums that ran between "2022-04-14 22:49:16.274" and > "2022-04-14 22:49:19.088" all have the same "removable cutoff". Are you aware of Andres' commit 02fea8fd? That work prevented exactly the same set of symptoms (the same index-only scan create_index regressions), which was apparently necessary following the rearrangements to the regression tests to remove cross-script dependencies (Tom's commit cc50080a82). This was the thread that led to Andres' commit, which was just over a month ago: https://www.postgresql.org/message-id/flat/caj7c6tpjnof1q+vjsy3qebgbpgxdu2erpvykbdhd6_ckv5e...@mail.gmail.com -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 3:23 PM Tom Lane wrote: > If we captured equivalent output from the manual VACUUM in test_setup, > maybe something could be learned. However, it seems virtually certain > to me that the problematic xmin is in some background process > (eg autovac launcher) and thus wouldn't show up in the postmaster log, > log_line_prefix or no. A bunch of autovacuums that ran between "2022-04-14 22:49:16.274" and "2022-04-14 22:49:19.088" all have the same "removable cutoff". The logs from this time show a period of around three seconds (likely more) where something held back OldestXmin generally. That does seem a bit fishy to me, even though it happened about a minute after the failure itself took place. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > Have you looked at the autovacuum log output in more detail? I don't think there's anything to be learned there. The first autovacuum in wrasse's log happens long after things went south: 2022-04-14 22:49:15.177 CEST [9427:1] LOG: automatic vacuum of table "regression.pg_catalog.pg_type": index scans: 1 pages: 0 removed, 49 remain, 49 scanned (100.00% of total) tuples: 539 removed, 1112 remain, 0 are dead but not yet removable removable cutoff: 8915, older by 1 xids when operation ended index scan needed: 34 pages from table (69.39% of total) had 1107 dead item identifiers removed index "pg_type_oid_index": pages: 14 in total, 0 newly deleted, 0 currently deleted, 0 reusable index "pg_type_typname_nsp_index": pages: 13 in total, 0 newly deleted, 0 currently deleted, 0 reusable avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s buffer usage: 193 hits, 0 misses, 0 dirtied WAL usage: 116 records, 0 full page images, 14113 bytes system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s If we captured equivalent output from the manual VACUUM in test_setup, maybe something could be learned. However, it seems virtually certain to me that the problematic xmin is in some background process (eg autovac launcher) and thus wouldn't show up in the postmaster log, log_line_prefix or no. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 2:33 PM Tom Lane wrote: > Meanwhile, wrasse did fail with my relallvisible check in place [1], > and what that shows is that relallvisible is *zero* to start with > and remains so throughout the CREATE INDEX sequence. That pretty > definitively proves that it's not a page-skipping problem but > an xmin-horizon-too-old problem. We're no closer to understanding > where that horizon value is coming from, though. Have you looked at the autovacuum log output in more detail? It might be possible to debug further, but looks like there are no XIDs to work off of in the log_line_prefix that's in use on wrasse. The CITester log_line_prefix is pretty useful -- I wonder if we can standardize on that within the buildfarm, too. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
I wroteL > Andres Freund writes: >> Thanks! Can you repro the problem manually on wrasse, perhaps even >> outside the buildfarm script? > I'm working on that right now, actually... So far, reproducing it manually has been a miserable failure: I've run about 180 cycles of the core regression tests with no error. Not sure what's different between my test scenario and wrasse's. Meanwhile, wrasse did fail with my relallvisible check in place [1], and what that shows is that relallvisible is *zero* to start with and remains so throughout the CREATE INDEX sequence. That pretty definitively proves that it's not a page-skipping problem but an xmin-horizon-too-old problem. We're no closer to understanding where that horizon value is coming from, though. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-04-14%2019%3A28%3A12
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 10:07 AM Peter Geoghegan wrote: > It looks like you're changing the elevel convention for these "extra" > messages with this patch. That might be fine, but don't forget about > similar ereports() in vacuumparallel.c. I think that the elevel should > probably remain uniform across all of these messages. Though I don't > particular care if it's DEBUG2 or DEBUG5. Also, don't forget to do something here, with the assertion and with the message: if (verbose) { /* * Aggressiveness already reported earlier, in dedicated * VACUUM VERBOSE ereport */ Assert(!params->is_wraparound); msgfmt = _("finished vacuuming \"%s.%s.%s\": index scans: %d\n"); } else if (params->is_wraparound) { Presumably we will need to report on antiwraparound-ness in the verbose-debug-elevel-for-autovacuum case (and not allow this assertion to fail). -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 9:48 AM Andres Freund wrote: > I think it might be worth leaving in, but let's debate that separately? > I'm thinking of something like the attached. The current convention for the "extra" ereport()s that VACUUM VERBOSE outputs at INFO elevel is to use DEBUG2 elevel in all other cases (these extra messages are considered part of VACUUM VERBOSE output, but are *not* considered part of the autovacuum log output). It looks like you're changing the elevel convention for these "extra" messages with this patch. That might be fine, but don't forget about similar ereports() in vacuumparallel.c. I think that the elevel should probably remain uniform across all of these messages. Though I don't particular care if it's DEBUG2 or DEBUG5. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-14 12:26:20 -0400, Tom Lane wrote: > Andres Freund writes: > > Thanks! Can you repro the problem manually on wrasse, perhaps even > > outside the buildfarm script? Ah, cool. > I'm working on that right now, actually... > > > I wonder if we should make VACUUM log the VERBOSE output at DEBUG1 > > unconditionally. This is like the third bug where we needed that > > information, and it's practically impossible to include in regression > > output. Then we'd know what the xid horizon is, whether pages were > > skipped, etc. > > Right at the moment it seems like we also need visibility into what > CREATE INDEX is doing. > I'm not sure I'd buy into permanent changes here (at least not ones made > in haste), but temporarily adding more logging seems perfectly reasonable. I think it might be worth leaving in, but let's debate that separately? I'm thinking of something like the attached. Greetings, Andres Freund diff --git i/src/backend/access/heap/vacuumlazy.c w/src/backend/access/heap/vacuumlazy.c index 788db569b2d..eaee473754d 100644 --- i/src/backend/access/heap/vacuumlazy.c +++ w/src/backend/access/heap/vacuumlazy.c @@ -186,6 +186,7 @@ typedef struct LVRelState OffsetNumber offnum; /* used only for heap operations */ VacErrPhase phase; bool verbose; /* VACUUM VERBOSE? */ + int elevel; /* * dead_items stores TIDs whose index tuples are deleted by index @@ -330,8 +331,24 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, WalUsage walusage_start = pgWalUsage; ErrorContextCallback errcallback; char **indnames = NULL; + int elevel; + + if (params->options & VACOPT_VERBOSE) + { + verbose = true; + elevel = INFO; + } + else if (message_level_is_interesting(DEBUG1)) + { + verbose = true; + elevel = DEBUG1; + } + else + { + verbose = false; + elevel = DEBUG5; + } - verbose = (params->options & VACOPT_VERBOSE) != 0; instrument = (verbose || (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)); if (instrument) @@ -392,20 +409,20 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->indname = NULL; vacrel->phase = VACUUM_ERRCB_PHASE_UNKNOWN; vacrel->verbose = verbose; + vacrel->elevel = elevel; errcallback.callback = vacuum_error_callback; errcallback.arg = vacrel; errcallback.previous = error_context_stack; error_context_stack = - if (verbose) + if (verbose && !IsAutoVacuumWorkerProcess()) { - Assert(!IsAutoVacuumWorkerProcess()); if (aggressive) - ereport(INFO, + ereport(elevel, (errmsg("aggressively vacuuming \"%s.%s.%s\"", get_database_name(MyDatabaseId), vacrel->relnamespace, vacrel->relname))); else - ereport(INFO, + ereport(elevel, (errmsg("vacuuming \"%s.%s.%s\"", get_database_name(MyDatabaseId), vacrel->relnamespace, vacrel->relname))); @@ -788,7 +805,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, (unsigned long long) walusage.wal_bytes); appendStringInfo(, _("system usage: %s"), pg_rusage_show()); - ereport(verbose ? INFO : LOG, + ereport(elevel, (errmsg_internal("%s", buf.data))); pfree(buf.data); } @@ -2944,7 +2961,7 @@ lazy_truncate_heap(LVRelState *vacrel) vacrel->removed_pages += orig_rel_pages - new_rel_pages; vacrel->rel_pages = new_rel_pages; - ereport(vacrel->verbose ? INFO : DEBUG2, + ereport(vacrel->elevel, (errmsg("table \"%s\": truncated %u to %u pages", vacrel->relname, orig_rel_pages, new_rel_pages))); @@ -3006,7 +3023,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected) { if (LockHasWaitersRelation(vacrel->rel, AccessExclusiveLock)) { - ereport(vacrel->verbose ? INFO : DEBUG2, + ereport(vacrel->elevel, (errmsg("table \"%s\": suspending truncate due to conflicting lock request", vacrel->relname)));
Re: Intermittent buildfarm failures on wrasse
Andres Freund writes: > Thanks! Can you repro the problem manually on wrasse, perhaps even > outside the buildfarm script? I'm working on that right now, actually... > I wonder if we should make VACUUM log the VERBOSE output at DEBUG1 > unconditionally. This is like the third bug where we needed that > information, and it's practically impossible to include in regression > output. Then we'd know what the xid horizon is, whether pages were > skipped, etc. Right at the moment it seems like we also need visibility into what CREATE INDEX is doing. I'm not sure I'd buy into permanent changes here (at least not ones made in haste), but temporarily adding more logging seems perfectly reasonable. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Thu, Apr 14, 2022 at 9:18 AM Andres Freund wrote: > I wonder if we should make VACUUM log the VERBOSE output at DEBUG1 > unconditionally. This is like the third bug where we needed that > information, and it's practically impossible to include in regression > output. Then we'd know what the xid horizon is, whether pages were > skipped, etc. I like the idea of making VACUUM log the VERBOSE output as a configurable user-visible feature. We'll then be able to log all VACUUM statements (not just autovacuum worker VACUUMs). -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-14 12:01:23 -0400, Tom Lane wrote: > Noah Misch writes: > > On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote: > >> Noah, any chance you could enable log_autovacuum_min_duration=0 on > >> wrasse? > > > Done. Also forced hourly builds. Thanks! Can you repro the problem manually on wrasse, perhaps even outside the buildfarm script? That might be simpler than debugging via the BF... > Thanks! We now have two failing runs with the additional info [1][2], > and in both, it's clear that the first autovac worker doesn't launch > until 1 minute after postmaster start, by which time we're long done > with the test scripts of interest. So whatever is breaking this is > not an autovac worker. I did some experiments around that too, and didn't find any related problems. For a second I was wondering if it's caused by the time of initdb (which ends up with a working pgstat snapshot now, but didn't before), but that's just a few more seconds. While the BF scripts don't show timestamps for initdb, the previous step's log output confirms that it's just a few seconds... > I think I'm going to temporarily add a couple of queries to check > what tenk1's relallvisible actually is, just so we can confirm > positively that that's what's causing the plan change. (I'm also > curious about whether the CREATE INDEX steps manage to change it > at all.) I wonder if we should make VACUUM log the VERBOSE output at DEBUG1 unconditionally. This is like the third bug where we needed that information, and it's practically impossible to include in regression output. Then we'd know what the xid horizon is, whether pages were skipped, etc. It also just generally seems like a good thing. Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
Noah Misch writes: > On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote: >> Noah, any chance you could enable log_autovacuum_min_duration=0 on >> wrasse? > Done. Also forced hourly builds. Thanks! We now have two failing runs with the additional info [1][2], and in both, it's clear that the first autovac worker doesn't launch until 1 minute after postmaster start, by which time we're long done with the test scripts of interest. So whatever is breaking this is not an autovac worker. I think I'm going to temporarily add a couple of queries to check what tenk1's relallvisible actually is, just so we can confirm positively that that's what's causing the plan change. (I'm also curious about whether the CREATE INDEX steps manage to change it at all.) regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-04-14%2013%3A28%3A14 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-04-14%2004%3A48%3A13
Re: Intermittent buildfarm failures on wrasse
On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote: > Noah, any chance you could enable log_autovacuum_min_duration=0 on > wrasse? Done. Also forced hourly builds.
Re: Intermittent buildfarm failures on wrasse
Andres Freund writes: > Noah, any chance you could enable log_autovacuum_min_duration=0 on > wrasse? +1 > Does sparc have wider alignment rules for some types? Perhaps that'd be > enough to put some tables to be sufficiently larger to trigger parallel > vacuum? No, the configure results on wrasse look pretty ordinary: checking size of void *... 8 checking size of size_t... 8 checking size of long... 8 checking alignment of short... 2 checking alignment of int... 4 checking alignment of long... 8 checking alignment of double... 8 I wondered for a moment about force_parallel_mode, but wrasse doesn't appear to be setting that, and in any case I'm pretty sure it only affects plannable statements. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Hi, Noah, any chance you could enable log_autovacuum_min_duration=0 on wrasse? On 2022-04-13 21:23:12 -0400, Tom Lane wrote: > I'm still suspicious of the pgstat changes, though. I checked into > things here by doing > > initdb > edit postgresql.conf to set log_autovacuum_min_duration = 0 > pg_ctl start && make installcheck-parallel > > and what I see is that the first reported autovacuum activity begins > exactly one minute after the postmaster starts, which is what I'd > expect given the autovacuum naptime rules. It'd not necessarily have to be autovacuum. A CREATE INDEX or VACUUM using parallelism, could also cause this, I think. It'd be a narrow window, of course... Does sparc have wider alignment rules for some types? Perhaps that'd be enough to put some tables to be sufficiently larger to trigger parallel vacuum? Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
Andres Freund writes: > On 2022-04-13 20:35:50 -0400, Tom Lane wrote: >> It seems like a SQL-accessible function could be written >> and then called before any problematic VACUUM. I like this better >> for something we're thinking of jamming in post-feature-freeze; >> we'd not be committing to the feature quite as much as if we >> added a VACUUM option. > We could otherwise just disable IOS for that query, for now. The entire point of that test case is to verify the shape of the IOS plan, so no that's not an acceptable answer. But if we're looking for quick hacks, we could do update pg_class set relallvisible = relpages where relname = 'tenk1'; just before that test. I'm still suspicious of the pgstat changes, though. I checked into things here by doing initdb edit postgresql.conf to set log_autovacuum_min_duration = 0 pg_ctl start && make installcheck-parallel and what I see is that the first reported autovacuum activity begins exactly one minute after the postmaster starts, which is what I'd expect given the autovacuum naptime rules. On my machine, of course, the installcheck-parallel run is long gone by then. But even on the much slower wrasse, we should be well past create_index by the time any autovac worker launches --- you can see from wrasse's reported test runtimes that only about 10 seconds have elapsed when it get to the end of create_index. This suggests to me that what is holding the (presumed) conflicting snapshot must be the autovac launcher, because what else could it be? So I'm suspicious that we broke something in that area, though I am baffled why only wrasse would be telling us so. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Wed, Apr 13, 2022 at 6:05 PM Andres Freund wrote: > I think most of those we've ended up replacing by using temp tables in > those tests instead, since they're not affected by the global horizon > anymore. Maybe, but it's a pain to have to work that way. You can't do it in cases like this, because a temp table is not workable. So that's not an ideal long term solution. > > We'd not necessarily have to embed wait-for-horizon into VACUUM > > itself. > > I'm not sure it'd be quite reliable outside of vacuum though, due to the > horizon potentially going backwards (in otherwise harmless ways)? I agree, since vacuumlazy.c would need to either be given its own OldestXmin, or knowledge of a wait-up-to XID. Either way we have to make non-trivial changes to vacuumlazy.c. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
On Wed, Apr 13, 2022 at 6:03 PM Peter Geoghegan wrote: > I think that it's more likely that FREEZE will correct problems, out of the > two: > > * FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a > cutoff value as possible (FreezeLimit will be equal to OldestXmin). The reason why that might have helped (at least in the past) is that it's enough to force us to wait for a cleanup lock to prune and freeze, if necessary. Which was never something that DISABLE_PAGE_SKIPPING could do. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-13 20:35:50 -0400, Tom Lane wrote: > Peter Geoghegan writes: > > On Wed, Apr 13, 2022 at 4:51 PM Tom Lane wrote: > >> Yeah, we have band-aided around this type of problem repeatedly. > >> Making a fix that's readily accessible from any test script > >> seems like a good idea. > > > We might even be able to consistently rely on this new option, given > > *any* problem involving test stability and VACUUM. Having a > > one-size-fits-all solution to these kinds of stability problems would > > be nice -- no more DISABLE_PAGE_SKIPPING bandaids. > > My guess is that you'd need both this new wait-for-horizon behavior > *and* DISABLE_PAGE_SKIPPING. But the two together ought to make > for pretty reproducible behavior. I noticed while scanning the > commit log that some patches have tried adding a FREEZE option, > which seems more like waving a dead chicken than anything that'd > improve stability. I think most of those we've ended up replacing by using temp tables in those tests instead, since they're not affected by the global horizon anymore. > We'd not necessarily have to embed wait-for-horizon into VACUUM > itself. I'm not sure it'd be quite reliable outside of vacuum though, due to the horizon potentially going backwards (in otherwise harmless ways)? > It seems like a SQL-accessible function could be written > and then called before any problematic VACUUM. I like this better > for something we're thinking of jamming in post-feature-freeze; > we'd not be committing to the feature quite as much as if we > added a VACUUM option. We could otherwise just disable IOS for that query, for now. Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
On Wed, Apr 13, 2022 at 5:35 PM Tom Lane wrote: > My guess is that you'd need both this new wait-for-horizon behavior > *and* DISABLE_PAGE_SKIPPING. But the two together ought to make > for pretty reproducible behavior. I noticed while scanning the > commit log that some patches have tried adding a FREEZE option, > which seems more like waving a dead chicken than anything that'd > improve stability. I think that it's more likely that FREEZE will correct problems, out of the two: * FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a cutoff value as possible (FreezeLimit will be equal to OldestXmin). * DISABLE_PAGE_SKIPPING also forces an aggressive VACUUM. But unlike FREEZE it makes VACUUM not use the visibility map, even in the case of all-frozen pages. And it changes nothing about FreezeLimit. It's also a certainty that VACUUM(FREEZE, DISABLE_PAGE_SKIPPING) is not a sensible remedy for any problem with test stability, but there are still some examples of that combination in the regression tests. The only way it could make sense is if the visibility map might be corrupt, but surely we're not expecting that anyway (and if we were we'd be testing it more directly). I recently argued that DISABLE_PAGE_SKIPPING should have nothing to do with aggressive vacuuming -- that should all be left up to VACUUM FREEZE. It seems more logical to make DISABLE_PAGE_SKIPPING mean "don't use the visibility map to skip anything", without bringing aggressiveness into it at all. That would be less confusing. > We'd not necessarily have to embed wait-for-horizon into VACUUM > itself. It seems like a SQL-accessible function could be written > and then called before any problematic VACUUM. I like this better > for something we're thinking of jamming in post-feature-freeze; > we'd not be committing to the feature quite as much as if we > added a VACUUM option. Hmm. I would say that the feature has zero appeal to users anyway. Maybe it can and should be done through an SQL function for other reasons, though. Users already think that there are several different flavors of VACUUM, which isn't really true. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-13 18:54:06 -0400, Tom Lane wrote: > We used to have a rate limit on how often stats reports would be sent > to the collector, which'd ensure half a second or so delay before a > transaction's change counts became visible to the autovac daemon. Just for posterity: That's not actually gone. But what is gone is the rate limiting in autovacuum about requesting recent stats for a table / autovac seeing slightly older stats. Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > On Wed, Apr 13, 2022 at 4:51 PM Tom Lane wrote: >> Yeah, we have band-aided around this type of problem repeatedly. >> Making a fix that's readily accessible from any test script >> seems like a good idea. > We might even be able to consistently rely on this new option, given > *any* problem involving test stability and VACUUM. Having a > one-size-fits-all solution to these kinds of stability problems would > be nice -- no more DISABLE_PAGE_SKIPPING bandaids. My guess is that you'd need both this new wait-for-horizon behavior *and* DISABLE_PAGE_SKIPPING. But the two together ought to make for pretty reproducible behavior. I noticed while scanning the commit log that some patches have tried adding a FREEZE option, which seems more like waving a dead chicken than anything that'd improve stability. We'd not necessarily have to embed wait-for-horizon into VACUUM itself. It seems like a SQL-accessible function could be written and then called before any problematic VACUUM. I like this better for something we're thinking of jamming in post-feature-freeze; we'd not be committing to the feature quite as much as if we added a VACUUM option. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
Hi, On 2022-04-13 16:45:44 -0700, Peter Geoghegan wrote: > On Wed, Apr 13, 2022 at 4:38 PM Tom Lane wrote: > > So what seems to be happening on wrasse is that a background > > autovacuum (or really autoanalyze?) is preventing pages from > > being marked all-visible not only during test_setup.sql but > > also create_index.sql; but it's gone by the time sanity_check.sql > > runs. > > I agree that it would need to be an autoanalyze (due to the > PROC_IN_VACUUM optimization). That's not a realiable protection - the snapshot is established normally at first, only after a while we set PROC_IN_VACUUM... Greetings, Andres Freund
Re: Intermittent buildfarm failures on wrasse
On Wed, Apr 13, 2022 at 4:51 PM Tom Lane wrote: > Yeah, we have band-aided around this type of problem repeatedly. > Making a fix that's readily accessible from any test script > seems like a good idea. We might even be able to consistently rely on this new option, given *any* problem involving test stability and VACUUM. Having a one-size-fits-all solution to these kinds of stability problems would be nice -- no more DISABLE_PAGE_SKIPPING bandaids. That general approach will be possible provided an inability to acquire a cleanup lock during VACUUM (which can more or less occur at random in most environments) doesn't ever lead to unexpected test results. There is good reason to think that it might work out that way. Simulating problems with acquiring cleanup locks during VACUUM left me with the impression that that could really work: https://postgr.es/m/cah2-wzkib-qcsbmwrpzp0nxvrqexouts1d7tyshg_drkohe...@mail.gmail.com -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > On Wed, Apr 13, 2022 at 4:38 PM Tom Lane wrote: >> It seems like a reliable fix might require test_setup to wait >> for any background autovac to exit before it does its own >> vacuums. Ick. > This is hardly a new problem, really. I wonder if it's worth inventing > a comprehensive solution. Yeah, we have band-aided around this type of problem repeatedly. Making a fix that's readily accessible from any test script seems like a good idea. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Wed, Apr 13, 2022 at 4:38 PM Tom Lane wrote: > So what seems to be happening on wrasse is that a background > autovacuum (or really autoanalyze?) is preventing pages from > being marked all-visible not only during test_setup.sql but > also create_index.sql; but it's gone by the time sanity_check.sql > runs. I agree that it would need to be an autoanalyze (due to the PROC_IN_VACUUM optimization). > It seems like a reliable fix might require test_setup to wait > for any background autovac to exit before it does its own > vacuums. Ick. This is hardly a new problem, really. I wonder if it's worth inventing a comprehensive solution. Some kind of infrastructure that makes VACUUM establish a next XID up-front (by calling ReadNextTransactionId()), and then find a way to run with an OldestXmin that's >= the earleir "next" XID value. If necessary by waiting. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > On Wed, Apr 13, 2022 at 4:13 PM Andres Freund wrote: >> IIRC the problem in matter isn't skipped pages, but that the horizon simply >> isn't new enough to mark pages as all visible. > Sometimes OldestXmin can go backwards in VACUUM operations that are > run in close succession against the same table, due to activity from > other databases in the same cluster (perhaps other factors are > involved at times). I've been doing some testing locally by inserting commands to manually set tenk1's relallvisible to zero. I first did that in test_setup.sql ... and it had no effect whatsoever. Further experimentation showed that the "CREATE INDEX ON tenk1" steps in create_index.sql itself generally suffice to fix relallvisible; although if you force it back to zero after the last such command, you get the same plan diffs wrasse is showing. And you don't get any others, which I thought curious until I realized that sanity_check.sql's database-wide VACUUM offers yet another opportunity to heal the incorrect value. If you force it back to zero again after that, a bunch of later tests start to show plan differences, which is what I'd been expecting. So what seems to be happening on wrasse is that a background autovacuum (or really autoanalyze?) is preventing pages from being marked all-visible not only during test_setup.sql but also create_index.sql; but it's gone by the time sanity_check.sql runs. Which is odd in itself because not that much time elapses between create_index and sanity_check, certainly less than the time from test_setup to create_index. It seems like a reliable fix might require test_setup to wait for any background autovac to exit before it does its own vacuums. Ick. And we still lack an explanation of why this only now broke. I remain suspicious that pgstats is behaving unexpectedly. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Wed, Apr 13, 2022 at 4:13 PM Andres Freund wrote: > IIRC the problem in matter isn't skipped pages, but that the horizon simply > isn't new enough to mark pages as all visible. Sometimes OldestXmin can go backwards in VACUUM operations that are run in close succession against the same table, due to activity from other databases in the same cluster (perhaps other factors are involved at times). That's why the following assertion that I recently added to vacuumlazy.c will fail pretty quickly without the "vacrel->NewRelfrozenXid == OldestXmin" part of its test: Assert(vacrel->NewRelfrozenXid == OldestXmin || TransactionIdPrecedesOrEquals(aggressive ? FreezeLimit : vacrel->relfrozenxid, vacrel->NewRelfrozenXid)); If you remove "vacrel->NewRelfrozenXid == OldestXmin", and run the regression tests, the remaining assertion will fail quite easily. Though perhaps not with a serial "make check". -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
Hi, On April 13, 2022 7:06:33 PM EDT, David Rowley wrote: >On Thu, 14 Apr 2022 at 10:54, Tom Lane wrote: >> After a bit more navel-contemplation I see a way that the pgstats >> work could have changed timing in this area. We used to have a >> rate limit on how often stats reports would be sent to the >> collector, which'd ensure half a second or so delay before a >> transaction's change counts became visible to the autovac daemon. >> I've not looked at the new code, but I'm betting that that's gone >> and the autovac launcher might start a worker nearly immediately >> after some foreground process finishes inserting some rows. >> So that could result in autovac activity occurring concurrently >> with test_setup where it didn't before. > >It's not quite clear to me why the manual vacuum wouldn't just cancel >the autovacuum and complete the job. I can't quite see how there's >room for competing page locks here. Also, see [1]. One of the >reported failing tests there is the same as one of the failing tests >on wrasse. My investigation for the AIO branch found that >relallvisible was not equal to relpages. I don't recall the reason why >that was happening now. > >> As to what to do about it ... maybe apply the FREEZE and >> DISABLE_PAGE_SKIPPING options in test_setup's vacuums? >> It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps >> not sufficient. > >We should likely try and confirm it's due to relallvisible first. We had this issue before, and not just on the aio branch. On my phone right now, so won't look up references. IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all visible. An independent autovac worker starting is enough for that, for example. Previously the data load and vacuum were further apart, preventing this kind of issue. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Intermittent buildfarm failures on wrasse
On Wed, Apr 13, 2022 at 3:54 PM Tom Lane wrote: > After a bit more navel-contemplation I see a way that the pgstats > work could have changed timing in this area. We used to have a > rate limit on how often stats reports would be sent to the > collector, which'd ensure half a second or so delay before a > transaction's change counts became visible to the autovac daemon. > I've not looked at the new code, but I'm betting that that's gone > and the autovac launcher might start a worker nearly immediately > after some foreground process finishes inserting some rows. > So that could result in autovac activity occurring concurrently > with test_setup where it didn't before. But why should it matter? The test_setup.sql VACUUM of tenk1 should leave relallvisible and relpages in the same state, either way (or very close to it). The only way that it seems like it could matter is if OldestXmin was held back during test_setup.sql's execution of the VACUUM command. > As to what to do about it ... maybe apply the FREEZE and > DISABLE_PAGE_SKIPPING options in test_setup's vacuums? > It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps > not sufficient. BTW, the work on VACUUM for Postgres 15 probably makes VACUUM test flappiness issues less of a problem -- unless they're issues involving something holding back OldestXmin when it shouldn't (in which case it won't have any effect on test stability). I would expect that to be the case, at least, since VACUUM now does almost all of the same work for any individual page that it cannot get a cleanup lock on. There is surprisingly little difference between a page that gets processed by lazy_scan_prune and a page that gets processed by lazy_scan_noprune. -- Peter Geoghegan
Re: Intermittent buildfarm failures on wrasse
On Thu, 14 Apr 2022 at 10:54, Tom Lane wrote: > After a bit more navel-contemplation I see a way that the pgstats > work could have changed timing in this area. We used to have a > rate limit on how often stats reports would be sent to the > collector, which'd ensure half a second or so delay before a > transaction's change counts became visible to the autovac daemon. > I've not looked at the new code, but I'm betting that that's gone > and the autovac launcher might start a worker nearly immediately > after some foreground process finishes inserting some rows. > So that could result in autovac activity occurring concurrently > with test_setup where it didn't before. It's not quite clear to me why the manual vacuum wouldn't just cancel the autovacuum and complete the job. I can't quite see how there's room for competing page locks here. Also, see [1]. One of the reported failing tests there is the same as one of the failing tests on wrasse. My investigation for the AIO branch found that relallvisible was not equal to relpages. I don't recall the reason why that was happening now. > As to what to do about it ... maybe apply the FREEZE and > DISABLE_PAGE_SKIPPING options in test_setup's vacuums? > It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps > not sufficient. We should likely try and confirm it's due to relallvisible first. David [1] https://www.postgresql.org/message-id/20220224153339.pqn64kseb5gpg...@alap3.anarazel.de
Re: Intermittent buildfarm failures on wrasse
Peter Geoghegan writes: > On Wed, Apr 13, 2022 at 3:08 PM Tom Lane wrote: >> I'm tempted to add something like >> SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1'; >> so that we can confirm or refute the theory that relallvisible is >> the driving factor. > It would be fairly straightforward to commit a temporary debugging > patch that has the autovacuum logging stuff report directly on how > VACUUM set new_rel_allvisible in pg_class. We should probably be doing > that already, just because it's useful information that is already > close at hand. Doesn't look like wrasse has autovacuum logging enabled, though. After a bit more navel-contemplation I see a way that the pgstats work could have changed timing in this area. We used to have a rate limit on how often stats reports would be sent to the collector, which'd ensure half a second or so delay before a transaction's change counts became visible to the autovac daemon. I've not looked at the new code, but I'm betting that that's gone and the autovac launcher might start a worker nearly immediately after some foreground process finishes inserting some rows. So that could result in autovac activity occurring concurrently with test_setup where it didn't before. As to what to do about it ... maybe apply the FREEZE and DISABLE_PAGE_SKIPPING options in test_setup's vacuums? It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps not sufficient. regards, tom lane
Re: Intermittent buildfarm failures on wrasse
On Wed, Apr 13, 2022 at 3:08 PM Tom Lane wrote: > I'm tempted to add something like > > SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1'; > > so that we can confirm or refute the theory that relallvisible is > the driving factor. It would be fairly straightforward to commit a temporary debugging patch that has the autovacuum logging stuff report directly on how VACUUM set new_rel_allvisible in pg_class. We should probably be doing that already, just because it's useful information that is already close at hand. Might be a bit trickier to make sure that wrasse reliably reported on all relevant VACUUMs, since that would have to include manual VACUUMs (which would really have to use VACUUM VERBOSE), as well as autovacuums. -- Peter Geoghegan
Intermittent buildfarm failures on wrasse
For the past five days or so, wrasse has been intermittently failing due to unexpectedly not using an Index Only Scan plan in the create_index test [1], eg @@ -1910,11 +1910,15 @@ SELECT unique1 FROM tenk1 WHERE unique1 IN (1,42,7) ORDER BY unique1; - QUERY PLAN - Index Only Scan using tenk1_unique1 on tenk1 - Index Cond: (unique1 = ANY ('{1,42,7}'::integer[])) -(2 rows) +QUERY PLAN +--- + Sort + Sort Key: unique1 + -> Bitmap Heap Scan on tenk1 + Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[])) + -> Bitmap Index Scan on tenk1_unique1 + Index Cond: (unique1 = ANY ('{1,42,7}'::integer[])) +(6 rows) SELECT unique1 FROM tenk1 WHERE unique1 IN (1,42,7) The most probable explanation for this seems to be that tenk1's pg_class.relallvisible value hasn't been set high enough to make an IOS look cheaper than the alternatives. Where that ought to be getting set is the "VACUUM ANALYZE tenk1" step in test_setup.sql. It's plausible I guess that a background autovacuum is preventing that command from setting relallvisible as high as it ought to be --- but if so, why are we only seeing two plans changing, on only one animal? But what I'm really confused about is that this test arrangement has been stable since early February. Why has wrasse suddenly started showing a 25% failure rate when it never failed this way before that? Somebody has to have recently committed a change that affects this. Checking the commit log up to the onset of the failures on 8 April, I only see two plausible candidates: * shared-memory pgstats * Peter's recent VACUUM changes Any connection to pgstats is, um, pretty obscure. I'd finger the VACUUM changes as a more likely trigger except that the last interesting-looking one was f3c15cbe5 on 3 April, and wrasse got through "make check" 38 times after that before its first failure of this kind. That doesn't square with the 25% failure rate since then, so I'm kind of forced to the conclusion that the pgstats work changed some behavior that it should not have. Any ideas? I'm tempted to add something like SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1'; so that we can confirm or refute the theory that relallvisible is the driving factor. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-04-08%2003%3A48%3A30