Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Fri, 25 Feb 2022 16:47:01 +0900 (JST), Kyotaro Horiguchi wrote in > So, this is the patches for pg12-10. 11 can share the same patch with > 12. 10 has differences in two points. > > 10 has ControlFile->prevCheckPoint. > > The DETAILS of the "recovery restart point at" message is not > capitalized. But I suppose it is so close to EOL so that we don't > want to "fix" it risking existing usecases. Ugh! Wait for a moment. Something's wrong. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Fri, 25 Feb 2022 16:06:31 +0900 (JST), Kyotaro Horiguchi wrote in > At Fri, 25 Feb 2022 15:31:12 +0900 (JST), Kyotaro Horiguchi > wrote in > > While making patches for v12, I see a test failure of pg_rewind for > > uncertain reason. I'm investigating that but I post this for > > discussion. > > Hmm. Too stupid. Somehow I overly removed the latchet condition for > minRecoveryPoint. So the same patch worked for v12. So, this is the patches for pg12-10. 11 can share the same patch with 12. 10 has differences in two points. 10 has ControlFile->prevCheckPoint. The DETAILS of the "recovery restart point at" message is not capitalized. But I suppose it is so close to EOL so that we don't want to "fix" it risking existing usecases. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From c89e2b509723b68897f2af49a154af2a69f0747b Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 25 Feb 2022 15:04:00 +0900 Subject: [PATCH v3] Correctly update contfol file at the end of archive recovery CreateRestartPoint runs WAL file cleanup basing on the checkpoint just have finished in the function. If the database has exited DB_IN_ARCHIVE_RECOVERY state when the function is going to update control file, the function refrains from updating the file at all then proceeds to WAL cleanup having the latest REDO LSN, which is now inconsistent with the control file. As the result, the succeeding cleanup procedure overly removes WAL files against the control file and leaves unrecoverable database until the next checkpoint finishes. Along with that fix, we remove a dead code path for the case some other process ran a simultaneous checkpoint. It seems like just a preventive measure but it's no longer useful because we are sure that checkpoint is performed only by checkpointer except single process mode. --- src/backend/access/transam/xlog.c | 71 +++ 1 file changed, 44 insertions(+), 27 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 885558f291..2b2568c475 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9334,7 +9334,7 @@ CreateRestartPoint(int flags) /* Also update the info_lck-protected copy */ SpinLockAcquire(>info_lck); - XLogCtl->RedoRecPtr = lastCheckPoint.redo; + XLogCtl->RedoRecPtr = RedoRecPtr; SpinLockRelease(>info_lck); /* @@ -9350,7 +9350,10 @@ CreateRestartPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags, true); - CheckPointGuts(lastCheckPoint.redo, flags); + CheckPointGuts(RedoRecPtr, flags); + + /* Update pg_control */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); /* * Remember the prior checkpoint's redo ptr for @@ -9358,31 +9361,28 @@ CreateRestartPoint(int flags) */ PriorRedoPtr = ControlFile->checkPointCopy.redo; + Assert (PriorRedoPtr < RedoRecPtr); + + ControlFile->checkPoint = lastCheckPointRecPtr; + ControlFile->checkPointCopy = lastCheckPoint; + + /* Update control file using current time */ + ControlFile->time = (pg_time_t) time(NULL); + /* -* Update pg_control, using current time. Check that it still shows -* IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; -* this is a quick hack to make sure nothing really bad happens if somehow -* we get here after the end-of-recovery checkpoint. +* Ensure minRecoveryPoint is past the checkpoint record while archive +* recovery is still ongoing. Normally, this will have happened already +* while writing out dirty buffers, but not necessarily - e.g. because no +* buffers were dirtied. We do this because a non-exclusive base backup +* uses minRecoveryPoint to determine which WAL files must be included in +* the backup, and the file (or files) containing the checkpoint record +* must be included, at a minimum. Note that for an ordinary restart of +* recovery there's no value in having the minimum recovery point any +* earlier than this anyway, because redo will begin just after the +* checkpoint record. */ - LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); - if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && - ControlFile->checkPointCopy.redo < lastCheckPoint.redo) + if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY) { - ControlFile->checkPoint = lastCheckPointRecPtr; - ControlFile->checkPointCopy = lastCheckPoint; - ControlFile->time = (pg_time_t) time(NULL); - - /* -* Ensure minRecoveryPoint is past the checkpoint record. Normally, -* this will have happened already while writing out dirty buffers, -* but not necessarily - e.g. because
Re: [Proposal] Global temporary tables
Hi, This is a huge thread. Realistically reviewers and committers can't reread it. I think there needs to be more of a description of how this works included in the patchset and *why* it works that way. The readme does a bit of that, but not particularly well. On 2022-02-25 14:26:47 +0800, Wenjing Zeng wrote: > +++ b/README.gtt.txt > @@ -0,0 +1,172 @@ > +Global Temporary Table(GTT) > += > + > +Feature description > +- > + > +Previously, temporary tables are defined once and automatically > +exist (starting with empty contents) in every session before using them. I think for a README "previously" etc isn't good language - if it were commited, it'd not be understandable anymore. It makes more sense for commit messages etc. > +Main design ideas > +- > +In general, GTT and LTT use the same storage and buffer design and > +implementation. The storage files for both types of temporary tables are > named > +as t_backendid_relfilenode, and the local buffer is used to cache the data. What does "named ast_backendid_relfilenode" mean? > +The schema of GTTs is shared among sessions while their data are not. We > build > +a new mechanisms to manage those non-shared data and their statistics. > +Here is the summary of changes: > + > +1) CATALOG > +GTTs store session-specific data. The storage information of GTTs'data, their > +transaction information, and their statistics are not stored in the catalog. > + > +2) STORAGE INFO & STATISTICS INFO & TRANSACTION INFO > +In order to maintain durability and availability of GTTs'session-specific > data, > +their storage information, statistics, and transaction information is managed > +in a local hash table tt_storage_local_hash. "maintain durability"? Durable across what? In the context of databases it's typically about crash safety, but that can't be the case here. > +3) DDL > +Currently, GTT supports almost all table'DDL except CLUSTER/VACUUM FULL. > +Part of the DDL behavior is limited by shared definitions and multiple > copies of > +local data, and we added some structures to handle this. > +A shared hash table active_gtt_shared_hash is added to track the state of the > +GTT in a different session. This information is recorded in the hash table > +during the DDL execution of the GTT. > +The data stored in a GTT can only be modified or accessed by owning session. > +The statements that only modify data in a GTT do not need a high level of > +table locking. The operations making those changes include truncate GTT, > +reindex GTT, and lock GTT. I think you need to introduce a bit more terminology for any of this to make sense. Sometimes GTT means the global catalog entity, sometimes, like here, it appears to mean the session specific contents of a GTT. What state of a GTT in a nother session? How do GTTs handle something like BEGIN; TRUNCATE some_gtt_table; ROLLBACK;? > +1.2 on commit clause > +LTT's status associated with on commit DELETE ROWS and on commit PRESERVE > ROWS > +is not stored in catalog. Instead, GTTs need a bool value > on_commit_delete_rows > +in reloptions which is shared among sessions. Why? > +2.3 statistics info > +1) relpages reltuples relallvisible relfilenode ? > +3 DDL > +3.1. active_gtt_shared_hash > +This is the hash table created in shared memory to trace the GTT files > initialized > +in each session. Each hash entry contains a bitmap that records the > backendid of > +the initialized GTT file. With this hash table, we know which backend/session > +is using this GTT. Such information is used during GTT's DDL operations. So there's a separate locking protocol for GTTs that doesn't use the normal locking infrastructure? Why? > +3.7 CLUSTER GTT/VACUUM FULL GTT > +The current version does not support. Why? > +4 MVCC commit log(clog) cleanup > + > +The GTT storage file contains transaction information. Queries for GTT data > rely > +on transaction information such as clog. The transaction information > required by > +each session may be completely different. Why is transaction information different between sessions? Or does this just mean that different transaction ids will be accessed? 0003-gtt-v67-implementation.patch 71 files changed, 3167 insertions(+), 195 deletions(-) This needs to be broken into smaller chunks to be reviewable. > @@ -677,6 +678,14 @@ _bt_getrootheight(Relation rel) > { > Buffer metabuf; > > + /* > + * If a global temporary table storage file is not initialized > in the > + * this session, its index does not have a root page, just > returns 0. > + */ > + if (RELATION_IS_GLOBAL_TEMP(rel) && > + !gtt_storage_attached(RelationGetRelid(rel))) > + return 0; > + > metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ); >
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Hi, On 2022-02-24 20:53:08 -0800, Peter Geoghegan wrote: > 0002 makes page-level freezing a first class thing. > heap_prepare_freeze_tuple now has some (limited) knowledge of how this > works. heap_prepare_freeze_tuple's cutoff_xid argument is now always > the VACUUM caller's OldestXmin (not its FreezeLimit, as before). We > still have to pass FreezeLimit to heap_prepare_freeze_tuple, which > helps us to respect FreezeLimit as a backstop, and so now it's passed > via the new backstop_cutoff_xid argument instead. I am not a fan of the backstop terminology. It's still the reason we need to do freezing for correctness reasons. It'd make more sense to me to turn it around and call the "non-backstop" freezing opportunistic freezing or such. > Whenever we opt to > "freeze a page", the new page-level algorithm *always* uses the most > recent possible XID and MXID values (OldestXmin and oldestMxact) to > decide what XIDs/XMIDs need to be replaced. That might sound like it'd > be too much, but it only applies to those pages that we actually > decide to freeze (since page-level characteristics drive everything > now). FreezeLimit is only one way of triggering that now (and one of > the least interesting and rarest). That largely makes sense to me and doesn't seem weird. I'm a tad concerned about replacing mxids that have some members that are older than OldestXmin but not older than FreezeLimit. It's not too hard to imagine that accelerating mxid consumption considerably. But we can probably, if not already done, special case that. > It seems that heap_prepare_freeze_tuple allocates new MXIDs (when > freezing old ones) in large part so it can NOT freeze XIDs that it > would have been useful (and much cheaper) to remove anyway. Well, we may have to allocate a new mxid because some members are older than FreezeLimit but others are still running. When do we not remove xids that would have been cheaper to remove once we decide to actually do work? > On HEAD, FreezeMultiXactId() doesn't get passed down the VACUUM operation's > OldestXmin at all (it actually just gets FreezeLimit passed as its > cutoff_xid argument). It cannot possibly recognize any of this for itself. It does recognize something like OldestXmin in a more precise and expensive way - MultiXactIdIsRunning() and TransactionIdIsCurrentTransactionId(). > Does that theory about MultiXacts sound plausible? I'm not claiming > that the patch makes it impossible that FreezeMultiXactId() will have > to allocate a new MultiXact to freeze during VACUUM -- the > freeze-the-dead isolation tests already show that that's not true. I > just think that page-level freezing based on page characteristics with > oldestXmin and oldestMxact (not FreezeLimit and MultiXactCutoff) > cutoffs might make it a lot less likely in practice. Hm. I guess I'll have to look at the code for it. It doesn't immediately "feel" quite right. > oldestXmin and oldestMxact map to the same wall clock time, more or less -- > that seems like it might be an important distinction, independent of > everything else. Hm. Multis can be kept alive by fairly "young" member xids. So it may not be removably (without creating a newer multi) until much later than its creation time. So I don't think that's really true. > From 483bc8df203f9df058fcb53e7972e3912e223b30 Mon Sep 17 00:00:00 2001 > From: Peter Geoghegan > Date: Mon, 22 Nov 2021 10:02:30 -0800 > Subject: [PATCH v9 1/4] Loosen coupling between relfrozenxid and freezing. > > When VACUUM set relfrozenxid before now, it set it to whatever value was > used to determine which tuples to freeze -- the FreezeLimit cutoff. > This approach was very naive: the relfrozenxid invariant only requires > that new relfrozenxid values be <= the oldest extant XID remaining in > the table (at the point that the VACUUM operation ends), which in > general might be much more recent than FreezeLimit. There is no fixed > relationship between the amount of physical work performed by VACUUM to > make it safe to advance relfrozenxid (freezing and pruning), and the > actual number of XIDs that relfrozenxid can be advanced by (at least in > principle) as a result. VACUUM might have to freeze all of the tuples > from a hundred million heap pages just to enable relfrozenxid to be > advanced by no more than one or two XIDs. On the other hand, VACUUM > might end up doing little or no work, and yet still be capable of > advancing relfrozenxid by hundreds of millions of XIDs as a result. > > VACUUM now sets relfrozenxid (and relminmxid) using the exact oldest > extant XID (and oldest extant MultiXactId) from the table, including > XIDs from the table's remaining/unfrozen MultiXacts. This requires that > VACUUM carefully track the oldest unfrozen XID/MultiXactId as it goes. > This optimization doesn't require any changes to the definition of > relfrozenxid, nor does it require changes to the core design of > freezing. > Final relfrozenxid values must still be >=
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Fri, 25 Feb 2022 15:31:12 +0900 (JST), Kyotaro Horiguchi wrote in > While making patches for v12, I see a test failure of pg_rewind for > uncertain reason. I'm investigating that but I post this for > discussion. Hmm. Too stupid. Somehow I overly removed the latchet condition for minRecoveryPoint. So the same patch worked for v12. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Hi, On Fri, Feb 25, 2022 at 12:23:27AM +0530, Nitin Jadhav wrote: > > I think the change to ImmediateCheckpointRequested() makes no sense. > > Before this patch, that function merely inquires whether there's an > > immediate checkpoint queued. After this patch, it ... changes a > > progress-reporting flag? I think it would make more sense to make the > > progress-report flag change in whatever is the place that *requests* an > > immediate checkpoint rather than here. > > > > > diff --git a/src/backend/postmaster/checkpointer.c > > > b/src/backend/postmaster/checkpointer.c > > > +ImmediateCheckpointRequested(int flags) > > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > > > +{ > > > +updated_flags |= CHECKPOINT_IMMEDIATE; > > > > I don't think that these changes are expected behaviour. Under in this > > condition; the currently running checkpoint is still not 'immediate', > > but it is going to hurry up for a new, actually immediate checkpoint. > > Those are different kinds of checkpoint handling; and I don't think > > you should modify the reported flags to show that we're going to do > > stuff faster than usual. Maybe maintiain a seperate 'upcoming > > checkpoint flags' field instead? > > Thank you Alvaro and Matthias for your views. I understand your point > of not updating the progress-report flag here as it just checks > whether the CHECKPOINT_IMMEDIATE is set or not and takes an action > based on that but it doesn't change the checkpoint flags. I will > modify the code but I am a bit confused here. As per Alvaro, we need > to make the progress-report flag change in whatever is the place that > *requests* an immediate checkpoint. I feel this gives information > about the upcoming checkpoint not the current one. So updating here > provides wrong details in the view. The flags available during > CreateCheckPoint() will remain same for the entire checkpoint > operation and we should show the same information in the view till it > completes. I'm not sure what Matthias meant, but as far as I know there's no fundamental difference between checkpoint with and without the CHECKPOINT_IMMEDIATE flag, and there's also no scheduling for multiple checkpoints. Yes, the flags will remain the same but checkpoint.c will test both the passed flags and the shmem flags to see whether a delay should be added or not, which is the only difference in checkpoint processing for this flag. See the call to ImmediateCheckpointRequested() which will look at the value in shmem: /* * Perform the usual duties and take a nap, unless we're behind schedule, * in which case we just try to catch up as quickly as possible. */ if (!(flags & CHECKPOINT_IMMEDIATE) && !ShutdownRequestPending && !ImmediateCheckpointRequested() && IsCheckpointOnSchedule(progress)) [...]
Re: Typo in pgbench messages.
On Thu, 24 Feb 2022 14:44:05 +0100 Daniel Gustafsson wrote: > > On 24 Feb 2022, at 13:58, Fabien COELHO wrote: > > >> One argument against a backpatch is that this would be disruptive with > >> tools that parse and analyze the output generated by pgbench. Fabien, > >> don't you have some tools and/or wrappers doing exactly that? > > > > Yep, I like to "| cut -d' ' -fx" and other "line.split(' ')" or whatever. > > > > I think that the break of typographical rules is intentional to allow such > > simplistic low-level stream handling through pipes or scripts. I'd prefer > > that the format is not changed. Maybe a comment could be added to explain > > the reason behind it. > > That doesn't sound like an overwhelmingly convincing argument to print some > messages with 'X %' and others with 'X%'. I agree with Daniel. If inserting a space in front of % was intentional for handling the result in such tools, we should fix other places where 'X%' is used in the pgbench output. Regards, Yugo Nagata -- Yugo NAGATA
Re: [PATCH] add relation and block-level filtering to pg_waldump
On Fri, 25 Feb 2022 at 03:02, David Christensen wrote: > Greetings, > > This patch adds the ability to specify a RelFileNode and optional BlockNum to > limit output of > pg_waldump records to only those which match the given criteria. This should > be more performant > than `pg_waldump | grep` as well as more reliable given specific variations > in output style > depending on how the blocks are specified. > > This currently affects only the main fork, but we could presumably add the > option to filter by fork > as well, if that is considered useful. > Cool. I think we can report an error instead of reading wal files, if the tablespace, database, or relation is invalid. Does there any WAL record that has invalid tablespace, database, or relation OID? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Tue, 22 Feb 2022 17:44:01 +0900 (JST), Kyotaro Horiguchi wrote in > At Tue, 22 Feb 2022 01:59:45 +0900, Fujii Masao > wrote in > > > > > Of the following, I think we should do (a) and (b) to make future > > > backpatchings easier. > > > a) Use RedoRecPtr and PriorRedoPtr after they are assigned. > > > b) Move assignment to PriorRedoPtr into the ControlFileLock section. > > > > I failed to understand how (a) and (b) can make the backpatching > > easier. How easy to backpatch seems the same whether we apply (a) and > > (b) or not... > > That premises that the patch applied to master contains (a) and (b). > So if it doesn't, those are not need by older branches. I was once going to remove them. But according the discussion below, the patch for back-patching is now quite close to that for the master branch. So I left them alone. > > > d) Skip call to UpdateCheckPointDistanceEstimate() when RedoRecPtr <= > > >PriorRedoPtr. > > > > But "RedoRecPtr <= PriorRedoPtr" will never happen, will it? Because a > > I didn't believe that it happens. (So, it came from my > convervativeness, or laziness, or both:p) The code dates from 2009 and > StartupXLOG makes a concurrent checkpoint with bgwriter. But as of at > least 9.5, StartupXLOG doesn't directly call CreateCheckPoint. So I > think that won't happen. > > So, in short, I agree to remove it or turn it into Assert(). It was a bit out of point. If we assume RedoRecPtr is always larger than PriorRedoPtr and then we don't need to check that there, we should also remove the "if (PriorRedoPtr < RedoRecPtr)" branch just above, which means the patch for back-branches gets very close to that for the master. Do we make such a large change on back branches? Anyways this version once takes that way. > > if (flags & CHECKPOINT_IS_SHUTDOWN) > > ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY; > > > > Same as above. IMO it's safer and better to always update the state > > (whether the state is DB_IN_ARCHIVE_RECOVERY or not) if > > CHECKPOINT_IS_SHUTDOWN flag is passed. > > That means we may exit recovery mode after ShutdownXLOG called > CreateRestartPoint. I don't think that may happen. So I'd rather add > Assert ((flags_IS_SHUTDOWN) == 0) there instaed. So this version for v14 gets updated in the following points. Completely removed the code path for the case some other process runs simultaneous checkpoint. Removed the condition (RedoRecPtr > PriorRedoPtr) for UpdateCheckPointDistanceEstimate() call. Added an assertion to the recoery-end path. # Honestly I feel this is a bit too much for back-patching, though. While making patches for v12, I see a test failure of pg_rewind for uncertain reason. I'm investigating that but I post this for discussion. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From e983f3d4c2dbeea742aed0ef1e209e7821f6687f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Mon, 14 Feb 2022 13:04:33 +0900 Subject: [PATCH v2] Correctly update contfol file at the end of archive recovery CreateRestartPoint runs WAL file cleanup basing on the checkpoint just have finished in the function. If the database has exited DB_IN_ARCHIVE_RECOVERY state when the function is going to update control file, the function refrains from updating the file at all then proceeds to WAL cleanup having the latest REDO LSN, which is now inconsistent with the control file. As the result, the succeeding cleanup procedure overly removes WAL files against the control file and leaves unrecoverable database until the next checkpoint finishes. Along with that fix, we remove a dead code path for the case some other process ran a simultaneous checkpoint. It seems like just a preventive measure but it's no longer useful because we are sure that checkpoint is performed only by checkpointer except single process mode. --- src/backend/access/transam/xlog.c | 73 --- 1 file changed, 47 insertions(+), 26 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6208e123e5..ff4a90eacc 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9587,6 +9587,9 @@ CreateRestartPoint(int flags) XLogSegNo _logSegNo; TimestampTz xtime; + /* we don't assume concurrent checkpoint/restartpoint to run */ + Assert (!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER); + /* Get a local copy of the last safe checkpoint record. */ SpinLockAcquire(>info_lck); lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr; @@ -9653,7 +9656,7 @@ CreateRestartPoint(int flags) /* Also update the info_lck-protected copy */ SpinLockAcquire(>info_lck); - XLogCtl->RedoRecPtr = lastCheckPoint.redo; + XLogCtl->RedoRecPtr = RedoRecPtr; SpinLockRelease(>info_lck); /* @@ -9672,7 +9675,10 @@ CreateRestartPoint(int flags) /* Update
Re: [PATCH] Expose port->authn_id to extensions and triggers
Hi, On Thu, Feb 24, 2022 at 09:18:26PM -0800, Andres Freund wrote: > > On 2022-02-25 13:01:26 +0800, Julien Rouhaud wrote: > > On Thu, Feb 24, 2022 at 08:44:08PM +, Jacob Champion wrote: > > > > > > Yeah... I was following a similar track with the initial work last > > > year, but I dropped it when the cost of implementation started to grow > > > considerably. At the time, though, it looked like some overhauls to the > > > stats framework were being pursued? I should read up on that thread. > > > > Do you mean the shared memory stats patchset? IIUC this is unrelated, as > > the > > beentry stuff Michael was talking about is a different infrastructure > > (backend_status.[ch]), and I don't think there are any plans to move that to > > dynamic shared memory. > > Until a year ago the backend_status.c stuff was in in pgstat.c - I just split > them out because of the shared memory status work - so it'd not be surprising > for them to mentally be thrown in one bucket. Right. > Basically the type of stats we're trying to move to dynamic shared memory is > about counters that should persist for a while and are accumulated across > connections etc. Whereas backend_status.c is more about tracking the current > state (what query is a backend running, what user, etc). But would it be acceptable to use dynamic shared memory in backend_status and e.g. have a dsa_pointer rather than a fixed length array? That seems like a bad idea for query text in general, but for authn_id for instance it seems less likely to hold gigantic strings, and also have more or less consistent size when provided.
Re: Add id's to various elements in protocol.sgml
On 24.02.2022 at 17:07, Brar Piening wrote: On 24.02.2022 at 16:46, Alvaro Herrera wrote: Would it be possible to create such anchor links as part of the XSL stylesheets for HTML? I'll investiogate our options and report back. Yes, that would be possible. In fact appending a link and optionally adding a tiny bit of CSS like I show below does the trick. The major problem in that regard would probably be my lack of XSLT/docbook skills but if no one can jump in here, I can see if I can make it work. Obviously adding the links via javascript would also work (and even be easier for me personally) but that seems like the second best solution to me since it involves javascript where no javasript is needed. Personally I consider having ids to link to and making them more comfortable to use/find as orthogonal problems in that case (mostly developer documentation) so IMHO solving this doesn't necessarily need to hold back the original patch. Insert # ... .variablelist a.anchor { visibility: hidden; } .variablelist *:hover > a.anchor, .variablelist a.anchor:focus { visibility: visible; }
Re: Design of pg_stat_subscription_workers vs pgstats
On Thu, Feb 24, 2022 at 9:20 PM Masahiko Sawada wrote: > > > > > 6. src/backend/postmaster/pgstat.c - function name > > > > +/* -- > > + * pgstat_reset_subscription_counter() - > > + * > > + * Tell the statistics collector to reset a single subscription > > + * counter, or all subscription counters (when subid is InvalidOid). > > + * > > + * Permission checking for this function is managed through the normal > > + * GRANT system. > > + * -- > > + */ > > +void > > +pgstat_reset_subscription_counter(Oid subid) > > > > SUGGESTION (function name) > > "pgstat_reset_subscription_counter" --> > > "pgstat_reset_subscription_counters" (plural) > > Fixed. > We don't use the plural form in other similar cases like pgstat_reset_replslot_counter, pgstat_reset_slru_counter, so why do it differently here? -- With Regards, Amit Kapila.
Re: [PATCH] Expose port->authn_id to extensions and triggers
Hi, On 2022-02-25 13:01:26 +0800, Julien Rouhaud wrote: > On Thu, Feb 24, 2022 at 08:44:08PM +, Jacob Champion wrote: > > > > Yeah... I was following a similar track with the initial work last > > year, but I dropped it when the cost of implementation started to grow > > considerably. At the time, though, it looked like some overhauls to the > > stats framework were being pursued? I should read up on that thread. > > Do you mean the shared memory stats patchset? IIUC this is unrelated, as the > beentry stuff Michael was talking about is a different infrastructure > (backend_status.[ch]), and I don't think there are any plans to move that to > dynamic shared memory. Until a year ago the backend_status.c stuff was in in pgstat.c - I just split them out because of the shared memory status work - so it'd not be surprising for them to mentally be thrown in one bucket. Basically the type of stats we're trying to move to dynamic shared memory is about counters that should persist for a while and are accumulated across connections etc. Whereas backend_status.c is more about tracking the current state (what query is a backend running, what user, etc). They're not unrelated though: backend_status.c feeds pgstat.c with some information occasionally. Greetings, Andres Freund
Re: [PATCH] Expose port->authn_id to extensions and triggers
On Thu, Feb 24, 2022 at 08:44:08PM +, Jacob Champion wrote: > > Yeah... I was following a similar track with the initial work last > year, but I dropped it when the cost of implementation started to grow > considerably. At the time, though, it looked like some overhauls to the > stats framework were being pursued? I should read up on that thread. Do you mean the shared memory stats patchset? IIUC this is unrelated, as the beentry stuff Michael was talking about is a different infrastructure (backend_status.[ch]), and I don't think there are any plans to move that to dynamic shared memory.
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Sun, Feb 20, 2022 at 12:27 PM Peter Geoghegan wrote: > You've given me a lot of high quality feedback on all of this, which > I'll work through soon. It's hard to get the balance right here, but > it's made much easier by this kind of feedback. Attached is v9. Lots of changes. Highlights: * Much improved 0001 ("loosen coupling" dynamic relfrozenxid tracking patch). Some of the improvements are due to recent feedback from Robert. * Much improved 0002 ("Make page-level characteristics drive freezing" patch). Whole new approach to the implementation, though the same algorithm as before. * No more FSM patch -- that was totally separate work, that I shouldn't have attached to this project. * There are 2 new patches (these are now 0003 and 0004), both of which are concerned with allowing non-aggressive VACUUM to consistently advance relfrozenxid. I think that 0003 makes sense on general principle, but I'm much less sure about 0004. These aren't too important. While working on the new approach to freezing taken by v9-0002, I had some insight about the issues that Robert raised around 0001, too. I wasn't expecting that to happen. 0002 makes page-level freezing a first class thing. heap_prepare_freeze_tuple now has some (limited) knowledge of how this works. heap_prepare_freeze_tuple's cutoff_xid argument is now always the VACUUM caller's OldestXmin (not its FreezeLimit, as before). We still have to pass FreezeLimit to heap_prepare_freeze_tuple, which helps us to respect FreezeLimit as a backstop, and so now it's passed via the new backstop_cutoff_xid argument instead. Whenever we opt to "freeze a page", the new page-level algorithm *always* uses the most recent possible XID and MXID values (OldestXmin and oldestMxact) to decide what XIDs/XMIDs need to be replaced. That might sound like it'd be too much, but it only applies to those pages that we actually decide to freeze (since page-level characteristics drive everything now). FreezeLimit is only one way of triggering that now (and one of the least interesting and rarest). 0002 also adds an alternative set of relfrozenxid/relminmxid tracker variables, to make the "don't freeze the page" path within lazy_scan_prune simpler (if you don't want to freeze the page, then use the set of tracker variables that go with that choice, which heap_prepare_freeze_tuple knows about and helps with). With page-level freezing, lazy_scan_prune wants to make a decision about the page as a whole, at the last minute, after all heap_prepare_freeze_tuple calls have already been made. So I think that heap_prepare_freeze_tuple needs to know about that aspect of lazy_scan_prune's behavior. When we *don't* want to freeze the page, we more or less need everything related to freezing inside lazy_scan_prune to behave like lazy_scan_noprune, which never freezes the page (that's mostly the point of lazy_scan_noprune). And that's almost what we actually do -- heap_prepare_freeze_tuple now outsources maintenance of this alternative set of "don't freeze the page" relfrozenxid/relminmxid tracker variables to its sibling function, heap_tuple_needs_freeze. That is the same function that lazy_scan_noprune itself actually calls. Now back to Robert's feedback on 0001, which had very complicated comments in the last version. This approach seems to make the "being versus becoming" or "going to freeze versus not going to freeze" distinctions much clearer. This is less true if you assume that 0002 won't be committed but 0001 will be. Even if that happens with Postgres 15, I have to imagine that adding something like 0002 must be the real goal, long term. Without 0002, the value from 0001 is far more limited. You need both together to get the virtuous cycle I've described. The approach with always using OldestXmin as cutoff_xid and oldestMxact as our cutoff_multi makes a lot of sense to me, in part because I think that it might well cut down on the tendency of VACUUM to allocate new MultiXacts in order to be able to freeze old ones. AFAICT the only reason that heap_prepare_freeze_tuple does that is because it has no flexibility on FreezeLimit and MultiXactCutoff. These are derived from vacuum_freeze_min_age and vacuum_multixact_freeze_min_age, respectively, and so they're two independent though fairly meaningless cutoffs. On the other hand, OldestXmin and OldestMxact are not independent in the same way. We get both of them at the same time and the same place, in vacuum_set_xid_limits. OldestMxact really is very close to OldestXmin -- only the units differ. It seems that heap_prepare_freeze_tuple allocates new MXIDs (when freezing old ones) in large part so it can NOT freeze XIDs that it would have been useful (and much cheaper) to remove anyway. On HEAD, FreezeMultiXactId() doesn't get passed down the VACUUM operation's OldestXmin at all (it actually just gets FreezeLimit passed as its cutoff_xid argument). It cannot possibly recognize any of this for itself. Does that theory about MultiXacts
Re: Separate the result of \watch for each query execution (psql)
pá 25. 2. 2022 v 5:23 odesílatel Noboru Saito napsal: > Hi, > > Pavel Stehule : > > > I strongly agree. It was a lot of work to find a workable solution for > pspg. Special chars that starting result and maybe other, that ending > result can significantly increase robustness and can reduce code. I think > it can be better to use form feed at the end of form - like it is semantic > of form feed. You know, at this moment, the result is complete. > https://en.wikipedia.org/wiki/Page_break > > > > It's easier to print a form feed before the result, but it's okay at the > end. > > > > > I don't think using it by default can be the best. Lot of people don't > use specialized pagers, but it can be set by \pset. Form feed should be > used on end > > > > > > \pset formfeed [on, off] > > > > I think it's a good idea to be able to switch with \pset. > > I have created a patch that allows you to turn it on and off in \pset. > The attached patch adds the following features. > > > Formfeed can be turned on with the command line option or \pset. > Formfeed (\f\n) is output after the query execution result by \watch. > > I think the considerations are as follows. > > * Is formfeed output after the result, not before? > We are talking about the first iteration. In the second and other iteration this question has no sense. You know the starting point. You don't know the endpoint. So I think so using formfeed on the end is good idea. * Is the formfeed output only "\f\n"? > yes > * Is the formfeed output only executed by \watch? > This is a good question. I think the implementation for \watch is a good start. But it can help with normal results too. In pspg it can be very useful for incremental load or for streaming mode. But if it will be used everywhere, then it should be used just for some specified pagers. > * Is the name "formfeed" appropriate? > If it will do work of formfeed, then the formfeed is good name. > > If the formfeed is output before the query result, > it will be better if the screen is reset when the formfeed is read. > I think, so you propose another feature - reset terminal sequence - it can be a good idea. Not for pspg, but generally, why not. Regards Pavel > > Furthermore, if the terminal clear string can be set instead of the > formfeed character, > the \watch output can be fixedly displayed without a pager. > > (I noticed that if I set it in the title, it would behave similarly in > the current version. > # \C '\033[2J;\033[0;0H' > # SELECT now();\watch 1 > ) > > Also, it may be good to output formfeed when outputting a file with > `\o result.txt` > other than \watch. >
Re: BufferAlloc: don't take two simultaneous locks
On Mon, 21 Feb 2022 at 08:06, Yura Sokolov wrote: > > Good day, Kyotaro Horiguchi and hackers. > > В Чт, 17/02/2022 в 14:16 +0900, Kyotaro Horiguchi пишет: > > At Wed, 16 Feb 2022 10:40:56 +0300, Yura Sokolov > > wrote in > > > Hello, all. > > > > > > I thought about patch simplification, and tested version > > > without BufTable and dynahash api change at all. > > > > > > It performs suprisingly well. It is just a bit worse > > > than v1 since there is more contention around dynahash's > > > freelist, but most of improvement remains. > > > > > > I'll finish benchmarking and will attach graphs with > > > next message. Patch is attached here. > > > > Thanks for the new patch. The patch as a whole looks fine to me. But > > some comments needs to be revised. > > Thank you for review and remarks. v3 gets the buffer partition locking right, well done, great results! In v3, the comment at line 1279 still implies we take both locks together, which is not now the case. Dynahash actions are still possible. You now have the BufTableDelete before the BufTableInsert, which opens up the possibility I discussed here: http://postgr.es/m/CANbhV-F0H-8oB_A+m=55hP0e0QRL=rdddqusxmtft6jprdx...@mail.gmail.com (Apologies for raising a similar topic, I hadn't noticed this thread before; thanks to Horiguchi-san for pointing this out). v1 had a horrible API (sorry!) where you returned the entry and then explicitly re-used it. I think we *should* make changes to dynahash, but not with the API you proposed. Proposal for new BufTable API BufTableReuse() - similar to BufTableDelete() but does NOT put entry back on freelist, we remember it in a private single item cache in dynahash BufTableAssign() - similar to BufTableInsert() but can only be executed directly after BufTableReuse(), fails with ERROR otherwise. Takes the entry from single item cache and re-assigns it to new tag In dynahash we have two new modes that match the above HASH_REUSE - used by BufTableReuse(), similar to HASH_REMOVE, but places entry on the single item cache, avoiding freelist HASH_ASSIGN - used by BufTableAssign(), similar to HASH_ENTER, but uses the entry from the single item cache, rather than asking freelist This last call can fail if someone else already inserted the tag, in which case it adds the single item cache entry back onto freelist Notice that single item cache is not in shared memory, so on abort we should give it back, so we probably need an extra API call for that also to avoid leaking an entry. Doing it this way allows us to * avoid touching freelists altogether in the common path - we know we are about to reassign the entry, so we do remember it - no contention from other backends, no borrowing etc.. * avoid sharing the private details outside of the dynahash module * allows us to use the same technique elsewhere that we have partitioned hash tables This approach is cleaner than v1, but should also perform better because there will be a 1:1 relationship between a buffer and its dynahash entry, most of the time. With these changes, I think we will be able to *reduce* the number of freelists for partitioned dynahash from 32 to maybe 8, as originally speculated by Robert in 2016: https://www.postgresql.org/message-id/CA%2BTgmoZkg-04rcNRURt%3DjAG0Cs5oPyB-qKxH4wqX09e-oXy-nw%40mail.gmail.com since the freelists will be much less contended with the above approach It would be useful to see performance with a higher number of connections, >400. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
>Nice catch! However, I'm not sure I like the patch. > * made it through and start writing after the portion that > persisted. > * (It's critical to first write an OVERWRITE_CONTRECORD message, > which > * we'll do as soon as we're open for writing new WAL.) >+* >+* If the last wal record is ahead of the missing contrecord, this > is a >+* recently promoted primary and we should not write an overwrite >+* contrecord. >Before the part, the comment follows the part shown below. >>* Actually, if WAL ended in an incomplete record, skip the parts > that >So, actually WAL did not ended in an incomplete record. I think >FinishWalRecover is the last place to do that. (But it could be >earlier.) Thanks for the feedback. ## On the primary, an OVERWRITE_CONTRECORD is written. The aborted record (abortedRecPtr) and insert position of the missing contrecord (missingContrecPtr) are set during FinishWalRecovery and after recovery but before writes are accepted, the OVERWRITE_CONTRECORD is written. ## on the primary logs 2022-02-25 02:25:16.208 UTC [7397] LOG: redo done at 0/1FFD2B8 system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s 2022-02-25 02:25:16.208 UTC [7397] DEBUG: resetting unlogged relations: cleanup 0 init 1 2022-02-25 02:25:16.209 UTC [7397] DEBUG: creating and filling new WAL file 2022-02-25 02:25:16.217 UTC [7397] DEBUG: done creating and filling new WAL file 2022-02-25 02:25:16.217 UTC [7397] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 2022-02-25 02:25:16.217 UTC [7397] DEBUG: MultiXact member stop limit is now 4294914944 based on MultiXact 1 2022-02-25 02:25:16.217 UTC [7397] DEBUG: OVERWRITE_CONTRECORD inserted at 0/228 for aborted record 0/1FFD2E0 <<--- Attached V3 of patch that adds logging which shows the overwrite record created on the primary 2022-02-25 02:25:16.217 UTC [7395] LOG: checkpoint starting: end-of-recovery immediate wait 2022-02-25 02:25:16.217 UTC [7395] DEBUG: performing replication slot checkpoint 2022-02-25 02:25:16.218 UTC [7395] DEBUG: attempting to remove WAL segments older than log file 2022-02-25 02:25:16.218 UTC [7395] LOG: checkpoint complete: wrote 131 buffers (102.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s; distance=11381 kB, estimate=11381 kB 2022-02-25 02:25:16.219 UTC [7394] DEBUG: starting background worker process "logical replication launcher" 2022-02-25 02:25:16.219 UTC [7394] LOG: database system is ready to accept connections ## A downstream standby now skips over this OVERWRITE_CONTRECORD, but the value for missingContrecPtr is not invalidated afterwards. ## on the standby logs 2022-02-25 02:25:16.616 UTC [7413] DEBUG: sending hot standby feedback xmin 0 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0 2022-02-25 02:25:16.616 UTC [7387] LOG: successfully skipped missing contrecord at 0/1FFD2E0, overwritten at 2022-02-25 02:25:16.2175+00 2022-02-25 02:25:16.616 UTC [7387] CONTEXT: WAL redo at 0/228 for XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFD2E0; time 2022-02-25 02:25:16.2175+00 ## After promotion, the standby attempts to write the overwrite_contrecord again using the missingContrecPtr LSN which is now in the past. ## on the standby logs 2022-02-25 02:25:16.646 UTC [7387] LOG: invalid record length at 0/201EC70: wanted 24, got 0 2022-02-25 02:25:16.646 UTC [7387] LOG: redo done at 0/201EC48 system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.55 s 2022-02-25 02:25:16.646 UTC [7387] LOG: last completed transaction was at log time 2022-02-25 02:25:16.301554+00 2022-02-25 02:25:16.646 UTC [7387] DEBUG: resetting unlogged relations: cleanup 0 init 1 2022-02-25 02:25:16.646 UTC [7387] LOG: selected new timeline ID: 2 2022-02-25 02:25:16.646 UTC [7387] DEBUG: updated min recovery point to 0/201EC70 on timeline 1 2022-02-25 02:25:16.656 UTC [7387] DEBUG: could not remove file "pg_wal/00020002": No such file or directory 2022-02-25 02:25:16.656 UTC [7387] LOG: archive recovery complete 2022-02-25 02:25:16.656 UTC [7387] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 2022-02-25 02:25:16.656 UTC [7387] DEBUG: MultiXact member stop limit is now 4294914944 based on MultiXact 1 2022-02-25 02:25:16.656 UTC [7387] DEBUG: OVERWRITE_CONTRECORD inserted at 0/228 for aborted record 0/1FFD2E0 <<--- The same overwrite record is written on the recently promoted standby 2022-02-25 02:25:16.656 UTC [7387] DEBUG: attempting to remove WAL segments newer than log file 00020001 2022-02-25 02:25:16.656 UTC [7387] DEBUG: recycled write-ahead log file "00010002" 2022-02-25 02:25:16.656 UTC [7387] DEBUG: release all standby locks 2022-02-25
Re: Separate the result of \watch for each query execution (psql)
Hi, Pavel Stehule : > > I strongly agree. It was a lot of work to find a workable solution for > > pspg. Special chars that starting result and maybe other, that ending > > result can significantly increase robustness and can reduce code. I think > > it can be better to use form feed at the end of form - like it is semantic > > of form feed. You know, at this moment, the result is complete. > > https://en.wikipedia.org/wiki/Page_break > > It's easier to print a form feed before the result, but it's okay at the end. > > > I don't think using it by default can be the best. Lot of people don't use > > specialized pagers, but it can be set by \pset. Form feed should be used on > > end > > > > \pset formfeed [on, off] > > I think it's a good idea to be able to switch with \pset. I have created a patch that allows you to turn it on and off in \pset. The attached patch adds the following features. Formfeed can be turned on with the command line option or \pset. Formfeed (\f\n) is output after the query execution result by \watch. I think the considerations are as follows. * Is formfeed output after the result, not before? * Is the formfeed output only "\f\n"? * Is the formfeed output only executed by \watch? * Is the name "formfeed" appropriate? If the formfeed is output before the query result, it will be better if the screen is reset when the formfeed is read. Furthermore, if the terminal clear string can be set instead of the formfeed character, the \watch output can be fixedly displayed without a pager. (I noticed that if I set it in the title, it would behave similarly in the current version. # \C '\033[2J;\033[0;0H' # SELECT now();\watch 1 ) Also, it may be good to output formfeed when outputting a file with `\o result.txt` other than \watch. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index caabb06c53..68a876c8bd 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -256,6 +256,16 @@ EOF + + --formfeed + + + Turn on the formfeed. This is equivalent to + \pset formfeed. + + + + -h hostname --host=hostname @@ -2879,6 +2889,14 @@ lo_import 152801 Unique abbreviations are allowed. + + formfeed + + + Outputs formfeed after every result when using the + \watch command. + + aligned format is the standard, human-readable, nicely formatted text output; this is the default. diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 292cff5df9..2a1fd6a22d 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -2242,8 +2242,8 @@ exec_command_pset(PsqlScanState scan_state, bool active_branch) int i; static const char *const my_list[] = { "border", "columns", "csv_fieldsep", "expanded", "fieldsep", -"fieldsep_zero", "footer", "format", "linestyle", "null", -"numericlocale", "pager", "pager_min_lines", +"fieldsep_zero", "footer", "format", "formfeed", "linestyle", +"null", "numericlocale", "pager", "pager_min_lines", "recordsep", "recordsep_zero", "tableattr", "title", "tuples_only", "unicode_border_linestyle", @@ -4532,6 +4532,16 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) if (value) popt->topt.columns = atoi(value); } + + /* toggle output formfeed */ + else if (strcmp(param, "formfeed") == 0) + { + if (value) + return ParseVariableBool(value, param, >topt.formfeed); + else + popt->topt.formfeed = !popt->topt.formfeed; + } + else { pg_log_error("\\pset: unknown option: %s", param); @@ -4701,6 +4711,15 @@ printPsetInfo(const char *param, printQueryOpt *popt) printf(_("Tuples only is off.\n")); } + /* show toggle output formfeed */ + else if (strcmp(param, "formfeed") == 0) + { + if (popt->topt.formfeed) + printf(_("formfeed is on.\n")); + else + printf(_("formfeed is off.\n")); + } + /* Unicode style formatting */ else if (strcmp(param, "unicode_border_linestyle") == 0) { @@ -4895,6 +4914,8 @@ pset_value_string(const char *param, printQueryOpt *popt) return popt->title ? pset_quoted_string(popt->title) : pstrdup(""); else if (strcmp(param, "tuples_only") == 0) return pstrdup(pset_bool_string(popt->topt.tuples_only)); + else if (strcmp(param, "formfeed") == 0) + return pstrdup(pset_bool_string(popt->topt.formfeed)); else if (strcmp(param, "unicode_border_linestyle") == 0) return pstrdup(_unicode_linestyle2string(popt->topt.unicode_border_linestyle)); else if (strcmp(param, "unicode_column_linestyle") == 0) diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index d65b9a124f..63eff20c55 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -647,6 +647,8 @@ PSQLexecWatch(const char *query, const printQueryOpt *opt, FILE *printQueryFout) { case
Re: Optionally automatically disable logical replication subscriptions on error
On Thu, Feb 24, 2022 at 6:30 PM Masahiko Sawada wrote: > > On Thu, Feb 24, 2022 at 8:08 PM Amit Kapila wrote: > > > > On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada > > wrote: > > > > > > Here are some comments: > > > > > > Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()? > > > > > > > I have given this comment to move the related code to separate > > functions to slightly simplify ApplyWorkerMain() code but if you don't > > like we can move it back. I am not sure I like the new function names > > in the patch though. > > Okay, I'm fine with moving this code but perhaps we can find a better > function name as "Wrapper" seems slightly odd to me. > Agreed. > For example, > start_table_sync_start() and start_apply_changes() or something (it > seems we use the snake case for static functions in worker.c). > I am fine with something on these lines, how about start_table_sync() and start_apply() respectively? -- With Regards, Amit Kapila.
Re: Proposal: Support custom authentication methods using hooks
Hi, On 2022-02-24 17:02:45 -0800, Jeff Davis wrote: > On Thu, 2022-02-17 at 11:25 -0800, samay sharma wrote: > One caveat is that this only works given information available from > existing authentication methods, because that's all the client > supports. In practice, it seems to only be useful with plaintext > password authentication over an SSL connection. Why is it restricted to that? You could do sasl negotiation as well from what I can see? And that'd theoretically also allow to negotiate whether the client supports different ways of doing auth? Not saying that that's easy, but I don't think it's a fundamental restriction. I also can imagine things like using selinux labeling of connections. We have several useful authentication technologies built ontop of plaintext exchange. Radius, Ldap, Pam afaics could be implemented as an extension? Greetings, Andres Freund
Re: Buffer Manager and Contention
On Fri, 25 Feb 2022 at 01:20, Kyotaro Horiguchi wrote: > Yura Sokolov is proposing a patch to separte the two partition locks. > > https://www.postgresql.org/message-id/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel%40postgrespro.ru > > And it seems to me viable for me and a benchmarking in the thread > showed a good result. I'd appreciate your input on that thread. Certainly, I will stop this thread and contribute to Yura's thread. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: why do hash index builds use smgrextend() for new splitpoint pages
On Fri, Feb 25, 2022 at 8:54 AM Amit Kapila wrote: > > On Fri, Feb 25, 2022 at 4:41 AM Melanie Plageman > wrote: > > > > I'm trying to understand why hash indexes are built primarily in shared > > buffers except when allocating a new splitpoint's worth of bucket pages > > -- which is done with smgrextend() directly in _hash_alloc_buckets(). > > > > Is this just so that the value returned by smgrnblocks() includes the > > new splitpoint's worth of bucket pages? > > > > All writes of tuple data to pages in this new splitpoint will go > > through shared buffers (via hash_getnewbuf()). > > > > I asked this and got some thoughts from Robert in [1], but I still don't > > really get it. > > > > When a new page is needed during the hash index build, why can't > > _hash_expandtable() just call ReadBufferExtended() with P_NEW instead of > > _hash_getnewbuf()? Does it have to do with the BUCKET_TO_BLKNO mapping? > > > > We allocate the chunk of pages (power-of-2 groups) at the time of > split which allows them to appear consecutively in an index. > I think allocating chunks of pages via "ReadBufferExtended() with P_NEW" will be time-consuming as compared to what we do now. -- With Regards, Amit Kapila.
Re: Buffer Manager and Contention
At Fri, 25 Feb 2022 10:20:25 +0900 (JST), Kyotaro Horiguchi wrote in > (I added Yura, as the author of a related patch) > > At Thu, 24 Feb 2022 12:58:23 +, Simon Riggs > wrote in > > Thinking about poor performance in the case where the data fits in > > RAM, but the working set is too big for shared_buffers, I notice a > > couple of things that seem bad in BufMgr, but don't understand why > > they are like that. > > > > 1. If we need to allocate a buffer to a new block we do this in one > > step, while holding both partition locks for the old and the new tag. > > Surely it would cause less contention to make the old block/tag > > invalid (after flushing), drop the old partition lock and then switch > > to the new one? i.e. just hold one mapping partition lock at a time. > > Is there a specific reason we do it this way? > > I'm not sure but I guess the developer wanted to make the operation > atomic. > > Yura Sokolov is proposing a patch to separte the two partition locks. > > https://www.postgresql.org/message-id/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel%40postgrespro.ru > > And it seems to me viable for me and a benchmarking in the thread > showed a good result. I'd appreciate your input on that thread. > > > 2. Possibly connected to the above, we issue BufTableInsert() BEFORE > > we issue BufTableDelete(). That means we need extra entries in the > > buffer mapping hash table to allow us to hold both the old and the new > > at the same time, for a short period. The way dynahash.c works, we try > > Yes. > > > to allocate an entry from the freelist and if that doesn't work, we > > begin searching ALL the freelists for free entries to steal. So if we > > get enough people trying to do virtual I/O at the same time, then we > > will hit a "freelist storm" where everybody is chasing the last few > > entries. It would make more sense if we could do BufTableDelete() > > To reduce that overhead, Yura proposed a surgically modification on > dynahash, but I didn't like that and the latest patch doesn't contain > that part. > > > first, then hold onto the buffer mapping entry rather than add it to > > the freelist, so we can use it again when we do BufTableInsert() very > > shortly afterwards. That way we wouldn't need to search the freelist > > at all. What is the benefit or reason of doing the Delete after the > > Insert? > > Hmm. something like hash_swap_key() or hash_reinsert_entry()? That > sounds reasonable. (Yura's proposal was taking out an entry from hash > then attach it with a new key again.) > > > Put that another way, it looks like BufTable functions are used in a > > way that mismatches against the way dynahash is designed. > > > > Thoughts? > > On the first part, I think Yura's patch works. On the second point, > Yura already showed it gives a certain amount of gain if we do that. On second thought, even if we have a new dynahash API to atomically replace hash key, we need to hold two partition locks to do that. That is contradicting our objective. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset
On Thu, Feb 24, 2022 at 12:03 PM Michael Paquier wrote: > > On Wed, Feb 23, 2022 at 09:52:02AM +0530, Amit Kapila wrote: > > Thanks, so you are okay with me pushing that patch just to HEAD. > > Yes, I am fine with that. I am wondering about patching the second > function though, to avoid any risk of forgetting it, but I am fine to > leave that to your judgement. > The corresponding patch with other changes is not very far from being ready to commit. So, will do it along with that. > > We don't want to backpatch this to 14 as this is a catalog change and > > won't cause any user-visible issue, is that correct? > > Yup, that's a HEAD-only cleanup, I am afraid. > Thanks, Done! -- With Regards, Amit Kapila.
Re: why do hash index builds use smgrextend() for new splitpoint pages
On Fri, Feb 25, 2022 at 4:41 AM Melanie Plageman wrote: > > I'm trying to understand why hash indexes are built primarily in shared > buffers except when allocating a new splitpoint's worth of bucket pages > -- which is done with smgrextend() directly in _hash_alloc_buckets(). > > Is this just so that the value returned by smgrnblocks() includes the > new splitpoint's worth of bucket pages? > > All writes of tuple data to pages in this new splitpoint will go > through shared buffers (via hash_getnewbuf()). > > I asked this and got some thoughts from Robert in [1], but I still don't > really get it. > > When a new page is needed during the hash index build, why can't > _hash_expandtable() just call ReadBufferExtended() with P_NEW instead of > _hash_getnewbuf()? Does it have to do with the BUCKET_TO_BLKNO mapping? > We allocate the chunk of pages (power-of-2 groups) at the time of split which allows them to appear consecutively in an index. This helps us to compute the physical block number from bucket number easily (BUCKET_TO_BLKNO mapping) with some minimal control information. -- With Regards, Amit Kapila.
Re: Add id's to various elements in protocol.sgml
On Thu, Feb 24, 2022, 16:52 Kyotaro Horiguchi wrote: > At Tue, 21 Dec 2021 08:47:27 +0100, Brar Piening wrote in > > On 20.12.2021 at 16:09, Robert Haas wrote: > > > As a data point, this is something I have also wanted to do, from time > > > to time. I am generally of the opinion that any place the > > +1 from me. When I put an URL in the answer for inquiries, I always > look into the html for name/id tags so that the inquirer quickly find > the information source (or the backing or reference point) on the > page. +1 here as well. I often do the exact same thing. It's not hard, but it's a little tedious, especially considering most modern doc systems support linkable sections.
Re: Design of pg_stat_subscription_workers vs pgstats
Below are my review comments for the v3 patch. == 1. Commit message (An earlier review comment [Peter-v2] #2 was only partly fixed) "there are no longer any relation information" --> "there is no longer any relation information" ~~~ 2. doc/src/sgml/monitoring.sgml + pg_stat_subscription_activitypg_stat_subscription_activity + One row per subscription, showing statistics about subscription + activity. + See + pg_stat_subscription_activity for details. Currently these stats are only about errors. These are not really statistics about "activity" though. Probably it is better just to avoid that word altogether? SUGGESTIONS e.g.1. "One row per subscription, showing statistics about subscription activity." --> "One row per subscription, showing statistics about errors." e.g.2. "One row per subscription, showing statistics about subscription activity." --> "One row per subscription, showing statistics about that subscription." - [Peter-v2] https://www.postgresql.org/message-id/CAHut%2BPv%3DVmXtHmPKp4fg8VDF%2BTQP6xWgL91Jn-hrqg5QObfCZA%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
Hi, On February 24, 2022 5:44:57 PM PST, Tom Lane wrote: >Tomas Vondra writes: >> I wonder if we could check the index tuple length, and adjust the siglen >> based on that, somehow ... > >In an already-corrupted index, there won't be a unique answer. If we're breaking every ltree gist index, can we detect that it's not an index generated by the new version? Most people don't read release notes, and this is just about guaranteed to break all. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Proposal: Support custom authentication methods using hooks
Jeff Davis writes: > On Thu, 2022-02-17 at 11:25 -0800, samay sharma wrote: >> To enable this, I've proposed adding a new authentication method >> "custom" which can be specified in pg_hba.conf and takes a mandatory >> argument "provider" specifying which authentication provider to use. > One caveat is that this only works given information available from > existing authentication methods, because that's all the client > supports. In practice, it seems to only be useful with plaintext > password authentication over an SSL connection. ... and, since we can't readily enforce that the client only sends those cleartext passwords over suitably-encrypted connections, this could easily be a net negative for security. Not sure that I think it's a good idea. regards, tom lane
Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
Tomas Vondra writes: > I wonder if we could check the index tuple length, and adjust the siglen > based on that, somehow ... In an already-corrupted index, there won't be a unique answer. regards, tom lane
Re: Buffer Manager and Contention
(I added Yura, as the author of a related patch) At Thu, 24 Feb 2022 12:58:23 +, Simon Riggs wrote in > Thinking about poor performance in the case where the data fits in > RAM, but the working set is too big for shared_buffers, I notice a > couple of things that seem bad in BufMgr, but don't understand why > they are like that. > > 1. If we need to allocate a buffer to a new block we do this in one > step, while holding both partition locks for the old and the new tag. > Surely it would cause less contention to make the old block/tag > invalid (after flushing), drop the old partition lock and then switch > to the new one? i.e. just hold one mapping partition lock at a time. > Is there a specific reason we do it this way? I'm not sure but I guess the developer wanted to make the operation atomic. Yura Sokolov is proposing a patch to separte the two partition locks. https://www.postgresql.org/message-id/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel%40postgrespro.ru And it seems to me viable for me and a benchmarking in the thread showed a good result. I'd appreciate your input on that thread. > 2. Possibly connected to the above, we issue BufTableInsert() BEFORE > we issue BufTableDelete(). That means we need extra entries in the > buffer mapping hash table to allow us to hold both the old and the new > at the same time, for a short period. The way dynahash.c works, we try Yes. > to allocate an entry from the freelist and if that doesn't work, we > begin searching ALL the freelists for free entries to steal. So if we > get enough people trying to do virtual I/O at the same time, then we > will hit a "freelist storm" where everybody is chasing the last few > entries. It would make more sense if we could do BufTableDelete() To reduce that overhead, Yura proposed a surgically modification on dynahash, but I didn't like that and the latest patch doesn't contain that part. > first, then hold onto the buffer mapping entry rather than add it to > the freelist, so we can use it again when we do BufTableInsert() very > shortly afterwards. That way we wouldn't need to search the freelist > at all. What is the benefit or reason of doing the Delete after the > Insert? Hmm. something like hash_swap_key() or hash_reinsert_entry()? That sounds reasonable. (Yura's proposal was taking out an entry from hash then attach it with a new key again.) > Put that another way, it looks like BufTable functions are used in a > way that mismatches against the way dynahash is designed. > > Thoughts? On the first part, I think Yura's patch works. On the second point, Yura already showed it gives a certain amount of gain if we do that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: set TESTDIR from perl rather than Makefile
On Mon, Feb 21, 2022 at 07:00:54AM -0500, Andrew Dunstan wrote: > On 2/19/22 18:53, Justin Pryzby wrote: > > On Sat, Feb 19, 2022 at 05:41:49PM -0600, Justin Pryzby wrote: > >> I rebased and fixed the check-guc script to work, made it work with vpath > >> builds, and cleaned it up some. > > I also meant to also attach it. > > This is going to break a bunch of stuff as written. > > First, it's not doing the same thing. The current system sets TESTDIR to > be the parent of the directory that holds the test. e.g. for > src/bin/pg_ctl/t/001_start_stop.pl it's src/bin/pg_ctl in the build > tree, not the 't' subdirectory. This patch apparently sets it to the 't' > subdirectory. That will break anything that goes looking for log files > in the current location, like the buildfarm client, and possibly some CI > setups as well. Yes, thanks. I changed the patch to use ENV{CURDIR} || dirname(dirname($0)). If I'm not wrong, that seems to be doing the right thing. > Also, unless I'm mistaken it appears to to the wrong thing for vpath > builds: > > my $test_dir = File::Spec->rel2abs(dirname($0)); > > is completely wrong for vpaths, since that will place it in the source > tree, not the build tree. > > Last, and for no explained reason that I can see, the patch undoes > commit f4ce6c4d3a, but only for msvc builds. Even if that's justified it > appears to have no relevance to this patch. Andres' idea is that perl should set TESTDIR and PATH. Here I commented out PATH, and had the improbable issue that nothing seemed to be breaking, including the pipeline test under msvc. It'd be helpful to know what configuration that breaks so I can test that it's broken and then test that it's fixed when set from within perl... I got busy here, and may not be able to progress this for awhile. >From b46b405565a2fce3f96a1dcddf6d35ae7f3acc6d Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 19 Feb 2022 13:06:52 -0600 Subject: [PATCH] wip: set TESTDIR from src/test/perl rather than Makefile/vcregress These seem most likely to break: make check -C src/bin/psql make check -C src/bin/pgbench make check -C src/test/modules/test_misc make check -C src/test/modules/libpq_pipeline PROVE_TESTS=t/027_stream_regress.pl make check -C src/test/recovery --- src/Makefile.global.in | 9 ++--- src/test/perl/PostgreSQL/Test/Utils.pm | 17 ++--- src/tools/msvc/vcregress.pl| 4 +--- 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/src/Makefile.global.in b/src/Makefile.global.in index c980444233..92649d0193 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -451,8 +451,9 @@ define prove_installcheck rm -rf '$(CURDIR)'/tmp_check $(MKDIR_P) '$(CURDIR)'/tmp_check cd $(srcdir) && \ - TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \ + PATH="$(bindir):$(CURDIR):$$PATH" \ PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' \ + PG_SUBDIR='$(CURDIR)' \ PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \ $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl) endef @@ -461,8 +462,9 @@ define prove_installcheck rm -rf '$(CURDIR)'/tmp_check $(MKDIR_P) '$(CURDIR)'/tmp_check cd $(srcdir) && \ - TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \ + PATH="$(bindir):$(CURDIR):$$PATH" \ PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \ + PG_SUBDIR='$(CURDIR)' \ PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \ $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl) endef @@ -472,7 +474,8 @@ define prove_check rm -rf '$(CURDIR)'/tmp_check $(MKDIR_P) '$(CURDIR)'/tmp_check cd $(srcdir) && \ - TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \ + $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \ + PG_SUBDIR='$(CURDIR)' \ PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \ $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl) endef diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm index 46cd746796..74ccaa08d9 100644 --- a/src/test/perl/PostgreSQL/Test/Utils.pm +++ b/src/test/perl/PostgreSQL/Test/Utils.pm @@ -184,19 +184,22 @@ INIT # test may still fail, but it's more likely to report useful facts. $SIG{PIPE} = 'IGNORE'; - # Determine output directories, and create them. The base path is the - # TESTDIR environment variable, which is normally set by the invoking - # Makefile. - $tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check"; + my $test_dir = File::Spec->rel2abs($ENV{PG_SUBDIR} || dirname(dirname($0))); + + my $test_name = basename($0); + $test_name =~ s/\.[^.]+$//; + + # Determine output directories, and create them. + # TODO: set srcdir? + $tmp_check = "$test_dir/tmp_check"; $log_path = "$tmp_check/log"; + $ENV{TESTDIR} = $test_dir; mkdir $tmp_check; mkdir
Re: Proposal: Support custom authentication methods using hooks
On Thu, 2022-02-17 at 11:25 -0800, samay sharma wrote: > To enable this, I've proposed adding a new authentication method > "custom" which can be specified in pg_hba.conf and takes a mandatory > argument "provider" specifying which authentication provider to use. > I've also moved a couple static functions to headers so that > extensions can call them. > > Sample pg_hba.conf line to use a custom provider: > > hostall all ::1/128 > custom provider=test One caveat is that this only works given information available from existing authentication methods, because that's all the client supports. In practice, it seems to only be useful with plaintext password authentication over an SSL connection. I still like the approach though. There's a lot of useful stuff you can do at authentication time with only the connection information and a password. It could be useful to authenticate against different services, or some kind of attack detection, etc. Regards, Jeff Davis
Re: Add id's to various elements in protocol.sgml
On 02/24/22 19:52, Kyotaro Horiguchi wrote: > FWIW in that perspecive, there's no requirement from me that it should > be human-readable. I'm fine with automatically-generated ids. One thing I would be −many on, though, would be automatically-generated ids that are not, somehow, stable. I've been burned too many times making links to auto-generated anchors in other projects' docs that change every time they republish the wretched things. Regards, -Chap
Re: Add id's to various elements in protocol.sgml
At Tue, 21 Dec 2021 08:47:27 +0100, Brar Piening wrote in > On 20.12.2021 at 16:09, Robert Haas wrote: > > As a data point, this is something I have also wanted to do, from time > > to time. I am generally of the opinion that any place the +1 from me. When I put an URL in the answer for inquiries, I always look into the html for name/id tags so that the inquirer quickly find the information source (or the backing or reference point) on the page. If not found, I place a snippet instead. > > documentation has a long list of things, which should add ids, so that > > people can link to the particular thing in the list to which they want > > to draw someone's attention. > > > Thank you. > > If there is consensus on generally adding links to long lists I'd take > suggestions for other places where people think that this would make > sense and amend my patch. I don't think there is. I remember sometimes wanted ids on some sections(x.x) and items(x.x.x or lower) (or even clauses, ignoring costs:p) FWIW in that perspecive, there's no requirement from me that it should be human-readable. I'm fine with automatically-generated ids. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Proposal: Support custom authentication methods using hooks
Hi Aleksander, On Thu, Feb 24, 2022 at 1:17 AM Aleksander Alekseev < aleksan...@timescale.com> wrote: > Hi Samay, > > > I wanted to submit a patch to expose 2 new hooks (one for the > authentication check and another one for error reporting) in auth.c. These > will allow users to implement their own authentication methods for Postgres > or add custom logic around authentication. > > I like the idea - PostgreSQL is all about extendability. Also, well > done with TAP tests and an example extension. This being said, I > didn't look at the code yet, but cfbot seems to be happy with it: > http://cfbot.cputube.org/ Thank you! > > > > One constraint in the current implementation is that we allow only one > authentication provider to be loaded at a time. In the future, we can add > more functionality to maintain an array of hooks and call the appropriate > one based on the provider name in the pg_hba line. > > This sounds like a pretty severe and unnecessary limitation to me. Do > you think it would be difficult to bypass it in the first > implementation? > Just to clarify, the limitation is around usage of multiple custom providers but does allow users to use the existing methods with one custom authentication method. The reasons I thought this is ok to start with are: * It allows users to plugin a custom authentication method which they can't do today. * Users *generally* have an authentication method for their database (eg. we use ldap for authentication, or we use md5 passwords for authentication). While it is possible, I'm not sure how many users use completely different authentication methods (other than standard ones like password and trust) for different IPs/databases for their same instance. So, I thought this should be good enough for a number of use cases. I thought about adding support for multiple custom providers and the approach I came up with is: Maintaining a list of all providers (their names, check functions and error functions), making sure they are all valid while parsing pg_hba.conf and calling the right one when we see the custom keyword in pg_hba.conf based on name. I'm not sure (I haven't yet checked) if we have other such hooks in Postgres where multiple extensions use them and Postgres calls the right hook based on input (instead of just calling whoever has the hook). Therefore, I wanted to start with something simple so users can begin using auth methods of their choice, and add multiple providers as an enhancement after doing more research and coming up with the right way to implement it. That being said, any thoughts on the approach I mentioned above? I'll look into it and see if it's not too difficult to implement this. Regards, Samay > -- > Best regards, > Aleksander Alekseev >
why do hash index builds use smgrextend() for new splitpoint pages
I'm trying to understand why hash indexes are built primarily in shared buffers except when allocating a new splitpoint's worth of bucket pages -- which is done with smgrextend() directly in _hash_alloc_buckets(). Is this just so that the value returned by smgrnblocks() includes the new splitpoint's worth of bucket pages? All writes of tuple data to pages in this new splitpoint will go through shared buffers (via hash_getnewbuf()). I asked this and got some thoughts from Robert in [1], but I still don't really get it. When a new page is needed during the hash index build, why can't _hash_expandtable() just call ReadBufferExtended() with P_NEW instead of _hash_getnewbuf()? Does it have to do with the BUCKET_TO_BLKNO mapping? - Melanie [1] https://www.postgresql.org/message-id/CA%2BTgmoa%2BQFFhkHgPxyxv6t8aVU0r7GZmu7z8io4vGG7RHPpGzA%40mail.gmail.com
RE: Failed transaction statistics to measure the logical replication progress
On Wednesday, February 23, 2022 3:30 PM Amit Kapila > On Tue, Feb 22, 2022 at 6:45 AM tanghy.f...@fujitsu.com > wrote: > > > > I found a problem when using it. When a replication workers exits, the > > transaction stats should be sent to stats collector if they were not > > sent before because it didn't reach PGSTAT_STAT_INTERVAL. But I saw > > that the stats weren't updated as expected. > > > > I looked into it and found that the replication worker would send the > > transaction stats (if any) before it exits. But it got invalid subid > > in pgstat_send_subworker_xact_stats(), which led to the following result: > > > > postgres=# select pg_stat_get_subscription_worker(0, null); > > pg_stat_get_subscription_worker > > - > > (0,,2,0,00,"",) > > (1 row) > > > > I think that's because subid has already been cleaned when trying to > > send the stats. I printed the value of before_shmem_exit_list, the > > functions in this list would be called in shmem_exit() when the worker > > exits. > > logicalrep_worker_onexit() would clean up the worker info (including > > subid), and > > pgstat_shutdown_hook() would send stats if any. > > logicalrep_worker_onexit() was called before calling > pgstat_shutdown_hook(). > > > > Yeah, I think that is a problem and maybe we can think of solving it by > sending > the stats via logicalrep_worker_onexit before subid is cleared but not sure > that > is a good idea. I feel we need to go back to the idea of v21 for sending stats > instead of using pgstat_report_stat. > I think the one thing which we could improve is to avoid trying to send it > each > time before receiving each message by walrcv_receive and rather try to send it > before we try to wait (WaitLatchOrSocket). > Trying after each message doesn't seem to be required and could lead to some > overhead as well. What do you think? I agree. Fixed. Kindly have a look at v24 shared in [1]. [1] - https://www.postgresql.org/message-id/TYCPR01MB8373A3E1BE237BAF38185BF2ED3D9%40TYCPR01MB8373.jpnprd01.prod.outlook.com Best Regards, Takamichi Osumi
RE: Failed transaction statistics to measure the logical replication progress
On Thursday, February 24, 2022 11:07 AM Masahiko Sawada wrote: > I have some comments on v23 patch: > > @@ -66,6 +66,12 @@ typedef struct LogicalRepWorker > TimestampTz last_recv_time; > XLogRecPtr reply_lsn; > TimestampTz reply_time; > + > + /* > +* Transaction statistics of subscription worker > +*/ > + int64 commit_count; > + int64 abort_count; > } LogicalRepWorker; > > I think that adding these statistics to the struct whose data is allocated on > the > shared memory is not a good idea since they don't need to be shared. We might > want to add more statistics for subscriptions such as insert_count and > update_count in the future. I think it's better to track these statistics in > local > memory either in worker.c or pgstat.c. Fixed. > +/* -- > + * pgstat_report_subworker_xact_end() - > + * > + * Update the statistics of subscription worker and have > + * pgstat_report_stat send a message to stats collector > + * after count increment. > + * -- > + */ > +void > +pgstat_report_subworker_xact_end(bool is_commit) { > +if (is_commit) > +MyLogicalRepWorker->commit_count++; > +else > +MyLogicalRepWorker->abort_count++; > +} > > It's slightly odd and it seems unnecessary to me that we modify fields of > MyLogicalRepWorker in pgstat.c. Although this function has “report” > in its name but it just increments the counter. I think we can do that in > worker.c. Fixed. Also, I made the timing adjustment logic back and now have the independent one as Amit-san suggested in [1]. Kindly have a look at v24. [1] - https://www.postgresql.org/message-id/CAA4eK1LWYc15%3DASj1tMTEFsXtxu%3D02aGoMwq9YanUVr9-QMhdQ%40mail.gmail.com Best Regards, Takamichi Osumi v24-0001-Extend-pg_stat_subscription_workers-to-include-g.patch Description: v24-0001-Extend-pg_stat_subscription_workers-to-include-g.patch
Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
On 2/24/22 23:13, Tomas Vondra wrote: > > > On 2/24/22 23:06, Andres Freund wrote: >> Hi, >> >> On 2022-02-24 22:15:20 +0100, Tomas Vondra wrote: >>> After thinking about this I only see two options: >>> >>> 1) Don't apply the patch and tell everyone using ltree_gist they need to >>> rebuild the index after pg_upgrade from 12 to 13+. The downside of this >>> is larger indexes (because some tuples are 20B larger). >>> >>> 2) Apply the patch + tell those who already upgraded from 12 to rebuild >>> ltree_gist indexes, because those might be broken due to new inserts. >>> >>> >>> My opinion is to do (2), because at least those who'll upgrade later >>> (which is a lot of people) will be fine without a rebuild. And it also >>> makes the indexes a bit smaller, thanks to saving 20B. >> >> Won't 2) also break indexes created without a pg_upgrade? "already upgraded >> from 12" sounds like it wouldn't but I don't see why? >> > > It will, unfortunately - that's why I wrote "upgrade" in that sentence. > I should have been more explicit, sorry. But any new index tuples formed > after starting the 13+ cluster are/may be corrupted. > I wonder if we could check the index tuple length, and adjust the siglen based on that, somehow ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
On 2/24/22 23:06, Andres Freund wrote: > Hi, > > On 2022-02-24 22:15:20 +0100, Tomas Vondra wrote: >> After thinking about this I only see two options: >> >> 1) Don't apply the patch and tell everyone using ltree_gist they need to >> rebuild the index after pg_upgrade from 12 to 13+. The downside of this >> is larger indexes (because some tuples are 20B larger). >> >> 2) Apply the patch + tell those who already upgraded from 12 to rebuild >> ltree_gist indexes, because those might be broken due to new inserts. >> >> >> My opinion is to do (2), because at least those who'll upgrade later >> (which is a lot of people) will be fine without a rebuild. And it also >> makes the indexes a bit smaller, thanks to saving 20B. > > Won't 2) also break indexes created without a pg_upgrade? "already upgraded > from 12" sounds like it wouldn't but I don't see why? > It will, unfortunately - that's why I wrote "upgrade" in that sentence. I should have been more explicit, sorry. But any new index tuples formed after starting the 13+ cluster are/may be corrupted. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
Hi, On 2022-02-24 22:15:20 +0100, Tomas Vondra wrote: > After thinking about this I only see two options: > > 1) Don't apply the patch and tell everyone using ltree_gist they need to > rebuild the index after pg_upgrade from 12 to 13+. The downside of this > is larger indexes (because some tuples are 20B larger). > > 2) Apply the patch + tell those who already upgraded from 12 to rebuild > ltree_gist indexes, because those might be broken due to new inserts. > > > My opinion is to do (2), because at least those who'll upgrade later > (which is a lot of people) will be fine without a rebuild. And it also > makes the indexes a bit smaller, thanks to saving 20B. Won't 2) also break indexes created without a pg_upgrade? "already upgraded from 12" sounds like it wouldn't but I don't see why? Greetings, Andres Freund
Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
Tomas Vondra writes: > After thinking about this I only see two options: > 1) Don't apply the patch and tell everyone using ltree_gist they need to > rebuild the index after pg_upgrade from 12 to 13+. The downside of this > is larger indexes (because some tuples are 20B larger). > 2) Apply the patch + tell those who already upgraded from 12 to rebuild > ltree_gist indexes, because those might be broken due to new inserts. Yeah, let's fix it and advise people to reindex those indexes. Won't be the first time release notes have contained such advice. Plus, it sounds like we'd have to advise people to reindex *anyway*, just in a slightly different set of cases. regards, tom lane
Re: remove more archiving overhead
I tested this again with many more WAL files and a much larger machine (r5d.24xlarge with data directory on an NVMe SSD instance store volume). As before, I am using a bare-bones archive module that does nothing but return true. Without the patch, archiving ~120k WAL files took about 109 seconds. With the patch, it took around 62 seconds, which amounts to a ~43% reduction in overhead. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
real/float example for testlibpq3
Hi everyone, Would adding additional examples to testlibpq3.c on handling more data types be helpful? I've attached a patch adding how to handle a REAL to current example. Regards, Mark diff --git a/src/test/examples/testlibpq3.c b/src/test/examples/testlibpq3.c index 4f7b791388..0c1739fcbb 100644 --- a/src/test/examples/testlibpq3.c +++ b/src/test/examples/testlibpq3.c @@ -11,19 +11,21 @@ * CREATE SCHEMA testlibpq3; * SET search_path = testlibpq3; * SET standard_conforming_strings = ON; - * CREATE TABLE test1 (i int4, t text, b bytea); - * INSERT INTO test1 values (1, 'joe''s place', '\000\001\002\003\004'); - * INSERT INTO test1 values (2, 'ho there', '\004\003\002\001\000'); + * CREATE TABLE test1 (i int4, r real, t text, b bytea); + * INSERT INTO test1 values (1, 2.3, 'joe''s place', '\000\001\002\003\004'); + * INSERT INTO test1 values (2, 4.5 'ho there', '\004\003\002\001\000'); * * The expected output is: * * tuple 0: got * i = (4 bytes) 1 + * r = (4 bytes) 2.3 * t = (11 bytes) 'joe's place' * b = (5 bytes) \000\001\002\003\004 * * tuple 0: got * i = (4 bytes) 2 + * r = (4 bytes) 4.5 * t = (8 bytes) 'ho there' * b = (5 bytes) \004\003\002\001\000 */ @@ -62,32 +64,41 @@ show_binary_results(PGresult *res) int i, j; int i_fnum, + r_fnum, t_fnum, b_fnum; /* Use PQfnumber to avoid assumptions about field order in result */ i_fnum = PQfnumber(res, "i"); + r_fnum = PQfnumber(res, "r"); t_fnum = PQfnumber(res, "t"); b_fnum = PQfnumber(res, "b"); for (i = 0; i < PQntuples(res); i++) { char *iptr; + char *rptr; char *tptr; char *bptr; int blen; int ival; + union { + int ival; + float fval; + } rval; /* Get the field values (we ignore possibility they are null!) */ iptr = PQgetvalue(res, i, i_fnum); + rptr = PQgetvalue(res, i, r_fnum); tptr = PQgetvalue(res, i, t_fnum); bptr = PQgetvalue(res, i, b_fnum); /* -* The binary representation of INT4 is in network byte order, which -* we'd better coerce to the local byte order. +* The binary representation of INT4 and REAL are in network byte +* order, which we'd better coerce to the local byte order. */ ival = ntohl(*((uint32_t *) iptr)); + rval.ival = ntohl(*((uint32_t *) rptr)); /* * The binary representation of TEXT is, well, text, and since libpq @@ -102,6 +113,8 @@ show_binary_results(PGresult *res) printf("tuple %d: got\n", i); printf(" i = (%d bytes) %d\n", PQgetlength(res, i, i_fnum), ival); + printf(" r = (%d bytes) %f\n", + PQgetlength(res, i, r_fnum), rval.fval); printf(" t = (%d bytes) '%s'\n", PQgetlength(res, i, t_fnum), tptr); printf(" b = (%d bytes) ", blen); diff --git a/src/test/examples/testlibpq3.sql b/src/test/examples/testlibpq3.sql index 35a95ca347..e3a70cec27 100644 --- a/src/test/examples/testlibpq3.sql +++ b/src/test/examples/testlibpq3.sql @@ -1,6 +1,6 @@ CREATE SCHEMA testlibpq3; SET search_path = testlibpq3; SET standard_conforming_strings = ON; -CREATE TABLE test1 (i int4, t text, b bytea); -INSERT INTO test1 values (1, 'joe''s place', '\000\001\002\003\004'); -INSERT INTO test1 values (2, 'ho there', '\004\003\002\001\000'); +CREATE TABLE test1 (i int4, r real, t text, b bytea); +INSERT INTO test1 values (1, 2.3, 'joe''s place', '\000\001\002\003\004'); +INSERT INTO test1 values (2, 4.5, 'ho there', '\004\003\002\001\000');
ltree_gist indexes broken after pg_upgrade from 12 to 13
Hi, Victor Yegorov reported a crash related to a GiST index on ltree [1], following a pg_upgrade from 12.x to 14.x, with a data set reproducing this. I spent some time investigating this, and it seems this is a silly bug in commit commit 911e70207703799605f5a0e8aad9f06cff067c63 (HEAD) Author: Alexander Korotkov Date: Mon Mar 30 19:17:11 2020 +0300 Implement operator class parameters ... in PG13, which modified ltree_gist so that siglen is opclass parameter (and not hard-coded). But the procedures use LTREE_GET_ASIGLEN macro to extract the value, which either extracts the value from the catalog (if the index has opclass parameter) or uses a default value - but it always uses LTREE_ASIGLEN_DEFAULT, which belongs to _ltree_gist opclass (i.e. for array of ltree). And that's 28 instead of the 8, as it should be. Attached is a simple patch tweaking the macros, which resolves the issue. Maybe there should be a separate macro though, because "ASIGLEN" probably refers to "array siglen". But the bigger issue is this fixes indexes created before the upgrade, but it breaks indexes created after it. Because those were created with siglen 28B, so reverting to 8B breaks them. A bit of a symmetry :-/ I'm not sure what to do about this. There's no way to distinguish indexes, and I think an index may actually be broken both with and without the patch - imagine an index created before the upgrade (so using siglen 8), but then receiving a couple new rows (siglen 28). After thinking about this I only see two options: 1) Don't apply the patch and tell everyone using ltree_gist they need to rebuild the index after pg_upgrade from 12 to 13+. The downside of this is larger indexes (because some tuples are 20B larger). 2) Apply the patch + tell those who already upgraded from 12 to rebuild ltree_gist indexes, because those might be broken due to new inserts. My opinion is to do (2), because at least those who'll upgrade later (which is a lot of people) will be fine without a rebuild. And it also makes the indexes a bit smaller, thanks to saving 20B. regards [1] https://www.postgresql.org/message-id/17406-71e02820ae79bb40%40postgresql.org -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom c6c5471aebf49da96e47deb5286a6f7065068bd7 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Thu, 24 Feb 2022 19:22:32 +0100 Subject: [PATCH] ltree gist siglen fix --- contrib/ltree/_ltree_gist.c | 12 ++-- contrib/ltree/ltree.h | 4 ++-- contrib/ltree/ltree_gist.c | 10 +- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/contrib/ltree/_ltree_gist.c b/contrib/ltree/_ltree_gist.c index 72516c3b6b9..1f361416ac9 100644 --- a/contrib/ltree/_ltree_gist.c +++ b/contrib/ltree/_ltree_gist.c @@ -49,7 +49,7 @@ _ltree_compress(PG_FUNCTION_ARGS) { GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); GISTENTRY *retval = entry; - int siglen = LTREE_GET_ASIGLEN(); + int siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT); if (entry->leafkey) { /* ltree */ @@ -108,7 +108,7 @@ _ltree_same(PG_FUNCTION_ARGS) ltree_gist *a = (ltree_gist *) PG_GETARG_POINTER(0); ltree_gist *b = (ltree_gist *) PG_GETARG_POINTER(1); bool *result = (bool *) PG_GETARG_POINTER(2); - int siglen = LTREE_GET_ASIGLEN(); + int siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT); if (LTG_ISALLTRUE(a) && LTG_ISALLTRUE(b)) *result = true; @@ -154,7 +154,7 @@ _ltree_union(PG_FUNCTION_ARGS) { GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0); int *size = (int *) PG_GETARG_POINTER(1); - int siglen = LTREE_GET_ASIGLEN(); + int siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT); int32 i; ltree_gist *result = ltree_gist_alloc(false, NULL, siglen, NULL, NULL); BITVECP base = LTG_SIGN(result); @@ -219,7 +219,7 @@ _ltree_penalty(PG_FUNCTION_ARGS) ltree_gist *origval = (ltree_gist *) DatumGetPointer(((GISTENTRY *) PG_GETARG_POINTER(0))->key); ltree_gist *newval = (ltree_gist *) DatumGetPointer(((GISTENTRY *) PG_GETARG_POINTER(1))->key); float *penalty = (float *) PG_GETARG_POINTER(2); - int siglen = LTREE_GET_ASIGLEN(); + int siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT); *penalty = hemdist(origval, newval, siglen); PG_RETURN_POINTER(penalty); @@ -242,7 +242,7 @@ _ltree_picksplit(PG_FUNCTION_ARGS) { GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0); GIST_SPLITVEC *v = (GIST_SPLITVEC *) PG_GETARG_POINTER(1); - int siglen = LTREE_GET_ASIGLEN(); + int siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT); OffsetNumber k, j; ltree_gist *datum_l, @@ -508,7 +508,7 @@ _ltree_consistent(PG_FUNCTION_ARGS) /* Oid subtype = PG_GETARG_OID(3); */ bool *recheck = (bool *) PG_GETARG_POINTER(4); - int siglen = LTREE_GET_ASIGLEN(); + int siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT); ltree_gist *key = (ltree_gist *) DatumGetPointer(entry->key); bool
Re: [PATCH] Expose port->authn_id to extensions and triggers
On Fri, 2022-02-25 at 01:15 +0800, Julien Rouhaud wrote: > On Thu, Feb 24, 2022 at 04:50:59PM +, Jacob Champion wrote: > > On Thu, 2022-02-24 at 20:39 +0900, Michael Paquier wrote: > > > I don't quite see the additional value that this API brings as > > > MyProcPort is directly accessible, and contrib modules like > > > postgres_fdw and sslinfo just use that to find the data of the current > > > backend. > > > > Right -- I just didn't know if third-party modules were actually able > > to rely on the internal layout of struct Port. Is that guaranteed to > > remain constant for a major release line? If so, this new API is > > superfluous. > > Yes, third-party can rely on Port layout. We don't break ABI between minor > release. In some occasions we can add additional fields at the end of a > struct, but nothing more. That simplifies things. PFA a smaller v2; if pgaudit can't make use of libpq-be.h for some reason, then I guess we can tailor a fix to that use case. > > > I could still see a use case for that at a more global level with > > > beentrys, but it looked like there was not much interest the last time > > > I dropped this idea. > > > > I agree that this would be useful to have in the stats. From my outside > > perspective, it seems like it's difficult to get strings of arbitrary > > length in there; is that accurate? > > Yes, as it's all in shared memory. The only workaround is using something > like > track_activity_query_size, but it's not great. Yeah... I was following a similar track with the initial work last year, but I dropped it when the cost of implementation started to grow considerably. At the time, though, it looked like some overhauls to the stats framework were being pursued? I should read up on that thread. --Jacob From 92679a2109be5ba4d81bf58e8fb091c2d0020828 Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Mon, 14 Feb 2022 08:10:53 -0800 Subject: [PATCH v2] Add API to retrieve authn_id from SQL The authn_id field in MyProcPort is currently only accessible to the backend itself. Add a SQL function, session_authn_id(), to expose the field to triggers that may want to make use of it. --- src/backend/utils/adt/name.c | 12 +++- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_proc.dat | 3 +++ src/test/authentication/t/001_password.pl | 11 +++ 4 files changed, 26 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c index e8bba3670c..ff0e2829bb 100644 --- a/src/backend/utils/adt/name.c +++ b/src/backend/utils/adt/name.c @@ -23,6 +23,7 @@ #include "catalog/namespace.h" #include "catalog/pg_collation.h" #include "catalog/pg_type.h" +#include "libpq/libpq-be.h" #include "libpq/pqformat.h" #include "mb/pg_wchar.h" #include "miscadmin.h" @@ -257,7 +258,7 @@ namestrcmp(Name name, const char *str) /* - * SQL-functions CURRENT_USER, SESSION_USER + * SQL-functions CURRENT_USER, SESSION_USER, SESSION_AUTHN_ID */ Datum current_user(PG_FUNCTION_ARGS) @@ -271,6 +272,15 @@ session_user(PG_FUNCTION_ARGS) PG_RETURN_DATUM(DirectFunctionCall1(namein, CStringGetDatum(GetUserNameFromId(GetSessionUserId(), false; } +Datum +session_authn_id(PG_FUNCTION_ARGS) +{ + if (!MyProcPort || !MyProcPort->authn_id) + PG_RETURN_NULL(); + + PG_RETURN_CSTRING(MyProcPort->authn_id); +} + /* * SQL-functions CURRENT_SCHEMA, CURRENT_SCHEMAS diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h index 1addb568ef..ca52e6889c 100644 --- a/src/include/catalog/catversion.h +++ b/src/include/catalog/catversion.h @@ -53,6 +53,6 @@ */ /* mmddN */ -#define CATALOG_VERSION_NO 202202221 +#define CATALOG_VERSION_NO 202202231 #endif diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 7f1ee97f55..b76d357bee 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -1508,6 +1508,9 @@ { oid => '746', descr => 'session user name', proname => 'session_user', provolatile => 's', prorettype => 'name', proargtypes => '', prosrc => 'session_user' }, +{ oid => '9774', descr => 'session authenticated identity', + proname => 'session_authn_id', provolatile => 's', prorettype => 'cstring', + proargtypes => '', prosrc => 'session_authn_id' }, { oid => '744', proname => 'array_eq', prorettype => 'bool', diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl index 3e3079c824..2aa28ed547 100644 --- a/src/test/authentication/t/001_password.pl +++ b/src/test/authentication/t/001_password.pl @@ -82,6 +82,11 @@ test_role($node, 'scram_role', 'trust', 0, test_role($node, 'md5_role', 'trust', 0, log_unlike => [qr/connection authenticated:/]); +my $res = $node->safe_psql('postgres', + "SELECT session_authn_id() IS NULL;" +); +is($res, 't', "users with trust authentication have NULL authn_id"); + # For plain "password" method, all
Re: Logical insert/update/delete WAL records for custom table AMs
On Mon, 17 Jan 2022 at 09:05, Jeff Davis wrote: > > On Wed, 2022-01-05 at 20:19 +, Simon Riggs wrote: > > I spoke with Jeff in detail about this patch in NYC Dec 21, and I now > > think it is worth pursuing. It seems much more likely that this would > > be acceptable than fully extensible rmgr. > > Thank you. I had some conversations with others who felt this approach > is wasteful, which it is. But if this patch still has potential, I'm > happy to pursue it along with the extensible rmgr approach. It's self-contained and generally useful for a range of applications, so the code would be long-lived. Calling it wasteful presumes the way you'd use it. If you make logical log entries you don't need to make physical ones, so its actually the physical log entries that would be wasteful. Logical log entries don't need to be decoded, so it's actually more efficient, plus we could skip index entries altogether. I would argue that this is the way we should be doing WAL, with occasional divergence to physical records for performance, rather than the other way around. > > So I'm signing up as a reviewer and we'll see if this is good to go. > > Great, I attached a rebased version. The approach is perfectly fine, it just needs to be finished and rebased. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: reallocing without oom check in pg_regress
> On 23 Feb 2022, at 23:05, Tom Lane wrote: > > Daniel Gustafsson writes: >> In pg_regress we realloc() with the destination and source pointer being >> equal, >> without checking for OOM. While a fairly unlikely source of errors, is >> there a >> reason not to use pg_realloc() there for hygiene? > > Yeah, looks like oversight to me. Thanks for confirming, I've pushed this now after taking it for a spin on the CI just in case. -- Daniel Gustafsson https://vmware.com/
Re: document that brin's autosummarize parameter is off by default
Ten months ago, Jaime Casanova wrote: > Hi everyone, > > Just noted that the default value of autosummarize reloption for brin > indexes is not documented, or at least not well documented. > > I added the default value in create_index.sgml where other options > mention their own defaults, also made a little change in brin.sgml to > make it more clear that is disabled by default (at least the way it > was written made no sense for me, but it could be that my english is > not that good). It seems like "This last trigger" in the current text is intended to mean "The second condition". Your change improves that. Should we also consider enabling autosummarize by default ? It was added in v10, after BRIN was added in v9.5. For us, brin wasn't usable without autosummarize. Also, note that vacuums are now triggered by insertions, since v13, so it might be that autosummarize is needed much less. -- Justin PS. I hope there's a faster response to your pg_upgrade patch.
Re: fix crash with Python 3.11
I wrote: > * I'm not satisfied with using static storage for > SaveTransactionCharacteristics/RestoreTransactionCharacteristics. Looking closer at this, I was not too amused to discover that of the three existing SaveTransactionCharacteristics calls, two already conflict with each other: _SPI_commit calls SaveTransactionCharacteristics and then calls CommitTransactionCommand, which again calls SaveTransactionCharacteristics, overwriting the static storage. I *think* there's no live bug there, because the state probably wouldn't have changed in between; but considering we run arbitrary user-defined code between those two points I sure wouldn't bet on it. 0001 attached is the same code patch as before, but now with spi.sgml updates; 0002 changes the API for Save/RestoreTransactionCharacteristics. If anyone's really worried about backpatching 0002, we could perhaps get away with doing that only in HEAD. I found in 0002 that I had to make CommitTransactionCommand call SaveTransactionCharacteristics unconditionally, else I got warnings about possibly-uninitialized local variables. It's cheap enough that I'm not too fussed about that. I don't get any warnings from the similar code in spi.c, probably because those functions can't be inlined there. regards, tom lane diff --git a/doc/src/sgml/spi.sgml b/doc/src/sgml/spi.sgml index d710e2d0df..7581661fc4 100644 --- a/doc/src/sgml/spi.sgml +++ b/doc/src/sgml/spi.sgml @@ -99,10 +99,9 @@ int SPI_connect_ext(int options) Sets the SPI connection to be nonatomic, which - means that transaction control calls SPI_commit, - SPI_rollback, and - SPI_start_transaction are allowed. Otherwise, - calling these functions will result in an immediate error. + means that transaction control calls (SPI_commit, + SPI_rollback) are allowed. Otherwise, + calling those functions will result in an immediate error. @@ -5040,15 +5039,17 @@ void SPI_commit_and_chain(void) SPI_commit commits the current transaction. It is approximately equivalent to running the SQL - command COMMIT. After a transaction is committed, a new - transaction has to be started - using SPI_start_transaction before further database - actions can be executed. + command COMMIT. After the transaction is committed, a + new transaction is automatically started using default transaction + characteristics, so that the caller can continue using SPI facilities. + If there is a failure during commit, the current transaction is instead + rolled back and a new transaction is started, after which the error is + thrown in the usual way. - SPI_commit_and_chain is the same, but a new - transaction is immediately started with the same transaction + SPI_commit_and_chain is the same, but the new + transaction is started with the same transaction characteristics as the just finished one, like with the SQL command COMMIT AND CHAIN. @@ -5093,14 +5094,13 @@ void SPI_rollback_and_chain(void) SPI_rollback rolls back the current transaction. It is approximately equivalent to running the SQL - command ROLLBACK. After a transaction is rolled back, a - new transaction has to be started - using SPI_start_transaction before further database - actions can be executed. + command ROLLBACK. After the transaction is rolled back, + a new transaction is automatically started using default transaction + characteristics, so that the caller can continue using SPI facilities. - SPI_rollback_and_chain is the same, but a new - transaction is immediately started with the same transaction + SPI_rollback_and_chain is the same, but the new + transaction is started with the same transaction characteristics as the just finished one, like with the SQL command ROLLBACK AND CHAIN. @@ -5124,7 +5124,7 @@ void SPI_rollback_and_chain(void) SPI_start_transaction - start a new transaction + obsolete function @@ -5137,17 +5137,12 @@ void SPI_start_transaction(void) Description - SPI_start_transaction starts a new transaction. It - can only be called after SPI_commit - or SPI_rollback, as there is no transaction active at - that point. Normally, when an SPI-using procedure is called, there is already a - transaction active, so attempting to start another one before closing out - the current one will result in an error. - - - - This function can only be executed if the SPI connection has been set as - nonatomic in the call to SPI_connect_ext. + SPI_start_transaction does nothing, and exists + only for code compatibility with + earlier PostgreSQL releases. It used to + be required after calling SPI_commit + or SPI_rollback, but now those functions start + a new transaction automatically. diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index
Re: [PATCH] add relation and block-level filtering to pg_waldump
Added to commitfest as: https://commitfest.postgresql.org/37/3565/
Re: [PATCH] add relation and block-level filtering to pg_waldump
On Thu, Feb 24, 2022 at 11:06 AM David Christensen wrote: > This patch adds the ability to specify a RelFileNode and optional BlockNum to > limit output of > pg_waldump records to only those which match the given criteria. This should > be more performant > than `pg_waldump | grep` as well as more reliable given specific variations > in output style > depending on how the blocks are specified. Sounds useful to me. -- Peter Geoghegan
[PATCH] add relation and block-level filtering to pg_waldump
Greetings, This patch adds the ability to specify a RelFileNode and optional BlockNum to limit output of pg_waldump records to only those which match the given criteria. This should be more performant than `pg_waldump | grep` as well as more reliable given specific variations in output style depending on how the blocks are specified. This currently affects only the main fork, but we could presumably add the option to filter by fork as well, if that is considered useful. Best, David >From 9194b2cb07172e636030b9b4e979b7f2caf7cbc0 Mon Sep 17 00:00:00 2001 From: David Christensen Date: Thu, 24 Feb 2022 11:00:46 -0600 Subject: [PATCH] Add relation/block filtering to pg_waldump This feature allows you to only output records that are targeting a specific RelFileNode and optional BlockNumber within this relation. Currently only applies this filter to the relation's main fork. --- doc/src/sgml/ref/pg_waldump.sgml | 23 ++ src/bin/pg_waldump/pg_waldump.c | 74 +++- 2 files changed, 96 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index 5735a161ce..c953703bc8 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -100,6 +100,29 @@ PostgreSQL documentation + + -k block + --block=block + + +Display only records touching the given block. (Requires also +providing the relation via --relation.) + + + + + + -l tbl/db/rel + --relation=tbl/db/rel + + +Display only records touching the given relation. The relation is +specified via tablespace oid, database oid, and relfilenode separated +by slashes. + + + + -n limit --limit=limit diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index a6251e1a96..faae547a5c 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -55,6 +55,10 @@ typedef struct XLogDumpConfig bool filter_by_rmgr_enabled; TransactionId filter_by_xid; bool filter_by_xid_enabled; + RelFileNode filter_by_relation; + bool filter_by_relation_enabled; + BlockNumber filter_by_relation_block; + bool filter_by_relation_block_enabled; } XLogDumpConfig; typedef struct Stats @@ -394,6 +398,34 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, return count; } +/* + * Boolean to return whether the given WAL record matches a specific relation and optional block + */ +static bool +XLogRecordMatchesRelationBlock(XLogReaderState *record, RelFileNode matchRnode, BlockNumber matchBlock) +{ + RelFileNode rnode; + ForkNumber forknum; + BlockNumber blk; + int block_id; + + for (block_id = 0; block_id <= record->max_block_id; block_id++) + { + if (!XLogRecHasBlockRef(record, block_id)) + continue; + + XLogRecGetBlockTag(record, block_id, , , ); + + if (forknum == MAIN_FORKNUM && + matchRnode.spcNode == rnode.spcNode && + matchRnode.dbNode == rnode.dbNode && + matchRnode.relNode == rnode.relNode && + (matchBlock == InvalidBlockNumber || matchBlock == blk)) + return true; + } + return false; +} + /* * Calculate the size of a record, split into !FPI and FPI parts. */ @@ -767,6 +799,8 @@ usage(void) printf(_(" -b, --bkp-details output detailed information about backup blocks\n")); printf(_(" -e, --end=RECPTR stop reading at WAL location RECPTR\n")); printf(_(" -f, --follow keep retrying after reaching end of WAL\n")); + printf(_(" -k, --block=N only show records matching a given relation block (requires -l)\n")); + printf(_(" -l, --relation=N/N/N only show records that touch a specific relation\n")); printf(_(" -n, --limit=N number of records to display\n")); printf(_(" -p, --path=PATHdirectory in which to find log segment files or a\n" " directory with a ./pg_wal that contains such files\n" @@ -802,12 +836,14 @@ main(int argc, char **argv) static struct option long_options[] = { {"bkp-details", no_argument, NULL, 'b'}, + {"block", required_argument, NULL, 'k'}, {"end", required_argument, NULL, 'e'}, {"follow", no_argument, NULL, 'f'}, {"help", no_argument, NULL, '?'}, {"limit", required_argument, NULL, 'n'}, {"path", required_argument, NULL, 'p'}, {"quiet", no_argument, NULL, 'q'}, + {"relation", required_argument, NULL, 'l'}, {"rmgr", required_argument, NULL, 'r'}, {"start", required_argument, NULL, 's'}, {"timeline", required_argument, NULL, 't'}, @@ -860,6 +896,8 @@ main(int argc, char **argv) config.filter_by_rmgr_enabled = false; config.filter_by_xid = InvalidTransactionId; config.filter_by_xid_enabled = false; + config.filter_by_relation_enabled = false; + config.filter_by_relation_block_enabled = false; config.stats = false; config.stats_per_record =
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
> I think the change to ImmediateCheckpointRequested() makes no sense. > Before this patch, that function merely inquires whether there's an > immediate checkpoint queued. After this patch, it ... changes a > progress-reporting flag? I think it would make more sense to make the > progress-report flag change in whatever is the place that *requests* an > immediate checkpoint rather than here. > > > diff --git a/src/backend/postmaster/checkpointer.c > > b/src/backend/postmaster/checkpointer.c > > +ImmediateCheckpointRequested(int flags) > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > > +{ > > +updated_flags |= CHECKPOINT_IMMEDIATE; > > I don't think that these changes are expected behaviour. Under in this > condition; the currently running checkpoint is still not 'immediate', > but it is going to hurry up for a new, actually immediate checkpoint. > Those are different kinds of checkpoint handling; and I don't think > you should modify the reported flags to show that we're going to do > stuff faster than usual. Maybe maintiain a seperate 'upcoming > checkpoint flags' field instead? Thank you Alvaro and Matthias for your views. I understand your point of not updating the progress-report flag here as it just checks whether the CHECKPOINT_IMMEDIATE is set or not and takes an action based on that but it doesn't change the checkpoint flags. I will modify the code but I am a bit confused here. As per Alvaro, we need to make the progress-report flag change in whatever is the place that *requests* an immediate checkpoint. I feel this gives information about the upcoming checkpoint not the current one. So updating here provides wrong details in the view. The flags available during CreateCheckPoint() will remain same for the entire checkpoint operation and we should show the same information in the view till it completes. So just removing the above piece of code (modified in ImmediateCheckpointRequested()) in the patch will make it correct. My opinion about maintaining a separate field to show upcoming checkpoint flags is it makes the view complex. Please share your thoughts. Thanks & Regards, On Thu, Feb 24, 2022 at 10:45 PM Matthias van de Meent wrote: > > On Wed, 23 Feb 2022 at 14:28, Nitin Jadhav > wrote: > > > > Sharing the v2 patch. Kindly have a look and share your comments. > > Thanks for updating. > > > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml > > With the new pg_stat_progress_checkpoint, you should also add a > backreference to this progress reporting in the CHECKPOINT sql command > documentation located in checkpoint.sgml, and maybe in wal.sgml and/or > backup.sgml too. See e.g. cluster.sgml around line 195 for an example. > > > diff --git a/src/backend/postmaster/checkpointer.c > > b/src/backend/postmaster/checkpointer.c > > +ImmediateCheckpointRequested(int flags) > > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > > +{ > > +updated_flags |= CHECKPOINT_IMMEDIATE; > > I don't think that these changes are expected behaviour. Under in this > condition; the currently running checkpoint is still not 'immediate', > but it is going to hurry up for a new, actually immediate checkpoint. > Those are different kinds of checkpoint handling; and I don't think > you should modify the reported flags to show that we're going to do > stuff faster than usual. Maybe maintiain a seperate 'upcoming > checkpoint flags' field instead? > > > diff --git a/src/backend/catalog/system_views.sql > > b/src/backend/catalog/system_views.sql > > +( SELECT '0/0'::pg_lsn + > > + ((CASE > > + WHEN stat.lsn_int64 < 0 THEN pow(2::numeric, > > 64::numeric)::numeric > > + ELSE 0::numeric > > + END) + > > + stat.lsn_int64::numeric) > > + FROM (SELECT s.param3::bigint) AS stat(lsn_int64) > > +) AS start_lsn, > > My LSN select statement was an example that could be run directly in > psql; the so you didn't have to embed the SELECT into the view query. > The following should be sufficient (and save the planner a few cycles > otherwise spent in inlining): > > +('0/0'::pg_lsn + > + ((CASE > + WHEN s.param3 < 0 THEN pow(2::numeric, > 64::numeric)::numeric > + ELSE 0::numeric > + END) + > + s.param3::numeric) > +) AS start_lsn, > > > > diff --git a/src/backend/access/transam/xlog.c > > b/src/backend/access/transam/xlog.c > > +checkpoint_progress_start(int flags) > > [...] > > +checkpoint_progress_update_param(int index, int64 val) > > [...] > > +checkpoint_progress_end(void) > > +{ > > +/* In bootstrap mode, we don't actually record anything. */ > > +if (IsBootstrapProcessingMode()) > > +return; > > Disabling pgstat progress reporting when in bootstrap processing mode > / startup/end-of-recovery makes very little sense (see upthread) and >
Re: JSON path decimal literal syntax
On 18.02.22 11:17, Peter Eisentraut wrote: I noticed that the JSON path lexer does not support the decimal literal syntax forms .1 1. (that is, there are no digits before or after the decimal point). This is allowed by the relevant ECMAScript standard (https://262.ecma-international.org/5.1/#sec-7.8.3) and of course SQL allows it as well. Is there a reason for this? I didn't find any code comments or documentation about this. It has come to my attention that there are syntactic differences between JavaScript, which is what JSON path is built on, and JSON itself. Presumably, the JSON path lexer was originally built with the JSON syntax in mind. Attached is an updated patch that implements the JavaScript-based JSON path numeric literal syntax more correctly. Besides the above mentioned syntax forms, it now also rejects trailing junk after numeric literals more correctly, similar to how the main SQL lexer does it.From 13e4a16e7b06c6109b8c975ea5d1b57f7dfbe8b9 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 24 Feb 2022 17:56:47 +0100 Subject: [PATCH v2] Make JSON path numeric literals more correct Per ECMAScript standard (ECMA-262, referenced by SQL standard), the syntax forms .1 1. should be allowed for decimal numeric literals, but the existing implementation rejected them. Also, by the same standard, reject trailing junk after numeric literals. Note that the ECMAScript standard for numeric literals is in respects like these slightly different from the JSON standard, which might be the original cause for this discrepancy. --- src/backend/utils/adt/jsonpath_scan.l | 24 ++- src/test/regress/expected/jsonpath.out | 200 - src/test/regress/sql/jsonpath.sql | 7 + 3 files changed, 149 insertions(+), 82 deletions(-) diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l index 827a9e44cb..1f08e7c51f 100644 --- a/src/backend/utils/adt/jsonpath_scan.l +++ b/src/backend/utils/adt/jsonpath_scan.l @@ -82,11 +82,13 @@ other [^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\" \t\n\r\f] digit [0-9] integer(0|[1-9]{digit}*) -decimal{integer}\.{digit}+ -decimalfail{integer}\. +decimal({integer}\.{digit}*|\.{digit}+) real ({integer}|{decimal})[Ee][-+]?{digit}+ -realfail1 ({integer}|{decimal})[Ee] -realfail2 ({integer}|{decimal})[Ee][-+] +realfail ({integer}|{decimal})[Ee][-+] + +integer_junk {integer}{other} +decimal_junk {decimal}{other} +real_junk {real}{other} hex_dig[0-9A-Fa-f] unicode\\u({hex_dig}{4}|\{{hex_dig}{1,6}\}) @@ -242,16 +244,10 @@ hex_fail \\x{hex_dig}{0,1} return INT_P; } -{decimalfail} { - /* throw back the ., and treat as integer */ - yyless(yyleng - 1); - addstring(true, yytext, yyleng); - addchar(false, '\0'); - yylval->str = scanstring; - return INT_P; - } - -({realfail1}|{realfail2}) { yyerror(NULL, "invalid floating point number"); } +{realfail} { yyerror(NULL, "invalid numeric literal"); } +{integer_junk} { yyerror(NULL, "trailing junk after numeric literal"); } +{decimal_junk} { yyerror(NULL, "trailing junk after numeric literal"); } +{real_junk}{ yyerror(NULL, "trailing junk after numeric literal"); } \" { addchar(true, '\0'); diff --git a/src/test/regress/expected/jsonpath.out b/src/test/regress/expected/jsonpath.out index e399fa9631..66c5160978 100644 --- a/src/test/regress/expected/jsonpath.out +++ b/src/test/regress/expected/jsonpath.out @@ -354,11 +354,9 @@ select 'null.type()'::jsonpath; (1 row) select '1.type()'::jsonpath; - jsonpath --- - 1.type() -(1 row) - +ERROR: trailing junk after numeric literal at or near "1.t" of jsonpath input +LINE 1: select '1.type()'::jsonpath; + ^ select '(1).type()'::jsonpath; jsonpath -- @@ -569,17 +567,23 @@ select '$ ? (@.a < +1)'::jsonpath; (1 row) select '$ ? (@.a < .1)'::jsonpath; -ERROR:
Re: fix crash with Python 3.11
I wrote: > * It might be a good idea to add parallel test cases for the other PLs. As I suspected, plperl and pltcl show exactly the same problems when trapping a failure at commit, reinforcing my opinion that this is a SPI bug that needs to be fixed in SPI. (plpgsql is not subject to this problem, because its only mechanism for trapping errors is a BEGIN block, ie a subtransaction, so it won't get to the interesting part.) They do have logic to catch the thrown error, though, so no additional fix is needed in either once we fix the core code. v2 attached adds those test cases. regards, tom lane diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index c93f90de9b..7971050746 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -156,7 +156,8 @@ SPI_connect_ext(int options) * XXX It could be better to use PortalContext as the parent context in * all cases, but we may not be inside a portal (consider deferred-trigger * execution). Perhaps CurTransactionContext could be an option? For now - * it doesn't matter because we clean up explicitly in AtEOSubXact_SPI(). + * it doesn't matter because we clean up explicitly in AtEOSubXact_SPI(); + * but see also AtEOXact_SPI(). */ _SPI_current->procCxt = AllocSetContextCreate(_SPI_current->atomic ? TopTransactionContext : PortalContext, "SPI Proc", @@ -214,13 +215,13 @@ SPI_finish(void) return SPI_OK_FINISH; } +/* + * SPI_start_transaction is a no-op, kept for backwards compatibility. + * SPI callers are *always* inside a transaction. + */ void SPI_start_transaction(void) { - MemoryContext oldcontext = CurrentMemoryContext; - - StartTransactionCommand(); - MemoryContextSwitchTo(oldcontext); } static void @@ -228,6 +229,12 @@ _SPI_commit(bool chain) { MemoryContext oldcontext = CurrentMemoryContext; + /* + * Complain if we are in a context that doesn't permit transaction + * termination. (Note: here and _SPI_rollback should be the only places + * that throw ERRCODE_INVALID_TRANSACTION_TERMINATION, so that callers can + * test for that with security that they know what happened.) + */ if (_SPI_current->atomic) ereport(ERROR, (errcode(ERRCODE_INVALID_TRANSACTION_TERMINATION), @@ -240,40 +247,74 @@ _SPI_commit(bool chain) * top-level transaction in such a block violates that idea. A future PL * implementation might have different ideas about this, in which case * this restriction would have to be refined or the check possibly be - * moved out of SPI into the PLs. + * moved out of SPI into the PLs. Note however that the code below relies + * on not being within a subtransaction. */ if (IsSubTransaction()) ereport(ERROR, (errcode(ERRCODE_INVALID_TRANSACTION_TERMINATION), errmsg("cannot commit while a subtransaction is active"))); - /* - * Hold any pinned portals that any PLs might be using. We have to do - * this before changing transaction state, since this will run - * user-defined code that might throw an error. - */ - HoldPinnedPortals(); + /* XXX this ain't re-entrant enough for my taste */ + if (chain) + SaveTransactionCharacteristics(); - /* Start the actual commit */ - _SPI_current->internal_xact = true; + /* Catch any error occurring during the COMMIT */ + PG_TRY(); + { + /* Protect current SPI stack entry against deletion */ + _SPI_current->internal_xact = true; - /* Release snapshots associated with portals */ - ForgetPortalSnapshots(); + /* + * Hold any pinned portals that any PLs might be using. We have to do + * this before changing transaction state, since this will run + * user-defined code that might throw an error. + */ + HoldPinnedPortals(); - if (chain) - SaveTransactionCharacteristics(); + /* Release snapshots associated with portals */ + ForgetPortalSnapshots(); - CommitTransactionCommand(); + /* Do the deed */ + CommitTransactionCommand(); - if (chain) - { + /* Immediately start a new transaction */ StartTransactionCommand(); - RestoreTransactionCharacteristics(); + if (chain) + RestoreTransactionCharacteristics(); + + MemoryContextSwitchTo(oldcontext); + + _SPI_current->internal_xact = false; } + PG_CATCH(); + { + ErrorData *edata; - MemoryContextSwitchTo(oldcontext); + /* Save error info in caller's context */ + MemoryContextSwitchTo(oldcontext); + edata = CopyErrorData(); + FlushErrorState(); - _SPI_current->internal_xact = false; + /* + * Abort the failed transaction. If this fails too, we'll just + * propagate the error out ... there's not that much we can do. + */ + AbortCurrentTransaction(); + + /* ... and start a new one */ + StartTransactionCommand(); + if (chain) + RestoreTransactionCharacteristics(); + + MemoryContextSwitchTo(oldcontext); + + _SPI_current->internal_xact = false; + + /* Now that we've cleaned up the transaction, re-throw the error */ + ReThrowError(edata); + } + PG_END_TRY(); }
Re: remove more archiving overhead
Thanks for taking a look! On Thu, Feb 24, 2022 at 02:13:49PM +0900, Kyotaro Horiguchi wrote: > https://www.postgresql.org/docs/14/continuous-archiving.html > | The archive command should generally be designed to refuse to > | overwrite any pre-existing archive file. This is an important safety > | feature to preserve the integrity of your archive in case of > | administrator error (such as sending the output of two different > | servers to the same archive directory). > > The recommended behavior of archive_command above is to avoid this > kind of corruption. If we officially admit that a WAL segment can be > archive twice, we need to revise that part (and the following part) of > the doc. Yeah, we might want to recommend succeeding if the archive already exists with identical contents (like basic_archive does). IMO this would best be handled in a separate thread that is solely focused on documentation updates. AFAICT this is an existing problem. >> IIUC this means that in theory, a crash at an unfortunate moment could >> leave you with a .ready file, the file to archive, and another link to that >> file with a "future" WAL filename. If you re-archive the file after it has >> been reused, you could end up with corrupt WAL archives. I think this > > IMHO the difference would be that the single system call (maybe) > cannot be interrupted by sigkill. So I'm not sure that that is worth > considering. The sigkill window can be elimianted if we could use > renameat(RENAME_NOREPLACE). But finally power-loss can cause that. Perhaps durable_rename_excl() could use renameat2() for systems that support it. However, the problem would still exist for systems that have link() but not renameat2(). In this particular case, there really shouldn't be an existing file in pg_wal. >> might already be possible today. Syncing the directory after every rename >> probably reduces the likelihood quite substantially, but if >> RemoveOldXlogFiles() quickly picks up the .done file and attempts >> recycling before durable_rename() calls fsync() on the directory, >> presumably the same problem could occur. > > If we don't fsync at every rename, we don't even need for > RemoveOldXlogFiles to pickup .done very quickly. Isn't it a big > difference? Yeah, it probably increases the chances quite a bit. > I think we don't want make checkpointer slower in exchange of making > archiver faster. Perhaps we need to work using a condensed > information rather than repeatedly scanning over the distributed > information like archive_status? v2 avoids giving the checkpointer more work. > At Tue, 22 Feb 2022 11:52:29 -0800, Nathan Bossart > wrote in >> On Tue, Feb 22, 2022 at 09:37:11AM -0800, Nathan Bossart wrote: >> > In my testing, I found that when I killed the server just before unlink() >> > during WAL recyling, I ended up with links to the same file in pg_wal after >> > restarting. My latest test produced links to the same file for the current >> > WAL file and the next one. Maybe WAL recyling should use durable_rename() >> > instead of durable_rename_excl(). >> >> Here is an updated patch. > > Is it intentional that v2 loses the xlogarchive.c part? Yes. I found that a crash at an unfortunate moment can produce multiple links to the same file in pg_wal, which seemed bad independent of archival. By fixing that (i.e., switching from durable_rename_excl() to durable_rename()), we not only avoid this problem, but we also avoid trying to archive a file the server is concurrently writing. Then, after a crash, the WAL file to archive should either not exist (which is handled by the archiver) or contain the same contents as any preexisting archives. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: convert libpq uri-regress tests to tap test
Hi, On 2022-02-24 17:03:33 +, Jacob Champion wrote: > On Thu, 2022-02-24 at 08:46 -0800, Andres Freund wrote: > > One annoying bit is that our current tap invocation infrastructure for msvc > > won't know how to deal with that. We put the build directory containing t/ > > onto PATH, but that won't work for test/. But we also don't want to install > > test binaries. Not sure what the solution for that is. > > Would it help if the C executable, not Perl, was the thing actually > producing the TAP output? The binaries built from test/ could be placed > into t/. Or does that just open up a new set of problems? I don't think it would help that much. And for the libpq pipeline test it doesn't really make sense anyway, because we intentionally use it with different traces and such. I've thought about a few C tap tests that I'd like, but I think that's a separate discussion. Greetings, Andres Freund
Re: libpq async duplicate error results
Actually ... it now seems to me that there's live bugs in the interaction between pipeline mode and error-buffer clearing. Namely, that places like PQsendQueryStart will unconditionally clear the buffer even though we may be in the middle of a pipeline sequence, and the buffer might hold an error that's yet to be reported to the application. So I think we need something more like the attached. regards, tom lane diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index 45dddaf556..0c39bc9abf 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -1380,10 +1380,7 @@ pqAppendCmdQueueEntry(PGconn *conn, PGcmdQueueEntry *entry) * state, we don't have to do anything. */ if (conn->asyncStatus == PGASYNC_IDLE) - { -pqClearConnErrorState(conn); pqPipelineProcessQueue(conn); - } break; } } @@ -1730,8 +1727,10 @@ PQsendQueryStart(PGconn *conn, bool newQuery) /* * If this is the beginning of a query cycle, reset the error state. + * However, in pipeline mode with something already queued, the error + * buffer belongs to that command and we shouldn't clear it. */ - if (newQuery) + if (newQuery && conn->cmd_queue_head == NULL) pqClearConnErrorState(conn); /* Don't try to send if we know there's no live connection. */ @@ -2149,11 +2148,8 @@ PQgetResult(PGconn *conn) /* * We're about to return the NULL that terminates the round of * results from the current query; prepare to send the results - * of the next query when we're called next. Also, since this - * is the start of the results of the next query, clear any - * prior error message. + * of the next query when we're called next. */ -pqClearConnErrorState(conn); pqPipelineProcessQueue(conn); } break; @@ -2362,6 +2358,14 @@ PQexecStart(PGconn *conn) if (!conn) return false; + /* + * Since this is the beginning of a query cycle, reset the error state. + * However, in pipeline mode with something already queued, the error + * buffer belongs to that command and we shouldn't clear it. + */ + if (conn->cmd_queue_head == NULL) + pqClearConnErrorState(conn); + if (conn->pipelineStatus != PQ_PIPELINE_OFF) { appendPQExpBufferStr(>errorMessage, @@ -2369,11 +2373,6 @@ PQexecStart(PGconn *conn) return false; } - /* - * Since this is the beginning of a query cycle, reset the error state. - */ - pqClearConnErrorState(conn); - /* * Silently discard any prior query result that application didn't eat. * This is probably poor design, but it's here for backward compatibility. @@ -2928,8 +2927,11 @@ PQfn(PGconn *conn, /* * Since this is the beginning of a query cycle, reset the error state. + * However, in pipeline mode with something already queued, the error + * buffer belongs to that command and we shouldn't clear it. */ - pqClearConnErrorState(conn); + if (conn->cmd_queue_head == NULL) + pqClearConnErrorState(conn); if (conn->pipelineStatus != PQ_PIPELINE_OFF) { @@ -3099,6 +3101,12 @@ pqPipelineProcessQueue(PGconn *conn) conn->cmd_queue_head == NULL) return; + /* + * Reset the error state. This and the next couple of steps correspond to + * what PQsendQueryStart didn't do for this query. + */ + pqClearConnErrorState(conn); + /* Initialize async result-accumulation state */ pqClearAsyncResult(conn); @@ -3809,9 +3817,11 @@ PQsetnonblocking(PGconn *conn, int arg) * behavior. this is ok because either they are making a transition _from_ * or _to_ blocking mode, either way we can block them. * - * Clear error state in case pqFlush adds to it. + * Clear error state in case pqFlush adds to it, unless we're actively + * pipelining, in which case it seems best not to. */ - pqClearConnErrorState(conn); + if (conn->cmd_queue_head == NULL) + pqClearConnErrorState(conn); /* if we are going from blocking to non-blocking flush here */ if (pqFlush(conn)) @@ -4003,7 +4013,8 @@ PQescapeStringConn(PGconn *conn, return 0; } - pqClearConnErrorState(conn); + if (conn->cmd_queue_head == NULL) + pqClearConnErrorState(conn); return PQescapeStringInternal(conn, to, from, length, error, conn->client_encoding, @@ -4041,7 +4052,8 @@ PQescapeInternal(PGconn *conn, const char *str, size_t len, bool as_ident) if (!conn) return NULL; - pqClearConnErrorState(conn); + if (conn->cmd_queue_head == NULL) + pqClearConnErrorState(conn); /* Scan the string for characters that must be escaped. */ for (s = str; (s - str) < len && *s != '\0'; ++s) @@ -4306,7 +4318,8 @@ PQescapeByteaConn(PGconn *conn, if (!conn) return NULL; - pqClearConnErrorState(conn); + if (conn->cmd_queue_head == NULL) + pqClearConnErrorState(conn); return PQescapeByteaInternal(conn, from, from_length, to_length, conn->std_strings,
Re: [PATCH] Expose port->authn_id to extensions and triggers
Hi, On Thu, Feb 24, 2022 at 04:50:59PM +, Jacob Champion wrote: > On Thu, 2022-02-24 at 20:39 +0900, Michael Paquier wrote: > > I don't quite see the additional value that this API brings as > > MyProcPort is directly accessible, and contrib modules like > > postgres_fdw and sslinfo just use that to find the data of the current > > backend. > > Right -- I just didn't know if third-party modules were actually able > to rely on the internal layout of struct Port. Is that guaranteed to > remain constant for a major release line? If so, this new API is > superfluous. Yes, third-party can rely on Port layout. We don't break ABI between minor release. In some occasions we can add additional fields at the end of a struct, but nothing more. > > I could still see a use case for that at a more global level with > > beentrys, but it looked like there was not much interest the last time > > I dropped this idea. > > I agree that this would be useful to have in the stats. From my outside > perspective, it seems like it's difficult to get strings of arbitrary > length in there; is that accurate? Yes, as it's all in shared memory. The only workaround is using something like track_activity_query_size, but it's not great.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
On Wed, 23 Feb 2022 at 14:28, Nitin Jadhav wrote: > > Sharing the v2 patch. Kindly have a look and share your comments. Thanks for updating. > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml With the new pg_stat_progress_checkpoint, you should also add a backreference to this progress reporting in the CHECKPOINT sql command documentation located in checkpoint.sgml, and maybe in wal.sgml and/or backup.sgml too. See e.g. cluster.sgml around line 195 for an example. > diff --git a/src/backend/postmaster/checkpointer.c > b/src/backend/postmaster/checkpointer.c > +ImmediateCheckpointRequested(int flags) > if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE) > +{ > +updated_flags |= CHECKPOINT_IMMEDIATE; I don't think that these changes are expected behaviour. Under in this condition; the currently running checkpoint is still not 'immediate', but it is going to hurry up for a new, actually immediate checkpoint. Those are different kinds of checkpoint handling; and I don't think you should modify the reported flags to show that we're going to do stuff faster than usual. Maybe maintiain a seperate 'upcoming checkpoint flags' field instead? > diff --git a/src/backend/catalog/system_views.sql > b/src/backend/catalog/system_views.sql > +( SELECT '0/0'::pg_lsn + > + ((CASE > + WHEN stat.lsn_int64 < 0 THEN pow(2::numeric, > 64::numeric)::numeric > + ELSE 0::numeric > + END) + > + stat.lsn_int64::numeric) > + FROM (SELECT s.param3::bigint) AS stat(lsn_int64) > +) AS start_lsn, My LSN select statement was an example that could be run directly in psql; the so you didn't have to embed the SELECT into the view query. The following should be sufficient (and save the planner a few cycles otherwise spent in inlining): +('0/0'::pg_lsn + + ((CASE + WHEN s.param3 < 0 THEN pow(2::numeric, 64::numeric)::numeric + ELSE 0::numeric + END) + + s.param3::numeric) +) AS start_lsn, > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c > +checkpoint_progress_start(int flags) > [...] > +checkpoint_progress_update_param(int index, int64 val) > [...] > +checkpoint_progress_end(void) > +{ > +/* In bootstrap mode, we don't actually record anything. */ > +if (IsBootstrapProcessingMode()) > +return; Disabling pgstat progress reporting when in bootstrap processing mode / startup/end-of-recovery makes very little sense (see upthread) and should be removed, regardless of whether seperate functions stay. > diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h > +#define PROGRESS_CHECKPOINT_PHASE_INIT 0 Generally, enum-like values in a stat_progress field are 1-indexed, to differentiate between empty/uninitialized (0) and states that have been set by the progress reporting infrastructure. Kind regards, Matthias van de Meent
Re: convert libpq uri-regress tests to tap test
On Thu, 2022-02-24 at 08:46 -0800, Andres Freund wrote: > One annoying bit is that our current tap invocation infrastructure for msvc > won't know how to deal with that. We put the build directory containing t/ > onto PATH, but that won't work for test/. But we also don't want to install > test binaries. Not sure what the solution for that is. Would it help if the C executable, not Perl, was the thing actually producing the TAP output? The binaries built from test/ could be placed into t/. Or does that just open up a new set of problems? --Jacob
Re: [PATCH] Expose port->authn_id to extensions and triggers
On Thu, 2022-02-24 at 20:39 +0900, Michael Paquier wrote: > I don't quite see the additional value that this API brings as > MyProcPort is directly accessible, and contrib modules like > postgres_fdw and sslinfo just use that to find the data of the current > backend. Right -- I just didn't know if third-party modules were actually able to rely on the internal layout of struct Port. Is that guaranteed to remain constant for a major release line? If so, this new API is superfluous. > Cannot a module like pgaudit, through the elog hook, do its > work with what we have already in place? Given the above, I would hope so. Stephen mentioned that pgaudit only had access to the logged-in role, and when I proposed a miscadmin.h API he said that would help. CC'ing him to see what he meant; I don't know if pgaudit has additional constraints. > What's the use case for a given session to be able to report back > only its own authn through SQL? That's for triggers to be able to grab the current ID for logging/auditing. Is there a better way to expose this for that use case? > I could still see a use case for that at a more global level with > beentrys, but it looked like there was not much interest the last time > I dropped this idea. I agree that this would be useful to have in the stats. From my outside perspective, it seems like it's difficult to get strings of arbitrary length in there; is that accurate? Thanks, --Jacob
Re: convert libpq uri-regress tests to tap test
Hi, On 2022-02-24 10:17:28 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2022-02-24 13:31:40 +0100, Peter Eisentraut wrote: > >> I think the TAP scripts should be in src/interfaces/libpq/t/, as usual. The > >> supporting code snippets could live in some other directory under > >> src/interfaces/libpq/, which might be called "test" or something else, not > >> that important. > > > Why not in t/? We can't easily build the test programs in libpq/ itself, but > > libpq/t should be fairly doable. > > I think that having t/ directories contain only Perl test scripts > is a good convention that we should stick to. Peter's proposal > of a separate test/ subdirectory for C test scaffolding is > probably fine. That exists today and continues to exist in the patch upthread, so it's easy ;). I just need to move the libpq/test/t to libpq/t and adjust the binary path. One annoying bit is that our current tap invocation infrastructure for msvc won't know how to deal with that. We put the build directory containing t/ onto PATH, but that won't work for test/. But we also don't want to install test binaries. Not sure what the solution for that is. Even on !windows, we only know how to find "test executables" in tap tests via PATH. We're in the source dir, so we can't just do test/executable. I probably just need another coffee, but right now I don't even see how to add anything to PATH given $(prove_check)'s definition - it ends up with multiple shells. We can fix that by using && in the definition, which might be a good thing anyway? Attached three patches: 0001: Convert src/interfaces/libpq/test to a tap test 0002: Add tap test infrastructure to src/interfaces/libpq 0003: Move libpq_pipeline test into src/interfaces/libpq. I did 0001 and 0002 in that order because prove errors out with a stacktrace if no tap tests exist... It might make more sense to commit them together, but for review it's easier to keep them separate I think. Andrew, do you have an idea about the feasibility of supporting any of this with the msvc build? I'm mildly inclined to only do 0001 and 0002 for now. We'd not loose msvc coverage, because it already doesn't build the test. Once we've ironed that stuff out, we could do 0003? Greetings, Andres Freund >From 71fa1532a1540e8bbf8bdee3ec0b64e863f212f2 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 23 Feb 2022 12:22:56 -0800 Subject: [PATCH v2 1/3] Convert src/interfaces/libpq/test to a tap test. The invocation of the tap test will be added in the next commit. --- src/interfaces/libpq/t/001_uri.pl | 265 + src/interfaces/libpq/test/.gitignore | 2 - src/interfaces/libpq/test/Makefile | 7 +- src/interfaces/libpq/test/README | 7 - src/interfaces/libpq/test/expected.out | 171 src/interfaces/libpq/test/regress.in | 57 -- src/interfaces/libpq/test/regress.pl | 65 -- 7 files changed, 267 insertions(+), 307 deletions(-) create mode 100644 src/interfaces/libpq/t/001_uri.pl delete mode 100644 src/interfaces/libpq/test/README delete mode 100644 src/interfaces/libpq/test/expected.out delete mode 100644 src/interfaces/libpq/test/regress.in delete mode 100644 src/interfaces/libpq/test/regress.pl diff --git a/src/interfaces/libpq/t/001_uri.pl b/src/interfaces/libpq/t/001_uri.pl new file mode 100644 index 000..0225666d9f6 --- /dev/null +++ b/src/interfaces/libpq/t/001_uri.pl @@ -0,0 +1,265 @@ +# Copyright (c) 2021-2022, PostgreSQL Global Development Group +use strict; +use warnings; + +use PostgreSQL::Test::Utils; +use Test::More; +use IPC::Run; + +my @tests = ( + [q{postgresql://uri-user:secret@host:12345/db}, + q{user='uri-user' password='secret' dbname='db' host='host' port='12345' (inet)}, + q{}, +], + [q{postgresql://uri-user@host:12345/db}, + q{user='uri-user' dbname='db' host='host' port='12345' (inet)}, + q{}, +], + [q{postgresql://uri-user@host/db}, + q{user='uri-user' dbname='db' host='host' (inet)}, + q{}, +], + [q{postgresql://host:12345/db}, + q{dbname='db' host='host' port='12345' (inet)}, + q{}, +], + [q{postgresql://host/db}, + q{dbname='db' host='host' (inet)}, + q{}, +], + [q{postgresql://uri-user@host:12345/}, + q{user='uri-user' host='host' port='12345' (inet)}, + q{}, +], + [q{postgresql://uri-user@host/}, + q{user='uri-user' host='host' (inet)}, + q{}, +], + [q{postgresql://uri-user@}, + q{user='uri-user' (local)}, + q{}, +], + [q{postgresql://host:12345/}, + q{host='host' port='12345' (inet)}, + q{}, +], + [q{postgresql://host:12345}, + q{host='host' port='12345' (inet)}, + q{}, +], + [q{postgresql://host/db}, + q{dbname='db' host='host' (inet)}, + q{}, +], + [q{postgresql://host/}, + q{host='host' (inet)}, + q{}, +], + [q{postgresql://host}, + q{host='host' (inet)}, + q{}, +], + [q{postgresql://}, + q{(local)}, + q{}, +], +
Re: Time to drop plpython2?
On Wed, Feb 23, 2022 at 07:59:01AM -0800, Mark Wong wrote: > On Tue, Feb 22, 2022 at 08:50:07PM -0500, Tom Lane wrote: > > Mark Wong writes: > > > Take 3. :) > > > > > I've upgraded everyone to the v14 buildfarm scripts and made sure the > > > --test passed on HEAD on each one. So I hopefully didn't miss any > > > (other than the one EOL OpenSUSE version that I will plan on upgrading.) > > > > Thanks! > > > > However ... it seems like most of your animals haven't run at all > > in the last couple of days. Did you maybe forget to re-enable > > their cron jobs after messing with them, or something like that? > > Uh oh, more user error. I tested run_build but run_branches wasn't > happy. I'll start working through that... I think I have most of them operational again. I see some animals are still failing on some branches, but still better overall? I discovered that clang for gadwall and pintail got purged when I removed python2, so I disabled those two animals. Regards, Mark
Re: Extract epoch from Interval weird behavior
Joseph Koshakow writes: > I do want to briefly mention, if I'm understanding the history of > EXTRACT correctly, that the previous behavior > actually was to multiply by 365.25, not 365. However The commit that > changed the return type from numeric [1] > changed that behavior. Looking through the discussions [2], I don't > see any mention of it, which makes me think > it was a mistake. Indeed ... Peter? regards, tom lane
Re: Add id's to various elements in protocol.sgml
On 24.02.2022 at 16:46, Alvaro Herrera wrote: On 2022-Feb-24, Dagfinn Ilmari Mannsåker wrote: Peter Eisentraut writes: Is there a way to obtain those URLs other than going into the HTML sources and checking if there is an anchor near where you want go? I use the jump-to-anchor extension: https://github.com/brettz9/jump-to-anchor/ Some sites have javascript that adds a link next to the element that becomes visible when hovering, e.g. the NAME and other headings on https://metacpan.org/pod/perl. Would it be possible to create such anchor links as part of the XSL stylesheets for HTML? Initially I thought that most use cases would involve developers who would be perfectly capable of extracting the id they need from the html sources but I agree that making that a bit more comfortable (especially given the fact that others do that too) seems worthwhile. I'll investiogate our options and report back.
Re: Design of pg_stat_subscription_workers vs pgstats
Hi, Thank you for the comments! On Thu, Feb 24, 2022 at 4:20 PM tanghy.f...@fujitsu.com wrote: > > On Thu, Feb 24, 2022 9:33 AM Masahiko Sawada wrote: > > > > Thank you for the comments! I've attached the latest version patch > > that incorporated all comments I got so far. The primary change from > > the previous version is that the subscription statistics live globally > > rather than per-database. > > > > Thanks for updating the patch. > > Few comments: > > 1. > I think we should add some doc for column stats_reset in > pg_stat_subscription_activity view. Added. > > 2. > +CREATE VIEW pg_stat_subscription_activity AS > SELECT > -w.subid, > +a.subid, > s.subname, > ... > +a.apply_error_count, > +a.sync_error_count, > + a.stats_reset > +FROM pg_subscription as s, > +pg_stat_get_subscription_activity(oid) as a; > > The line "a.stats_reset" uses a Tab, and we'd better use spaces here. Fixed. On Thu, Feb 24, 2022 at 5:54 PM Peter Smith wrote: > > Hi. Below are my review comments for the v2 patch. > > == > > 1. Commit message > > This patch changes the pg_stat_subscription_workers view (introduced > by commit 8d74fc9) so that it stores only statistics counters: > apply_error_count and sync_error_count, and has one entry for > subscription. > > SUGGESTION > "and has one entry for subscription." --> "and has one entry for each > subscription." Fixed. > > ~~~ > > 2. Commit message > > After removing these error details, there are no longer relation > information, so the subscription statistics are now a cluster-wide > statistics. > > SUGGESTION > "there are no longer relation information," --> "there is no longer > any relation information," Fixed. > > ~~~ > > 3. doc/src/sgml/monitoring.sgml > > - > - The error message > + Number of times the error occurred during the application of changes > > > > > > - last_error_time timestamp > with time zone > + sync_error_count bigint > > > - Last time at which this error occurred > + Number of times the error occurred during the initial table > + synchronization > > > SUGGESTION (both places) > "Number of times the error occurred ..." --> "Number of times an error > occurred ..." Fixed. > > ~~~ > > 4. doc/src/sgml/monitoring.sgml - missing column > > (duplicate - also reported by [Tang-v2]) > > The PG docs for the new "stats_reset" column are missing. > > ~~~ > > 5. src/backend/catalog/system_views.sql - whitespace > > (duplicate - also reported by [Tang-v2]) > > - JOIN pg_subscription s ON (w.subid = s.oid); > +a.apply_error_count, > +a.sync_error_count, > + a.stats_reset > +FROM pg_subscription as s, > +pg_stat_get_subscription_activity(oid) as a; > > inconsistent tab/space indenting for 'a.stats_reset'. > > ~~~ > > 6. src/backend/postmaster/pgstat.c - function name > > +/* -- > + * pgstat_reset_subscription_counter() - > + * > + * Tell the statistics collector to reset a single subscription > + * counter, or all subscription counters (when subid is InvalidOid). > + * > + * Permission checking for this function is managed through the normal > + * GRANT system. > + * -- > + */ > +void > +pgstat_reset_subscription_counter(Oid subid) > > SUGGESTION (function name) > "pgstat_reset_subscription_counter" --> > "pgstat_reset_subscription_counters" (plural) Fixed. > > ~~ > > 7. src/backend/postmaster/pgstat.c - pgstat_recv_resetsubcounter > > @@ -5645,6 +5598,51 @@ > pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg, > } > } > > +/* -- > + * pgstat_recv_resetsubcounter() - > + * > + * Reset some subscription statistics of the cluster. > + * -- > + */ > +static void > +pgstat_recv_resetsubcounter(PgStat_MsgResetsubcounter *msg, int len) > > > "Reset some" seems a bit vague. Why not describe that it is all or > none according to the msg->m_subid? I think it reset none, one, or all statistics, actually. Given other pgstat_recv_reset* functions also have similar comments, I think we can use it rather than elaborating. > > ~~~ > > 8. src/backend/postmaster/pgstat.c - pgstat_recv_resetsubcounter > > + if (!OidIsValid(msg->m_subid)) > + { > + HASH_SEQ_STATUS sstat; > + > + /* Clear all subscription counters */ > + hash_seq_init(, subscriptionStatHash); > + while ((subentry = (PgStat_StatSubEntry *) hash_seq_search()) != NULL) > + pgstat_reset_subscription(subentry, ts); > + } > + else > + { > + /* Get the subscription statistics to reset */ > + subentry = pgstat_get_subscription_entry(msg->m_subid, false); > + > + /* > + * Nothing to do if the given subscription entry is not found. This > + * could happen when the subscription with the subid is removed and > + * the corresponding statistics entry is also removed before receiving > + * the reset message. > + */ > + if (!subentry) > + return; > + > + /* Reset
Re: create_index test fails when synchronous_commit = off @ master
Hi, On 2022-02-24 07:33:39 -0800, Andres Freund wrote: > I added the SELECT relpages, reltuples, relallvisible FROM pg_class WHERE oid > = 'tenk1'::regclass; > just after the > VACUUM ANALYZE tenk1; > > synchronous_commit=on > + relpages | reltuples | relallvisible > +--+---+--- > + 345 | 1 | 345 > +(1 row) > > synchronous_commit=off > + relpages | reltuples | relallvisible > +--+---+--- > + 345 | 1 | 0 > +(1 row) > > So it clearly is the explanation for the issue. > > > Obviously we can locally work around it by adding a > SET LOCAL synchronous_commit = local; > to the COPY. But I'd like to fully understand what's going on. It is the hint bit sets delayed by asynchronous commit. I traced execution and we do end up not setting all visible due to reaching the !HeapTupleHeaderXminCommitted() path in lazy_scan_prune() case HEAPTUPLE_LIVE: ... /* * Is the tuple definitely visible to all transactions? * * NB: Like with per-tuple hint bits, we can't set the * PD_ALL_VISIBLE flag if the inserter committed * asynchronously. See SetHintBits for more info. Check that * the tuple is hinted xmin-committed because of that. */ if (prunestate->all_visible) { TransactionId xmin; if (!HeapTupleHeaderXminCommitted(tuple.t_data)) So it might be reasonable to use synchronous_commit=on in test_setup.sql? It's not super satisfying, but i don't immediately see what else could prevent all-visible to be set in this case? There's no dead rows, so concurrent snapshots shouldn't prevent cleanup. I guess we could alternatively try doing something like flushing pending async commits at the start of vacuum. But that probably isn't warranted. Greetings, Andres Freund
Re: Add id's to various elements in protocol.sgml
On 2022-Feb-24, Dagfinn Ilmari Mannsåker wrote: > Peter Eisentraut writes: > > Is there a way to obtain those URLs other than going into the HTML > > sources and checking if there is an anchor near where you want go? > > I use the jump-to-anchor extension: https://github.com/brettz9/jump-to-anchor/ > > Some sites have javascript that adds a link next to the element that > becomes visible when hovering, e.g. the NAME and other headings on > https://metacpan.org/pod/perl. Would it be possible to create such anchor links as part of the XSL stylesheets for HTML? -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Re: create_index test fails when synchronous_commit = off @ master
Hi, On 2022-02-24 16:47:25 +0300, Aleksander Alekseev wrote: > - QUERY PLAN > > - Index Only Scan using tenk1_thous_tenthous on tenk1 > - Index Cond: (thousand < 2) > - Filter: (tenthous = ANY ('{1001,3000}'::integer[])) > -(3 rows) > + QUERY PLAN > +-- > + Sort > + Sort Key: thousand > + -> Index Only Scan using tenk1_thous_tenthous on tenk1 > + Index Cond: ((thousand < 2) AND (tenthous = ANY > ('{1001,3000}'::integer[]))) > +(4 rows) Heh. We've been having a lot of fights with exactly this plan change in the AIO branch, before cc50080a82, and without synchronous_commit = off. Interestingly near-exclusively with the regression run within pg_upgrade's tests. For aio we (David did a lot of that IIRC) finally hunted it down to be due vacuum skipping pages due to inability to get a cleanup lock. If that happens enough, pg_class.relallvisible changes enough to lead to the different plan. I first was going to suggest that we should just use VACUUM FREEZE to prevent the issue. But in this instance the cause isn't cleanup locks, probably that we can't yet set hint bits due to synchronous_commit=off? But I don't *fully* understand how it leads to this. I added the SELECT relpages, reltuples, relallvisible FROM pg_class WHERE oid = 'tenk1'::regclass; just after the VACUUM ANALYZE tenk1; synchronous_commit=on + relpages | reltuples | relallvisible +--+---+--- + 345 | 1 | 345 +(1 row) synchronous_commit=off + relpages | reltuples | relallvisible +--+---+--- + 345 | 1 | 0 +(1 row) So it clearly is the explanation for the issue. Obviously we can locally work around it by adding a SET LOCAL synchronous_commit = local; to the COPY. But I'd like to fully understand what's going on. > I didn't investigate further. Do we assume that `make installcheck` suppose > to pass with a different postgresql.conf options? Depends on the option, I think... There's some where it's interesting to run tests with different options and where the effort required is reasonable. And some cases where it's not... synchronous_commit=off worked until recently, and I think we should keep it working. Greetings, Andres Freund
Re: Extract epoch from Interval weird behavior
On Thu, Feb 24, 2022 at 4:47 AM Aleksander Alekseev wrote: > Extracting an epoch from an interval is quite a strange case since intervals > are not connected to any specific dates. I agree, I think it's a weird use case and that it's probably not worth fixing. Though it was fun for me to try. > > All in all, I don't think that the benefit of the proposed change outweighs > the fact that it will break the previous behavior for the users who may rely > on it. I suggest keeping it simple, i.e. the way it is now. What I think we > could do instead is explicitly document this behavior in [1]. > > [1]: https://www.postgresql.org/docs/current/functions-datetime.html I do want to briefly mention, if I'm understanding the history of EXTRACT correctly, that the previous behavior actually was to multiply by 365.25, not 365. However The commit that changed the return type from numeric [1] changed that behavior. Looking through the discussions [2], I don't see any mention of it, which makes me think it was a mistake. However there is a lot of discussion around numeric performance and being able to optimize numeric division because every divisor was a power of 10. Fixing this issue would break that assumption and cause some performance degradations which probably isn't worth it. [1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2da77cdb4661826482ebf2ddba1f953bc74afe4 [2]: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce1...@phystech.edu - Joe Koshakow
Re: convert libpq uri-regress tests to tap test
Andres Freund writes: > On 2022-02-24 13:31:40 +0100, Peter Eisentraut wrote: >> I think the TAP scripts should be in src/interfaces/libpq/t/, as usual. The >> supporting code snippets could live in some other directory under >> src/interfaces/libpq/, which might be called "test" or something else, not >> that important. > Why not in t/? We can't easily build the test programs in libpq/ itself, but > libpq/t should be fairly doable. I think that having t/ directories contain only Perl test scripts is a good convention that we should stick to. Peter's proposal of a separate test/ subdirectory for C test scaffolding is probably fine. regards, tom lane
Re: Frontend error logging style
Peter Eisentraut writes: > This patch already illustrates a couple of things that are wrong with > this approach: All of these objections are a bit moot with my followup proposal, no? > - It doesn't allow any other way of exiting. For example, in pg_dump, > you have removed a few exit_nicely() calls. It's not clear why that is > valid or whether it would always be valid for all call sites. As the patch stood, I'd hacked it so that pg_log_fatal called exit_nicely() not exit() within pg_dump. I agree that's not a great solution, but I think that the correct fix is to get rid of exit_nicely in favor of doing the requisite cleanup in an atexit hook. pg_dump is already at great hazard of somebody calling exit() not exit_nicely(), either through failure to pay attention to that undocumented convention, or because some code it imports from someplace like src/common calls exit() directly. You need not look any further than pg_malloc() to see that there are live problems there. I figured this could be addressed in a separate patch, though, because to the extent that missing the exit-nicely callbacks is a real bug, it's already a bug. > My suggestion is to just get rid of pg_log_fatal() and replace them all > with pg_log_error(). I'm on board with dropping the separate FATAL log level if there's consensus to do so; I think it adds more confusion than anything else. I still want to have something like pg_fatal(), though, because it's impossible to deny the usefulness of such an abbreviation. It's already been reinvented three or four times independently. Independently of either of those points, I still want to make the changes I proposed w.r.t. making explicit concepts of "detail" and "hint" addendums. What we have right now in the frontend code is an impossible mishmash of three or four ways of filling that lack, all inconsistent. regards, tom lane
Re: Readd use of TAP subtests
Hi, On 2022-02-24 11:16:03 +0100, Peter Eisentraut wrote: > Now that we have switched everything to done_testing(), the subtests feature > isn't that relevant anymore, but it might still be useful to get better > output when running with PROVE_FLAGS=--verbose. I've incidentally played with subtests yesterdays, when porting src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it seems that subtests aren't actually specified in the tap format, and that different libraries generate different output formats. The reason this matters somewhat is that meson's testrunner can parse tap and give nicer progress / error reports. But since subtests aren't in the spec it can't currently parse them... Open issue since 2015: https://github.com/TestAnything/Specification/issues/2 The perl ecosystem is so moribund :(. > t/001_basic.pl .. > # Subtest: vacuumlo --help > ok 1 - exit code 0 > ok 2 - goes to stdout > ok 3 - nothing to stderr > 1..3 It's clearly better. We can approximate some of it by using is_deeply() and comparing exit, stdout, stderr at once. Particularly for helpers like program_help() that are used in a lot of places. Greetings, Andres Freund
Re: Frontend error logging style
Hi, On 2022-02-24 14:06:18 +0100, Peter Eisentraut wrote: > My suggestion is to just get rid of pg_log_fatal() and replace them all with > pg_log_error(). -1. This ignores that already several places came up with their slightly different versions of fatal exit handlers. We don't gain anything by not standardizing on one notion of a fatal error wrapper. Greetings, Andres Freund
Re: convert libpq uri-regress tests to tap test
Hi, On 2022-02-24 13:31:40 +0100, Peter Eisentraut wrote: > I think the TAP scripts should be in src/interfaces/libpq/t/, as usual. The > supporting code snippets could live in some other directory under > src/interfaces/libpq/, which might be called "test" or something else, not > that important. Why not in t/? We can't easily build the test programs in libpq/ itself, but libpq/t should be fairly doable. Greetings, Andres Freund
Re: Design of pg_stat_subscription_workers vs pgstats
Hi, On 2022-02-24 13:23:55 +0100, Peter Eisentraut wrote: > On 24.02.22 12:46, Masahiko Sawada wrote: > > > We have a view called pg_stat_activity, which is very well known. From > > > that perspective, "activity" means what is happening right now or what > > > has happened most recently. The reworked view in this patch does not > > > contain that (we already have pg_stat_subscription for that), but it > > > contains accumulated counters. > > Right. > > > > What pg_stat_subscription shows is rather suitable for the name > > pg_stat_subscription_activity than the reworked view. But switching > > these names would also not be a good idea. I think it's better to use > > "subscription" in the view name since it shows actually statistics for > > subscriptions and subscription OID is the key. I personally prefer > > pg_stat_subscription_counters among the ideas that have been proposed > > so far, but I'd like to hear opinions and votes. > > _counters will fail if there is something not a counter (such as > last-timestamp-of-something). > > Earlier, pg_stat_subscription_stats was mentioned, which doesn't have that > problem. We really should try to fix this in a more general way at some point. We have way too many different things mixed up in pg_stat_*. I'd like to get something like the patch in soon though, we can still change the name later. I've been blocked behind this stuff for weeks, and it's getting really painful... Greetings, Andres Freund
Re: GiST index build missing smgrimmedsync()?
On 23/02/2022 23:30, Melanie Plageman wrote: I brought this up in [1] but wanted to start a dedicated thread. Since 16fa9b2b30a357 GiST indexes are not built in shared buffers. However, smgrimmedsync() is not done anywhere and skipFsync=true is always passed to smgrwrite() and smgrextend(). So, if a checkpoint starts and finishes after WAL-logging some of the index build pages and there is a crash sometime before the dirty pages make it to permanent storage, won't that data be lost? Yes, good catch! Seems like the following would address this: Committed essentially that, except that I put the smgrimmedsync in a separate if-block, and copied the comment from the similar piece of code from nbtsort.c to explain the issue. Thanks! - Heikki
Re: Add id's to various elements in protocol.sgml
Peter Eisentraut writes: > On 18.12.21 00:53, Brar Piening wrote: >> The purpose is that you can directly link to the id in the public html >> docs which still gets generated (e. g. >> https://www.postgresql.org/docs/14/protocol-replication.html#PROTOCOL-REPLICATION-BASE-BACKUP). >> >> Essentially it gives people discussing the protocol and pointing to a >> certain command or message format the chance to link to the very thing >> they are discussing instead of the top of the lengthy html page. > > Is there a way to obtain those URLs other than going into the HTML > sources and checking if there is an anchor near where you want go? I use the jump-to-anchor extension: https://github.com/brettz9/jump-to-anchor/ Some sites have javascript that adds a link next to the element that becomes visible when hovering, e.g. the NAME and other headings on https://metacpan.org/pod/perl. - ilmari
Re: Patch a potential memory leak in describeOneTableDetails()
> On 24 Feb 2022, at 07:32, Kyotaro Horiguchi wrote: > > At Thu, 24 Feb 2022 14:44:56 +0900 (JST), Kyotaro Horiguchi > wrote in >> At Tue, 22 Feb 2022 09:59:09 +0100, Daniel Gustafsson >> wrote in >>> The section in question was added to our docs 22 years ago, to make it >>> reflect >>> the current operating systems in use maybe just not mentioning more(1) is >>> the >>> easiest?: >>> >>>"The text browsing tool less can be invoked as less -x4 to make it show >>>tabs appropriately." >>> >>> Or perhaps remove that section altogether? >> >> I think the former is better. > > So the attached does that. I think this is reasonable, since it's pretty clear that more(1) supporting "-x" is fairly rare. I'll await others commenting. -- Daniel Gustafsson https://vmware.com/
create_index test fails when synchronous_commit = off @ master
Hi hackers, I noticed that create_index test (make installcheck) fails on my laptop because different query plans are chosen: - QUERY PLAN - Index Only Scan using tenk1_unique1 on tenk1 - Index Cond: (unique1 = ANY ('{1,42,7}'::integer[])) -(2 rows) +QUERY PLAN +--- + Sort + Sort Key: unique1 + -> Bitmap Heap Scan on tenk1 + Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[])) + -> Bitmap Index Scan on tenk1_unique1 + Index Cond: (unique1 = ANY ('{1,42,7}'::integer[])) +(6 rows) ... - QUERY PLAN - Index Only Scan using tenk1_thous_tenthous on tenk1 - Index Cond: (thousand < 2) - Filter: (tenthous = ANY ('{1001,3000}'::integer[])) -(3 rows) + QUERY PLAN +-- + Sort + Sort Key: thousand + -> Index Only Scan using tenk1_thous_tenthous on tenk1 + Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[]))) +(4 rows) Investigation showed that it happens only with `synchronous_commit = off`. Only the master branch is affected, starting from cc50080a82. This is a debug build. I'm running MacOS Monterey 12.2.1 @ x64. I didn't investigate further. Do we assume that `make installcheck` suppose to pass with a different postgresql.conf options? -- Best regards, Aleksander Alekseev
Re: PATCH: add "--config-file=" option to pg_rewind
> On 24 Feb 2022, at 14:43, Gunnar Nick Bluth wrote: > That looks just as good to me, and it already has tests, so I'll happily step > down! Actually, I think this looks like a saner approach. Putting a config setting in two place (postgresql.conf and on the commandline for pg_rewind) is a recipe for them diverging. -- Daniel Gustafsson https://vmware.com/
Re: Typo in pgbench messages.
> On 24 Feb 2022, at 13:58, Fabien COELHO wrote: >> One argument against a backpatch is that this would be disruptive with >> tools that parse and analyze the output generated by pgbench. Fabien, >> don't you have some tools and/or wrappers doing exactly that? > > Yep, I like to "| cut -d' ' -fx" and other "line.split(' ')" or whatever. > > I think that the break of typographical rules is intentional to allow such > simplistic low-level stream handling through pipes or scripts. I'd prefer > that the format is not changed. Maybe a comment could be added to explain the > reason behind it. That doesn't sound like an overwhelmingly convincing argument to print some messages with 'X %' and others with 'X%'. -- Daniel Gustafsson https://vmware.com/
Re: PATCH: add "--config-file=" option to pg_rewind
Am 24.02.2022 um 14:08 schrieb Daniel Gustafsson: On 24 Feb 2022, at 10:27, Alexander Kukushkin wrote: Hello Gunnar, On Thu, 24 Feb 2022 at 10:03, Aleksander Alekseev wrote: wants to use the "-c" option on a typical Debian/Ubuntu installation (where the config resides below /etc/postgresql/), pg_rewind needs a way to be told where the postgresql.conf actually is. The attached patch implements this as an option. [...] Good catch! Yeah, it is a known problem, and there was already another patch trying to address it [1]. There is yet another related patch which provides a parameter to pass in restore_command in cases where postgresq.conf isn't reachable: https://commitfest.postgresql.org/37/3213/ That looks just as good to me, and it already has tests, so I'll happily step down! (Note to myself: check the CF first next time ;-) -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bl...@pro-open.de __ "Ceterum censeo SystemD esse delendam" - Cato
Re: Readd use of TAP subtests
> On 24 Feb 2022, at 11:16, Peter Eisentraut > wrote: > I think for deeply nested tests and test routines defined in other modules, > this helps structure the test output more like the test source code is laid > out, so it makes following the tests and locating failing test code easier. I don't have any strong opinions on this, but if we go ahead with it I think there should be a short note in src/test/perl/README about when substest could be considered. -- Daniel Gustafsson https://vmware.com/
Re: PATCH: add "--config-file=" option to pg_rewind
> On 24 Feb 2022, at 10:27, Alexander Kukushkin wrote: > > Hello Gunnar, > > On Thu, 24 Feb 2022 at 10:03, Aleksander Alekseev > wrote: > > > wants to use the "-c" option on a typical Debian/Ubuntu installation > > (where the config resides below /etc/postgresql/), pg_rewind needs a way > > to be told where the postgresql.conf actually is. > > > > The attached patch implements this as an option. > > > > [...] > > Good catch! > > Yeah, it is a known problem, and there was already another patch trying to > address it [1]. There is yet another related patch which provides a parameter to pass in restore_command in cases where postgresq.conf isn't reachable: https://commitfest.postgresql.org/37/3213/ -- Daniel Gustafsson https://vmware.com/
Re: Frontend error logging style
On 23.02.22 00:23, Tom Lane wrote: This conversation seems to have tailed off without full resolution, but I observe that pretty much everyone except Peter is on board with defining pg_log_fatal as pg_log_error + exit(1). So I think we should just do that, unless Peter wants to produce a finished alternative proposal. I've now gone through and fleshed out the patch I sketched upthread. This patch already illustrates a couple of things that are wrong with this approach: - It doesn't allow any other way of exiting. For example, in pg_dump, you have removed a few exit_nicely() calls. It's not clear why that is valid or whether it would always be valid for all call sites. - It doesn't allow other exit codes. For example, in psql, you have changed a pg_log_fatal() call to pg_log_error() because it needed another exit code. This slides us right back into that annoying situation where in the backend we have to log error messages using elog(LOG) because the flow control is tangled up with the log level. My suggestion is to just get rid of pg_log_fatal() and replace them all with pg_log_error().
Re: Optionally automatically disable logical replication subscriptions on error
On Thu, Feb 24, 2022 at 8:08 PM Amit Kapila wrote: > > On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada wrote: > > > > Here are some comments: > > > > Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()? > > > > I have given this comment to move the related code to separate > functions to slightly simplify ApplyWorkerMain() code but if you don't > like we can move it back. I am not sure I like the new function names > in the patch though. Okay, I'm fine with moving this code but perhaps we can find a better function name as "Wrapper" seems slightly odd to me. For example, start_table_sync_start() and start_apply_changes() or something (it seems we use the snake case for static functions in worker.c). Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Typo in pgbench messages.
Bonjour Michaël, I think it's better to back-patch this to stable branches if there's no objection. Thought? That's only cosmetic, so I would just bother about HEAD if I were to change something like that (I would not bother at all, personally). One argument against a backpatch is that this would be disruptive with tools that parse and analyze the output generated by pgbench. Fabien, don't you have some tools and/or wrappers doing exactly that? Yep, I like to "| cut -d' ' -fx" and other "line.split(' ')" or whatever. I think that the break of typographical rules is intentional to allow such simplistic low-level stream handling through pipes or scripts. I'd prefer that the format is not changed. Maybe a comment could be added to explain the reason behind it. -- Fabien.
Buffer Manager and Contention
Thinking about poor performance in the case where the data fits in RAM, but the working set is too big for shared_buffers, I notice a couple of things that seem bad in BufMgr, but don't understand why they are like that. 1. If we need to allocate a buffer to a new block we do this in one step, while holding both partition locks for the old and the new tag. Surely it would cause less contention to make the old block/tag invalid (after flushing), drop the old partition lock and then switch to the new one? i.e. just hold one mapping partition lock at a time. Is there a specific reason we do it this way? 2. Possibly connected to the above, we issue BufTableInsert() BEFORE we issue BufTableDelete(). That means we need extra entries in the buffer mapping hash table to allow us to hold both the old and the new at the same time, for a short period. The way dynahash.c works, we try to allocate an entry from the freelist and if that doesn't work, we begin searching ALL the freelists for free entries to steal. So if we get enough people trying to do virtual I/O at the same time, then we will hit a "freelist storm" where everybody is chasing the last few entries. It would make more sense if we could do BufTableDelete() first, then hold onto the buffer mapping entry rather than add it to the freelist, so we can use it again when we do BufTableInsert() very shortly afterwards. That way we wouldn't need to search the freelist at all. What is the benefit or reason of doing the Delete after the Insert? Put that another way, it looks like BufTable functions are used in a way that mismatches against the way dynahash is designed. Thoughts? -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Design of pg_stat_subscription_workers vs pgstats
On Thu, Feb 24, 2022 at 9:23 PM Peter Eisentraut wrote: > > > On 24.02.22 12:46, Masahiko Sawada wrote: > >> We have a view called pg_stat_activity, which is very well known. From > >> that perspective, "activity" means what is happening right now or what > >> has happened most recently. The reworked view in this patch does not > >> contain that (we already have pg_stat_subscription for that), but it > >> contains accumulated counters. > > Right. > > > > What pg_stat_subscription shows is rather suitable for the name > > pg_stat_subscription_activity than the reworked view. But switching > > these names would also not be a good idea. I think it's better to use > > "subscription" in the view name since it shows actually statistics for > > subscriptions and subscription OID is the key. I personally prefer > > pg_stat_subscription_counters among the ideas that have been proposed > > so far, but I'd like to hear opinions and votes. > > _counters will fail if there is something not a counter (such as > last-timestamp-of-something). > > Earlier, pg_stat_subscription_stats was mentioned, which doesn't have > that problem. Ah, I had misunderstood your comment. Right, _counter could be a blocker for the future changes. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Proposal: Support custom authentication methods using hooks
On 2/24/22 04:16, Aleksander Alekseev wrote: > Hi Samay, > >> I wanted to submit a patch to expose 2 new hooks (one for the authentication >> check and another one for error reporting) in auth.c. These will allow users >> to implement their own authentication methods for Postgres or add custom >> logic around authentication. > I like the idea - PostgreSQL is all about extendability. Also, well > done with TAP tests and an example extension. This being said, I > didn't look at the code yet, but cfbot seems to be happy with it: > http://cfbot.cputube.org/ > >> One constraint in the current implementation is that we allow only one >> authentication provider to be loaded at a time. In the future, we can add >> more functionality to maintain an array of hooks and call the appropriate >> one based on the provider name in the pg_hba line. > This sounds like a pretty severe and unnecessary limitation to me. Do > you think it would be difficult to bypass it in the first > implementation? Yeah, I think we would want a set of providers that could be looked up at runtime. I think this is likely to me material for release 16, so there's plenty of time to get it right. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: convert libpq uri-regress tests to tap test
On 24.02.22 02:52, Tom Lane wrote: Peter Eisentraut writes: On 23.02.22 23:58, Tom Lane wrote: Peter Eisentraut writes: libpq TAP tests should be in src/interfaces/libpq/t/. That's failing to account for the fact that a libpq test can't really be a pure-perl TAP test; you need some C code to drive the library. Such things could be put under src/interfaces/libpq/test, or some other subdirectory. We already have src/interfaces/ecpg/test. OK, but then the TAP scripts are under src/interfaces/libpq/test/t, which isn't what you said. I have no great objection to moving src/test/modules/libpq_pipeline/ to src/interfaces/libpq/test/, though, as long as the buildfarm will cope. I think the TAP scripts should be in src/interfaces/libpq/t/, as usual. The supporting code snippets could live in some other directory under src/interfaces/libpq/, which might be called "test" or something else, not that important. I think we should pick a layout that is proper and future-proof and then adjust the buildfarm client as necessary. The issue of writing libpq-specific tests has come up a few times recently; I think it would be worth finding a proper solution to this that would facilitate that work in the future.
Re: Design of pg_stat_subscription_workers vs pgstats
On 24.02.22 12:46, Masahiko Sawada wrote: We have a view called pg_stat_activity, which is very well known. From that perspective, "activity" means what is happening right now or what has happened most recently. The reworked view in this patch does not contain that (we already have pg_stat_subscription for that), but it contains accumulated counters. Right. What pg_stat_subscription shows is rather suitable for the name pg_stat_subscription_activity than the reworked view. But switching these names would also not be a good idea. I think it's better to use "subscription" in the view name since it shows actually statistics for subscriptions and subscription OID is the key. I personally prefer pg_stat_subscription_counters among the ideas that have been proposed so far, but I'd like to hear opinions and votes. _counters will fail if there is something not a counter (such as last-timestamp-of-something). Earlier, pg_stat_subscription_stats was mentioned, which doesn't have that problem.
Re: convert libpq uri-regress tests to tap test
On 2/23/22 20:52, Tom Lane wrote: > Peter Eisentraut writes: >> On 23.02.22 23:58, Tom Lane wrote: >>> Peter Eisentraut writes: libpq TAP tests should be in src/interfaces/libpq/t/. >>> That's failing to account for the fact that a libpq test can't >>> really be a pure-perl TAP test; you need some C code to drive the >>> library. >> Such things could be put under src/interfaces/libpq/test, or some other >> subdirectory. We already have src/interfaces/ecpg/test. > OK, but then the TAP scripts are under src/interfaces/libpq/test/t, > which isn't what you said. I have no great objection to moving > src/test/modules/libpq_pipeline/ to src/interfaces/libpq/test/, > though, as long as the buildfarm will cope. > > It won't without some adjustment. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Add id's to various elements in protocol.sgml
On 18.12.21 00:53, Brar Piening wrote: The purpose is that you can directly link to the id in the public html docs which still gets generated (e. g. https://www.postgresql.org/docs/14/protocol-replication.html#PROTOCOL-REPLICATION-BASE-BACKUP). Essentially it gives people discussing the protocol and pointing to a certain command or message format the chance to link to the very thing they are discussing instead of the top of the lengthy html page. Is there a way to obtain those URLs other than going into the HTML sources and checking if there is an anchor near where you want go?
Re: Design of pg_stat_subscription_workers vs pgstats
On Thu, Feb 24, 2022 at 9:05 PM Amit Kapila wrote: > > On Thu, Feb 24, 2022 at 2:24 PM Peter Smith wrote: > > > > 10. src/backend/replication/logical/worker.c > > > > (from my previous [Peter-v1] #9) > > > > >> + /* Report the worker failed during table synchronization */ > > >> + pgstat_report_subscription_error(MyLogicalRepWorker->subid, false); > > >> > > >> and > > >> > > >> + /* Report the worker failed during the application of the change */ > > >> + pgstat_report_subscription_error(MyLogicalRepWorker->subid, true); > > >> > > >> > > >> Why don't these use MySubscription->oid instead of > > >> MyLogicalRepWorker->subid? > > > > > It's just because we used to use MyLogicalRepWorker->subid, is there > > > any particular reason why we should use MySubscription->oid here? > > > > I felt MySubscription->oid is a more natural and more direct way of > > expressing the same thing. > > > > Consider: "the oid of the current subscription" versus "the oid of > > the subscription of the current worker". IMO the first one is simpler. > > YMMV. > > > > I think we can use either but maybe MySubscription->oid would be > slightly better here as the same is used in nearby code as well. Okay, will change. > > > > 13. src/test/subscription/t/026_worker_stats.pl - missing test? > > > > Shouldn't there also be some test to reset the counters to confirm > > that they really do get reset to zero? > > > > ~~~ > > > > I think we avoid writing tests for stats for each and every case as it > is not reliable in nature (the message can be lost). If we can find a > reliable way then it is okay. Yeah, the messages can even be out-of-order. Particularly, in this test, the apply worker and table sync worker keep reporting the messages, it's quite possible that the test becomes unstable. I remember we removed unstable tests of resetting statistics before (e.g., see fc6950913). Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/