date:20220224

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-24 Thread Kyotaro Horiguchi

At Fri, 25 Feb 2022 16:47:01 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> So, this is the patches for pg12-10.  11 can share the same patch with
> 12.  10 has differences in two points.
> 
> 10 has ControlFile->prevCheckPoint.
> 
> The DETAILS of the "recovery restart point at" message is not
> capitalized.  But I suppose it is so close to EOL so that we don't
> want to "fix" it risking existing usecases.

Ugh! Wait for a moment. Something's wrong.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-24 Thread Kyotaro Horiguchi

At Fri, 25 Feb 2022 16:06:31 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Fri, 25 Feb 2022 15:31:12 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > While making patches for v12, I see a test failure of pg_rewind for
> > uncertain reason. I'm investigating that but I post this for
> > discussion.
> 
> Hmm. Too stupid.  Somehow I overly removed the latchet condition for
> minRecoveryPoint.  So the same patch worked for v12.

So, this is the patches for pg12-10.  11 can share the same patch with
12.  10 has differences in two points.

10 has ControlFile->prevCheckPoint.

The DETAILS of the "recovery restart point at" message is not
capitalized.  But I suppose it is so close to EOL so that we don't
want to "fix" it risking existing usecases.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From c89e2b509723b68897f2af49a154af2a69f0747b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 25 Feb 2022 15:04:00 +0900
Subject: [PATCH v3] Correctly update contfol file at the end of archive
 recovery

CreateRestartPoint runs WAL file cleanup basing on the checkpoint just
have finished in the function.  If the database has exited
DB_IN_ARCHIVE_RECOVERY state when the function is going to update
control file, the function refrains from updating the file at all then
proceeds to WAL cleanup having the latest REDO LSN, which is now
inconsistent with the control file.  As the result, the succeeding
cleanup procedure overly removes WAL files against the control file
and leaves unrecoverable database until the next checkpoint finishes.

Along with that fix, we remove a dead code path for the case some
other process ran a simultaneous checkpoint.  It seems like just a
preventive measure but it's no longer useful because we are sure that
checkpoint is performed only by checkpointer except single process
mode.
---
 src/backend/access/transam/xlog.c | 71 +++
 1 file changed, 44 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 885558f291..2b2568c475 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9334,7 +9334,7 @@ CreateRestartPoint(int flags)
 
/* Also update the info_lck-protected copy */
SpinLockAcquire(>info_lck);
-   XLogCtl->RedoRecPtr = lastCheckPoint.redo;
+   XLogCtl->RedoRecPtr = RedoRecPtr;
SpinLockRelease(>info_lck);
 
/*
@@ -9350,7 +9350,10 @@ CreateRestartPoint(int flags)
if (log_checkpoints)
LogCheckpointStart(flags, true);
 
-   CheckPointGuts(lastCheckPoint.redo, flags);
+   CheckPointGuts(RedoRecPtr, flags);
+
+   /* Update pg_control */
+   LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 
/*
 * Remember the prior checkpoint's redo ptr for
@@ -9358,31 +9361,28 @@ CreateRestartPoint(int flags)
 */
PriorRedoPtr = ControlFile->checkPointCopy.redo;
 
+   Assert (PriorRedoPtr < RedoRecPtr);
+
+   ControlFile->checkPoint = lastCheckPointRecPtr;
+   ControlFile->checkPointCopy = lastCheckPoint;
+
+   /* Update control file using current time */
+   ControlFile->time = (pg_time_t) time(NULL);
+
/*
-* Update pg_control, using current time.  Check that it still shows
-* IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
-* this is a quick hack to make sure nothing really bad happens if 
somehow
-* we get here after the end-of-recovery checkpoint.
+* Ensure minRecoveryPoint is past the checkpoint record while archive
+* recovery is still ongoing.  Normally, this will have happened already
+* while writing out dirty buffers, but not necessarily - e.g. because 
no
+* buffers were dirtied.  We do this because a non-exclusive base backup
+* uses minRecoveryPoint to determine which WAL files must be included 
in
+* the backup, and the file (or files) containing the checkpoint record
+* must be included, at a minimum. Note that for an ordinary restart of
+* recovery there's no value in having the minimum recovery point any
+* earlier than this anyway, because redo will begin just after the
+* checkpoint record.
 */
-   LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
-   if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
-   ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
+   if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
{
-   ControlFile->checkPoint = lastCheckPointRecPtr;
-   ControlFile->checkPointCopy = lastCheckPoint;
-   ControlFile->time = (pg_time_t) time(NULL);
-
-   /*
-* Ensure minRecoveryPoint is past the checkpoint record.  
Normally,
-* this will have happened already while writing out dirty 
buffers,
-* but not necessarily - e.g. because

Re: [Proposal] Global temporary tables

2022-02-24 Thread Andres Freund

Hi,


This is a huge thread. Realistically reviewers and committers can't reread
it. I think there needs to be more of a description of how this works included
in the patchset and *why* it works that way. The readme does a bit of that,
but not particularly well.


On 2022-02-25 14:26:47 +0800, Wenjing Zeng wrote:
> +++ b/README.gtt.txt
> @@ -0,0 +1,172 @@
> +Global Temporary Table(GTT)
> +=
> +
> +Feature description
> +-
> +
> +Previously, temporary tables are defined once and automatically
> +exist (starting with empty contents) in every session before using them.

I think for a README "previously" etc isn't good language - if it were
commited, it'd not be understandable anymore. It makes more sense for commit
messages etc.


> +Main design ideas
> +-
> +In general, GTT and LTT use the same storage and buffer design and
> +implementation. The storage files for both types of temporary tables are 
> named
> +as t_backendid_relfilenode, and the local buffer is used to cache the data.

What does "named ast_backendid_relfilenode" mean?


> +The schema of GTTs is shared among sessions while their data are not. We 
> build
> +a new mechanisms to manage those non-shared data and their statistics.
> +Here is the summary of changes:
> +
> +1) CATALOG
> +GTTs store session-specific data. The storage information of GTTs'data, their
> +transaction information, and their statistics are not stored in the catalog.
> +
> +2) STORAGE INFO & STATISTICS INFO & TRANSACTION INFO
> +In order to maintain durability and availability of GTTs'session-specific 
> data,
> +their storage information, statistics, and transaction information is managed
> +in a local hash table tt_storage_local_hash.

"maintain durability"? Durable across what? In the context of databases it's
typically about crash safety, but that can't be the case here.


> +3) DDL
> +Currently, GTT supports almost all table'DDL except CLUSTER/VACUUM FULL.
> +Part of the DDL behavior is limited by shared definitions and multiple 
> copies of
> +local data, and we added some structures to handle this.

> +A shared hash table active_gtt_shared_hash is added to track the state of the
> +GTT in a different session. This information is recorded in the hash table
> +during the DDL execution of the GTT.

> +The data stored in a GTT can only be modified or accessed by owning session.
> +The statements that only modify data in a GTT do not need a high level of
> +table locking. The operations making those changes include truncate GTT,
> +reindex GTT, and lock GTT.

I think you need to introduce a bit more terminology for any of this to make
sense. Sometimes GTT means the global catalog entity, sometimes, like here, it
appears to mean the session specific contents of a GTT.

What state of a GTT in a nother session?


How do GTTs handle something like BEGIN; TRUNCATE some_gtt_table; ROLLBACK;?


> +1.2 on commit clause
> +LTT's status associated with on commit DELETE ROWS and on commit PRESERVE 
> ROWS
> +is not stored in catalog. Instead, GTTs need a bool value 
> on_commit_delete_rows
> +in reloptions which is shared among sessions.

Why?



> +2.3 statistics info
> +1) relpages reltuples relallvisible relfilenode

?


> +3 DDL
> +3.1. active_gtt_shared_hash
> +This is the hash table created in shared memory to trace the GTT files 
> initialized
> +in each session. Each hash entry contains a bitmap that records the 
> backendid of
> +the initialized GTT file. With this hash table, we know which backend/session
> +is using this GTT. Such information is used during GTT's DDL operations.

So there's a separate locking protocol for GTTs that doesn't use the normal
locking infrastructure? Why?


> +3.7 CLUSTER GTT/VACUUM FULL GTT
> +The current version does not support.

Why?


> +4 MVCC commit log(clog) cleanup
> +
> +The GTT storage file contains transaction information. Queries for GTT data 
> rely
> +on transaction information such as clog. The transaction information 
> required by
> +each session may be completely different.

Why is transaction information different between sessions? Or does this just
mean that different transaction ids will be accessed?



0003-gtt-v67-implementation.patch
 71 files changed, 3167 insertions(+), 195 deletions(-)

This needs to be broken into smaller chunks to be reviewable.


> @@ -677,6 +678,14 @@ _bt_getrootheight(Relation rel)
>   {
>   Buffer  metabuf;
>  
> + /*
> +  * If a global temporary table storage file is not initialized 
> in the
> +  * this session, its index does not have a root page, just 
> returns 0.
> +  */
> + if (RELATION_IS_GLOBAL_TEMP(rel) &&
> + !gtt_storage_attached(RelationGetRelid(rel)))
> + return 0;
> +
>   metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
>

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 20:53:08 -0800, Peter Geoghegan wrote:
> 0002 makes page-level freezing a first class thing.
> heap_prepare_freeze_tuple now has some (limited) knowledge of how this
> works. heap_prepare_freeze_tuple's cutoff_xid argument is now always
> the VACUUM caller's OldestXmin (not its FreezeLimit, as before). We
> still have to pass FreezeLimit to heap_prepare_freeze_tuple, which
> helps us to respect FreezeLimit as a backstop, and so now it's passed
> via the new backstop_cutoff_xid argument instead.

I am not a fan of the backstop terminology. It's still the reason we need to
do freezing for correctness reasons. It'd make more sense to me to turn it
around and call the "non-backstop" freezing opportunistic freezing or such.


> Whenever we opt to
> "freeze a page", the new page-level algorithm *always* uses the most
> recent possible XID and MXID values (OldestXmin and oldestMxact) to
> decide what XIDs/XMIDs need to be replaced. That might sound like it'd
> be too much, but it only applies to those pages that we actually
> decide to freeze (since page-level characteristics drive everything
> now). FreezeLimit is only one way of triggering that now (and one of
> the least interesting and rarest).

That largely makes sense to me and doesn't seem weird.

I'm a tad concerned about replacing mxids that have some members that are
older than OldestXmin but not older than FreezeLimit. It's not too hard to
imagine that accelerating mxid consumption considerably.  But we can probably,
if not already done, special case that.


> It seems that heap_prepare_freeze_tuple allocates new MXIDs (when
> freezing old ones) in large part so it can NOT freeze XIDs that it
> would have been useful (and much cheaper) to remove anyway.

Well, we may have to allocate a new mxid because some members are older than
FreezeLimit but others are still running. When do we not remove xids that
would have been cheaper to remove once we decide to actually do work?


> On HEAD, FreezeMultiXactId() doesn't get passed down the VACUUM operation's
> OldestXmin at all (it actually just gets FreezeLimit passed as its
> cutoff_xid argument). It cannot possibly recognize any of this for itself.

It does recognize something like OldestXmin in a more precise and expensive
way - MultiXactIdIsRunning() and TransactionIdIsCurrentTransactionId().


> Does that theory about MultiXacts sound plausible? I'm not claiming
> that the patch makes it impossible that FreezeMultiXactId() will have
> to allocate a new MultiXact to freeze during VACUUM -- the
> freeze-the-dead isolation tests already show that that's not true. I
> just think that page-level freezing based on page characteristics with
> oldestXmin and oldestMxact (not FreezeLimit and MultiXactCutoff)
> cutoffs might make it a lot less likely in practice.

Hm. I guess I'll have to look at the code for it. It doesn't immediately
"feel" quite right.


> oldestXmin and oldestMxact map to the same wall clock time, more or less --
> that seems like it might be an important distinction, independent of
> everything else.

Hm. Multis can be kept alive by fairly "young" member xids. So it may not be
removably (without creating a newer multi) until much later than its creation
time. So I don't think that's really true.



> From 483bc8df203f9df058fcb53e7972e3912e223b30 Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan 
> Date: Mon, 22 Nov 2021 10:02:30 -0800
> Subject: [PATCH v9 1/4] Loosen coupling between relfrozenxid and freezing.
>
> When VACUUM set relfrozenxid before now, it set it to whatever value was
> used to determine which tuples to freeze -- the FreezeLimit cutoff.
> This approach was very naive: the relfrozenxid invariant only requires
> that new relfrozenxid values be <= the oldest extant XID remaining in
> the table (at the point that the VACUUM operation ends), which in
> general might be much more recent than FreezeLimit.  There is no fixed
> relationship between the amount of physical work performed by VACUUM to
> make it safe to advance relfrozenxid (freezing and pruning), and the
> actual number of XIDs that relfrozenxid can be advanced by (at least in
> principle) as a result.  VACUUM might have to freeze all of the tuples
> from a hundred million heap pages just to enable relfrozenxid to be
> advanced by no more than one or two XIDs.  On the other hand, VACUUM
> might end up doing little or no work, and yet still be capable of
> advancing relfrozenxid by hundreds of millions of XIDs as a result.
>
> VACUUM now sets relfrozenxid (and relminmxid) using the exact oldest
> extant XID (and oldest extant MultiXactId) from the table, including
> XIDs from the table's remaining/unfrozen MultiXacts.  This requires that
> VACUUM carefully track the oldest unfrozen XID/MultiXactId as it goes.
> This optimization doesn't require any changes to the definition of
> relfrozenxid, nor does it require changes to the core design of
> freezing.


> Final relfrozenxid values must still be >=

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-24 Thread Kyotaro Horiguchi

At Fri, 25 Feb 2022 15:31:12 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> While making patches for v12, I see a test failure of pg_rewind for
> uncertain reason. I'm investigating that but I post this for
> discussion.

Hmm. Too stupid.  Somehow I overly removed the latchet condition for
minRecoveryPoint.  So the same patch worked for v12.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-24 Thread Julien Rouhaud

Hi,

On Fri, Feb 25, 2022 at 12:23:27AM +0530, Nitin Jadhav wrote:
> > I think the change to ImmediateCheckpointRequested() makes no sense.
> > Before this patch, that function merely inquires whether there's an
> > immediate checkpoint queued.  After this patch, it ... changes a
> > progress-reporting flag?  I think it would make more sense to make the
> > progress-report flag change in whatever is the place that *requests* an
> > immediate checkpoint rather than here.
> >
> > > diff --git a/src/backend/postmaster/checkpointer.c 
> > > b/src/backend/postmaster/checkpointer.c
> > > +ImmediateCheckpointRequested(int flags)
> > >  if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
> > > +{
> > > +updated_flags |= CHECKPOINT_IMMEDIATE;
> >
> > I don't think that these changes are expected behaviour. Under in this
> > condition; the currently running checkpoint is still not 'immediate',
> > but it is going to hurry up for a new, actually immediate checkpoint.
> > Those are different kinds of checkpoint handling; and I don't think
> > you should modify the reported flags to show that we're going to do
> > stuff faster than usual. Maybe maintiain a seperate 'upcoming
> > checkpoint flags' field instead?
> 
> Thank you Alvaro and Matthias for your views. I understand your point
> of not updating the progress-report flag here as it just checks
> whether the CHECKPOINT_IMMEDIATE is set or not and takes an action
> based on that but it doesn't change the checkpoint flags. I will
> modify the code but I am a bit confused here. As per Alvaro, we need
> to make the progress-report flag change in whatever is the place that
> *requests* an immediate checkpoint. I feel this gives information
> about the upcoming checkpoint not the current one. So updating here
> provides wrong details in the view. The flags available during
> CreateCheckPoint() will remain same for the entire checkpoint
> operation and we should show the same information in the view till it
> completes.

I'm not sure what Matthias meant, but as far as I know there's no fundamental
difference between checkpoint with and without the CHECKPOINT_IMMEDIATE flag,
and there's also no scheduling for multiple checkpoints.

Yes, the flags will remain the same but checkpoint.c will test both the passed
flags and the shmem flags to see whether a delay should be added or not, which
is the only difference in checkpoint processing for this flag.  See the call to
ImmediateCheckpointRequested() which will look at the value in shmem:

/*
 * Perform the usual duties and take a nap, unless we're behind 
schedule,
 * in which case we just try to catch up as quickly as possible.
 */
if (!(flags & CHECKPOINT_IMMEDIATE) &&
!ShutdownRequestPending &&
!ImmediateCheckpointRequested() &&
IsCheckpointOnSchedule(progress))
[...]

Re: Typo in pgbench messages.

2022-02-24 Thread Yugo NAGATA

On Thu, 24 Feb 2022 14:44:05 +0100
Daniel Gustafsson  wrote:

> > On 24 Feb 2022, at 13:58, Fabien COELHO  wrote:
> 
> >> One argument against a backpatch is that this would be disruptive with
> >> tools that parse and analyze the output generated by pgbench.  Fabien,
> >> don't you have some tools and/or wrappers doing exactly that?
> > 
> > Yep, I like to "| cut -d' ' -fx" and other "line.split(' ')" or whatever.
> > 
> > I think that the break of typographical rules is intentional to allow such 
> > simplistic low-level stream handling through pipes or scripts. I'd prefer 
> > that the format is not changed. Maybe a comment could be added to explain 
> > the reason behind it.
> 
> That doesn't sound like an overwhelmingly convincing argument to print some
> messages with 'X %' and others with 'X%'.

I agree with Daniel. If inserting a space in front of % was intentional for
handling the result in such tools, we should fix other places where  'X%' is
used in the pgbench output.

Regards,
Yugo Nagata

-- 
Yugo NAGATA

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-02-24 Thread Japin Li



On Fri, 25 Feb 2022 at 03:02, David Christensen 
 wrote:
> Greetings,
>
> This patch adds the ability to specify a RelFileNode and optional BlockNum to 
> limit output of
> pg_waldump records to only those which match the given criteria.  This should 
> be more performant
> than `pg_waldump | grep` as well as more reliable given specific variations 
> in output style
> depending on how the blocks are specified.
>
> This currently affects only the main fork, but we could presumably add the 
> option to filter by fork
> as well, if that is considered useful.
>

Cool.  I think we can report an error instead of reading wal files,
if the tablespace, database, or relation is invalid.  Does there any
WAL record that has invalid tablespace, database, or relation OID?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-24 Thread Kyotaro Horiguchi

At Tue, 22 Feb 2022 17:44:01 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Tue, 22 Feb 2022 01:59:45 +0900, Fujii Masao  
> wrote in 
> > 
> > > Of the following, I think we should do (a) and (b) to make future
> > > backpatchings easier.
> > > a) Use RedoRecPtr and PriorRedoPtr after they are assigned.
> > > b) Move assignment to PriorRedoPtr into the ControlFileLock section.
> > 
> > I failed to understand how (a) and (b) can make the backpatching
> > easier. How easy to backpatch seems the same whether we apply (a) and
> > (b) or not...
> 
> That premises that the patch applied to master contains (a) and (b).
> So if it doesn't, those are not need by older branches.

I was once going to remove them.  But according the discussion below,
the patch for back-patching is now quite close to that for the master
branch.  So I left them alone.

> > > d) Skip call to UpdateCheckPointDistanceEstimate() when RedoRecPtr <=
> > >PriorRedoPtr.
> > 
> > But "RedoRecPtr <= PriorRedoPtr" will never happen, will it? Because a
> 
> I didn't believe that it happens. (So, it came from my
> convervativeness, or laziness, or both:p) The code dates from 2009 and
> StartupXLOG makes a concurrent checkpoint with bgwriter. But as of at
> least 9.5, StartupXLOG doesn't directly call CreateCheckPoint. So I
> think that won't happen.
> 
> So, in short, I agree to remove it or turn it into Assert().

It was a bit out of point.  If we assume RedoRecPtr is always larger
than PriorRedoPtr and then we don't need to check that there, we
should also remove the "if (PriorRedoPtr < RedoRecPtr)" branch just
above, which means the patch for back-branches gets very close to that
for the master.  Do we make such a large change on back branches?
Anyways this version once takes that way.

> > if (flags & CHECKPOINT_IS_SHUTDOWN)
> > ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
> > 
> > Same as above. IMO it's safer and better to always update the state
> > (whether the state is DB_IN_ARCHIVE_RECOVERY or not) if
> > CHECKPOINT_IS_SHUTDOWN flag is passed.
> 
> That means we may exit recovery mode after ShutdownXLOG called
> CreateRestartPoint. I don't think that may happen.  So I'd rather add
> Assert ((flags_IS_SHUTDOWN) == 0) there instaed.

So this version for v14 gets updated in the following points.

Completely removed the code path for the case some other process runs
simultaneous checkpoint.

Removed the condition (RedoRecPtr > PriorRedoPtr) for
UpdateCheckPointDistanceEstimate() call.

Added an assertion to the recoery-end path.

# Honestly I feel this is a bit too much for back-patching, though.

While making patches for v12, I see a test failure of pg_rewind for
uncertain reason. I'm investigating that but I post this for
discussion.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From e983f3d4c2dbeea742aed0ef1e209e7821f6687f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 14 Feb 2022 13:04:33 +0900
Subject: [PATCH v2] Correctly update contfol file at the end of archive
 recovery

CreateRestartPoint runs WAL file cleanup basing on the checkpoint just
have finished in the function.  If the database has exited
DB_IN_ARCHIVE_RECOVERY state when the function is going to update
control file, the function refrains from updating the file at all then
proceeds to WAL cleanup having the latest REDO LSN, which is now
inconsistent with the control file.  As the result, the succeeding
cleanup procedure overly removes WAL files against the control file
and leaves unrecoverable database until the next checkpoint finishes.

Along with that fix, we remove a dead code path for the case some
other process ran a simultaneous checkpoint.  It seems like just a
preventive measure but it's no longer useful because we are sure that
checkpoint is performed only by checkpointer except single process
mode.
---
 src/backend/access/transam/xlog.c | 73 ---
 1 file changed, 47 insertions(+), 26 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 6208e123e5..ff4a90eacc 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9587,6 +9587,9 @@ CreateRestartPoint(int flags)
XLogSegNo   _logSegNo;
TimestampTz xtime;
 
+   /* we don't assume concurrent checkpoint/restartpoint to run */
+   Assert (!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
+
/* Get a local copy of the last safe checkpoint record. */
SpinLockAcquire(>info_lck);
lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr;
@@ -9653,7 +9656,7 @@ CreateRestartPoint(int flags)
 
/* Also update the info_lck-protected copy */
SpinLockAcquire(>info_lck);
-   XLogCtl->RedoRecPtr = lastCheckPoint.redo;
+   XLogCtl->RedoRecPtr = RedoRecPtr;
SpinLockRelease(>info_lck);
 
/*
@@ -9672,7 +9675,10 @@ CreateRestartPoint(int flags)
/* Update

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-24 Thread Julien Rouhaud

Hi,

On Thu, Feb 24, 2022 at 09:18:26PM -0800, Andres Freund wrote:
> 
> On 2022-02-25 13:01:26 +0800, Julien Rouhaud wrote:
> > On Thu, Feb 24, 2022 at 08:44:08PM +, Jacob Champion wrote:
> > > 
> > > Yeah... I was following a similar track with the initial work last
> > > year, but I dropped it when the cost of implementation started to grow
> > > considerably. At the time, though, it looked like some overhauls to the
> > > stats framework were being pursued? I should read up on that thread.
> > 
> > Do you mean the shared memory stats patchset?  IIUC this is unrelated, as 
> > the
> > beentry stuff Michael was talking about is a different infrastructure
> > (backend_status.[ch]), and I don't think there are any plans to move that to
> > dynamic shared memory.
> 
> Until a year ago the backend_status.c stuff was in in pgstat.c - I just split
> them out because of the shared memory status work - so it'd not be surprising
> for them to mentally be thrown in one bucket.

Right.

> Basically the type of stats we're trying to move to dynamic shared memory is
> about counters that should persist for a while and are accumulated across
> connections etc. Whereas backend_status.c is more about tracking the current
> state (what query is a backend running, what user, etc).

But would it be acceptable to use dynamic shared memory in backend_status and
e.g. have a dsa_pointer rather than a fixed length array?  That seems like a
bad idea for query text in general, but for authn_id for instance it seems less
likely to hold gigantic strings, and also have more or less consistent size
when provided.

Re: Add id's to various elements in protocol.sgml

2022-02-24 Thread Brar Piening


On 24.02.2022 at 17:07, Brar Piening wrote:

On 24.02.2022 at 16:46, Alvaro Herrera wrote:

Would it be possible to create such anchor links as part of the XSL
stylesheets for HTML?


I'll investiogate our options and report back.


Yes, that would be possible. In fact appending a link and optionally
adding a tiny bit of CSS like I show below does the trick.

The major problem in that regard would probably be my lack of
XSLT/docbook skills but if no one can jump in here, I can see if I can
make it work.

Obviously adding the links via javascript would also work (and even be
easier for me personally) but that seems like the second best solution
to me since it involves javascript where no javasript is needed.

Personally I consider having ids to link to and making them more
comfortable to use/find as orthogonal problems in that case (mostly
developer documentation) so IMHO solving this doesn't necessarily need
to hold back the original patch.


  
    Insert
    #
  ...




.variablelist a.anchor {
  visibility: hidden;
}
.variablelist *:hover > a.anchor,
.variablelist a.anchor:focus {
  visibility: visible;
}

Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-24 Thread Amit Kapila

On Thu, Feb 24, 2022 at 9:20 PM Masahiko Sawada  wrote:
>
> >
> > 6. src/backend/postmaster/pgstat.c - function name
> >
> > +/* --
> > + * pgstat_reset_subscription_counter() -
> > + *
> > + * Tell the statistics collector to reset a single subscription
> > + * counter, or all subscription counters (when subid is InvalidOid).
> > + *
> > + * Permission checking for this function is managed through the normal
> > + * GRANT system.
> > + * --
> > + */
> > +void
> > +pgstat_reset_subscription_counter(Oid subid)
> >
> > SUGGESTION (function name)
> > "pgstat_reset_subscription_counter" -->
> > "pgstat_reset_subscription_counters" (plural)
>
> Fixed.
>

We don't use the plural form in other similar cases like
pgstat_reset_replslot_counter, pgstat_reset_slru_counter, so why do it
differently here?

-- 
With Regards,
Amit Kapila.

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-25 13:01:26 +0800, Julien Rouhaud wrote:
> On Thu, Feb 24, 2022 at 08:44:08PM +, Jacob Champion wrote:
> > 
> > Yeah... I was following a similar track with the initial work last
> > year, but I dropped it when the cost of implementation started to grow
> > considerably. At the time, though, it looked like some overhauls to the
> > stats framework were being pursued? I should read up on that thread.
> 
> Do you mean the shared memory stats patchset?  IIUC this is unrelated, as the
> beentry stuff Michael was talking about is a different infrastructure
> (backend_status.[ch]), and I don't think there are any plans to move that to
> dynamic shared memory.

Until a year ago the backend_status.c stuff was in in pgstat.c - I just split
them out because of the shared memory status work - so it'd not be surprising
for them to mentally be thrown in one bucket.

Basically the type of stats we're trying to move to dynamic shared memory is
about counters that should persist for a while and are accumulated across
connections etc. Whereas backend_status.c is more about tracking the current
state (what query is a backend running, what user, etc).

They're not unrelated though: backend_status.c feeds pgstat.c with some
information occasionally.

Greetings,

Andres Freund

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-24 Thread Julien Rouhaud

On Thu, Feb 24, 2022 at 08:44:08PM +, Jacob Champion wrote:
> 
> Yeah... I was following a similar track with the initial work last
> year, but I dropped it when the cost of implementation started to grow
> considerably. At the time, though, it looked like some overhauls to the
> stats framework were being pursued? I should read up on that thread.

Do you mean the shared memory stats patchset?  IIUC this is unrelated, as the
beentry stuff Michael was talking about is a different infrastructure
(backend_status.[ch]), and I don't think there are any plans to move that to
dynamic shared memory.

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-24 Thread Peter Geoghegan

On Sun, Feb 20, 2022 at 12:27 PM Peter Geoghegan  wrote:
> You've given me a lot of high quality feedback on all of this, which
> I'll work through soon. It's hard to get the balance right here, but
> it's made much easier by this kind of feedback.

Attached is v9. Lots of changes. Highlights:

* Much improved 0001 ("loosen coupling" dynamic relfrozenxid tracking
patch). Some of the improvements are due to recent feedback from
Robert.

* Much improved 0002 ("Make page-level characteristics drive freezing"
patch). Whole new approach to the implementation, though the same
algorithm as before.

* No more FSM patch -- that was totally separate work, that I
shouldn't have attached to this project.

* There are 2 new patches (these are now 0003 and 0004), both of which
are concerned with allowing non-aggressive VACUUM to consistently
advance relfrozenxid. I think that 0003 makes sense on general
principle, but I'm much less sure about 0004. These aren't too
important.

While working on the new approach to freezing taken by v9-0002, I had
some insight about the issues that Robert raised around 0001, too. I
wasn't expecting that to happen.

0002 makes page-level freezing a first class thing.
heap_prepare_freeze_tuple now has some (limited) knowledge of how this
works. heap_prepare_freeze_tuple's cutoff_xid argument is now always
the VACUUM caller's OldestXmin (not its FreezeLimit, as before). We
still have to pass FreezeLimit to heap_prepare_freeze_tuple, which
helps us to respect FreezeLimit as a backstop, and so now it's passed
via the new backstop_cutoff_xid argument instead. Whenever we opt to
"freeze a page", the new page-level algorithm *always* uses the most
recent possible XID and MXID values (OldestXmin and oldestMxact) to
decide what XIDs/XMIDs need to be replaced. That might sound like it'd
be too much, but it only applies to those pages that we actually
decide to freeze (since page-level characteristics drive everything
now). FreezeLimit is only one way of triggering that now (and one of
the least interesting and rarest).

0002 also adds an alternative set of relfrozenxid/relminmxid tracker
variables, to make the "don't freeze the page" path within
lazy_scan_prune simpler (if you don't want to freeze the page, then
use the set of tracker variables that go with that choice, which
heap_prepare_freeze_tuple knows about and helps with). With page-level
freezing, lazy_scan_prune wants to make a decision about the page as a
whole, at the last minute, after all heap_prepare_freeze_tuple calls
have already been made. So I think that heap_prepare_freeze_tuple
needs to know about that aspect of lazy_scan_prune's behavior.

When we *don't* want to freeze the page, we more or less need
everything related to freezing inside lazy_scan_prune to behave like
lazy_scan_noprune, which never freezes the page (that's mostly the
point of lazy_scan_noprune). And that's almost what we actually do --
heap_prepare_freeze_tuple now outsources maintenance of this
alternative set of "don't freeze the page" relfrozenxid/relminmxid
tracker variables to its sibling function, heap_tuple_needs_freeze.
That is the same function that lazy_scan_noprune itself actually
calls.

Now back to Robert's feedback on 0001, which had very complicated
comments in the last version. This approach seems to make the "being
versus becoming" or "going to freeze versus not going to freeze"
distinctions much clearer. This is less true if you assume that 0002
won't be committed but 0001 will be. Even if that happens with
Postgres 15, I have to imagine that adding something like 0002 must be
the real goal, long term. Without 0002, the value from 0001 is far
more limited. You need both together to get the virtuous cycle I've
described.

The approach with always using OldestXmin as cutoff_xid and
oldestMxact as our cutoff_multi makes a lot of sense to me, in part
because I think that it might well cut down on the tendency of VACUUM
to allocate new MultiXacts in order to be able to freeze old ones.
AFAICT the only reason that heap_prepare_freeze_tuple does that is
because it has no flexibility on FreezeLimit and MultiXactCutoff.
These are derived from vacuum_freeze_min_age and
vacuum_multixact_freeze_min_age, respectively, and so they're two
independent though fairly meaningless cutoffs. On the other hand,
OldestXmin and OldestMxact are not independent in the same way. We get
both of them at the same time and the same place, in
vacuum_set_xid_limits. OldestMxact really is very close to OldestXmin
-- only the units differ.

It seems that heap_prepare_freeze_tuple allocates new MXIDs (when
freezing old ones) in large part so it can NOT freeze XIDs that it
would have been useful (and much cheaper) to remove anyway. On HEAD,
FreezeMultiXactId() doesn't get passed down the VACUUM operation's
OldestXmin at all (it actually just gets FreezeLimit passed as its
cutoff_xid argument). It cannot possibly recognize any of this for
itself.

Does that theory about MultiXacts

Re: Separate the result of \watch for each query execution (psql)

2022-02-24 Thread Pavel Stehule

pá 25. 2. 2022 v 5:23 odesílatel Noboru Saito  napsal:

> Hi,
>
> Pavel Stehule :
> > > I strongly agree. It was a lot of work to find a workable solution for
> pspg. Special chars that starting result and maybe other, that ending
> result can significantly increase robustness and can reduce code. I think
> it can be better to use form feed at the end of form - like it is semantic
> of form feed. You know, at this moment, the result is complete.
> https://en.wikipedia.org/wiki/Page_break
> >
> > It's easier to print a form feed before the result, but it's okay at the
> end.
> >
> > > I don't think using it by default can be the best. Lot of people don't
> use specialized pagers, but it can be set by \pset. Form feed should be
> used on end
> > >
> > > \pset formfeed [on, off]
> >
> > I think it's a good idea to be able to switch with \pset.
>
> I have created a patch that allows you to turn it on and off in \pset.
> The attached patch adds the following features.
>
>
> Formfeed can be turned on with the command line option or \pset.
> Formfeed (\f\n) is output after the query execution result by \watch.
>
> I think the considerations are as follows.
>
> * Is formfeed output after the result, not before?
>

We are talking about the first iteration. In the second and other iteration
this question has no sense. You know the starting point. You don't know the
endpoint. So I think so using formfeed on the end is good idea.

* Is the formfeed output only "\f\n"?
>

yes


> * Is the formfeed output only executed by \watch?
>

This is a good question. I think the implementation for \watch is a good
start. But it can help with normal results too. In pspg it can be very
useful for incremental load or for streaming mode. But if it will be used
everywhere, then it should be used just for some specified pagers.



> * Is the name "formfeed" appropriate?
>

If it will do work of formfeed, then the formfeed is good name.


>
> If the formfeed is output before the query result,
> it will be better if the screen is reset when the formfeed is read.
>

I think, so you propose another feature - reset terminal sequence - it can
be a good idea. Not for pspg, but generally, why not.

Regards

Pavel


>
> Furthermore, if the terminal clear string can be set instead of the
> formfeed character,
> the \watch output can be fixedly displayed without a pager.
>
> (I noticed that if I set it in the title, it would behave similarly in
> the current version.
> # \C '\033[2J;\033[0;0H'
> # SELECT now();\watch 1
> )
>
> Also, it may be good to output formfeed when outputting a file with
> `\o result.txt`
> other than \watch.
>

Re: BufferAlloc: don't take two simultaneous locks

2022-02-24 Thread Simon Riggs

On Mon, 21 Feb 2022 at 08:06, Yura Sokolov  wrote:
>
> Good day, Kyotaro Horiguchi and hackers.
>
> В Чт, 17/02/2022 в 14:16 +0900, Kyotaro Horiguchi пишет:
> > At Wed, 16 Feb 2022 10:40:56 +0300, Yura Sokolov  
> > wrote in
> > > Hello, all.
> > >
> > > I thought about patch simplification, and tested version
> > > without BufTable and dynahash api change at all.
> > >
> > > It performs suprisingly well. It is just a bit worse
> > > than v1 since there is more contention around dynahash's
> > > freelist, but most of improvement remains.
> > >
> > > I'll finish benchmarking and will attach graphs with
> > > next message. Patch is attached here.
> >
> > Thanks for the new patch.  The patch as a whole looks fine to me. But
> > some comments needs to be revised.
>
> Thank you for review and remarks.

v3 gets the buffer partition locking right, well done, great results!

In v3, the comment at line 1279 still implies we take both locks
together, which is not now the case.

Dynahash actions are still possible. You now have the BufTableDelete
before the BufTableInsert, which opens up the possibility I discussed
here:
http://postgr.es/m/CANbhV-F0H-8oB_A+m=55hP0e0QRL=rdddqusxmtft6jprdx...@mail.gmail.com
(Apologies for raising a similar topic, I hadn't noticed this thread
before; thanks to Horiguchi-san for pointing this out).

v1 had a horrible API (sorry!) where you returned the entry and then
explicitly re-used it. I think we *should* make changes to dynahash,
but not with the API you proposed.

Proposal for new BufTable API
BufTableReuse() - similar to BufTableDelete() but does NOT put entry
back on freelist, we remember it in a private single item cache in
dynahash
BufTableAssign() - similar to BufTableInsert() but can only be
executed directly after BufTableReuse(), fails with ERROR otherwise.
Takes the entry from single item cache and re-assigns it to new tag

In dynahash we have two new modes that match the above
HASH_REUSE - used by BufTableReuse(), similar to HASH_REMOVE, but
places entry on the single item cache, avoiding freelist
HASH_ASSIGN - used by BufTableAssign(), similar to HASH_ENTER, but
uses the entry from the single item cache, rather than asking freelist
This last call can fail if someone else already inserted the tag, in
which case it adds the single item cache entry back onto freelist

Notice that single item cache is not in shared memory, so on abort we
should give it back, so we probably need an extra API call for that
also to avoid leaking an entry.

Doing it this way allows us to
* avoid touching freelists altogether in the common path - we know we
are about to reassign the entry, so we do remember it - no contention
from other backends, no borrowing etc..
* avoid sharing the private details outside of the dynahash module
* allows us to use the same technique elsewhere that we have
partitioned hash tables

This approach is cleaner than v1, but should also perform better
because there will be a 1:1 relationship between a buffer and its
dynahash entry, most of the time.

With these changes, I think we will be able to *reduce* the number of
freelists for partitioned dynahash from 32 to maybe 8, as originally
speculated by Robert in 2016:

https://www.postgresql.org/message-id/CA%2BTgmoZkg-04rcNRURt%3DjAG0Cs5oPyB-qKxH4wqX09e-oXy-nw%40mail.gmail.com
since the freelists will be much less contended with the above approach

It would be useful to see performance with a higher number of connections, >400.

--
Simon Riggshttp://www.EnterpriseDB.com/

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

2022-02-24 Thread Imseih (AWS), Sami

>Nice catch!  However, I'm not sure I like the patch.

> * made it through and start writing after the portion that 
> persisted.
> * (It's critical to first write an OVERWRITE_CONTRECORD message, 
> which
> * we'll do as soon as we're open for writing new WAL.)
>+*
>+* If the last wal record is ahead of the missing contrecord, this 
> is a
>+* recently promoted primary and we should not write an overwrite
>+* contrecord.

>Before the part, the comment follows the part shown below.

>>* Actually, if WAL ended in an incomplete record, skip the parts 
> that

>So, actually WAL did not ended in an incomplete record.  I think
>FinishWalRecover is the last place to do that. (But it could be
>earlier.)

Thanks for the feedback. 

## On the primary, an OVERWRITE_CONTRECORD is written. The aborted record 
(abortedRecPtr) and insert position of the missing contrecord 
(missingContrecPtr) are set during FinishWalRecovery and after recovery but 
before writes are accepted, the OVERWRITE_CONTRECORD is written.
## on the primary logs

2022-02-25 02:25:16.208 UTC [7397] LOG:  redo done at 0/1FFD2B8 system usage: 
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s
2022-02-25 02:25:16.208 UTC [7397] DEBUG:  resetting unlogged relations: 
cleanup 0 init 1
2022-02-25 02:25:16.209 UTC [7397] DEBUG:  creating and filling new WAL file
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  done creating and filling new WAL 
file
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  MultiXactId wrap limit is 
2147483648, limited by database with OID 1
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  MultiXact member stop limit is now 
4294914944 based on MultiXact 1
2022-02-25 02:25:16.217 UTC [7397] DEBUG:  OVERWRITE_CONTRECORD inserted at 
0/228 for aborted record 0/1FFD2E0 <<--- Attached V3 of patch that adds 
logging which shows the overwrite record created on the primary
2022-02-25 02:25:16.217 UTC [7395] LOG:  checkpoint starting: end-of-recovery 
immediate wait
2022-02-25 02:25:16.217 UTC [7395] DEBUG:  performing replication slot 
checkpoint
2022-02-25 02:25:16.218 UTC [7395] DEBUG:  attempting to remove WAL segments 
older than log file 
2022-02-25 02:25:16.218 UTC [7395] LOG:  checkpoint complete: wrote 131 buffers 
(102.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 
s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s; 
distance=11381 kB, estimate=11381 kB
2022-02-25 02:25:16.219 UTC [7394] DEBUG:  starting background worker process 
"logical replication launcher"
2022-02-25 02:25:16.219 UTC [7394] LOG:  database system is ready to accept 
connections

## A downstream standby now skips over this OVERWRITE_CONTRECORD, but the value 
for missingContrecPtr is not invalidated afterwards.
## on the standby logs

2022-02-25 02:25:16.616 UTC [7413] DEBUG:  sending hot standby feedback xmin 0 
epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
2022-02-25 02:25:16.616 UTC [7387] LOG:  successfully skipped missing 
contrecord at 0/1FFD2E0, overwritten at 2022-02-25 02:25:16.2175+00
2022-02-25 02:25:16.616 UTC [7387] CONTEXT:  WAL redo at 0/228 for 
XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFD2E0; time 2022-02-25 02:25:16.2175+00

## After promotion, the standby attempts to write the overwrite_contrecord 
again using the missingContrecPtr LSN which is now in the past.
## on the standby logs

2022-02-25 02:25:16.646 UTC [7387] LOG:  invalid record length at 0/201EC70: 
wanted 24, got 0
2022-02-25 02:25:16.646 UTC [7387] LOG:  redo done at 0/201EC48 system usage: 
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.55 s
2022-02-25 02:25:16.646 UTC [7387] LOG:  last completed transaction was at log 
time 2022-02-25 02:25:16.301554+00
2022-02-25 02:25:16.646 UTC [7387] DEBUG:  resetting unlogged relations: 
cleanup 0 init 1
2022-02-25 02:25:16.646 UTC [7387] LOG:  selected new timeline ID: 2
2022-02-25 02:25:16.646 UTC [7387] DEBUG:  updated min recovery point to 
0/201EC70 on timeline 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  could not remove file 
"pg_wal/00020002": No such file or directory
2022-02-25 02:25:16.656 UTC [7387] LOG:  archive recovery complete
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  MultiXactId wrap limit is 
2147483648, limited by database with OID 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  MultiXact member stop limit is now 
4294914944 based on MultiXact 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  OVERWRITE_CONTRECORD inserted at 
0/228 for aborted record 0/1FFD2E0 <<--- The same overwrite record is 
written on the recently promoted standby
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  attempting to remove WAL segments 
newer than log file 00020001
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  recycled write-ahead log file 
"00010002"
2022-02-25 02:25:16.656 UTC [7387] DEBUG:  release all standby locks
2022-02-25

Re: Separate the result of \watch for each query execution (psql)

2022-02-24 Thread Noboru Saito

Hi,

Pavel Stehule :
> > I strongly agree. It was a lot of work to find a workable solution for 
> > pspg. Special chars that starting result and maybe other, that ending 
> > result can significantly increase robustness and can reduce code. I think 
> > it can be better to use form feed at the end of form - like it is semantic 
> > of form feed. You know, at this moment, the result is complete. 
> > https://en.wikipedia.org/wiki/Page_break
>
> It's easier to print a form feed before the result, but it's okay at the end.
>
> > I don't think using it by default can be the best. Lot of people don't use 
> > specialized pagers, but it can be set by \pset. Form feed should be used on 
> > end
> >
> > \pset formfeed [on, off]
>
> I think it's a good idea to be able to switch with \pset.

I have created a patch that allows you to turn it on and off in \pset.
The attached patch adds the following features.


Formfeed can be turned on with the command line option or \pset.
Formfeed (\f\n) is output after the query execution result by \watch.

I think the considerations are as follows.

* Is formfeed output after the result, not before?
* Is the formfeed output only "\f\n"?
* Is the formfeed output only executed by \watch?
* Is the name "formfeed" appropriate?

If the formfeed is output before the query result,
it will be better if the screen is reset when the formfeed is read.

Furthermore, if the terminal clear string can be set instead of the
formfeed character,
the \watch output can be fixedly displayed without a pager.

(I noticed that if I set it in the title, it would behave similarly in
the current version.
# \C '\033[2J;\033[0;0H'
# SELECT now();\watch 1
)

Also, it may be good to output formfeed when outputting a file with
`\o result.txt`
other than \watch.
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index caabb06c53..68a876c8bd 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -256,6 +256,16 @@ EOF
   
 
 
+
+  --formfeed
+  
+  
+  Turn on the formfeed. This is equivalent to
+  \pset formfeed.
+  
+  
+
+
 
   -h hostname
   --host=hostname
@@ -2879,6 +2889,14 @@ lo_import 152801
   Unique abbreviations are allowed.
   
 
+  
+  formfeed
+  
+  
+  Outputs formfeed after every result when using the
+  \watch command.
+  
+
   aligned format is the standard,
   human-readable, nicely formatted text output; this is the default.
   
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 292cff5df9..2a1fd6a22d 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -2242,8 +2242,8 @@ exec_command_pset(PsqlScanState scan_state, bool active_branch)
 			int			i;
 			static const char *const my_list[] = {
 "border", "columns", "csv_fieldsep", "expanded", "fieldsep",
-"fieldsep_zero", "footer", "format", "linestyle", "null",
-"numericlocale", "pager", "pager_min_lines",
+"fieldsep_zero", "footer", "format", "formfeed", "linestyle",
+"null", "numericlocale", "pager", "pager_min_lines",
 "recordsep", "recordsep_zero",
 "tableattr", "title", "tuples_only",
 "unicode_border_linestyle",
@@ -4532,6 +4532,16 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet)
 		if (value)
 			popt->topt.columns = atoi(value);
 	}
+
+	/* toggle output formfeed */
+	else if (strcmp(param, "formfeed") == 0)
+	{
+		if (value)
+			return ParseVariableBool(value, param, >topt.formfeed);
+		else
+			popt->topt.formfeed = !popt->topt.formfeed;
+	}
+
 	else
 	{
 		pg_log_error("\\pset: unknown option: %s", param);
@@ -4701,6 +4711,15 @@ printPsetInfo(const char *param, printQueryOpt *popt)
 			printf(_("Tuples only is off.\n"));
 	}
 
+	/* show toggle output formfeed */
+	else if (strcmp(param, "formfeed") == 0)
+	{
+		if (popt->topt.formfeed)
+			printf(_("formfeed is on.\n"));
+		else
+			printf(_("formfeed is off.\n"));
+	}
+
 	/* Unicode style formatting */
 	else if (strcmp(param, "unicode_border_linestyle") == 0)
 	{
@@ -4895,6 +4914,8 @@ pset_value_string(const char *param, printQueryOpt *popt)
 		return popt->title ? pset_quoted_string(popt->title) : pstrdup("");
 	else if (strcmp(param, "tuples_only") == 0)
 		return pstrdup(pset_bool_string(popt->topt.tuples_only));
+	else if (strcmp(param, "formfeed") == 0)
+		return pstrdup(pset_bool_string(popt->topt.formfeed));
 	else if (strcmp(param, "unicode_border_linestyle") == 0)
 		return pstrdup(_unicode_linestyle2string(popt->topt.unicode_border_linestyle));
 	else if (strcmp(param, "unicode_column_linestyle") == 0)
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index d65b9a124f..63eff20c55 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -647,6 +647,8 @@ PSQLexecWatch(const char *query, const printQueryOpt *opt, FILE *printQueryFout)
 	{
 		case

Re: Optionally automatically disable logical replication subscriptions on error

2022-02-24 Thread Amit Kapila

On Thu, Feb 24, 2022 at 6:30 PM Masahiko Sawada  wrote:
>
> On Thu, Feb 24, 2022 at 8:08 PM Amit Kapila  wrote:
> >
> > On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada  
> > wrote:
> > >
> > > Here are some comments:
> > >
> > > Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()?
> > >
> >
> > I have given this comment to move the related code to separate
> > functions to slightly simplify ApplyWorkerMain() code but if you don't
> > like we can move it back. I am not sure I like the new function names
> > in the patch though.
>
> Okay, I'm fine with moving this code but perhaps we can find a better
> function name as "Wrapper" seems slightly odd to me.
>

Agreed.

> For example,
> start_table_sync_start() and start_apply_changes() or something (it
> seems we use the snake case for static functions in worker.c).
>

I am fine with something on these lines, how about start_table_sync()
and start_apply() respectively?

-- 
With Regards,
Amit Kapila.

Re: Proposal: Support custom authentication methods using hooks

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 17:02:45 -0800, Jeff Davis wrote:
> On Thu, 2022-02-17 at 11:25 -0800, samay sharma wrote:
> One caveat is that this only works given information available from
> existing authentication methods, because that's all the client
> supports. In practice, it seems to only be useful with plaintext
> password authentication over an SSL connection.

Why is it restricted to that? You could do sasl negotiation as well from what
I can see? And that'd theoretically also allow to negotiate whether the client
supports different ways of doing auth?  Not saying that that's easy, but I
don't think it's a fundamental restriction.

I also can imagine things like using selinux labeling of connections.

We have several useful authentication technologies built ontop of plaintext
exchange. Radius, Ldap, Pam afaics could be implemented as an extension?

Greetings,

Andres Freund

Re: Buffer Manager and Contention

2022-02-24 Thread Simon Riggs

On Fri, 25 Feb 2022 at 01:20, Kyotaro Horiguchi  wrote:

> Yura Sokolov is proposing a patch to separte the two partition locks.
>
> https://www.postgresql.org/message-id/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel%40postgrespro.ru
>
> And it seems to me viable for me and a benchmarking in the thread
> showed a good result.  I'd appreciate your input on that thread.

Certainly, I will stop this thread and contribute to Yura's thread.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: why do hash index builds use smgrextend() for new splitpoint pages

2022-02-24 Thread Amit Kapila

On Fri, Feb 25, 2022 at 8:54 AM Amit Kapila  wrote:
>
> On Fri, Feb 25, 2022 at 4:41 AM Melanie Plageman
>  wrote:
> >
> > I'm trying to understand why hash indexes are built primarily in shared
> > buffers except when allocating a new splitpoint's worth of bucket pages
> > -- which is done with smgrextend() directly in _hash_alloc_buckets().
> >
> > Is this just so that the value returned by smgrnblocks() includes the
> > new splitpoint's worth of bucket pages?
> >
> > All writes of tuple data to pages in this new splitpoint will go
> > through shared buffers (via hash_getnewbuf()).
> >
> > I asked this and got some thoughts from Robert in [1], but I still don't
> > really get it.
> >
> > When a new page is needed during the hash index build, why can't
> > _hash_expandtable() just call ReadBufferExtended() with P_NEW instead of
> > _hash_getnewbuf()? Does it have to do with the BUCKET_TO_BLKNO mapping?
> >
>
> We allocate the chunk of pages (power-of-2 groups) at the time of
> split which allows them to appear consecutively in an index.
>

I think allocating chunks of pages via "ReadBufferExtended() with
P_NEW" will be time-consuming as compared to what we do now.

-- 
With Regards,
Amit Kapila.

Re: Buffer Manager and Contention

2022-02-24 Thread Kyotaro Horiguchi

At Fri, 25 Feb 2022 10:20:25 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> (I added Yura, as the author of a related patch)
> 
> At Thu, 24 Feb 2022 12:58:23 +, Simon Riggs 
>  wrote in 
> > Thinking about poor performance in the case where the data fits in
> > RAM, but the working set is too big for shared_buffers, I notice a
> > couple of things that seem bad in BufMgr, but don't understand why
> > they are like that.
> > 
> > 1. If we need to allocate a buffer to a new block we do this in one
> > step, while holding both partition locks for the old and the new tag.
> > Surely it would cause less contention to make the old block/tag
> > invalid (after flushing), drop the old partition lock and then switch
> > to the new one? i.e. just hold one mapping partition lock at a time.
> > Is there a specific reason we do it this way?
> 
> I'm not sure but I guess the developer wanted to make the operation
> atomic.
> 
> Yura Sokolov is proposing a patch to separte the two partition locks.
> 
> https://www.postgresql.org/message-id/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel%40postgrespro.ru
> 
> And it seems to me viable for me and a benchmarking in the thread
> showed a good result.  I'd appreciate your input on that thread.
> 
> > 2. Possibly connected to the above, we issue BufTableInsert() BEFORE
> > we issue BufTableDelete(). That means we need extra entries in the
> > buffer mapping hash table to allow us to hold both the old and the new
> > at the same time, for a short period. The way dynahash.c works, we try
> 
> Yes.
> 
> > to allocate an entry from the freelist and if that doesn't work, we
> > begin searching ALL the freelists for free entries to steal. So if we
> > get enough people trying to do virtual I/O at the same time, then we
> > will hit a "freelist storm" where everybody is chasing the last few
> > entries. It would make more sense if we could do BufTableDelete()
> 
> To reduce that overhead, Yura proposed a surgically modification on
> dynahash, but I didn't like that and the latest patch doesn't contain
> that part.
> 
> > first, then hold onto the buffer mapping entry rather than add it to
> > the freelist, so we can use it again when we do BufTableInsert() very
> > shortly afterwards. That way we wouldn't need to search the freelist
> > at all. What is the benefit or reason of doing the Delete after the
> > Insert?
> 
> Hmm. something like hash_swap_key() or hash_reinsert_entry()?  That
> sounds reasonable. (Yura's proposal was taking out an entry from hash
> then attach it with a new key again.)
> 
> > Put that another way, it looks like BufTable functions are used in a
> > way that mismatches against the way dynahash is designed.
> > 
> > Thoughts?
> 
> On the first part, I think Yura's patch works.  On the second point,
> Yura already showed it gives a certain amount of gain if we do that.

On second thought, even if we have a new dynahash API to atomically
replace hash key, we need to hold two partition locks to do that. That
is contradicting our objective.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset

2022-02-24 Thread Amit Kapila

On Thu, Feb 24, 2022 at 12:03 PM Michael Paquier  wrote:
>
> On Wed, Feb 23, 2022 at 09:52:02AM +0530, Amit Kapila wrote:
> > Thanks, so you are okay with me pushing that patch just to HEAD.
>
> Yes, I am fine with that.  I am wondering about patching the second
> function though, to avoid any risk of forgetting it, but I am fine to
> leave that to your judgement.
>

The corresponding patch with other changes is not very far from being
ready to commit. So, will do it along with that.

> > We don't want to backpatch this to 14 as this is a catalog change and
> > won't cause any user-visible issue, is that correct?
>
> Yup, that's a HEAD-only cleanup, I am afraid.
>

Thanks, Done!

-- 
With Regards,
Amit Kapila.

Re: why do hash index builds use smgrextend() for new splitpoint pages

2022-02-24 Thread Amit Kapila

On Fri, Feb 25, 2022 at 4:41 AM Melanie Plageman
 wrote:
>
> I'm trying to understand why hash indexes are built primarily in shared
> buffers except when allocating a new splitpoint's worth of bucket pages
> -- which is done with smgrextend() directly in _hash_alloc_buckets().
>
> Is this just so that the value returned by smgrnblocks() includes the
> new splitpoint's worth of bucket pages?
>
> All writes of tuple data to pages in this new splitpoint will go
> through shared buffers (via hash_getnewbuf()).
>
> I asked this and got some thoughts from Robert in [1], but I still don't
> really get it.
>
> When a new page is needed during the hash index build, why can't
> _hash_expandtable() just call ReadBufferExtended() with P_NEW instead of
> _hash_getnewbuf()? Does it have to do with the BUCKET_TO_BLKNO mapping?
>

We allocate the chunk of pages (power-of-2 groups) at the time of
split which allows them to appear consecutively in an index. This
helps us to compute the physical block number from bucket number
easily (BUCKET_TO_BLKNO mapping) with some minimal control
information.

-- 
With Regards,
Amit Kapila.

Re: Add id's to various elements in protocol.sgml

2022-02-24 Thread Maciek Sakrejda

On Thu, Feb 24, 2022, 16:52 Kyotaro Horiguchi 
wrote:

> At Tue, 21 Dec 2021 08:47:27 +0100, Brar Piening  wrote in
> > On 20.12.2021 at 16:09, Robert Haas wrote:
> > > As a data point, this is something I have also wanted to do, from time
> > > to time. I am generally of the opinion that any place the
>
> +1 from me.  When I put an URL in the answer for inquiries, I always
> look into the html for name/id tags so that the inquirer quickly find
> the information source (or the backing or reference point) on the
> page.


+1 here as well. I often do the exact same thing. It's not hard, but it's a
little tedious, especially considering most modern doc systems support
linkable sections.

Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-24 Thread Peter Smith

Below are my review comments for the v3 patch.

==

1. Commit message

(An earlier review comment [Peter-v2] #2 was only partly fixed)

"there are no longer any relation information" --> "there is no longer
any relation information"

~~~

2. doc/src/sgml/monitoring.sgml

+  
pg_stat_subscription_activitypg_stat_subscription_activity
+  One row per subscription, showing statistics about subscription
+  activity.
+  See 
+  pg_stat_subscription_activity
for details.
   
  

Currently these stats are only about errors. These are not really
statistics about "activity" though. Probably it is better just to
avoid that word altogether?

SUGGESTIONS

e.g.1. "One row per subscription, showing statistics about
subscription activity." --> "One row per subscription, showing
statistics about errors."

e.g.2. "One row per subscription, showing statistics about
subscription activity." --> "One row per subscription, showing
statistics about that subscription."

-
[Peter-v2] 
https://www.postgresql.org/message-id/CAHut%2BPv%3DVmXtHmPKp4fg8VDF%2BTQP6xWgL91Jn-hrqg5QObfCZA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-02-24 Thread Andres Freund

Hi, 

On February 24, 2022 5:44:57 PM PST, Tom Lane  wrote:
>Tomas Vondra  writes:
>> I wonder if we could check the index tuple length, and adjust the siglen
>> based on that, somehow ...
>
>In an already-corrupted index, there won't be a unique answer.

If we're breaking every ltree gist index, can we detect that it's not an index 
generated by the new version? Most people don't read release notes, and this is 
just about guaranteed to break all. 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Proposal: Support custom authentication methods using hooks

2022-02-24 Thread Tom Lane

Jeff Davis  writes:
> On Thu, 2022-02-17 at 11:25 -0800, samay sharma wrote:
>> To enable this, I've proposed adding a new authentication method
>> "custom" which can be specified in pg_hba.conf and takes a mandatory
>> argument  "provider" specifying which authentication provider to use.

> One caveat is that this only works given information available from
> existing authentication methods, because that's all the client
> supports. In practice, it seems to only be useful with plaintext
> password authentication over an SSL connection.

... and, since we can't readily enforce that the client only sends
those cleartext passwords over suitably-encrypted connections, this
could easily be a net negative for security.  Not sure that I think
it's a good idea.

regards, tom lane

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-02-24 Thread Tom Lane

Tomas Vondra  writes:
> I wonder if we could check the index tuple length, and adjust the siglen
> based on that, somehow ...

In an already-corrupted index, there won't be a unique answer.

regards, tom lane

Re: Buffer Manager and Contention

2022-02-24 Thread Kyotaro Horiguchi

(I added Yura, as the author of a related patch)

At Thu, 24 Feb 2022 12:58:23 +, Simon Riggs  
wrote in 
> Thinking about poor performance in the case where the data fits in
> RAM, but the working set is too big for shared_buffers, I notice a
> couple of things that seem bad in BufMgr, but don't understand why
> they are like that.
> 
> 1. If we need to allocate a buffer to a new block we do this in one
> step, while holding both partition locks for the old and the new tag.
> Surely it would cause less contention to make the old block/tag
> invalid (after flushing), drop the old partition lock and then switch
> to the new one? i.e. just hold one mapping partition lock at a time.
> Is there a specific reason we do it this way?

I'm not sure but I guess the developer wanted to make the operation
atomic.

Yura Sokolov is proposing a patch to separte the two partition locks.

https://www.postgresql.org/message-id/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel%40postgrespro.ru

And it seems to me viable for me and a benchmarking in the thread
showed a good result.  I'd appreciate your input on that thread.

> 2. Possibly connected to the above, we issue BufTableInsert() BEFORE
> we issue BufTableDelete(). That means we need extra entries in the
> buffer mapping hash table to allow us to hold both the old and the new
> at the same time, for a short period. The way dynahash.c works, we try

Yes.

> to allocate an entry from the freelist and if that doesn't work, we
> begin searching ALL the freelists for free entries to steal. So if we
> get enough people trying to do virtual I/O at the same time, then we
> will hit a "freelist storm" where everybody is chasing the last few
> entries. It would make more sense if we could do BufTableDelete()

To reduce that overhead, Yura proposed a surgically modification on
dynahash, but I didn't like that and the latest patch doesn't contain
that part.

> first, then hold onto the buffer mapping entry rather than add it to
> the freelist, so we can use it again when we do BufTableInsert() very
> shortly afterwards. That way we wouldn't need to search the freelist
> at all. What is the benefit or reason of doing the Delete after the
> Insert?

Hmm. something like hash_swap_key() or hash_reinsert_entry()?  That
sounds reasonable. (Yura's proposal was taking out an entry from hash
then attach it with a new key again.)

> Put that another way, it looks like BufTable functions are used in a
> way that mismatches against the way dynahash is designed.
> 
> Thoughts?

On the first part, I think Yura's patch works.  On the second point,
Yura already showed it gives a certain amount of gain if we do that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: set TESTDIR from perl rather than Makefile

2022-02-24 Thread Justin Pryzby

On Mon, Feb 21, 2022 at 07:00:54AM -0500, Andrew Dunstan wrote:
> On 2/19/22 18:53, Justin Pryzby wrote:
> > On Sat, Feb 19, 2022 at 05:41:49PM -0600, Justin Pryzby wrote:
> >> I rebased and fixed the check-guc script to work, made it work with vpath
> >> builds, and cleaned it up some.
> > I also meant to also attach it.
> 
> This is going to break a bunch of stuff as written.
> 
> First, it's not doing the same thing. The current system sets TESTDIR to
> be the parent of the directory that holds the test. e.g. for
> src/bin/pg_ctl/t/001_start_stop.pl it's src/bin/pg_ctl in the build
> tree, not the 't' subdirectory. This patch apparently sets it to the 't'
> subdirectory. That will break anything that goes looking for log files
> in the current location, like the buildfarm client, and possibly some CI
> setups as well.

Yes, thanks.

I changed the patch to use ENV{CURDIR} || dirname(dirname($0)).  If I'm not
wrong, that seems to be doing the right thing.

> Also, unless I'm mistaken it appears to to the wrong thing for vpath
> builds:
> 
> my $test_dir = File::Spec->rel2abs(dirname($0));
> 
> is completely wrong for vpaths, since that will place it in the source
> tree, not the build tree.
> 
> Last, and for no explained reason that I can see, the patch undoes
> commit f4ce6c4d3a, but only for msvc builds. Even if that's justified it
> appears to have no relevance to this patch.

Andres' idea is that perl should set TESTDIR and PATH.  Here I commented out
PATH, and had the improbable issue that nothing seemed to be breaking,
including the pipeline test under msvc.  It'd be helpful to know what
configuration that breaks so I can test that it's broken and then test that
it's fixed when set from within perl...

I got busy here, and may not be able to progress this for awhile.
>From b46b405565a2fce3f96a1dcddf6d35ae7f3acc6d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 19 Feb 2022 13:06:52 -0600
Subject: [PATCH] wip: set TESTDIR from src/test/perl rather than
 Makefile/vcregress

These seem most likely to break:

make check -C src/bin/psql
make check -C src/bin/pgbench
make check -C src/test/modules/test_misc
make check -C src/test/modules/libpq_pipeline
PROVE_TESTS=t/027_stream_regress.pl make check -C src/test/recovery
---
 src/Makefile.global.in |  9 ++---
 src/test/perl/PostgreSQL/Test/Utils.pm | 17 ++---
 src/tools/msvc/vcregress.pl|  4 +---
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index c980444233..92649d0193 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -451,8 +451,9 @@ define prove_installcheck
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
+   PATH="$(bindir):$(CURDIR):$$PATH" \
PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' \
+   PG_SUBDIR='$(CURDIR)' \
PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \
$(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
 endef
@@ -461,8 +462,9 @@ define prove_installcheck
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' PATH="$(bindir):$(CURDIR):$$PATH" \
+   PATH="$(bindir):$(CURDIR):$$PATH" \
PGPORT='6$(DEF_PGPORT)' top_builddir='$(top_builddir)' \
+   PG_SUBDIR='$(CURDIR)' \
PG_REGRESS='$(top_builddir)/src/test/regress/pg_regress' \
$(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
 endef
@@ -472,7 +474,8 @@ define prove_check
 rm -rf '$(CURDIR)'/tmp_check
 $(MKDIR_P) '$(CURDIR)'/tmp_check
 cd $(srcdir) && \
-   TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \
+   $(with_temp_install) PGPORT='6$(DEF_PGPORT)' \
+   PG_SUBDIR='$(CURDIR)' \
PG_REGRESS='$(CURDIR)/$(top_builddir)/src/test/regress/pg_regress' \
$(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) $(if $(PROVE_TESTS),$(PROVE_TESTS),t/*.pl)
 endef
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 46cd746796..74ccaa08d9 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -184,19 +184,22 @@ INIT
 	# test may still fail, but it's more likely to report useful facts.
 	$SIG{PIPE} = 'IGNORE';
 
-	# Determine output directories, and create them.  The base path is the
-	# TESTDIR environment variable, which is normally set by the invoking
-	# Makefile.
-	$tmp_check = $ENV{TESTDIR} ? "$ENV{TESTDIR}/tmp_check" : "tmp_check";
+	my $test_dir = File::Spec->rel2abs($ENV{PG_SUBDIR} || dirname(dirname($0)));
+
+	my $test_name = basename($0);
+	$test_name =~ s/\.[^.]+$//;
+
+	# Determine output directories, and create them.
+	# TODO: set srcdir?
+	$tmp_check = "$test_dir/tmp_check";
 	$log_path = "$tmp_check/log";
+	$ENV{TESTDIR} = $test_dir;
 
 	mkdir $tmp_check;
 	mkdir

Re: Proposal: Support custom authentication methods using hooks

2022-02-24 Thread Jeff Davis

On Thu, 2022-02-17 at 11:25 -0800, samay sharma wrote:
> To enable this, I've proposed adding a new authentication method
> "custom" which can be specified in pg_hba.conf and takes a mandatory
> argument  "provider" specifying which authentication provider to use.
> I've also moved a couple static functions to headers so that
> extensions can call them.
> 
> Sample pg_hba.conf line to use a custom provider:
> 
> hostall all ::1/128  
>   custom provider=test

One caveat is that this only works given information available from
existing authentication methods, because that's all the client
supports. In practice, it seems to only be useful with plaintext
password authentication over an SSL connection.

I still like the approach though. There's a lot of useful stuff you can
do at authentication time with only the connection information and a
password. It could be useful to authenticate against different
services, or some kind of attack detection, etc.

Regards,
Jeff Davis

Re: Add id's to various elements in protocol.sgml

2022-02-24 Thread Chapman Flack

On 02/24/22 19:52, Kyotaro Horiguchi wrote:
> FWIW in that perspecive, there's no requirement from me that it should
> be human-readable.  I'm fine with automatically-generated ids.

One thing I would be −many on, though, would be automatically-generated ids
that are not, somehow, stable. I've been burned too many times making links
to auto-generated anchors in other projects' docs that change every time
they republish the wretched things.

Regards,
-Chap

Re: Add id's to various elements in protocol.sgml

2022-02-24 Thread Kyotaro Horiguchi

At Tue, 21 Dec 2021 08:47:27 +0100, Brar Piening  wrote in 
> On 20.12.2021 at 16:09, Robert Haas wrote:
> > As a data point, this is something I have also wanted to do, from time
> > to time. I am generally of the opinion that any place the

+1 from me.  When I put an URL in the answer for inquiries, I always
look into the html for name/id tags so that the inquirer quickly find
the information source (or the backing or reference point) on the
page.  If not found, I place a snippet instead.

> > documentation has a long list of things, which should add ids, so that
> > people can link to the particular thing in the list to which they want
> > to draw someone's attention.
> >
> Thank you.
> 
> If there is consensus on generally adding links to long lists I'd take
> suggestions for other places where people think that this would make
> sense and amend my patch.

I don't think there is.

I remember sometimes wanted ids on some sections(x.x) and
items(x.x.x or lower) (or even clauses, ignoring costs:p)

FWIW in that perspecive, there's no requirement from me that it should
be human-readable.  I'm fine with automatically-generated ids.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Proposal: Support custom authentication methods using hooks

2022-02-24 Thread samay sharma

Hi Aleksander,

On Thu, Feb 24, 2022 at 1:17 AM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Samay,
>
> > I wanted to submit a patch to expose 2 new hooks (one for the
> authentication check and another one for error reporting) in auth.c. These
> will allow users to implement their own authentication methods for Postgres
> or add custom logic around authentication.
>
> I like the idea - PostgreSQL is all about extendability. Also, well
> done with TAP tests and an example extension. This being said, I
> didn't look at the code yet, but cfbot seems to be happy with it:
> http://cfbot.cputube.org/

Thank you!

>
>
> > One constraint in the current implementation is that we allow only one
> authentication provider to be loaded at a time. In the future, we can add
> more functionality to maintain an array of hooks and call the appropriate
> one based on the provider name in the pg_hba line.
>
> This sounds like a pretty severe and unnecessary limitation to me. Do
> you think it would be difficult to bypass it in the first
> implementation?
>

Just to clarify, the limitation is around usage of multiple custom
providers but does allow users to use the existing methods with one custom
authentication method. The reasons I thought this is ok to start with are:
* It allows users to plugin a custom authentication method which they can't
do today.
* Users *generally* have an authentication method for their database (eg.
we use ldap for authentication, or we use md5 passwords for
authentication). While it is possible, I'm not sure how many users use
completely different authentication methods (other than standard ones like
password and trust) for different IPs/databases for their same instance.

So, I thought this should be good enough for a number of use cases.

I thought about adding support for multiple custom providers and the
approach I came up with is: Maintaining a list of all providers (their
names, check functions and error functions), making sure they are all valid
while parsing pg_hba.conf and calling the right one when we see the custom
keyword in pg_hba.conf based on name. I'm not sure (I haven't yet checked)
if we have other such hooks in Postgres where multiple extensions use them
and Postgres calls the right hook based on input (instead of just calling
whoever has the hook).

Therefore, I wanted to start with something simple so users can begin using
auth methods of their choice, and add multiple providers as an enhancement
after doing more research and coming up with the right way to implement it.

That being said, any thoughts on the approach I mentioned above? I'll
look into it and see if it's not too difficult to implement this.

Regards,
Samay

> --
> Best regards,
> Aleksander Alekseev
>

why do hash index builds use smgrextend() for new splitpoint pages

2022-02-24 Thread Melanie Plageman

I'm trying to understand why hash indexes are built primarily in shared
buffers except when allocating a new splitpoint's worth of bucket pages
-- which is done with smgrextend() directly in _hash_alloc_buckets().

Is this just so that the value returned by smgrnblocks() includes the
new splitpoint's worth of bucket pages?

All writes of tuple data to pages in this new splitpoint will go
through shared buffers (via hash_getnewbuf()).

I asked this and got some thoughts from Robert in [1], but I still don't
really get it.

When a new page is needed during the hash index build, why can't
_hash_expandtable() just call ReadBufferExtended() with P_NEW instead of
_hash_getnewbuf()? Does it have to do with the BUCKET_TO_BLKNO mapping?

- Melanie

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoa%2BQFFhkHgPxyxv6t8aVU0r7GZmu7z8io4vGG7RHPpGzA%40mail.gmail.com

RE: Failed transaction statistics to measure the logical replication progress

2022-02-24 Thread osumi.takami...@fujitsu.com

On Wednesday, February 23, 2022 3:30 PM Amit Kapila 
> On Tue, Feb 22, 2022 at 6:45 AM tanghy.f...@fujitsu.com
>  wrote:
> >
> > I found a problem when using it. When a replication workers exits, the
> > transaction stats should be sent to stats collector if they were not
> > sent before because it didn't reach PGSTAT_STAT_INTERVAL. But I saw
> > that the stats weren't updated as expected.
> >
> > I looked into it and found that the replication worker would send the
> > transaction stats (if any) before it exits. But it got invalid subid
> > in pgstat_send_subworker_xact_stats(), which led to the following result:
> >
> > postgres=# select pg_stat_get_subscription_worker(0, null);
> > pg_stat_get_subscription_worker
> > -
> >  (0,,2,0,00,"",)
> > (1 row)
> >
> > I think that's because subid has already been cleaned when trying to
> > send the stats. I printed the value of before_shmem_exit_list, the
> > functions in this list would be called in shmem_exit() when the worker 
> > exits.
> > logicalrep_worker_onexit() would clean up the worker info (including
> > subid), and
> > pgstat_shutdown_hook() would send stats if any.
> > logicalrep_worker_onexit() was called before calling
> pgstat_shutdown_hook().
> >
> 
> Yeah, I think that is a problem and maybe we can think of solving it by 
> sending
> the stats via logicalrep_worker_onexit before subid is cleared but not sure 
> that
> is a good idea. I feel we need to go back to the idea of v21 for sending stats
> instead of using pgstat_report_stat.
> I think the one thing which we could improve is to avoid trying to send it 
> each
> time before receiving each message by walrcv_receive and rather try to send it
> before we try to wait (WaitLatchOrSocket).
> Trying after each message doesn't seem to be required and could lead to some
> overhead as well. What do you think?
I agree. Fixed.

Kindly have a look at v24 shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373A3E1BE237BAF38185BF2ED3D9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi

RE: Failed transaction statistics to measure the logical replication progress

2022-02-24 Thread osumi.takami...@fujitsu.com

On Thursday, February 24, 2022 11:07 AM Masahiko Sawada  
wrote:
> I have some comments on v23 patch:
> 
> @@ -66,6 +66,12 @@ typedef struct LogicalRepWorker
> TimestampTz last_recv_time;
> XLogRecPtr  reply_lsn;
> TimestampTz reply_time;
> +
> +   /*
> +* Transaction statistics of subscription worker
> +*/
> +   int64   commit_count;
> +   int64   abort_count;
>  } LogicalRepWorker;
> 
> I think that adding these statistics to the struct whose data is allocated on 
> the
> shared memory is not a good idea since they don't need to be shared. We might
> want to add more statistics for subscriptions such as insert_count and
> update_count in the future. I think it's better to track these statistics in 
> local
> memory either in worker.c or pgstat.c.
Fixed.

> +/* --
> + * pgstat_report_subworker_xact_end() -
> + *
> + *  Update the statistics of subscription worker and have
> + *  pgstat_report_stat send a message to stats collector
> + *  after count increment.
> + * --
> + */
> +void
> +pgstat_report_subworker_xact_end(bool is_commit) {
> +if (is_commit)
> +MyLogicalRepWorker->commit_count++;
> +else
> +MyLogicalRepWorker->abort_count++;
> +}
> 
> It's slightly odd and it seems unnecessary to me that we modify fields of
> MyLogicalRepWorker in pgstat.c. Although this function has “report”
> in its name but it just increments the counter. I think we can do that in 
> worker.c.
Fixed.


Also, I made the timing adjustment logic
back and now have the independent one as Amit-san suggested in [1].

Kindly have a look at v24.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1LWYc15%3DASj1tMTEFsXtxu%3D02aGoMwq9YanUVr9-QMhdQ%40mail.gmail.com


Best Regards,
Takamichi Osumi



v24-0001-Extend-pg_stat_subscription_workers-to-include-g.patch
Description:  v24-0001-Extend-pg_stat_subscription_workers-to-include-g.patch

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-02-24 Thread Tomas Vondra




On 2/24/22 23:13, Tomas Vondra wrote:
> 
> 
> On 2/24/22 23:06, Andres Freund wrote:
>> Hi,
>>
>> On 2022-02-24 22:15:20 +0100, Tomas Vondra wrote:
>>> After thinking about this I only see two options:
>>>
>>> 1) Don't apply the patch and tell everyone using ltree_gist they need to
>>> rebuild the index after pg_upgrade from 12 to 13+. The downside of this
>>> is larger indexes (because some tuples are 20B larger).
>>>
>>> 2) Apply the patch + tell those who already upgraded from 12 to rebuild
>>> ltree_gist indexes, because those might be broken due to new inserts.
>>>
>>>
>>> My opinion is to do (2), because at least those who'll upgrade later
>>> (which is a lot of people) will be fine without a rebuild. And it also
>>> makes the indexes a bit smaller, thanks to saving 20B.
>>
>> Won't 2) also break indexes created without a pg_upgrade? "already upgraded
>> from 12" sounds like it wouldn't but I don't see why?
>>
> 
> It will, unfortunately - that's why I wrote "upgrade" in that sentence.
> I should have been more explicit, sorry. But any new index tuples formed
> after starting the 13+ cluster are/may be corrupted.
> 

I wonder if we could check the index tuple length, and adjust the siglen
based on that, somehow ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-02-24 Thread Tomas Vondra




On 2/24/22 23:06, Andres Freund wrote:
> Hi,
> 
> On 2022-02-24 22:15:20 +0100, Tomas Vondra wrote:
>> After thinking about this I only see two options:
>>
>> 1) Don't apply the patch and tell everyone using ltree_gist they need to
>> rebuild the index after pg_upgrade from 12 to 13+. The downside of this
>> is larger indexes (because some tuples are 20B larger).
>>
>> 2) Apply the patch + tell those who already upgraded from 12 to rebuild
>> ltree_gist indexes, because those might be broken due to new inserts.
>>
>>
>> My opinion is to do (2), because at least those who'll upgrade later
>> (which is a lot of people) will be fine without a rebuild. And it also
>> makes the indexes a bit smaller, thanks to saving 20B.
> 
> Won't 2) also break indexes created without a pg_upgrade? "already upgraded
> from 12" sounds like it wouldn't but I don't see why?
> 

It will, unfortunately - that's why I wrote "upgrade" in that sentence.
I should have been more explicit, sorry. But any new index tuples formed
after starting the 13+ cluster are/may be corrupted.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 22:15:20 +0100, Tomas Vondra wrote:
> After thinking about this I only see two options:
> 
> 1) Don't apply the patch and tell everyone using ltree_gist they need to
> rebuild the index after pg_upgrade from 12 to 13+. The downside of this
> is larger indexes (because some tuples are 20B larger).
> 
> 2) Apply the patch + tell those who already upgraded from 12 to rebuild
> ltree_gist indexes, because those might be broken due to new inserts.
> 
> 
> My opinion is to do (2), because at least those who'll upgrade later
> (which is a lot of people) will be fine without a rebuild. And it also
> makes the indexes a bit smaller, thanks to saving 20B.

Won't 2) also break indexes created without a pg_upgrade? "already upgraded
from 12" sounds like it wouldn't but I don't see why?

Greetings,

Andres Freund

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-02-24 Thread Tom Lane

Tomas Vondra  writes:
> After thinking about this I only see two options:

> 1) Don't apply the patch and tell everyone using ltree_gist they need to
> rebuild the index after pg_upgrade from 12 to 13+. The downside of this
> is larger indexes (because some tuples are 20B larger).

> 2) Apply the patch + tell those who already upgraded from 12 to rebuild
> ltree_gist indexes, because those might be broken due to new inserts.

Yeah, let's fix it and advise people to reindex those indexes.
Won't be the first time release notes have contained such advice.
Plus, it sounds like we'd have to advise people to reindex *anyway*,
just in a slightly different set of cases.

regards, tom lane

Re: remove more archiving overhead

2022-02-24 Thread Nathan Bossart

I tested this again with many more WAL files and a much larger machine
(r5d.24xlarge with data directory on an NVMe SSD instance store volume).
As before, I am using a bare-bones archive module that does nothing but
return true.  Without the patch, archiving ~120k WAL files took about 109
seconds.  With the patch, it took around 62 seconds, which amounts to a
~43% reduction in overhead.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

real/float example for testlibpq3

2022-02-24 Thread Mark Wong

Hi everyone,

Would adding additional examples to testlibpq3.c on handling more data
types be helpful?  I've attached a patch adding how to handle a REAL to
current example.

Regards,
Mark
diff --git a/src/test/examples/testlibpq3.c b/src/test/examples/testlibpq3.c
index 4f7b791388..0c1739fcbb 100644
--- a/src/test/examples/testlibpq3.c
+++ b/src/test/examples/testlibpq3.c
@@ -11,19 +11,21 @@
  * CREATE SCHEMA testlibpq3;
  * SET search_path = testlibpq3;
  * SET standard_conforming_strings = ON;
- * CREATE TABLE test1 (i int4, t text, b bytea);
- * INSERT INTO test1 values (1, 'joe''s place', '\000\001\002\003\004');
- * INSERT INTO test1 values (2, 'ho there', '\004\003\002\001\000');
+ * CREATE TABLE test1 (i int4, r real, t text, b bytea);
+ * INSERT INTO test1 values (1, 2.3, 'joe''s place', '\000\001\002\003\004');
+ * INSERT INTO test1 values (2, 4.5 'ho there', '\004\003\002\001\000');
  *
  * The expected output is:
  *
  * tuple 0: got
  * i = (4 bytes) 1
+ * r = (4 bytes) 2.3
  * t = (11 bytes) 'joe's place'
  * b = (5 bytes) \000\001\002\003\004
  *
  * tuple 0: got
  * i = (4 bytes) 2
+ * r = (4 bytes) 4.5
  * t = (8 bytes) 'ho there'
  * b = (5 bytes) \004\003\002\001\000
  */
@@ -62,32 +64,41 @@ show_binary_results(PGresult *res)
int i,
j;
int i_fnum,
+   r_fnum,
t_fnum,
b_fnum;
 
/* Use PQfnumber to avoid assumptions about field order in result */
i_fnum = PQfnumber(res, "i");
+   r_fnum = PQfnumber(res, "r");
t_fnum = PQfnumber(res, "t");
b_fnum = PQfnumber(res, "b");
 
for (i = 0; i < PQntuples(res); i++)
{
char   *iptr;
+   char   *rptr;
char   *tptr;
char   *bptr;
int blen;
int ival;
+   union {
+   int ival;
+   float   fval;
+   }   rval;
 
/* Get the field values (we ignore possibility they are null!) 
*/
iptr = PQgetvalue(res, i, i_fnum);
+   rptr = PQgetvalue(res, i, r_fnum);
tptr = PQgetvalue(res, i, t_fnum);
bptr = PQgetvalue(res, i, b_fnum);
 
/*
-* The binary representation of INT4 is in network byte order, 
which
-* we'd better coerce to the local byte order.
+* The binary representation of INT4 and REAL are in network 
byte
+* order, which we'd better coerce to the local byte order.
 */
ival = ntohl(*((uint32_t *) iptr));
+   rval.ival = ntohl(*((uint32_t *) rptr));
 
/*
 * The binary representation of TEXT is, well, text, and since 
libpq
@@ -102,6 +113,8 @@ show_binary_results(PGresult *res)
printf("tuple %d: got\n", i);
printf(" i = (%d bytes) %d\n",
   PQgetlength(res, i, i_fnum), ival);
+   printf(" r = (%d bytes) %f\n",
+  PQgetlength(res, i, r_fnum), rval.fval);
printf(" t = (%d bytes) '%s'\n",
   PQgetlength(res, i, t_fnum), tptr);
printf(" b = (%d bytes) ", blen);
diff --git a/src/test/examples/testlibpq3.sql b/src/test/examples/testlibpq3.sql
index 35a95ca347..e3a70cec27 100644
--- a/src/test/examples/testlibpq3.sql
+++ b/src/test/examples/testlibpq3.sql
@@ -1,6 +1,6 @@
 CREATE SCHEMA testlibpq3;
 SET search_path = testlibpq3;
 SET standard_conforming_strings = ON;
-CREATE TABLE test1 (i int4, t text, b bytea);
-INSERT INTO test1 values (1, 'joe''s place', '\000\001\002\003\004');
-INSERT INTO test1 values (2, 'ho there', '\004\003\002\001\000');
+CREATE TABLE test1 (i int4, r real, t text, b bytea);
+INSERT INTO test1 values (1, 2.3, 'joe''s place', '\000\001\002\003\004');
+INSERT INTO test1 values (2, 4.5, 'ho there', '\004\003\002\001\000');

ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-02-24 Thread Tomas Vondra

Hi,

Victor Yegorov reported a crash related to a GiST index on ltree [1],
following a pg_upgrade from 12.x to 14.x, with a data set reproducing
this. I spent some time investigating this, and it seems this is a silly
bug in commit

  commit 911e70207703799605f5a0e8aad9f06cff067c63 (HEAD)
  Author: Alexander Korotkov 
  Date:   Mon Mar 30 19:17:11 2020 +0300

Implement operator class parameters
...

in PG13, which modified ltree_gist so that siglen is opclass parameter
(and not hard-coded). But the procedures use LTREE_GET_ASIGLEN macro to
extract the value, which either extracts the value from the catalog (if
the index has opclass parameter) or uses a default value - but it always
uses LTREE_ASIGLEN_DEFAULT, which belongs to _ltree_gist opclass (i.e.
for array of ltree). And that's 28 instead of the 8, as it should be.

Attached is a simple patch tweaking the macros, which resolves the
issue. Maybe there should be a separate macro though, because "ASIGLEN"
probably refers to "array siglen".

But the bigger issue is this fixes indexes created before the upgrade,
but it breaks indexes created after it. Because those were created with
siglen 28B, so reverting to 8B breaks them. A bit of a symmetry :-/

I'm not sure what to do about this. There's no way to distinguish
indexes, and I think an index may actually be broken both with and
without the patch - imagine an index created before the upgrade (so
using siglen 8), but then receiving a couple new rows (siglen 28).

After thinking about this I only see two options:

1) Don't apply the patch and tell everyone using ltree_gist they need to
rebuild the index after pg_upgrade from 12 to 13+. The downside of this
is larger indexes (because some tuples are 20B larger).

2) Apply the patch + tell those who already upgraded from 12 to rebuild
ltree_gist indexes, because those might be broken due to new inserts.


My opinion is to do (2), because at least those who'll upgrade later
(which is a lot of people) will be fine without a rebuild. And it also
makes the indexes a bit smaller, thanks to saving 20B.


regards


[1]
https://www.postgresql.org/message-id/17406-71e02820ae79bb40%40postgresql.org


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom c6c5471aebf49da96e47deb5286a6f7065068bd7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 24 Feb 2022 19:22:32 +0100
Subject: [PATCH] ltree gist siglen fix

---
 contrib/ltree/_ltree_gist.c | 12 ++--
 contrib/ltree/ltree.h   |  4 ++--
 contrib/ltree/ltree_gist.c  | 10 +-
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/contrib/ltree/_ltree_gist.c b/contrib/ltree/_ltree_gist.c
index 72516c3b6b9..1f361416ac9 100644
--- a/contrib/ltree/_ltree_gist.c
+++ b/contrib/ltree/_ltree_gist.c
@@ -49,7 +49,7 @@ _ltree_compress(PG_FUNCTION_ARGS)
 {
 	GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 	GISTENTRY  *retval = entry;
-	int			siglen = LTREE_GET_ASIGLEN();
+	int			siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT);
 
 	if (entry->leafkey)
 	{			/* ltree */
@@ -108,7 +108,7 @@ _ltree_same(PG_FUNCTION_ARGS)
 	ltree_gist *a = (ltree_gist *) PG_GETARG_POINTER(0);
 	ltree_gist *b = (ltree_gist *) PG_GETARG_POINTER(1);
 	bool	   *result = (bool *) PG_GETARG_POINTER(2);
-	int			siglen = LTREE_GET_ASIGLEN();
+	int			siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT);
 
 	if (LTG_ISALLTRUE(a) && LTG_ISALLTRUE(b))
 		*result = true;
@@ -154,7 +154,7 @@ _ltree_union(PG_FUNCTION_ARGS)
 {
 	GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
 	int		   *size = (int *) PG_GETARG_POINTER(1);
-	int			siglen = LTREE_GET_ASIGLEN();
+	int			siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT);
 	int32		i;
 	ltree_gist *result = ltree_gist_alloc(false, NULL, siglen, NULL, NULL);
 	BITVECP		base = LTG_SIGN(result);
@@ -219,7 +219,7 @@ _ltree_penalty(PG_FUNCTION_ARGS)
 	ltree_gist *origval = (ltree_gist *) DatumGetPointer(((GISTENTRY *) PG_GETARG_POINTER(0))->key);
 	ltree_gist *newval = (ltree_gist *) DatumGetPointer(((GISTENTRY *) PG_GETARG_POINTER(1))->key);
 	float	   *penalty = (float *) PG_GETARG_POINTER(2);
-	int			siglen = LTREE_GET_ASIGLEN();
+	int			siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT);
 
 	*penalty = hemdist(origval, newval, siglen);
 	PG_RETURN_POINTER(penalty);
@@ -242,7 +242,7 @@ _ltree_picksplit(PG_FUNCTION_ARGS)
 {
 	GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
 	GIST_SPLITVEC *v = (GIST_SPLITVEC *) PG_GETARG_POINTER(1);
-	int			siglen = LTREE_GET_ASIGLEN();
+	int			siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT);
 	OffsetNumber k,
 j;
 	ltree_gist *datum_l,
@@ -508,7 +508,7 @@ _ltree_consistent(PG_FUNCTION_ARGS)
 
 	/* Oid		subtype = PG_GETARG_OID(3); */
 	bool	   *recheck = (bool *) PG_GETARG_POINTER(4);
-	int			siglen = LTREE_GET_ASIGLEN();
+	int			siglen = LTREE_GET_ASIGLEN(LTREE_ASIGLEN_DEFAULT);
 	ltree_gist *key = (ltree_gist *) DatumGetPointer(entry->key);
 	bool

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-24 Thread Jacob Champion

On Fri, 2022-02-25 at 01:15 +0800, Julien Rouhaud wrote:
> On Thu, Feb 24, 2022 at 04:50:59PM +, Jacob Champion wrote:
> > On Thu, 2022-02-24 at 20:39 +0900, Michael Paquier wrote:
> > > I don't quite see the additional value that this API brings as
> > > MyProcPort is directly accessible, and contrib modules like
> > > postgres_fdw and sslinfo just use that to find the data of the current
> > > backend.
> > 
> > Right -- I just didn't know if third-party modules were actually able
> > to rely on the internal layout of struct Port. Is that guaranteed to
> > remain constant for a major release line? If so, this new API is
> > superfluous.
> 
> Yes, third-party can rely on Port layout.  We don't break ABI between minor
> release.  In some occasions we can add additional fields at the end of a
> struct, but nothing more.

That simplifies things. PFA a smaller v2; if pgaudit can't make use of
libpq-be.h for some reason, then I guess we can tailor a fix to that
use case.

> > > I could still see a use case for that at a more global level with
> > > beentrys, but it looked like there was not much interest the last time
> > > I dropped this idea.
> > 
> > I agree that this would be useful to have in the stats. From my outside
> > perspective, it seems like it's difficult to get strings of arbitrary
> > length in there; is that accurate?
> 
> Yes, as it's all in shared memory.  The only workaround is using something 
> like
> track_activity_query_size, but it's not great.

Yeah... I was following a similar track with the initial work last
year, but I dropped it when the cost of implementation started to grow
considerably. At the time, though, it looked like some overhauls to the
stats framework were being pursued? I should read up on that thread.

--Jacob
From 92679a2109be5ba4d81bf58e8fb091c2d0020828 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 14 Feb 2022 08:10:53 -0800
Subject: [PATCH v2] Add API to retrieve authn_id from SQL

The authn_id field in MyProcPort is currently only accessible to the
backend itself.  Add a SQL function, session_authn_id(), to expose the
field to triggers that may want to make use of it.
---
 src/backend/utils/adt/name.c  | 12 +++-
 src/include/catalog/catversion.h  |  2 +-
 src/include/catalog/pg_proc.dat   |  3 +++
 src/test/authentication/t/001_password.pl | 11 +++
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index e8bba3670c..ff0e2829bb 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -23,6 +23,7 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "libpq/libpq-be.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -257,7 +258,7 @@ namestrcmp(Name name, const char *str)
 
 
 /*
- * SQL-functions CURRENT_USER, SESSION_USER
+ * SQL-functions CURRENT_USER, SESSION_USER, SESSION_AUTHN_ID
  */
 Datum
 current_user(PG_FUNCTION_ARGS)
@@ -271,6 +272,15 @@ session_user(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(DirectFunctionCall1(namein, CStringGetDatum(GetUserNameFromId(GetSessionUserId(), false;
 }
 
+Datum
+session_authn_id(PG_FUNCTION_ARGS)
+{
+	if (!MyProcPort || !MyProcPort->authn_id)
+		PG_RETURN_NULL();
+
+	PG_RETURN_CSTRING(MyProcPort->authn_id);
+}
+
 
 /*
  * SQL-functions CURRENT_SCHEMA, CURRENT_SCHEMAS
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 1addb568ef..ca52e6889c 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202202221
+#define CATALOG_VERSION_NO	202202231
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7f1ee97f55..b76d357bee 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1508,6 +1508,9 @@
 { oid => '746', descr => 'session user name',
   proname => 'session_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'session_user' },
+{ oid => '9774', descr => 'session authenticated identity',
+  proname => 'session_authn_id', provolatile => 's', prorettype => 'cstring',
+  proargtypes => '', prosrc => 'session_authn_id' },
 
 { oid => '744',
   proname => 'array_eq', prorettype => 'bool',
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 3e3079c824..2aa28ed547 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -82,6 +82,11 @@ test_role($node, 'scram_role', 'trust', 0,
 test_role($node, 'md5_role', 'trust', 0,
 	log_unlike => [qr/connection authenticated:/]);
 
+my $res = $node->safe_psql('postgres',
+	"SELECT session_authn_id() IS NULL;"
+);
+is($res, 't', "users with trust authentication have NULL authn_id");
+
 # For plain "password" method, all

Re: Logical insert/update/delete WAL records for custom table AMs

2022-02-24 Thread Simon Riggs

On Mon, 17 Jan 2022 at 09:05, Jeff Davis  wrote:
>
> On Wed, 2022-01-05 at 20:19 +, Simon Riggs wrote:
> > I spoke with Jeff in detail about this patch in NYC Dec 21, and I now
> > think it is worth pursuing. It seems much more likely that this would
> > be acceptable than fully extensible rmgr.
>
> Thank you. I had some conversations with others who felt this approach
> is wasteful, which it is. But if this patch still has potential, I'm
> happy to pursue it along with the extensible rmgr approach.

It's self-contained and generally useful for a range of applications,
so the code would be long-lived.

Calling it wasteful presumes the way you'd use it. If you make logical
log entries you don't need to make physical ones, so its actually the
physical log entries that would be wasteful.

Logical log entries don't need to be decoded, so it's actually more
efficient, plus we could skip index entries altogether.

I would argue that this is the way we should be doing WAL, with
occasional divergence to physical records for performance, rather than
the other way around.

> > So I'm signing up as a reviewer and we'll see if this is good to go.
>
> Great, I attached a rebased version.

The approach is perfectly fine, it just needs to be finished and rebased.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: reallocing without oom check in pg_regress

2022-02-24 Thread Daniel Gustafsson

> On 23 Feb 2022, at 23:05, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> In pg_regress we realloc() with the destination and source pointer being 
>> equal,
>> without checking for OOM.  While a fairly unlikely source of errors, is 
>> there a
>> reason not to use pg_realloc() there for hygiene?
> 
> Yeah, looks like oversight to me.

Thanks for confirming, I've pushed this now after taking it for a spin on the
CI just in case.

--
Daniel Gustafsson   https://vmware.com/

Re: document that brin's autosummarize parameter is off by default

2022-02-24 Thread Justin Pryzby

Ten months ago, Jaime Casanova wrote:
> Hi everyone,
> 
> Just noted that the default value of autosummarize reloption for brin
> indexes is not documented, or at least not well documented.
> 
> I added the default value in create_index.sgml where other options
> mention their own defaults, also made a little change in brin.sgml to 
> make it more clear that is disabled by default (at least the way it 
> was written made no sense for me, but it could be that my english is 
> not that good).

It seems like "This last trigger" in the current text is intended to mean "The
second condition".  Your change improves that.

Should we also consider enabling autosummarize by default ?
It was added in v10, after BRIN was added in v9.5.  For us, brin wasn't usable
without autosummarize.

Also, note that vacuums are now triggered by insertions, since v13, so it might
be that autosummarize is needed much less.

-- 
Justin

PS. I hope there's a faster response to your pg_upgrade patch.

Re: fix crash with Python 3.11

2022-02-24 Thread Tom Lane

I wrote:
> * I'm not satisfied with using static storage for
> SaveTransactionCharacteristics/RestoreTransactionCharacteristics.

Looking closer at this, I was not too amused to discover that of the three
existing SaveTransactionCharacteristics calls, two already conflict with
each other: _SPI_commit calls SaveTransactionCharacteristics and then
calls CommitTransactionCommand, which again calls
SaveTransactionCharacteristics, overwriting the static storage.
I *think* there's no live bug there, because the state probably wouldn't
have changed in between; but considering we run arbitrary user-defined
code between those two points I sure wouldn't bet on it.

0001 attached is the same code patch as before, but now with spi.sgml
updates; 0002 changes the API for Save/RestoreTransactionCharacteristics.
If anyone's really worried about backpatching 0002, we could perhaps
get away with doing that only in HEAD.

I found in 0002 that I had to make CommitTransactionCommand call
SaveTransactionCharacteristics unconditionally, else I got warnings
about possibly-uninitialized local variables.  It's cheap enough
that I'm not too fussed about that.  I don't get any warnings from
the similar code in spi.c, probably because those functions can't
be inlined there.

regards, tom lane

diff --git a/doc/src/sgml/spi.sgml b/doc/src/sgml/spi.sgml
index d710e2d0df..7581661fc4 100644
--- a/doc/src/sgml/spi.sgml
+++ b/doc/src/sgml/spi.sgml
@@ -99,10 +99,9 @@ int SPI_connect_ext(int options)
  
   
Sets the SPI connection to be nonatomic, which
-   means that transaction control calls SPI_commit,
-   SPI_rollback, and
-   SPI_start_transaction are allowed.  Otherwise,
-   calling these functions will result in an immediate error.
+   means that transaction control calls (SPI_commit,
+   SPI_rollback) are allowed.  Otherwise,
+   calling those functions will result in an immediate error.
   
  
 
@@ -5040,15 +5039,17 @@ void SPI_commit_and_chain(void)
   
SPI_commit commits the current transaction.  It is
approximately equivalent to running the SQL
-   command COMMIT.  After a transaction is committed, a new
-   transaction has to be started
-   using SPI_start_transaction before further database
-   actions can be executed.
+   command COMMIT.  After the transaction is committed, a
+   new transaction is automatically started using default transaction
+   characteristics, so that the caller can continue using SPI facilities.
+   If there is a failure during commit, the current transaction is instead
+   rolled back and a new transaction is started, after which the error is
+   thrown in the usual way.
   
 
   
-   SPI_commit_and_chain is the same, but a new
-   transaction is immediately started with the same transaction
+   SPI_commit_and_chain is the same, but the new
+   transaction is started with the same transaction
characteristics as the just finished one, like with the SQL command
COMMIT AND CHAIN.
   
@@ -5093,14 +5094,13 @@ void SPI_rollback_and_chain(void)
   
SPI_rollback rolls back the current transaction.  It
is approximately equivalent to running the SQL
-   command ROLLBACK.  After a transaction is rolled back, a
-   new transaction has to be started
-   using SPI_start_transaction before further database
-   actions can be executed.
+   command ROLLBACK.  After the transaction is rolled back,
+   a new transaction is automatically started using default transaction
+   characteristics, so that the caller can continue using SPI facilities.
   
   
-   SPI_rollback_and_chain is the same, but a new
-   transaction is immediately started with the same transaction
+   SPI_rollback_and_chain is the same, but the new
+   transaction is started with the same transaction
characteristics as the just finished one, like with the SQL command
ROLLBACK AND CHAIN.
   
@@ -5124,7 +5124,7 @@ void SPI_rollback_and_chain(void)
 
  
   SPI_start_transaction
-  start a new transaction
+  obsolete function
  
 
  
@@ -5137,17 +5137,12 @@ void SPI_start_transaction(void)
   Description
 
   
-   SPI_start_transaction starts a new transaction.  It
-   can only be called after SPI_commit
-   or SPI_rollback, as there is no transaction active at
-   that point.  Normally, when an SPI-using procedure is called, there is already a
-   transaction active, so attempting to start another one before closing out
-   the current one will result in an error.
-  
-
-  
-   This function can only be executed if the SPI connection has been set as
-   nonatomic in the call to SPI_connect_ext.
+   SPI_start_transaction does nothing, and exists
+   only for code compatibility with
+   earlier PostgreSQL releases.  It used to
+   be required after calling SPI_commit
+   or SPI_rollback, but now those functions start
+   a new transaction automatically.
   
  
 
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-02-24 Thread David Christensen

Added to commitfest as:

https://commitfest.postgresql.org/37/3565/

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-02-24 Thread Peter Geoghegan

On Thu, Feb 24, 2022 at 11:06 AM David Christensen
 wrote:
> This patch adds the ability to specify a RelFileNode and optional BlockNum to 
> limit output of
> pg_waldump records to only those which match the given criteria.  This should 
> be more performant
> than `pg_waldump | grep` as well as more reliable given specific variations 
> in output style
> depending on how the blocks are specified.

Sounds useful to me.

-- 
Peter Geoghegan

[PATCH] add relation and block-level filtering to pg_waldump

2022-02-24 Thread David Christensen

Greetings,

This patch adds the ability to specify a RelFileNode and optional BlockNum to 
limit output of
pg_waldump records to only those which match the given criteria.  This should 
be more performant
than `pg_waldump | grep` as well as more reliable given specific variations in 
output style
depending on how the blocks are specified.

This currently affects only the main fork, but we could presumably add the 
option to filter by fork
as well, if that is considered useful.

Best,

David

>From 9194b2cb07172e636030b9b4e979b7f2caf7cbc0 Mon Sep 17 00:00:00 2001
From: David Christensen 
Date: Thu, 24 Feb 2022 11:00:46 -0600
Subject: [PATCH] Add relation/block filtering to pg_waldump

This feature allows you to only output records that are targeting a specific RelFileNode and optional
BlockNumber within this relation.  Currently only applies this filter to the relation's main fork.
---
 doc/src/sgml/ref/pg_waldump.sgml | 23 ++
 src/bin/pg_waldump/pg_waldump.c  | 74 +++-
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 5735a161ce..c953703bc8 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -100,6 +100,29 @@ PostgreSQL documentation
   
  
 
+ 
+  -k block
+  --block=block
+  
+   
+Display only records touching the given block. (Requires also
+providing the relation via --relation.)
+   
+  
+ 
+
+ 
+  -l tbl/db/rel
+  --relation=tbl/db/rel
+  
+   
+Display only records touching the given relation.  The relation is
+specified via tablespace oid, database oid, and relfilenode separated
+by slashes.
+   
+  
+ 
+
  
   -n limit
   --limit=limit
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index a6251e1a96..faae547a5c 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -55,6 +55,10 @@ typedef struct XLogDumpConfig
 	bool		filter_by_rmgr_enabled;
 	TransactionId filter_by_xid;
 	bool		filter_by_xid_enabled;
+	RelFileNode filter_by_relation;
+	bool		filter_by_relation_enabled;
+	BlockNumber	filter_by_relation_block;
+	bool		filter_by_relation_block_enabled;
 } XLogDumpConfig;
 
 typedef struct Stats
@@ -394,6 +398,34 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
 	return count;
 }
 
+/*
+ * Boolean to return whether the given WAL record matches a specific relation and optional block
+ */
+static bool
+XLogRecordMatchesRelationBlock(XLogReaderState *record, RelFileNode matchRnode, BlockNumber matchBlock)
+{
+	RelFileNode rnode;
+	ForkNumber	forknum;
+	BlockNumber blk;
+	int			block_id;
+
+	for (block_id = 0; block_id <= record->max_block_id; block_id++)
+	{
+		if (!XLogRecHasBlockRef(record, block_id))
+			continue;
+
+		XLogRecGetBlockTag(record, block_id, , , );
+
+		if (forknum == MAIN_FORKNUM &&
+			matchRnode.spcNode == rnode.spcNode &&
+			matchRnode.dbNode == rnode.dbNode &&
+			matchRnode.relNode == rnode.relNode &&
+			(matchBlock == InvalidBlockNumber || matchBlock == blk))
+			return true;
+	}
+	return false;
+}
+
 /*
  * Calculate the size of a record, split into !FPI and FPI parts.
  */
@@ -767,6 +799,8 @@ usage(void)
 	printf(_("  -b, --bkp-details  output detailed information about backup blocks\n"));
 	printf(_("  -e, --end=RECPTR   stop reading at WAL location RECPTR\n"));
 	printf(_("  -f, --follow   keep retrying after reaching end of WAL\n"));
+	printf(_("  -k, --block=N  only show records matching a given relation block (requires -l)\n"));
+	printf(_("  -l, --relation=N/N/N   only show records that touch a specific relation\n"));
 	printf(_("  -n, --limit=N  number of records to display\n"));
 	printf(_("  -p, --path=PATHdirectory in which to find log segment files or a\n"
 			 " directory with a ./pg_wal that contains such files\n"
@@ -802,12 +836,14 @@ main(int argc, char **argv)
 
 	static struct option long_options[] = {
 		{"bkp-details", no_argument, NULL, 'b'},
+		{"block", required_argument, NULL, 'k'},
 		{"end", required_argument, NULL, 'e'},
 		{"follow", no_argument, NULL, 'f'},
 		{"help", no_argument, NULL, '?'},
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"relation", required_argument, NULL, 'l'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
 		{"timeline", required_argument, NULL, 't'},
@@ -860,6 +896,8 @@ main(int argc, char **argv)
 	config.filter_by_rmgr_enabled = false;
 	config.filter_by_xid = InvalidTransactionId;
 	config.filter_by_xid_enabled = false;
+	config.filter_by_relation_enabled = false;
+	config.filter_by_relation_block_enabled = false;
 	config.stats = false;
 	config.stats_per_record =

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-24 Thread Nitin Jadhav

> I think the change to ImmediateCheckpointRequested() makes no sense.
> Before this patch, that function merely inquires whether there's an
> immediate checkpoint queued.  After this patch, it ... changes a
> progress-reporting flag?  I think it would make more sense to make the
> progress-report flag change in whatever is the place that *requests* an
> immediate checkpoint rather than here.
>
> > diff --git a/src/backend/postmaster/checkpointer.c 
> > b/src/backend/postmaster/checkpointer.c
> > +ImmediateCheckpointRequested(int flags)
> >  if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
> > +{
> > +updated_flags |= CHECKPOINT_IMMEDIATE;
>
> I don't think that these changes are expected behaviour. Under in this
> condition; the currently running checkpoint is still not 'immediate',
> but it is going to hurry up for a new, actually immediate checkpoint.
> Those are different kinds of checkpoint handling; and I don't think
> you should modify the reported flags to show that we're going to do
> stuff faster than usual. Maybe maintiain a seperate 'upcoming
> checkpoint flags' field instead?

Thank you Alvaro and Matthias for your views. I understand your point
of not updating the progress-report flag here as it just checks
whether the CHECKPOINT_IMMEDIATE is set or not and takes an action
based on that but it doesn't change the checkpoint flags. I will
modify the code but I am a bit confused here. As per Alvaro, we need
to make the progress-report flag change in whatever is the place that
*requests* an immediate checkpoint. I feel this gives information
about the upcoming checkpoint not the current one. So updating here
provides wrong details in the view. The flags available during
CreateCheckPoint() will remain same for the entire checkpoint
operation and we should show the same information in the view till it
completes. So just removing the above piece of code (modified in
ImmediateCheckpointRequested()) in the patch will make it correct. My
opinion about maintaining a separate field to show upcoming checkpoint
flags is it makes the view complex. Please share your thoughts.

Thanks & Regards,

On Thu, Feb 24, 2022 at 10:45 PM Matthias van de Meent
 wrote:
>
> On Wed, 23 Feb 2022 at 14:28, Nitin Jadhav
>  wrote:
> >
> > Sharing the v2 patch. Kindly have a look and share your comments.
>
> Thanks for updating.
>
> > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
>
> With the new pg_stat_progress_checkpoint, you should also add a
> backreference to this progress reporting in the CHECKPOINT sql command
> documentation located in checkpoint.sgml, and maybe in wal.sgml and/or
> backup.sgml too. See e.g. cluster.sgml around line 195 for an example.
>
> > diff --git a/src/backend/postmaster/checkpointer.c 
> > b/src/backend/postmaster/checkpointer.c
> > +ImmediateCheckpointRequested(int flags)
> >  if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
> > +{
> > +updated_flags |= CHECKPOINT_IMMEDIATE;
>
> I don't think that these changes are expected behaviour. Under in this
> condition; the currently running checkpoint is still not 'immediate',
> but it is going to hurry up for a new, actually immediate checkpoint.
> Those are different kinds of checkpoint handling; and I don't think
> you should modify the reported flags to show that we're going to do
> stuff faster than usual. Maybe maintiain a seperate 'upcoming
> checkpoint flags' field instead?
>
> > diff --git a/src/backend/catalog/system_views.sql 
> > b/src/backend/catalog/system_views.sql
> > +( SELECT '0/0'::pg_lsn +
> > + ((CASE
> > + WHEN stat.lsn_int64 < 0 THEN pow(2::numeric, 
> > 64::numeric)::numeric
> > + ELSE 0::numeric
> > +  END) +
> > +  stat.lsn_int64::numeric)
> > +  FROM (SELECT s.param3::bigint) AS stat(lsn_int64)
> > +) AS start_lsn,
>
> My LSN select statement was an example that could be run directly in
> psql; the so you didn't have to embed the SELECT into the view query.
> The following should be sufficient (and save the planner a few cycles
> otherwise spent in inlining):
>
> +('0/0'::pg_lsn +
> + ((CASE
> + WHEN s.param3 < 0 THEN pow(2::numeric,
> 64::numeric)::numeric
> + ELSE 0::numeric
> +  END) +
> +  s.param3::numeric)
> +) AS start_lsn,
>
>
> > diff --git a/src/backend/access/transam/xlog.c 
> > b/src/backend/access/transam/xlog.c
> > +checkpoint_progress_start(int flags)
> > [...]
> > +checkpoint_progress_update_param(int index, int64 val)
> > [...]
> > +checkpoint_progress_end(void)
> > +{
> > +/* In bootstrap mode, we don't actually record anything. */
> > +if (IsBootstrapProcessingMode())
> > +return;
>
> Disabling pgstat progress reporting when in bootstrap processing mode
> / startup/end-of-recovery makes very little sense (see upthread) and
>

Re: JSON path decimal literal syntax

2022-02-24 Thread Peter Eisentraut


On 18.02.22 11:17, Peter Eisentraut wrote:
I noticed that the JSON path lexer does not support the decimal literal 
syntax forms


.1
1.

(that is, there are no digits before or after the decimal point).  This 
is allowed by the relevant ECMAScript standard 
(https://262.ecma-international.org/5.1/#sec-7.8.3) and of course SQL 
allows it as well.


Is there a reason for this?  I didn't find any code comments or 
documentation about this.


It has come to my attention that there are syntactic differences between 
JavaScript, which is what JSON path is built on, and JSON itself. 
Presumably, the JSON path lexer was originally built with the JSON 
syntax in mind.


Attached is an updated patch that implements the JavaScript-based JSON 
path numeric literal syntax more correctly.  Besides the above mentioned 
syntax forms, it now also rejects trailing junk after numeric literals 
more correctly, similar to how the main SQL lexer does it.From 13e4a16e7b06c6109b8c975ea5d1b57f7dfbe8b9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 24 Feb 2022 17:56:47 +0100
Subject: [PATCH v2] Make JSON path numeric literals more correct

Per ECMAScript standard (ECMA-262, referenced by SQL standard), the
syntax forms

.1
1.

should be allowed for decimal numeric literals, but the existing
implementation rejected them.

Also, by the same standard, reject trailing junk after numeric
literals.

Note that the ECMAScript standard for numeric literals is in respects
like these slightly different from the JSON standard, which might be
the original cause for this discrepancy.
---
 src/backend/utils/adt/jsonpath_scan.l  |  24 ++-
 src/test/regress/expected/jsonpath.out | 200 -
 src/test/regress/sql/jsonpath.sql  |   7 +
 3 files changed, 149 insertions(+), 82 deletions(-)

diff --git a/src/backend/utils/adt/jsonpath_scan.l 
b/src/backend/utils/adt/jsonpath_scan.l
index 827a9e44cb..1f08e7c51f 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -82,11 +82,13 @@ other   
[^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\" \t\n\r\f]
 
 digit  [0-9]
 integer(0|[1-9]{digit}*)
-decimal{integer}\.{digit}+
-decimalfail{integer}\.
+decimal({integer}\.{digit}*|\.{digit}+)
 real   ({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1  ({integer}|{decimal})[Ee]
-realfail2  ({integer}|{decimal})[Ee][-+]
+realfail   ({integer}|{decimal})[Ee][-+]
+
+integer_junk   {integer}{other}
+decimal_junk   {decimal}{other}
+real_junk  {real}{other}
 
 hex_dig[0-9A-Fa-f]
 unicode\\u({hex_dig}{4}|\{{hex_dig}{1,6}\})
@@ -242,16 +244,10 @@ hex_fail  \\x{hex_dig}{0,1}
return 
INT_P;
}
 
-{decimalfail}  {
-   /* 
throw back the ., and treat as integer */
-   
yyless(yyleng - 1);
-   
addstring(true, yytext, yyleng);
-   
addchar(false, '\0');
-   
yylval->str = scanstring;
-   return 
INT_P;
-   }
-
-({realfail1}|{realfail2})  { yyerror(NULL, "invalid floating point 
number"); }
+{realfail} { yyerror(NULL, 
"invalid numeric literal"); }
+{integer_junk} { yyerror(NULL, "trailing junk 
after numeric literal"); }
+{decimal_junk} { yyerror(NULL, "trailing junk 
after numeric literal"); }
+{real_junk}{ yyerror(NULL, 
"trailing junk after numeric literal"); }
 
 \" {

addchar(true, '\0');
diff --git a/src/test/regress/expected/jsonpath.out 
b/src/test/regress/expected/jsonpath.out
index e399fa9631..66c5160978 100644
--- a/src/test/regress/expected/jsonpath.out
+++ b/src/test/regress/expected/jsonpath.out
@@ -354,11 +354,9 @@ select 'null.type()'::jsonpath;
 (1 row)
 
 select '1.type()'::jsonpath;
- jsonpath 
---
- 1.type()
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "1.t" of jsonpath input
+LINE 1: select '1.type()'::jsonpath;
+   ^
 select '(1).type()'::jsonpath;
  jsonpath 
 --
@@ -569,17 +567,23 @@ select '$ ? (@.a < +1)'::jsonpath;
 (1 row)
 
 select '$ ? (@.a < .1)'::jsonpath;
-ERROR:

Re: fix crash with Python 3.11

2022-02-24 Thread Tom Lane

I wrote:
> * It might be a good idea to add parallel test cases for the other PLs.

As I suspected, plperl and pltcl show exactly the same problems
when trapping a failure at commit, reinforcing my opinion that this
is a SPI bug that needs to be fixed in SPI.  (plpgsql is not subject
to this problem, because its only mechanism for trapping errors is
a BEGIN block, ie a subtransaction, so it won't get to the interesting
part.)  They do have logic to catch the thrown error, though, so no
additional fix is needed in either once we fix the core code.

v2 attached adds those test cases.

regards, tom lane

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index c93f90de9b..7971050746 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -156,7 +156,8 @@ SPI_connect_ext(int options)
 	 * XXX It could be better to use PortalContext as the parent context in
 	 * all cases, but we may not be inside a portal (consider deferred-trigger
 	 * execution).  Perhaps CurTransactionContext could be an option?  For now
-	 * it doesn't matter because we clean up explicitly in AtEOSubXact_SPI().
+	 * it doesn't matter because we clean up explicitly in AtEOSubXact_SPI();
+	 * but see also AtEOXact_SPI().
 	 */
 	_SPI_current->procCxt = AllocSetContextCreate(_SPI_current->atomic ? TopTransactionContext : PortalContext,
   "SPI Proc",
@@ -214,13 +215,13 @@ SPI_finish(void)
 	return SPI_OK_FINISH;
 }
 
+/*
+ * SPI_start_transaction is a no-op, kept for backwards compatibility.
+ * SPI callers are *always* inside a transaction.
+ */
 void
 SPI_start_transaction(void)
 {
-	MemoryContext oldcontext = CurrentMemoryContext;
-
-	StartTransactionCommand();
-	MemoryContextSwitchTo(oldcontext);
 }
 
 static void
@@ -228,6 +229,12 @@ _SPI_commit(bool chain)
 {
 	MemoryContext oldcontext = CurrentMemoryContext;
 
+	/*
+	 * Complain if we are in a context that doesn't permit transaction
+	 * termination.  (Note: here and _SPI_rollback should be the only places
+	 * that throw ERRCODE_INVALID_TRANSACTION_TERMINATION, so that callers can
+	 * test for that with security that they know what happened.)
+	 */
 	if (_SPI_current->atomic)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_TRANSACTION_TERMINATION),
@@ -240,40 +247,74 @@ _SPI_commit(bool chain)
 	 * top-level transaction in such a block violates that idea.  A future PL
 	 * implementation might have different ideas about this, in which case
 	 * this restriction would have to be refined or the check possibly be
-	 * moved out of SPI into the PLs.
+	 * moved out of SPI into the PLs.  Note however that the code below relies
+	 * on not being within a subtransaction.
 	 */
 	if (IsSubTransaction())
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_TRANSACTION_TERMINATION),
  errmsg("cannot commit while a subtransaction is active")));
 
-	/*
-	 * Hold any pinned portals that any PLs might be using.  We have to do
-	 * this before changing transaction state, since this will run
-	 * user-defined code that might throw an error.
-	 */
-	HoldPinnedPortals();
+	/* XXX this ain't re-entrant enough for my taste */
+	if (chain)
+		SaveTransactionCharacteristics();
 
-	/* Start the actual commit */
-	_SPI_current->internal_xact = true;
+	/* Catch any error occurring during the COMMIT */
+	PG_TRY();
+	{
+		/* Protect current SPI stack entry against deletion */
+		_SPI_current->internal_xact = true;
 
-	/* Release snapshots associated with portals */
-	ForgetPortalSnapshots();
+		/*
+		 * Hold any pinned portals that any PLs might be using.  We have to do
+		 * this before changing transaction state, since this will run
+		 * user-defined code that might throw an error.
+		 */
+		HoldPinnedPortals();
 
-	if (chain)
-		SaveTransactionCharacteristics();
+		/* Release snapshots associated with portals */
+		ForgetPortalSnapshots();
 
-	CommitTransactionCommand();
+		/* Do the deed */
+		CommitTransactionCommand();
 
-	if (chain)
-	{
+		/* Immediately start a new transaction */
 		StartTransactionCommand();
-		RestoreTransactionCharacteristics();
+		if (chain)
+			RestoreTransactionCharacteristics();
+
+		MemoryContextSwitchTo(oldcontext);
+
+		_SPI_current->internal_xact = false;
 	}
+	PG_CATCH();
+	{
+		ErrorData  *edata;
 
-	MemoryContextSwitchTo(oldcontext);
+		/* Save error info in caller's context */
+		MemoryContextSwitchTo(oldcontext);
+		edata = CopyErrorData();
+		FlushErrorState();
 
-	_SPI_current->internal_xact = false;
+		/*
+		 * Abort the failed transaction.  If this fails too, we'll just
+		 * propagate the error out ... there's not that much we can do.
+		 */
+		AbortCurrentTransaction();
+
+		/* ... and start a new one */
+		StartTransactionCommand();
+		if (chain)
+			RestoreTransactionCharacteristics();
+
+		MemoryContextSwitchTo(oldcontext);
+
+		_SPI_current->internal_xact = false;
+
+		/* Now that we've cleaned up the transaction, re-throw the error */
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
 }

Re: remove more archiving overhead

2022-02-24 Thread Nathan Bossart

Thanks for taking a look!

On Thu, Feb 24, 2022 at 02:13:49PM +0900, Kyotaro Horiguchi wrote:
> https://www.postgresql.org/docs/14/continuous-archiving.html
> | The archive command should generally be designed to refuse to
> | overwrite any pre-existing archive file. This is an important safety
> | feature to preserve the integrity of your archive in case of
> | administrator error (such as sending the output of two different
> | servers to the same archive directory).
> 
> The recommended behavior of archive_command above is to avoid this
> kind of corruption.  If we officially admit that a WAL segment can be
> archive twice, we need to revise that part (and the following part) of
> the doc.

Yeah, we might want to recommend succeeding if the archive already exists
with identical contents (like basic_archive does).  IMO this would best be
handled in a separate thread that is solely focused on documentation
updates.  AFAICT this is an existing problem.

>> IIUC this means that in theory, a crash at an unfortunate moment could
>> leave you with a .ready file, the file to archive, and another link to that
>> file with a "future" WAL filename.  If you re-archive the file after it has
>> been reused, you could end up with corrupt WAL archives.  I think this
> 
> IMHO the difference would be that the single system call (maybe)
> cannot be interrupted by sigkill.  So I'm not sure that that is worth
> considering.  The sigkill window can be elimianted if we could use
> renameat(RENAME_NOREPLACE).  But finally power-loss can cause that.

Perhaps durable_rename_excl() could use renameat2() for systems that
support it.  However, the problem would still exist for systems that have
link() but not renameat2().  In this particular case, there really
shouldn't be an existing file in pg_wal.

>> might already be possible today.  Syncing the directory after every rename
>> probably reduces the likelihood quite substantially, but if
>> RemoveOldXlogFiles() quickly picks up the .done file and attempts
>> recycling before durable_rename() calls fsync() on the directory,
>> presumably the same problem could occur.
> 
> If we don't fsync at every rename, we don't even need for
> RemoveOldXlogFiles to pickup .done very quickly.  Isn't it a big
> difference?

Yeah, it probably increases the chances quite a bit.

> I think we don't want make checkpointer slower in exchange of making
> archiver faster.  Perhaps we need to work using a condensed
> information rather than repeatedly scanning over the distributed
> information like archive_status?

v2 avoids giving the checkpointer more work.

> At Tue, 22 Feb 2022 11:52:29 -0800, Nathan Bossart  
> wrote in 
>> On Tue, Feb 22, 2022 at 09:37:11AM -0800, Nathan Bossart wrote:
>> > In my testing, I found that when I killed the server just before unlink()
>> > during WAL recyling, I ended up with links to the same file in pg_wal after
>> > restarting.  My latest test produced links to the same file for the current
>> > WAL file and the next one.  Maybe WAL recyling should use durable_rename()
>> > instead of durable_rename_excl().
>> 
>> Here is an updated patch.
> 
> Is it intentional that v2 loses the xlogarchive.c part?

Yes.  I found that a crash at an unfortunate moment can produce multiple
links to the same file in pg_wal, which seemed bad independent of archival.
By fixing that (i.e., switching from durable_rename_excl() to
durable_rename()), we not only avoid this problem, but we also avoid trying
to archive a file the server is concurrently writing.  Then, after a crash,
the WAL file to archive should either not exist (which is handled by the
archiver) or contain the same contents as any preexisting archives.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: convert libpq uri-regress tests to tap test

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 17:03:33 +, Jacob Champion wrote:
> On Thu, 2022-02-24 at 08:46 -0800, Andres Freund wrote:
> > One annoying bit is that our current tap invocation infrastructure for msvc
> > won't know how to deal with that. We put the build directory containing t/
> > onto PATH, but that won't work for test/. But we also don't want to install
> > test binaries. Not sure what the solution for that is.
> 
> Would it help if the C executable, not Perl, was the thing actually
> producing the TAP output? The binaries built from test/ could be placed
> into t/. Or does that just open up a new set of problems?

I don't think it would help that much. And for the libpq pipeline test it
doesn't really make sense anyway, because we intentionally use it with
different traces and such.

I've thought about a few C tap tests that I'd like, but I think that's a
separate discussion.

Greetings,

Andres Freund

Re: libpq async duplicate error results

2022-02-24 Thread Tom Lane

Actually ... it now seems to me that there's live bugs in the
interaction between pipeline mode and error-buffer clearing.
Namely, that places like PQsendQueryStart will unconditionally
clear the buffer even though we may be in the middle of a pipeline
sequence, and the buffer might hold an error that's yet to be
reported to the application.  So I think we need something
more like the attached.

regards, tom lane

diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 45dddaf556..0c39bc9abf 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1380,10 +1380,7 @@ pqAppendCmdQueueEntry(PGconn *conn, PGcmdQueueEntry *entry)
 			 * state, we don't have to do anything.
 			 */
 			if (conn->asyncStatus == PGASYNC_IDLE)
-			{
-pqClearConnErrorState(conn);
 pqPipelineProcessQueue(conn);
-			}
 			break;
 	}
 }
@@ -1730,8 +1727,10 @@ PQsendQueryStart(PGconn *conn, bool newQuery)
 
 	/*
 	 * If this is the beginning of a query cycle, reset the error state.
+	 * However, in pipeline mode with something already queued, the error
+	 * buffer belongs to that command and we shouldn't clear it.
 	 */
-	if (newQuery)
+	if (newQuery && conn->cmd_queue_head == NULL)
 		pqClearConnErrorState(conn);
 
 	/* Don't try to send if we know there's no live connection. */
@@ -2149,11 +2148,8 @@ PQgetResult(PGconn *conn)
 /*
  * We're about to return the NULL that terminates the round of
  * results from the current query; prepare to send the results
- * of the next query when we're called next.  Also, since this
- * is the start of the results of the next query, clear any
- * prior error message.
+ * of the next query when we're called next.
  */
-pqClearConnErrorState(conn);
 pqPipelineProcessQueue(conn);
 			}
 			break;
@@ -2362,6 +2358,14 @@ PQexecStart(PGconn *conn)
 	if (!conn)
 		return false;
 
+	/*
+	 * Since this is the beginning of a query cycle, reset the error state.
+	 * However, in pipeline mode with something already queued, the error
+	 * buffer belongs to that command and we shouldn't clear it.
+	 */
+	if (conn->cmd_queue_head == NULL)
+		pqClearConnErrorState(conn);
+
 	if (conn->pipelineStatus != PQ_PIPELINE_OFF)
 	{
 		appendPQExpBufferStr(>errorMessage,
@@ -2369,11 +2373,6 @@ PQexecStart(PGconn *conn)
 		return false;
 	}
 
-	/*
-	 * Since this is the beginning of a query cycle, reset the error state.
-	 */
-	pqClearConnErrorState(conn);
-
 	/*
 	 * Silently discard any prior query result that application didn't eat.
 	 * This is probably poor design, but it's here for backward compatibility.
@@ -2928,8 +2927,11 @@ PQfn(PGconn *conn,
 
 	/*
 	 * Since this is the beginning of a query cycle, reset the error state.
+	 * However, in pipeline mode with something already queued, the error
+	 * buffer belongs to that command and we shouldn't clear it.
 	 */
-	pqClearConnErrorState(conn);
+	if (conn->cmd_queue_head == NULL)
+		pqClearConnErrorState(conn);
 
 	if (conn->pipelineStatus != PQ_PIPELINE_OFF)
 	{
@@ -3099,6 +3101,12 @@ pqPipelineProcessQueue(PGconn *conn)
 		conn->cmd_queue_head == NULL)
 		return;
 
+	/*
+	 * Reset the error state.  This and the next couple of steps correspond to
+	 * what PQsendQueryStart didn't do for this query.
+	 */
+	pqClearConnErrorState(conn);
+
 	/* Initialize async result-accumulation state */
 	pqClearAsyncResult(conn);
 
@@ -3809,9 +3817,11 @@ PQsetnonblocking(PGconn *conn, int arg)
 	 * behavior. this is ok because either they are making a transition _from_
 	 * or _to_ blocking mode, either way we can block them.
 	 *
-	 * Clear error state in case pqFlush adds to it.
+	 * Clear error state in case pqFlush adds to it, unless we're actively
+	 * pipelining, in which case it seems best not to.
 	 */
-	pqClearConnErrorState(conn);
+	if (conn->cmd_queue_head == NULL)
+		pqClearConnErrorState(conn);
 
 	/* if we are going from blocking to non-blocking flush here */
 	if (pqFlush(conn))
@@ -4003,7 +4013,8 @@ PQescapeStringConn(PGconn *conn,
 		return 0;
 	}
 
-	pqClearConnErrorState(conn);
+	if (conn->cmd_queue_head == NULL)
+		pqClearConnErrorState(conn);
 
 	return PQescapeStringInternal(conn, to, from, length, error,
   conn->client_encoding,
@@ -4041,7 +4052,8 @@ PQescapeInternal(PGconn *conn, const char *str, size_t len, bool as_ident)
 	if (!conn)
 		return NULL;
 
-	pqClearConnErrorState(conn);
+	if (conn->cmd_queue_head == NULL)
+		pqClearConnErrorState(conn);
 
 	/* Scan the string for characters that must be escaped. */
 	for (s = str; (s - str) < len && *s != '\0'; ++s)
@@ -4306,7 +4318,8 @@ PQescapeByteaConn(PGconn *conn,
 	if (!conn)
 		return NULL;
 
-	pqClearConnErrorState(conn);
+	if (conn->cmd_queue_head == NULL)
+		pqClearConnErrorState(conn);
 
 	return PQescapeByteaInternal(conn, from, from_length, to_length,
  conn->std_strings,

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-24 Thread Julien Rouhaud

Hi,

On Thu, Feb 24, 2022 at 04:50:59PM +, Jacob Champion wrote:
> On Thu, 2022-02-24 at 20:39 +0900, Michael Paquier wrote:
> > I don't quite see the additional value that this API brings as
> > MyProcPort is directly accessible, and contrib modules like
> > postgres_fdw and sslinfo just use that to find the data of the current
> > backend.
> 
> Right -- I just didn't know if third-party modules were actually able
> to rely on the internal layout of struct Port. Is that guaranteed to
> remain constant for a major release line? If so, this new API is
> superfluous.

Yes, third-party can rely on Port layout.  We don't break ABI between minor
release.  In some occasions we can add additional fields at the end of a
struct, but nothing more.

> > I could still see a use case for that at a more global level with
> > beentrys, but it looked like there was not much interest the last time
> > I dropped this idea.
> 
> I agree that this would be useful to have in the stats. From my outside
> perspective, it seems like it's difficult to get strings of arbitrary
> length in there; is that accurate?

Yes, as it's all in shared memory.  The only workaround is using something like
track_activity_query_size, but it's not great.

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-24 Thread Matthias van de Meent

On Wed, 23 Feb 2022 at 14:28, Nitin Jadhav
 wrote:
>
> Sharing the v2 patch. Kindly have a look and share your comments.

Thanks for updating.

> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml

With the new pg_stat_progress_checkpoint, you should also add a
backreference to this progress reporting in the CHECKPOINT sql command
documentation located in checkpoint.sgml, and maybe in wal.sgml and/or
backup.sgml too. See e.g. cluster.sgml around line 195 for an example.

> diff --git a/src/backend/postmaster/checkpointer.c 
> b/src/backend/postmaster/checkpointer.c
> +ImmediateCheckpointRequested(int flags)
>  if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
> +{
> +updated_flags |= CHECKPOINT_IMMEDIATE;

I don't think that these changes are expected behaviour. Under in this
condition; the currently running checkpoint is still not 'immediate',
but it is going to hurry up for a new, actually immediate checkpoint.
Those are different kinds of checkpoint handling; and I don't think
you should modify the reported flags to show that we're going to do
stuff faster than usual. Maybe maintiain a seperate 'upcoming
checkpoint flags' field instead?

> diff --git a/src/backend/catalog/system_views.sql 
> b/src/backend/catalog/system_views.sql
> +( SELECT '0/0'::pg_lsn +
> + ((CASE
> + WHEN stat.lsn_int64 < 0 THEN pow(2::numeric, 
> 64::numeric)::numeric
> + ELSE 0::numeric
> +  END) +
> +  stat.lsn_int64::numeric)
> +  FROM (SELECT s.param3::bigint) AS stat(lsn_int64)
> +) AS start_lsn,

My LSN select statement was an example that could be run directly in
psql; the so you didn't have to embed the SELECT into the view query.
The following should be sufficient (and save the planner a few cycles
otherwise spent in inlining):

+('0/0'::pg_lsn +
+ ((CASE
+ WHEN s.param3 < 0 THEN pow(2::numeric,
64::numeric)::numeric
+ ELSE 0::numeric
+  END) +
+  s.param3::numeric)
+) AS start_lsn,

> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> +checkpoint_progress_start(int flags)
> [...]
> +checkpoint_progress_update_param(int index, int64 val)
> [...]
> +checkpoint_progress_end(void)
> +{
> +/* In bootstrap mode, we don't actually record anything. */
> +if (IsBootstrapProcessingMode())
> +return;

Disabling pgstat progress reporting when in bootstrap processing mode
/ startup/end-of-recovery makes very little sense (see upthread) and
should be removed, regardless of whether seperate functions stay.

> diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
> +#define PROGRESS_CHECKPOINT_PHASE_INIT  0

Generally, enum-like values in a stat_progress field are 1-indexed, to
differentiate between empty/uninitialized (0) and states that have
been set by the progress reporting infrastructure.

Kind regards,

Matthias van de Meent

Re: convert libpq uri-regress tests to tap test

2022-02-24 Thread Jacob Champion

On Thu, 2022-02-24 at 08:46 -0800, Andres Freund wrote:
> One annoying bit is that our current tap invocation infrastructure for msvc
> won't know how to deal with that. We put the build directory containing t/
> onto PATH, but that won't work for test/. But we also don't want to install
> test binaries. Not sure what the solution for that is.

Would it help if the C executable, not Perl, was the thing actually
producing the TAP output? The binaries built from test/ could be placed
into t/. Or does that just open up a new set of problems?

--Jacob

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-02-24 Thread Jacob Champion

On Thu, 2022-02-24 at 20:39 +0900, Michael Paquier wrote:
> I don't quite see the additional value that this API brings as
> MyProcPort is directly accessible, and contrib modules like
> postgres_fdw and sslinfo just use that to find the data of the current
> backend.

Right -- I just didn't know if third-party modules were actually able
to rely on the internal layout of struct Port. Is that guaranteed to
remain constant for a major release line? If so, this new API is
superfluous.

> Cannot a module like pgaudit, through the elog hook, do its
> work with what we have already in place?

Given the above, I would hope so. Stephen mentioned that pgaudit only
had access to the logged-in role, and when I proposed a miscadmin.h API
he said that would help. CC'ing him to see what he meant; I don't know
if pgaudit has additional constraints.

> What's the use case for a given session to be able to report back
> only its own authn through SQL?

That's for triggers to be able to grab the current ID for
logging/auditing. Is there a better way to expose this for that use
case?

> I could still see a use case for that at a more global level with
> beentrys, but it looked like there was not much interest the last time
> I dropped this idea.

I agree that this would be useful to have in the stats. From my outside
perspective, it seems like it's difficult to get strings of arbitrary
length in there; is that accurate?

Thanks,
--Jacob

Re: convert libpq uri-regress tests to tap test

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 10:17:28 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-02-24 13:31:40 +0100, Peter Eisentraut wrote:
> >> I think the TAP scripts should be in src/interfaces/libpq/t/, as usual. The
> >> supporting code snippets could live in some other directory under
> >> src/interfaces/libpq/, which might be called "test" or something else, not
> >> that important.
>
> > Why not in t/? We can't easily build the test programs in libpq/ itself, but
> > libpq/t should be fairly doable.
>
> I think that having t/ directories contain only Perl test scripts
> is a good convention that we should stick to.  Peter's proposal
> of a separate test/ subdirectory for C test scaffolding is
> probably fine.

That exists today and continues to exist in the patch upthread, so it's easy
;). I just need to move the libpq/test/t to libpq/t and adjust the binary
path.

One annoying bit is that our current tap invocation infrastructure for msvc
won't know how to deal with that. We put the build directory containing t/
onto PATH, but that won't work for test/. But we also don't want to install
test binaries. Not sure what the solution for that is.

Even on !windows, we only know how to find "test executables" in tap tests via
PATH. We're in the source dir, so we can't just do test/executable.

I probably just need another coffee, but right now I don't even see how to add
anything to PATH given $(prove_check)'s definition - it ends up with multiple
shells. We can fix that by using && in the definition, which might be a good
thing anyway?

Attached three patches:

0001: Convert src/interfaces/libpq/test to a tap test
0002: Add tap test infrastructure to src/interfaces/libpq
0003: Move libpq_pipeline test into src/interfaces/libpq.

I did 0001 and 0002 in that order because prove errors out with a stacktrace
if no tap tests exist... It might make more sense to commit them together, but
for review it's easier to keep them separate I think.

Andrew, do you have an idea about the feasibility of supporting any of this
with the msvc build?

I'm mildly inclined to only do 0001 and 0002 for now. We'd not loose msvc
coverage, because it already doesn't build the test. Once we've ironed that
stuff out, we could do 0003?

Greetings,

Andres Freund
>From 71fa1532a1540e8bbf8bdee3ec0b64e863f212f2 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 23 Feb 2022 12:22:56 -0800
Subject: [PATCH v2 1/3] Convert src/interfaces/libpq/test to a tap test.

The invocation of the tap test will be added in the next commit.
---
 src/interfaces/libpq/t/001_uri.pl  | 265 +
 src/interfaces/libpq/test/.gitignore   |   2 -
 src/interfaces/libpq/test/Makefile |   7 +-
 src/interfaces/libpq/test/README   |   7 -
 src/interfaces/libpq/test/expected.out | 171 
 src/interfaces/libpq/test/regress.in   |  57 --
 src/interfaces/libpq/test/regress.pl   |  65 --
 7 files changed, 267 insertions(+), 307 deletions(-)
 create mode 100644 src/interfaces/libpq/t/001_uri.pl
 delete mode 100644 src/interfaces/libpq/test/README
 delete mode 100644 src/interfaces/libpq/test/expected.out
 delete mode 100644 src/interfaces/libpq/test/regress.in
 delete mode 100644 src/interfaces/libpq/test/regress.pl

diff --git a/src/interfaces/libpq/t/001_uri.pl b/src/interfaces/libpq/t/001_uri.pl
new file mode 100644
index 000..0225666d9f6
--- /dev/null
+++ b/src/interfaces/libpq/t/001_uri.pl
@@ -0,0 +1,265 @@
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Utils;
+use Test::More;
+use IPC::Run;
+
+my @tests = (
+	[q{postgresql://uri-user:secret@host:12345/db},
+	 q{user='uri-user' password='secret' dbname='db' host='host' port='12345' (inet)},
+ q{},
+],
+	[q{postgresql://uri-user@host:12345/db},
+	 q{user='uri-user' dbname='db' host='host' port='12345' (inet)},
+ q{},
+],
+	[q{postgresql://uri-user@host/db},
+	 q{user='uri-user' dbname='db' host='host' (inet)},
+ q{},
+],
+	[q{postgresql://host:12345/db},
+	 q{dbname='db' host='host' port='12345' (inet)},
+ q{},
+],
+	[q{postgresql://host/db},
+	 q{dbname='db' host='host' (inet)},
+ q{},
+],
+	[q{postgresql://uri-user@host:12345/},
+	 q{user='uri-user' host='host' port='12345' (inet)},
+ q{},
+],
+	[q{postgresql://uri-user@host/},
+	 q{user='uri-user' host='host' (inet)},
+ q{},
+],
+	[q{postgresql://uri-user@},
+	 q{user='uri-user' (local)},
+ q{},
+],
+	[q{postgresql://host:12345/},
+	 q{host='host' port='12345' (inet)},
+ q{},
+],
+	[q{postgresql://host:12345},
+	 q{host='host' port='12345' (inet)},
+ q{},
+],
+	[q{postgresql://host/db},
+	 q{dbname='db' host='host' (inet)},
+ q{},
+],
+	[q{postgresql://host/},
+	 q{host='host' (inet)},
+ q{},
+],
+	[q{postgresql://host},
+	 q{host='host' (inet)},
+ q{},
+],
+	[q{postgresql://},
+	 q{(local)},
+ q{},
+],
+

Re: Time to drop plpython2?

2022-02-24 Thread Mark Wong

On Wed, Feb 23, 2022 at 07:59:01AM -0800, Mark Wong wrote:
> On Tue, Feb 22, 2022 at 08:50:07PM -0500, Tom Lane wrote:
> > Mark Wong  writes:
> > > Take 3. :)
> > 
> > > I've upgraded everyone to the v14 buildfarm scripts and made sure the
> > > --test passed on HEAD on each one.  So I hopefully didn't miss any
> > > (other than the one EOL OpenSUSE version that I will plan on upgrading.)
> > 
> > Thanks!
> > 
> > However ... it seems like most of your animals haven't run at all
> > in the last couple of days.  Did you maybe forget to re-enable
> > their cron jobs after messing with them, or something like that?
> 
> Uh oh, more user error.  I tested run_build but run_branches wasn't
> happy.  I'll start working through that...

I think I have most of them operational again.  I see some animals are
still failing on some branches, but still better overall?

I discovered that clang for gadwall and pintail got purged when I
removed python2, so I disabled those two animals.

Regards,
Mark

Re: Extract epoch from Interval weird behavior

2022-02-24 Thread Tom Lane

Joseph Koshakow  writes:
> I do want to briefly mention, if I'm understanding the history of
> EXTRACT correctly, that the previous behavior
> actually was to multiply by 365.25, not 365. However The commit that
> changed the return type from numeric [1]
> changed that behavior. Looking through the discussions [2], I don't
> see any mention of it, which makes me think
> it was a mistake.

Indeed ... Peter?

regards, tom lane

Re: Add id's to various elements in protocol.sgml

2022-02-24 Thread Brar Piening


On 24.02.2022 at 16:46, Alvaro Herrera wrote:

On 2022-Feb-24, Dagfinn Ilmari Mannsåker wrote:


Peter Eisentraut  writes:

Is there a way to obtain those URLs other than going into the HTML
sources and checking if there is an anchor near where you want go?

I use the jump-to-anchor extension: https://github.com/brettz9/jump-to-anchor/

Some sites have javascript that adds a link next to the element that
becomes visible when hovering, e.g. the NAME and other headings on
https://metacpan.org/pod/perl.

Would it be possible to create such anchor links as part of the XSL
stylesheets for HTML?


Initially I thought that most use cases would involve developers who
would be perfectly capable of extracting the id they need from the html
sources but I agree that making that a bit more comfortable (especially
given the fact that others do that too) seems worthwhile.

I'll investiogate our options and report back.

Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-24 Thread Masahiko Sawada

Hi,

Thank you for the comments!

On Thu, Feb 24, 2022 at 4:20 PM tanghy.f...@fujitsu.com
 wrote:
>
> On Thu, Feb 24, 2022 9:33 AM Masahiko Sawada  wrote:
> >
> > Thank you for the comments! I've attached the latest version patch
> > that incorporated all comments I got so far. The primary change from
> > the previous version is that the subscription statistics live globally
> > rather than per-database.
> >
>
> Thanks for updating the patch.
>
> Few comments:
>
> 1.
> I think we should add some doc for column stats_reset in 
> pg_stat_subscription_activity view.

Added.

>
> 2.
> +CREATE VIEW pg_stat_subscription_activity AS
>  SELECT
> -w.subid,
> +a.subid,
>  s.subname,
> ...
> +a.apply_error_count,
> +a.sync_error_count,
> +   a.stats_reset
> +FROM pg_subscription as s,
> +pg_stat_get_subscription_activity(oid) as a;
>
> The line "a.stats_reset" uses a Tab, and we'd better use spaces here.

Fixed.

On Thu, Feb 24, 2022 at 5:54 PM Peter Smith  wrote:
>
> Hi. Below are my review comments for the v2 patch.
>
> ==
>
> 1. Commit message
>
> This patch changes the pg_stat_subscription_workers view (introduced
> by commit 8d74fc9) so that it stores only statistics counters:
> apply_error_count and sync_error_count, and has one entry for
> subscription.
>
> SUGGESTION
> "and has one entry for subscription." --> "and has one entry for each
> subscription."

Fixed.

>
> ~~~
>
> 2. Commit message
>
> After removing these error details, there are no longer relation
> information, so the subscription statistics are now a cluster-wide
> statistics.
>
> SUGGESTION
> "there are no longer relation information," --> "there is no longer
> any relation information,"

Fixed.

>
> ~~~
>
> 3. doc/src/sgml/monitoring.sgml
>
> -  
> -   The error message
> +   Number of times the error occurred during the application of changes
>
>   
>
>   
>
> -   last_error_time timestamp
> with time zone
> +   sync_error_count bigint
>
>
> -   Last time at which this error occurred
> +   Number of times the error occurred during the initial table
> +   synchronization
>
>
> SUGGESTION (both places)
> "Number of times the error occurred ..." --> "Number of times an error
> occurred ..."

Fixed.

>
> ~~~
>
> 4. doc/src/sgml/monitoring.sgml - missing column
>
> (duplicate - also reported by [Tang-v2])
>
> The PG docs for the new "stats_reset" column are missing.
>
> ~~~
>
> 5. src/backend/catalog/system_views.sql - whitespace
>
> (duplicate - also reported by [Tang-v2])
>
> -  JOIN pg_subscription s ON (w.subid = s.oid);
> +a.apply_error_count,
> +a.sync_error_count,
> + a.stats_reset
> +FROM pg_subscription as s,
> +pg_stat_get_subscription_activity(oid) as a;
>
> inconsistent tab/space indenting for 'a.stats_reset'.
>
> ~~~
>
> 6. src/backend/postmaster/pgstat.c - function name
>
> +/* --
> + * pgstat_reset_subscription_counter() -
> + *
> + * Tell the statistics collector to reset a single subscription
> + * counter, or all subscription counters (when subid is InvalidOid).
> + *
> + * Permission checking for this function is managed through the normal
> + * GRANT system.
> + * --
> + */
> +void
> +pgstat_reset_subscription_counter(Oid subid)
>
> SUGGESTION (function name)
> "pgstat_reset_subscription_counter" -->
> "pgstat_reset_subscription_counters" (plural)

Fixed.

>
> ~~
>
> 7. src/backend/postmaster/pgstat.c - pgstat_recv_resetsubcounter
>
> @@ -5645,6 +5598,51 @@
> pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
>   }
>  }
>
> +/* --
> + * pgstat_recv_resetsubcounter() -
> + *
> + * Reset some subscription statistics of the cluster.
> + * --
> + */
> +static void
> +pgstat_recv_resetsubcounter(PgStat_MsgResetsubcounter *msg, int len)
>
>
> "Reset some" seems a bit vague. Why not describe that it is all or
> none according to the msg->m_subid?

I think it reset none, one, or all statistics, actually. Given other
pgstat_recv_reset* functions also have similar comments, I think we
can use it rather than elaborating.

>
> ~~~
>
> 8. src/backend/postmaster/pgstat.c - pgstat_recv_resetsubcounter
>
> + if (!OidIsValid(msg->m_subid))
> + {
> + HASH_SEQ_STATUS sstat;
> +
> + /* Clear all subscription counters */
> + hash_seq_init(, subscriptionStatHash);
> + while ((subentry = (PgStat_StatSubEntry *) hash_seq_search()) != NULL)
> + pgstat_reset_subscription(subentry, ts);
> + }
> + else
> + {
> + /* Get the subscription statistics to reset */
> + subentry = pgstat_get_subscription_entry(msg->m_subid, false);
> +
> + /*
> + * Nothing to do if the given subscription entry is not found.  This
> + * could happen when the subscription with the subid is removed and
> + * the corresponding statistics entry is also removed before receiving
> + * the reset message.
> + */
> + if (!subentry)
> + return;
> +
> + /* Reset

Re: create_index test fails when synchronous_commit = off @ master

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 07:33:39 -0800, Andres Freund wrote:
> I added the SELECT relpages, reltuples, relallvisible FROM pg_class WHERE oid 
> = 'tenk1'::regclass;
> just after the
> VACUUM ANALYZE tenk1;
>
> synchronous_commit=on
> + relpages | reltuples | relallvisible
> +--+---+---
> +  345 | 1 |   345
> +(1 row)
>
> synchronous_commit=off
> + relpages | reltuples | relallvisible
> +--+---+---
> +  345 | 1 | 0
> +(1 row)
>
> So it clearly is the explanation for the issue.
>
>
> Obviously we can locally work around it by adding a
> SET LOCAL synchronous_commit = local;
> to the COPY. But I'd like to fully understand what's going on.

It is the hint bit sets delayed by asynchronous commit. I traced execution and
we do end up not setting all visible due to reaching the
!HeapTupleHeaderXminCommitted() path in lazy_scan_prune()

case HEAPTUPLE_LIVE:
...
/*
 * Is the tuple definitely visible to all 
transactions?
 *
 * NB: Like with per-tuple hint bits, we can't 
set the
 * PD_ALL_VISIBLE flag if the inserter committed
 * asynchronously. See SetHintBits for more 
info. Check that
 * the tuple is hinted xmin-committed because 
of that.
 */
if (prunestate->all_visible)
{
TransactionId xmin;

if 
(!HeapTupleHeaderXminCommitted(tuple.t_data))

So it might be reasonable to use synchronous_commit=on in test_setup.sql?

It's not super satisfying, but i don't immediately see what else could prevent
all-visible to be set in this case? There's no dead rows, so concurrent
snapshots shouldn't prevent cleanup.

I guess we could alternatively try doing something like flushing pending async
commits at the start of vacuum. But that probably isn't warranted.

Greetings,

Andres Freund

Re: Add id's to various elements in protocol.sgml

2022-02-24 Thread Alvaro Herrera

On 2022-Feb-24, Dagfinn Ilmari Mannsåker wrote:

> Peter Eisentraut  writes:

> > Is there a way to obtain those URLs other than going into the HTML
> > sources and checking if there is an anchor near where you want go?
> 
> I use the jump-to-anchor extension: https://github.com/brettz9/jump-to-anchor/
> 
> Some sites have javascript that adds a link next to the element that
> becomes visible when hovering, e.g. the NAME and other headings on
> https://metacpan.org/pod/perl.

Would it be possible to create such anchor links as part of the XSL
stylesheets for HTML?

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Re: create_index test fails when synchronous_commit = off @ master

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 16:47:25 +0300, Aleksander Alekseev wrote:
> -  QUERY PLAN
> 
> - Index Only Scan using tenk1_thous_tenthous on tenk1
> -   Index Cond: (thousand < 2)
> -   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
> -(3 rows)
> +  QUERY PLAN
> +--
> + Sort
> +   Sort Key: thousand
> +   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
> + Index Cond: ((thousand < 2) AND (tenthous = ANY
> ('{1001,3000}'::integer[])))
> +(4 rows)

Heh. We've been having a lot of fights with exactly this plan change in the
AIO branch, before cc50080a82, and without synchronous_commit =
off. Interestingly near-exclusively with the regression run within
pg_upgrade's tests.

For aio we (David did a lot of that IIRC) finally hunted it down to be due
vacuum skipping pages due to inability to get a cleanup lock. If that happens
enough, pg_class.relallvisible changes enough to lead to the different plan.

I first was going to suggest that we should just use VACUUM FREEZE to prevent
the issue.

But in this instance the cause isn't cleanup locks, probably that we can't yet
set hint bits due to synchronous_commit=off? But I don't *fully* understand
how it leads to this.

I added the SELECT relpages, reltuples, relallvisible FROM pg_class WHERE oid = 
'tenk1'::regclass;
just after the
VACUUM ANALYZE tenk1;

synchronous_commit=on
+ relpages | reltuples | relallvisible
+--+---+---
+  345 | 1 |   345
+(1 row)

synchronous_commit=off
+ relpages | reltuples | relallvisible
+--+---+---
+  345 | 1 | 0
+(1 row)

So it clearly is the explanation for the issue.

Obviously we can locally work around it by adding a
SET LOCAL synchronous_commit = local;
to the COPY. But I'd like to fully understand what's going on.

> I didn't investigate further. Do we assume that `make installcheck` suppose
> to pass with a different postgresql.conf options?

Depends on the option, I think... There's some where it's interesting to run
tests with different options and where the effort required is reasonable. And
some cases where it's not... synchronous_commit=off worked until recently, and
I think we should keep it working.

Greetings,

Andres Freund

Re: Extract epoch from Interval weird behavior

2022-02-24 Thread Joseph Koshakow

On Thu, Feb 24, 2022 at 4:47 AM Aleksander Alekseev
 wrote:
> Extracting an epoch from an interval is quite a strange case since intervals 
> are not connected to any specific dates.

I agree, I think it's a weird use case and that it's probably not
worth fixing. Though it was fun for me to try.

>
> All in all, I don't think that the benefit of the proposed change outweighs 
> the fact that it will break the previous behavior for the users who may rely 
> on it. I suggest keeping it simple, i.e. the way it is now. What I think we 
> could do instead is explicitly document this behavior in [1].
>
> [1]: https://www.postgresql.org/docs/current/functions-datetime.html

I do want to briefly mention, if I'm understanding the history of
EXTRACT correctly, that the previous behavior
actually was to multiply by 365.25, not 365. However The commit that
changed the return type from numeric [1]
changed that behavior. Looking through the discussions [2], I don't
see any mention of it, which makes me think
it was a mistake. However there is a lot of discussion around numeric
performance and being able to optimize
numeric division because every divisor was a power of 10. Fixing this
issue would break that assumption and
cause some performance degradations which probably isn't worth it.

[1]: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2da77cdb4661826482ebf2ddba1f953bc74afe4
[2]: 
https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce1...@phystech.edu

- Joe Koshakow

Re: convert libpq uri-regress tests to tap test

2022-02-24 Thread Tom Lane

Andres Freund  writes:
> On 2022-02-24 13:31:40 +0100, Peter Eisentraut wrote:
>> I think the TAP scripts should be in src/interfaces/libpq/t/, as usual. The
>> supporting code snippets could live in some other directory under
>> src/interfaces/libpq/, which might be called "test" or something else, not
>> that important.

> Why not in t/? We can't easily build the test programs in libpq/ itself, but
> libpq/t should be fairly doable.

I think that having t/ directories contain only Perl test scripts
is a good convention that we should stick to.  Peter's proposal
of a separate test/ subdirectory for C test scaffolding is
probably fine.

regards, tom lane

Re: Frontend error logging style

2022-02-24 Thread Tom Lane

Peter Eisentraut  writes:
> This patch already illustrates a couple of things that are wrong with 
> this approach:

All of these objections are a bit moot with my followup proposal, no?

> - It doesn't allow any other way of exiting.  For example, in pg_dump, 
> you have removed a few exit_nicely() calls.  It's not clear why that is 
> valid or whether it would always be valid for all call sites.

As the patch stood, I'd hacked it so that pg_log_fatal called
exit_nicely() not exit() within pg_dump.  I agree that's not a great
solution, but I think that the correct fix is to get rid of exit_nicely
in favor of doing the requisite cleanup in an atexit hook.  pg_dump
is already at great hazard of somebody calling exit() not exit_nicely(),
either through failure to pay attention to that undocumented convention,
or because some code it imports from someplace like src/common calls
exit() directly.  You need not look any further than pg_malloc()
to see that there are live problems there.

I figured this could be addressed in a separate patch, though,
because to the extent that missing the exit-nicely callbacks is
a real bug, it's already a bug.

> My suggestion is to just get rid of pg_log_fatal() and replace them all 
> with pg_log_error().

I'm on board with dropping the separate FATAL log level if there's
consensus to do so; I think it adds more confusion than anything else.
I still want to have something like pg_fatal(), though, because it's
impossible to deny the usefulness of such an abbreviation.  It's already
been reinvented three or four times independently.

Independently of either of those points, I still want to make the changes
I proposed w.r.t. making explicit concepts of "detail" and "hint"
addendums.  What we have right now in the frontend code is an impossible
mishmash of three or four ways of filling that lack, all inconsistent.

regards, tom lane

Re: Readd use of TAP subtests

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 11:16:03 +0100, Peter Eisentraut wrote:
> Now that we have switched everything to done_testing(), the subtests feature
> isn't that relevant anymore, but it might still be useful to get better
> output when running with PROVE_FLAGS=--verbose.

I've incidentally played with subtests yesterdays, when porting
src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it seems
that subtests aren't actually specified in the tap format, and that different
libraries generate different output formats. The reason this matters somewhat
is that meson's testrunner can parse tap and give nicer progress / error
reports. But since subtests aren't in the spec it can't currently parse
them...

Open issue since 2015:
https://github.com/TestAnything/Specification/issues/2

The perl ecosystem is so moribund :(.

> t/001_basic.pl ..
> # Subtest: vacuumlo --help
> ok 1 - exit code 0
> ok 2 - goes to stdout
> ok 3 - nothing to stderr
> 1..3

It's clearly better.

We can approximate some of it by using is_deeply() and comparing exit, stdout,
stderr at once. Particularly for helpers like program_help() that are used in
a lot of places.

Greetings,

Andres Freund

Re: Frontend error logging style

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 14:06:18 +0100, Peter Eisentraut wrote:
> My suggestion is to just get rid of pg_log_fatal() and replace them all with
> pg_log_error().

-1. This ignores that already several places came up with their slightly
different versions of fatal exit handlers. We don't gain anything by not
standardizing on one notion of a fatal error wrapper.

Greetings,

Andres Freund

Re: convert libpq uri-regress tests to tap test

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 13:31:40 +0100, Peter Eisentraut wrote:
> I think the TAP scripts should be in src/interfaces/libpq/t/, as usual. The
> supporting code snippets could live in some other directory under
> src/interfaces/libpq/, which might be called "test" or something else, not
> that important.

Why not in t/? We can't easily build the test programs in libpq/ itself, but
libpq/t should be fairly doable.

Greetings,

Andres Freund

Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-24 Thread Andres Freund

Hi,

On 2022-02-24 13:23:55 +0100, Peter Eisentraut wrote:
> On 24.02.22 12:46, Masahiko Sawada wrote:
> > > We have a view called pg_stat_activity, which is very well known.  From
> > > that perspective, "activity" means what is happening right now or what
> > > has happened most recently.  The reworked view in this patch does not
> > > contain that (we already have pg_stat_subscription for that), but it
> > > contains accumulated counters.
> > Right.
> > 
> > What pg_stat_subscription shows is rather suitable for the name
> > pg_stat_subscription_activity than the reworked view. But switching
> > these names would also not be a good idea.  I think it's better to use
> > "subscription" in the view name since it shows actually statistics for
> > subscriptions and subscription OID is the key. I personally prefer
> > pg_stat_subscription_counters among the ideas that have been proposed
> > so far, but I'd like to hear opinions and votes.
> 
> _counters will fail if there is something not a counter (such as
> last-timestamp-of-something).
> 
> Earlier, pg_stat_subscription_stats was mentioned, which doesn't have that
> problem.

We really should try to fix this in a more general way at some point. We have
way too many different things mixed up in pg_stat_*.

I'd like to get something like the patch in soon though, we can still change
the name later. I've been blocked behind this stuff for weeks, and it's
getting really painful...

Greetings,

Andres Freund

Re: GiST index build missing smgrimmedsync()?

2022-02-24 Thread Heikki Linnakangas


On 23/02/2022 23:30, Melanie Plageman wrote:

I brought this up in [1] but wanted to start a dedicated thread.

Since 16fa9b2b30a357 GiST indexes are not built in shared buffers.
However, smgrimmedsync() is not done anywhere and skipFsync=true is
always passed to smgrwrite() and smgrextend(). So, if a checkpoint
starts and finishes after WAL-logging some of the index build pages and
there is a crash sometime before the dirty pages make it to permanent
storage, won't that data be lost?


Yes, good catch!


Seems like the following would address this:


Committed essentially that, except that I put the smgrimmedsync in a 
separate if-block, and copied the comment from the similar piece of code 
from nbtsort.c to explain the issue.


Thanks!

- Heikki

Re: Add id's to various elements in protocol.sgml

2022-02-24 Thread Dagfinn Ilmari Mannsåker

Peter Eisentraut  writes:

> On 18.12.21 00:53, Brar Piening wrote:
>> The purpose is that you can directly link to the id in the public html
>> docs which still gets generated (e. g.
>> https://www.postgresql.org/docs/14/protocol-replication.html#PROTOCOL-REPLICATION-BASE-BACKUP).
>>  
>> Essentially it gives people discussing the protocol and pointing to a
>> certain command or message format the chance to link to the very thing
>> they are discussing instead of the top of the lengthy html page.
>
> Is there a way to obtain those URLs other than going into the HTML
> sources and checking if there is an anchor near where you want go?

I use the jump-to-anchor extension: https://github.com/brettz9/jump-to-anchor/

Some sites have javascript that adds a link next to the element that
becomes visible when hovering, e.g. the NAME and other headings on
https://metacpan.org/pod/perl.

- ilmari

Re: Patch a potential memory leak in describeOneTableDetails()

2022-02-24 Thread Daniel Gustafsson

> On 24 Feb 2022, at 07:32, Kyotaro Horiguchi  wrote:
> 
> At Thu, 24 Feb 2022 14:44:56 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
>> At Tue, 22 Feb 2022 09:59:09 +0100, Daniel Gustafsson  
>> wrote in 
>>> The section in question was added to our docs 22 years ago, to make it 
>>> reflect
>>> the current operating systems in use maybe just not mentioning more(1) is 
>>> the
>>> easiest?:
>>> 
>>>"The text browsing tool less can be invoked as less -x4 to make it show
>>>tabs appropriately."
>>> 
>>> Or perhaps remove that section altogether?
>> 
>> I think the former is better.
> 
> So the attached does that.

I think this is reasonable, since it's pretty clear that more(1) supporting
"-x" is fairly rare.  I'll await others commenting.

--
Daniel Gustafsson   https://vmware.com/

create_index test fails when synchronous_commit = off @ master

2022-02-24 Thread Aleksander Alekseev

Hi hackers,

I noticed that create_index test (make installcheck) fails on my laptop
because different query plans are chosen:

-  QUERY PLAN

- Index Only Scan using tenk1_unique1 on tenk1
-   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+QUERY PLAN
+---
+ Sort
+   Sort Key: unique1
+   ->  Bitmap Heap Scan on tenk1
+ Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+ ->  Bitmap Index Scan on tenk1_unique1
+   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)

...

-  QUERY PLAN

- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+  QUERY PLAN
+--
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+ Index Cond: ((thousand < 2) AND (tenthous = ANY
('{1001,3000}'::integer[])))
+(4 rows)

Investigation showed that it happens only with `synchronous_commit = off`.
Only the master branch is affected, starting from cc50080a82. This is a
debug build. I'm running MacOS Monterey 12.2.1 @ x64.

I didn't investigate further. Do we assume that `make installcheck` suppose
to pass with a different postgresql.conf options?

-- 
Best regards,
Aleksander Alekseev

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-24 Thread Daniel Gustafsson

> On 24 Feb 2022, at 14:43, Gunnar Nick Bluth  wrote:

> That looks just as good to me, and it already has tests, so I'll happily step 
> down!

Actually, I think this looks like a saner approach.  Putting a config setting
in two place (postgresql.conf and on the commandline for pg_rewind) is a recipe
for them diverging.

--
Daniel Gustafsson   https://vmware.com/

Re: Typo in pgbench messages.

2022-02-24 Thread Daniel Gustafsson

> On 24 Feb 2022, at 13:58, Fabien COELHO  wrote:

>> One argument against a backpatch is that this would be disruptive with
>> tools that parse and analyze the output generated by pgbench.  Fabien,
>> don't you have some tools and/or wrappers doing exactly that?
> 
> Yep, I like to "| cut -d' ' -fx" and other "line.split(' ')" or whatever.
> 
> I think that the break of typographical rules is intentional to allow such 
> simplistic low-level stream handling through pipes or scripts. I'd prefer 
> that the format is not changed. Maybe a comment could be added to explain the 
> reason behind it.

That doesn't sound like an overwhelmingly convincing argument to print some
messages with 'X %' and others with 'X%'.

--
Daniel Gustafsson   https://vmware.com/

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-24 Thread Gunnar "Nick" Bluth


Am 24.02.2022 um 14:08 schrieb Daniel Gustafsson:

On 24 Feb 2022, at 10:27, Alexander Kukushkin  wrote:

Hello Gunnar,

On Thu, 24 Feb 2022 at 10:03, Aleksander Alekseev  
wrote:
  

wants to use the "-c" option on a typical Debian/Ubuntu installation
(where the config resides below /etc/postgresql/), pg_rewind needs a way
to be told where the postgresql.conf actually is.

The attached patch implements this as an option.

[...]


Good catch!

Yeah, it is a known problem, and there was already another patch trying to 
address it [1].


There is yet another related patch which provides a parameter to pass in
restore_command in cases where postgresq.conf isn't reachable:

   https://commitfest.postgresql.org/37/3213/


That looks just as good to me, and it already has tests, so I'll happily 
step down!


(Note to myself: check the CF first next time ;-)
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato

Re: Readd use of TAP subtests

2022-02-24 Thread Daniel Gustafsson

> On 24 Feb 2022, at 11:16, Peter Eisentraut 
>  wrote:

> I think for deeply nested tests and test routines defined in other modules,
> this helps structure the test output more like the test source code is laid
> out, so it makes following the tests and locating failing test code easier.

I don't have any strong opinions on this, but if we go ahead with it I think
there should be a short note in src/test/perl/README about when substest could
be considered.

--
Daniel Gustafsson   https://vmware.com/

Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-24 Thread Daniel Gustafsson

> On 24 Feb 2022, at 10:27, Alexander Kukushkin  wrote:
> 
> Hello Gunnar,
> 
> On Thu, 24 Feb 2022 at 10:03, Aleksander Alekseev  
> wrote:
>  
> > wants to use the "-c" option on a typical Debian/Ubuntu installation
> > (where the config resides below /etc/postgresql/), pg_rewind needs a way
> > to be told where the postgresql.conf actually is.
> >
> > The attached patch implements this as an option.
> >
> > [...]
> 
> Good catch!
> 
> Yeah, it is a known problem, and there was already another patch trying to 
> address it [1].

There is yet another related patch which provides a parameter to pass in
restore_command in cases where postgresq.conf isn't reachable:

  https://commitfest.postgresql.org/37/3213/

--
Daniel Gustafsson   https://vmware.com/

Re: Frontend error logging style

2022-02-24 Thread Peter Eisentraut


On 23.02.22 00:23, Tom Lane wrote:

This conversation seems to have tailed off without full resolution,
but I observe that pretty much everyone except Peter is on board
with defining pg_log_fatal as pg_log_error + exit(1).  So I think
we should just do that, unless Peter wants to produce a finished
alternative proposal.

I've now gone through and fleshed out the patch I sketched upthread.


This patch already illustrates a couple of things that are wrong with 
this approach:


- It doesn't allow any other way of exiting.  For example, in pg_dump, 
you have removed a few exit_nicely() calls.  It's not clear why that is 
valid or whether it would always be valid for all call sites.


- It doesn't allow other exit codes.  For example, in psql, you have 
changed a pg_log_fatal() call to pg_log_error() because it needed 
another exit code.  This slides us right back into that annoying 
situation where in the backend we have to log error messages using 
elog(LOG) because the flow control is tangled up with the log level.


My suggestion is to just get rid of pg_log_fatal() and replace them all 
with pg_log_error().

Re: Optionally automatically disable logical replication subscriptions on error

2022-02-24 Thread Masahiko Sawada

On Thu, Feb 24, 2022 at 8:08 PM Amit Kapila  wrote:
>
> On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada  wrote:
> >
> > Here are some comments:
> >
> > Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()?
> >
>
> I have given this comment to move the related code to separate
> functions to slightly simplify ApplyWorkerMain() code but if you don't
> like we can move it back. I am not sure I like the new function names
> in the patch though.

Okay, I'm fine with moving this code but perhaps we can find a better
function name as "Wrapper" seems slightly odd to me. For example,
start_table_sync_start() and start_apply_changes() or something (it
seems we use the snake case for static functions in worker.c).

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Typo in pgbench messages.

2022-02-24 Thread Fabien COELHO



Bonjour Michaël,


I think it's better to back-patch this to stable branches if there's
no objection. Thought?


That's only cosmetic, so I would just bother about HEAD if I were to
change something like that (I would not bother at all, personally).

One argument against a backpatch is that this would be disruptive with
tools that parse and analyze the output generated by pgbench.  Fabien,
don't you have some tools and/or wrappers doing exactly that?


Yep, I like to "| cut -d' ' -fx" and other "line.split(' ')" or whatever.

I think that the break of typographical rules is intentional to allow such 
simplistic low-level stream handling through pipes or scripts. I'd prefer 
that the format is not changed. Maybe a comment could be added to explain 
the reason behind it.


--
Fabien.

Buffer Manager and Contention

2022-02-24 Thread Simon Riggs

Thinking about poor performance in the case where the data fits in
RAM, but the working set is too big for shared_buffers, I notice a
couple of things that seem bad in BufMgr, but don't understand why
they are like that.

1. If we need to allocate a buffer to a new block we do this in one
step, while holding both partition locks for the old and the new tag.
Surely it would cause less contention to make the old block/tag
invalid (after flushing), drop the old partition lock and then switch
to the new one? i.e. just hold one mapping partition lock at a time.
Is there a specific reason we do it this way?

2. Possibly connected to the above, we issue BufTableInsert() BEFORE
we issue BufTableDelete(). That means we need extra entries in the
buffer mapping hash table to allow us to hold both the old and the new
at the same time, for a short period. The way dynahash.c works, we try
to allocate an entry from the freelist and if that doesn't work, we
begin searching ALL the freelists for free entries to steal. So if we
get enough people trying to do virtual I/O at the same time, then we
will hit a "freelist storm" where everybody is chasing the last few
entries. It would make more sense if we could do BufTableDelete()
first, then hold onto the buffer mapping entry rather than add it to
the freelist, so we can use it again when we do BufTableInsert() very
shortly afterwards. That way we wouldn't need to search the freelist
at all. What is the benefit or reason of doing the Delete after the
Insert?

Put that another way, it looks like BufTable functions are used in a
way that mismatches against the way dynahash is designed.

Thoughts?

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-24 Thread Masahiko Sawada

On Thu, Feb 24, 2022 at 9:23 PM Peter Eisentraut
 wrote:
>
>
> On 24.02.22 12:46, Masahiko Sawada wrote:
> >> We have a view called pg_stat_activity, which is very well known.  From
> >> that perspective, "activity" means what is happening right now or what
> >> has happened most recently.  The reworked view in this patch does not
> >> contain that (we already have pg_stat_subscription for that), but it
> >> contains accumulated counters.
> > Right.
> >
> > What pg_stat_subscription shows is rather suitable for the name
> > pg_stat_subscription_activity than the reworked view. But switching
> > these names would also not be a good idea.  I think it's better to use
> > "subscription" in the view name since it shows actually statistics for
> > subscriptions and subscription OID is the key. I personally prefer
> > pg_stat_subscription_counters among the ideas that have been proposed
> > so far, but I'd like to hear opinions and votes.
>
> _counters will fail if there is something not a counter (such as
> last-timestamp-of-something).
>
> Earlier, pg_stat_subscription_stats was mentioned, which doesn't have
> that problem.

Ah, I had misunderstood your comment. Right, _counter could be a
blocker for the future changes.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Proposal: Support custom authentication methods using hooks

2022-02-24 Thread Andrew Dunstan



On 2/24/22 04:16, Aleksander Alekseev wrote:
> Hi Samay,
>
>> I wanted to submit a patch to expose 2 new hooks (one for the authentication 
>> check and another one for error reporting) in auth.c. These will allow users 
>> to implement their own authentication methods for Postgres or add custom 
>> logic around authentication.
> I like the idea - PostgreSQL is all about extendability. Also, well
> done with TAP tests and an example extension. This being said, I
> didn't look at the code yet, but cfbot seems to be happy with it:
> http://cfbot.cputube.org/
>
>> One constraint in the current implementation is that we allow only one 
>> authentication provider to be loaded at a time. In the future, we can add 
>> more functionality to maintain an array of hooks and call the appropriate 
>> one based on the provider name in the pg_hba line.
> This sounds like a pretty severe and unnecessary limitation to me. Do
> you think it would be difficult to bypass it in the first
> implementation?



Yeah, I think we  would want a set of providers that could be looked up
at runtime.


I think this is likely to me material for release 16, so there's plenty
of time to get it right.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: convert libpq uri-regress tests to tap test

2022-02-24 Thread Peter Eisentraut


On 24.02.22 02:52, Tom Lane wrote:

Peter Eisentraut  writes:

On 23.02.22 23:58, Tom Lane wrote:

Peter Eisentraut  writes:

libpq TAP tests should be in src/interfaces/libpq/t/.



That's failing to account for the fact that a libpq test can't
really be a pure-perl TAP test; you need some C code to drive the
library.



Such things could be put under src/interfaces/libpq/test, or some other
subdirectory.  We already have src/interfaces/ecpg/test.


OK, but then the TAP scripts are under src/interfaces/libpq/test/t,
which isn't what you said.  I have no great objection to moving
src/test/modules/libpq_pipeline/ to src/interfaces/libpq/test/,
though, as long as the buildfarm will cope.


I think the TAP scripts should be in src/interfaces/libpq/t/, as usual. 
The supporting code snippets could live in some other directory under 
src/interfaces/libpq/, which might be called "test" or something else, 
not that important.


I think we should pick a layout that is proper and future-proof and then 
adjust the buildfarm client as necessary.  The issue of writing 
libpq-specific tests has come up a few times recently; I think it would 
be worth finding a proper solution to this that would facilitate that 
work in the future.

Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-24 Thread Peter Eisentraut




On 24.02.22 12:46, Masahiko Sawada wrote:

We have a view called pg_stat_activity, which is very well known.  From
that perspective, "activity" means what is happening right now or what
has happened most recently.  The reworked view in this patch does not
contain that (we already have pg_stat_subscription for that), but it
contains accumulated counters.

Right.

What pg_stat_subscription shows is rather suitable for the name
pg_stat_subscription_activity than the reworked view. But switching
these names would also not be a good idea.  I think it's better to use
"subscription" in the view name since it shows actually statistics for
subscriptions and subscription OID is the key. I personally prefer
pg_stat_subscription_counters among the ideas that have been proposed
so far, but I'd like to hear opinions and votes.


_counters will fail if there is something not a counter (such as 
last-timestamp-of-something).


Earlier, pg_stat_subscription_stats was mentioned, which doesn't have 
that problem.

Re: convert libpq uri-regress tests to tap test

2022-02-24 Thread Andrew Dunstan



On 2/23/22 20:52, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 23.02.22 23:58, Tom Lane wrote:
>>> Peter Eisentraut  writes:
 libpq TAP tests should be in src/interfaces/libpq/t/.
>>> That's failing to account for the fact that a libpq test can't
>>> really be a pure-perl TAP test; you need some C code to drive the
>>> library.
>> Such things could be put under src/interfaces/libpq/test, or some other 
>> subdirectory.  We already have src/interfaces/ecpg/test.
> OK, but then the TAP scripts are under src/interfaces/libpq/test/t,
> which isn't what you said.  I have no great objection to moving
> src/test/modules/libpq_pipeline/ to src/interfaces/libpq/test/,
> though, as long as the buildfarm will cope.
>
>   


It won't without some adjustment.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Add id's to various elements in protocol.sgml

2022-02-24 Thread Peter Eisentraut




On 18.12.21 00:53, Brar Piening wrote:

The purpose is that you can directly link to the id in the public html
docs which still gets generated (e. g.
https://www.postgresql.org/docs/14/protocol-replication.html#PROTOCOL-REPLICATION-BASE-BACKUP). 


Essentially it gives people discussing the protocol and pointing to a
certain command or message format the chance to link to the very thing
they are discussing instead of the top of the lengthy html page.


Is there a way to obtain those URLs other than going into the HTML 
sources and checking if there is an anchor near where you want go?

Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-24 Thread Masahiko Sawada

On Thu, Feb 24, 2022 at 9:05 PM Amit Kapila  wrote:
>
> On Thu, Feb 24, 2022 at 2:24 PM Peter Smith  wrote:
> >
> > 10. src/backend/replication/logical/worker.c
> >
> > (from my previous [Peter-v1] #9)
> >
> > >> + /* Report the worker failed during table synchronization */
> > >> + pgstat_report_subscription_error(MyLogicalRepWorker->subid, false);
> > >>
> > >> and
> > >>
> > >> + /* Report the worker failed during the application of the change */
> > >> + pgstat_report_subscription_error(MyLogicalRepWorker->subid, true);
> > >>
> > >>
> > >> Why don't these use MySubscription->oid instead of 
> > >> MyLogicalRepWorker->subid?
> >
> > > It's just because we used to use MyLogicalRepWorker->subid, is there
> > > any particular reason why we should use MySubscription->oid here?
> >
> > I felt MySubscription->oid is a more natural and more direct way of
> > expressing the same thing.
> >
> > Consider:  "the oid of the current subscription" versus "the oid of
> > the subscription of the current worker". IMO the first one is simpler.
> > YMMV.
> >
>
> I think we can use either but maybe MySubscription->oid would be
> slightly better here as the same is used in nearby code as well.

Okay, will change.

> >
> > 13. src/test/subscription/t/026_worker_stats.pl - missing test?
> >
> > Shouldn't there also be some test to reset the counters to confirm
> > that they really do get reset to zero?
> >
> > ~~~
> >
>
> I think we avoid writing tests for stats for each and every case as it
> is not reliable in nature (the message can be lost). If we can find a
> reliable way then it is okay.

Yeah, the messages can even be out-of-order. Particularly, in this
test, the apply worker and table sync worker keep reporting the
messages, it's quite possible that the test becomes unstable. I
remember we removed unstable tests of resetting statistics before
(e.g., see fc6950913).

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

1 2 >

1 - 100 of 123 matches

Mail list logo