Re: Logical Replication of sequences

2024-06-05 Thread Amit Kapila
On Thu, Jun 6, 2024 at 9:32 AM Masahiko Sawada  wrote:
>
> On Wed, Jun 5, 2024 at 12:43 PM Amit Kapila  wrote:
> >
> > On Tue, Jun 4, 2024 at 8:56 PM Yogesh Sharma
> >  wrote:
> > >
> > > On 6/4/24 06:57, Amit Kapila wrote:
> > >
> > > > 2. Provide a command say Alter Subscription ...  Replicate Sequences
> > > > (or something like that) which users can perform before shutdown of
> > > > the publisher node during upgrade. This will allow copying all the
> > > > sequences from the publisher node to the subscriber node directly.
> > > > Similar to previous approach, this could also be inconvenient for
> > > > users.
> > >
> > > This is similar to option 1 except that it is a SQL command now.
> > >
> >
> > Right, but I would still prefer a command as it provides clear steps
> > for the upgrade. Users need to perform (a) Replicate Sequences for a
> > particular subscription (b) Disable that subscription (c) Perform (a)
> > and (b) for all the subscriptions corresponding to the publisher we
> > want to shut down for upgrade.
> >
> > I agree there are some manual steps involved here but it is advisable
> > for users to ensure that they have received the required data on the
> > subscriber before the upgrade of the publisher node, otherwise, they
> > may not be able to continue replication after the upgrade. For
> > example, see the "Prepare for publisher upgrades" step in pg_upgrade
> > docs [1].
> >
> > >
> > > > 3. Replicate published sequences via walsender at the time of shutdown
> > > > or incrementally while decoding checkpoint record. The two ways to
> > > > achieve this are: (a) WAL log a special NOOP record just before
> > > > shutting down checkpointer. Then allow the WALsender to read the
> > > > sequence data and send it to the subscriber while decoding the new
> > > > NOOP record. (b) Similar to the previous idea but instead of WAL
> > > > logging a new record directly invokes a decoding callback after
> > > > walsender receives a request to shutdown which will allow pgoutput to
> > > > read and send required sequences. This approach has a drawback that we
> > > > are adding more work at the time of shutdown but note that we already
> > > > waits for all the WAL records to be decoded and sent before shutting
> > > > down the walsender during shutdown of the node.
> > >
> > > At the time of shutdown a) most logical upgrades don't necessarily call
> > > for shutdown
> > >
> >
> > Won't the major version upgrade expect that the node is down? Refer to
> > step "Stop both servers" in [1].
>
> I think the idea is that the publisher is the old version and the
> subscriber is the new version, and changes generated on the publisher
> are replicated to the subscriber via logical replication. And at some
> point, we change the application (or a router) settings so that no
> more transactions come to the publisher, do the last upgrade
> preparation work (e.g. copying the latest sequence values if
> requried), and then change the application so that new transactions
> come to the subscriber.
>

Okay, thanks for sharing the exact steps. If one has to follow that
path then sending incrementally (at checkpoint WAL or other times)
won't work because we want to ensure that the sequences are up-to-date
before the application starts using the new database. To do that in a
bullet-proof way, one has to copy/replicate sequences during the
requests to the new database are paused (Reference from the blog you
shared: For the first second after flipping the flag, our application
artificially paused any new database requests for one second.).
Currently, they are using some guesswork to replicate sequences that
require manual verification and more manual work for each sequence.
The new command (Alter Subscription ... Replicate Sequence) should
ease their procedure and can do things where they would require no or
very less verification.

> I remember the blog post about Knock doing a similar process to
> upgrade the clusters with minimal downtime[1].
>

Thanks for sharing the blog post.

-- 
With Regards,
Amit Kapila.




Re: 001_rep_changes.pl fails due to publisher stuck on shutdown

2024-06-05 Thread Kyotaro Horiguchi
At Thu, 6 Jun 2024 12:49:45 +1000, Peter Smith  wrote in 
> Hi, I have reproduced this multiple times now.
> 
> I confirmed the initial post/steps from Alexander. i.e. The test
> script provided [1] gets itself into a state where function
> ReadPageInternal (called by XLogDecodeNextRecord and commented "Wait
> for the next page to become available") constantly returns
> XLREAD_FAIL. Ultimately the test times out because WalSndLoop() loops
> forever, since it never calls WalSndDone() to exit the walsender
> process.

Thanks for the repro; I believe I understand what's happening here.

During server shutdown, the latter half of the last continuation
record may fail to be flushed. This is similar to what is described in
the commit message of commit ff9f111bce. While shutting down,
WalSndLoop() waits for XLogSendLogical() to consume WAL up to
flushPtr, but in this case, the last record cannot complete without
the continuation part starting from flushPtr, which is
missing. However, in such cases, xlogreader.missingContrecPtr is set
to the beginning of the missing part, but something similar to 

So, I believe the attached small patch fixes the behavior. I haven't
come up with a good test script for this issue. Something like
026_overwrite_contrecord.pl might work, but this situation seems a bit
more complex than what it handles.

Versions back to 10 should suffer from the same issue and the same
patch will be applicable without significant changes.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 99cad7bd53a94b4b90937fb1eb2f37f2ebcadf6a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 6 Jun 2024 14:56:53 +0900
Subject: [PATCH] Fix infinite loop in walsender during publisher shutdown

When a publisher server is shutting down, there can be a case where
the last WAL record at that point is a continuation record with its
latter part not yet flushed. In such cases, the walsender attempts to
read this unflushed part and ends up in an infinite loop. To prevent
this situation, modify the logical WAL sender to consider itself
caught up in this case. The records that are not fully flushed at this
point are generally not significant, so simply ignoring them should
not cause any issues.
---
 src/backend/replication/walsender.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index c623b07cf0..635396c138 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -3426,8 +3426,15 @@ XLogSendLogical(void)
 			flushPtr = GetFlushRecPtr(NULL);
 	}
 
-	/* If EndRecPtr is still past our flushPtr, it means we caught up. */
-	if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
+	/*
+	 * If EndRecPtr is still past our flushPtr, it means we caught up.  When
+	 * the server is shutting down, the latter part of a continuation record
+	 * may be missing.  If got_STOPPING is true, assume we are caught up if the
+	 * last record is missing its continuation part at flushPtr.
+	 */
+	if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr ||
+		(got_STOPPING &&
+		 logical_decoding_ctx->reader->missingContrecPtr == flushPtr))
 		WalSndCaughtUp = true;
 
 	/*
-- 
2.43.0



Re: Avoid orphaned objects dependencies, take 3

2024-06-05 Thread Bertrand Drouvot
Hi,

On Thu, May 23, 2024 at 02:10:54PM -0400, Robert Haas wrote:
> On Thu, May 23, 2024 at 12:19 AM Bertrand Drouvot
>  wrote:
> > The reason why we are using a dirty snapshot here is for the cases where we 
> > are
> > recording a dependency on a referenced object that we are creating at the 
> > same
> > time behind the scene (for example, creating a composite type while creating
> > a relation). Without the dirty snapshot, then the object we are creating 
> > behind
> > the scene (the composite type) would not be visible and we would wrongly 
> > assume
> > that it has been dropped.
> 
> The usual reason for using a dirty snapshot is that you want to see
> uncommitted work by other transactions. It sounds like you're saying
> you just need to see uncommitted work by the same transaction. If
> that's true, I think using HeapTupleSatisfiesSelf would be clearer. Or
> maybe we just need to put CommandCounterIncrement() calls in the right
> places to avoid having the problem in the first place. Or maybe this
> is another sign that we're doing the work at the wrong level.

Thanks for having discussed your concern with Tom last week during pgconf.dev
and shared the outcome to me. I understand your concern regarding code
maintainability with the current approach.

Please find attached v9 that:

- Acquire the lock and check for object existence at an upper level, means 
before
calling recordDependencyOn() and recordMultipleDependencies().

- Get rid of the SNAPSHOT_SELF snapshot usage and relies on
CommandCounterIncrement() instead to ensure new entries are visible when
we check for object existence (for the cases where we create additional object
behind the scene: like composite type while creating a relation).

- Add an assertion in recordMultipleDependencies() to ensure that we locked the
object before recording the dependency (to ensure we don't miss any cases now 
that
the lock is acquired at an upper level).

A few remarks:

My first attempt has been to move eliminate_duplicate_dependencies() out of
record_object_address_dependencies() so that we get the calls in this order:

eliminate_duplicate_dependencies()
depLockAndCheckObjects()
record_object_address_dependencies()

What I'm doing instead in v9 is to rename record_object_address_dependencies()
to lock_record_object_address_dependencies() and add depLockAndCheckObjects()
in it at the right place. That way the caller of
[lock_]record_object_address_dependencies() is not responsible of calling
eliminate_duplicate_dependencies() (which would have been the case with my first
attempt).

We need to setup the LOCKTAG before calling the new Assert in
recordMultipleDependencies(). So, using "#ifdef USE_ASSERT_CHECKING" here to
not setup the LOCKTAG on a non Assert enabled build.

v9 is more invasive (as it changes code in much more places) than v8 but it is
easier to follow (as it is now clear where the new lock is acquired).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 6b6ee40fab0b011e9858a0a25624935c59732bfd Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 29 Mar 2024 15:43:26 +
Subject: [PATCH v9] Avoid orphaned objects dependencies

It's currently possible to create orphaned objects dependencies, for example:

Scenario 1:

session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;

With the above, the function created in session 2 would be linked to a non
existing schema.

Scenario 2:

session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;

With the above, the function created in session 1 would be linked to a non
existing schema.

To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP)
has been put in place when the dependencies are being recorded. With this in
place, the drop schema in scenario 2 would be locked.

Also, after the new lock attempt, the patch checks that the object still exists:
with this in place session 2 in scenario 1 would be locked and would report an
error once session 1 committs (that would not be the case should session 1 abort
the transaction).

If the object is dropped before the new lock attempt is triggered then the patch
would also report an error (but with less details).

The patch adds a few tests for some dependency cases (that would currently produce
orphaned objects):

- schema and function (as the above scenarios)
- alter a dependency (function and schema)
- function and arg type
- function and return type
- function and function
- domain and domain
- table and type
- server and foreign data wrapper
---
 src/backend/catalog/aclchk.c  |   1 +
 src/backend/catalog/dependency.c  |  83 +--
 src/backend/catalog/heap.c|   7 +-
 src/backend/catalog/index.c   |  16 ++-
 src/backend/catalog/objectaddress.c   |  57 
 src

Re: Logical Replication of sequences

2024-06-05 Thread Masahiko Sawada
On Wed, Jun 5, 2024 at 9:30 PM Amit Kapila  wrote:
>
> On Wed, Jun 5, 2024 at 12:51 PM Peter Eisentraut  wrote:
> >
> > On 04.06.24 12:57, Amit Kapila wrote:
> > > 2. Provide a command say Alter Subscription ...  Replicate Sequences
> > > (or something like that) which users can perform before shutdown of
> > > the publisher node during upgrade. This will allow copying all the
> > > sequences from the publisher node to the subscriber node directly.
> > > Similar to previous approach, this could also be inconvenient for
> > > users.
> >
> > I would start with this.  In any case, you're going to need to write
> > code to collect all the sequence values, send them over some protocol,
> > apply them on the subscriber.  The easiest way to start is to trigger
> > that manually.  Then later you can add other ways to trigger it, either
> > by timer or around shutdown, or whatever other ideas there might be.
> >
>
> Agreed.

+1

> To achieve this, we can allow sequences to be copied during
> the initial CREATE SUBSCRIPTION command similar to what we do for
> tables. And then later by new/existing command, we re-copy the already
> existing sequences on the subscriber.
>
> The options for the new command could be:
> Alter Subscription ... Refresh Sequences
> Alter Subscription ... Replicate Sequences
>
> In the second option, we need to introduce a new keyword Replicate.
> Can you think of any better option?

Another idea is doing that using options. For example,

For initial sequences synchronization:

CREATE SUBSCRIPTION ... WITH (copy_sequence = true);

For re-copy (or update) sequences:

ALTER SUBSCRIPTION ... REFRESH PUBLICATION WITH (copy_sequence = true);

>
> In addition to the above, the command Alter Subscription .. Refresh
> Publication will fetch any missing sequences similar to what it does
> for tables.

On the subscriber side, do we need to track which sequences are
created via CREATE/ALTER SUBSCRIPTION?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2024-06-05 Thread Ashutosh Bapat
On Wed, Jun 5, 2024 at 1:17 PM Ashutosh Bapat 
wrote:

> Hi David,
> Thanks for looking into this.
>
> On Fri, May 31, 2024 at 2:19 AM David Christensen 
> wrote:
>
>> Hello,
>>
>> I am looking through some outstanding CommitFest entries; I wonder if
>> this particular patch is already effectively fixed by 5278d0a2, which
>> is both attributed to the original author as well as an extremely
>> similar approach.  Can this entry
>> (https://commitfest.postgresql.org/48/4553/) be closed?
>>
>
> This is different. But it needs a rebase. PFA rebased patch. The revised
> commit message explains the change better.
>
> Here are numbers revised after 5278d0a2. Since the code changes only
> affect partitionwise join code path, I am reporting only partition wise
> join numbers. The first column reports the number of joining relations,
> each having 1000 partitions. Rest of the column names are self-explanatory.
>
> Planning memory used:
>  num_joins | master  | patched | memory saved | memory saved
> ---+-+-+--+
>  2 | 31 MB   | 30 MB   | 525 kB   |   1.68%
>  3 | 111 MB  | 107 MB  | 4588 kB  |   4.03%
>  4 | 339 MB  | 321 MB  | 17 MB|   5.13%
>  5 | 1062 MB | 967 MB  | 95 MB|   8.96%
>
> Here's planning time measurements.
>  num_joins | master (ms) | patched (ms) | change in planning time (ms) |
> change in planning time
>
> ---+-+--+--+---
>  2 |  187.86 |   177.27 |10.59 |
>5.64%
>  3 |  721.81 |   758.80 |   -36.99 |
>   -5.12%
>  4 | 2239.87 |  2236.19 | 3.68 |
>0.16%
>  5 | 6830.86 |  7027.76 |  -196.90 |
>   -2.88%
>

Here are some numbers with lower number of partitions per relation
For 100 partitions
Planning memory:
 num_joins | master  | patched | memory saved  | memory saved in percentage
---+-+-+---+
 2 | 2820 kB | 2812 kB | 8192.00 bytes |   0.28%
 3 | 9335 kB | 9270 kB | 65 kB |   0.70%
 4 | 27 MB   | 27 MB   | 247 kB|   0.90%
 5 | 78 MB   | 77 MB   | 1101 kB   |   1.37%

Planning time:
 num_joins | master (ms) | patched (ms) | change in planning time (ms) |
change in planning time as percentage
---+-+--+--+---
 2 |3.33 | 3.21 | 0.12 |
   3.60%
 3 |   11.40 |11.17 | 0.23 |
   2.02%
 4 |   37.61 |37.56 | 0.05 |
   0.13%
 5 |  124.39 |   126.08 |-1.69 |
  -1.36%

For 10 partitions
Planning memory:
 num_joins | master  | patched | memory saved  | memory saved in percentage
---+-+-+---+
 2 | 310 kB  | 310 kB  | 0.00 bytes|   0.00
 3 | 992 kB  | 989 kB  | 3072.00 bytes |   0.30
 4 | 2891 kB | 2883 kB | 8192.00 bytes |   0.28
 5 | 8317 kB | 8290 kB | 27 kB |   0.32

Planning time:
 num_joins | master (ms) | patched (ms) | change in planning time (ms) |
change in planning time as percentage
---+-+--+--+---
 2 |0.42 | 0.37 | 0.05 |
  11.90
 3 |1.08 | 1.09 |-0.01 |
  -0.93
 4 |3.56 | 3.32 | 0.24 |
   6.74
 5 |8.86 | 8.78 | 0.08 |
   0.90

Observations:
1. As expected the memory savings are smaller for a small number of
partitions, but they are not negative. Thus the patch helps in case of a
large number of partitions but does not adversely affect a small number of
partitions
2. Planning time changes are within noise. But usually I see that the
planning time varies a lot. Thus the numbers here just indicate that the
planning times are not adversely affected by the change. Theoretically, I
would expect the patch

Re: Fix use of possible uninitialized variable retval (src/pl/plpgsql/src/pl_handler.c)

2024-06-05 Thread Michael Paquier
On Wed, Jun 05, 2024 at 10:27:51PM +0800, Julien Rouhaud wrote:
> Note that I removed the -Werror from lapwing a long time ago, so at
> least this animal shouldn't lead to hackish fixes for false positive
> anymore.

That's good to know.  Thanks for the update.
--
Michael


signature.asc
Description: PGP signature


Re: Synchronizing slots from primary to standby

2024-06-05 Thread Peter Smith
Hi, here are some review comments for the docs patch v5-0001.

Apart from these it LGTM.

==
doc/src/sgml/logical-replication.sgml

1.
+
+ On the subscriber node, use the following SQL to identify which slots
+ should be synced to the standby that we plan to promote. This query will
+ return the relevant replication slots, including the main slots and table
+ synchronization slots associated with the failover enabled subscriptions.

/failover enabled/failover-enabled/

~~~

2.
+  
+   If all the slots are present on the standby server and result
+   (failover_ready) of is true, then existing subscriptions
+   can continue subscribing to publications now on the new primary server
+   without any loss of data.
+  

Hmm. It looks like there is some typo or missing words here: "of is true".

Did you mean something like: "of the above SQL query is true"?

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Logical Replication of sequences

2024-06-05 Thread Amit Kapila
On Wed, Jun 5, 2024 at 3:17 PM Bharath Rupireddy
 wrote:
>
> On Tue, Jun 4, 2024 at 5:40 PM Amit Kapila  wrote:
> >
> > Even if we decode it periodically (say each time we decode the
> > checkpoint record) then also we need to send the entire set of
> > sequences at shutdown. This is because the sequences may have changed
> > from the last time we sent them.
>
> Agree. How about decoding and sending only the sequences that are
> changed from the last time when they were sent? I know it requires a
> bit of tracking and more work, but all I'm looking for is to reduce
> the amount of work that walsenders need to do during the shutdown.
>

I see your point but going towards tracking the changed sequences
sounds like moving towards what we do for incremental backups unless
we can invent some other smart way.

> Having said that, I like the idea of letting the user sync the
> sequences via ALTER SUBSCRIPTION command and not weave the logic into
> the shutdown checkpoint path. As Peter Eisentraut said here
> https://www.postgresql.org/message-id/42e5cb35-4aeb-4f58-8091-90619c7c3ecc%40eisentraut.org,
> this can be a good starting point to get going.
>

Agreed.

> > > I can imagine a case where there are tens
> > > of thousands of sequences in a production server, and surely decoding
> > > and sending them just during the shutdown can take a lot of time
> > > hampering the overall server uptime.
> >
> > It is possible but we will send only the sequences that belong to
> > publications for which walsender is supposed to send the required
> > data.
>
> Right, but what if all the publication tables can have tens of
> thousands of sequences.
>

In such cases we have no option but to send all the sequences.

> > Now, we can also imagine providing option 2 (Alter Subscription
> > ... Replicate Sequences) so that users can replicate sequences before
> > shutdown and then disable the subscriptions so that there won't be a
> > corresponding walsender.
>
> As stated above, I like this idea to start with.
>

+1.


--
With Regards,
Amit Kapila.




Re: Logical Replication of sequences

2024-06-05 Thread Masahiko Sawada
On Wed, Jun 5, 2024 at 12:43 PM Amit Kapila  wrote:
>
> On Tue, Jun 4, 2024 at 8:56 PM Yogesh Sharma
>  wrote:
> >
> > On 6/4/24 06:57, Amit Kapila wrote:
> >
> > > 2. Provide a command say Alter Subscription ...  Replicate Sequences
> > > (or something like that) which users can perform before shutdown of
> > > the publisher node during upgrade. This will allow copying all the
> > > sequences from the publisher node to the subscriber node directly.
> > > Similar to previous approach, this could also be inconvenient for
> > > users.
> >
> > This is similar to option 1 except that it is a SQL command now.
> >
>
> Right, but I would still prefer a command as it provides clear steps
> for the upgrade. Users need to perform (a) Replicate Sequences for a
> particular subscription (b) Disable that subscription (c) Perform (a)
> and (b) for all the subscriptions corresponding to the publisher we
> want to shut down for upgrade.
>
> I agree there are some manual steps involved here but it is advisable
> for users to ensure that they have received the required data on the
> subscriber before the upgrade of the publisher node, otherwise, they
> may not be able to continue replication after the upgrade. For
> example, see the "Prepare for publisher upgrades" step in pg_upgrade
> docs [1].
>
> >
> > > 3. Replicate published sequences via walsender at the time of shutdown
> > > or incrementally while decoding checkpoint record. The two ways to
> > > achieve this are: (a) WAL log a special NOOP record just before
> > > shutting down checkpointer. Then allow the WALsender to read the
> > > sequence data and send it to the subscriber while decoding the new
> > > NOOP record. (b) Similar to the previous idea but instead of WAL
> > > logging a new record directly invokes a decoding callback after
> > > walsender receives a request to shutdown which will allow pgoutput to
> > > read and send required sequences. This approach has a drawback that we
> > > are adding more work at the time of shutdown but note that we already
> > > waits for all the WAL records to be decoded and sent before shutting
> > > down the walsender during shutdown of the node.
> >
> > At the time of shutdown a) most logical upgrades don't necessarily call
> > for shutdown
> >
>
> Won't the major version upgrade expect that the node is down? Refer to
> step "Stop both servers" in [1].

I think the idea is that the publisher is the old version and the
subscriber is the new version, and changes generated on the publisher
are replicated to the subscriber via logical replication. And at some
point, we change the application (or a router) settings so that no
more transactions come to the publisher, do the last upgrade
preparation work (e.g. copying the latest sequence values if
requried), and then change the application so that new transactions
come to the subscriber.

I remember the blog post about Knock doing a similar process to
upgrade the clusters with minimal downtime[1].

Regards,

[1] https://knock.app/blog/zero-downtime-postgres-upgrades


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




RE: Synchronizing slots from primary to standby

2024-06-05 Thread Zhijie Hou (Fujitsu)
On Wednesday, June 5, 2024 2:32 PM Peter Smith  wrote:
 
> Hi. Here are some minor review comments for the docs patch v4-0001.

Thanks for the comments!

> The SGML file wrapping can be fixed to fill up to 80 cols for some of the
> paragraphs.

Unlike comments in C code, I think we don't force the 80 cols limit in doc
file unless it's too long to read. I checked the doc once and think it's
OK.

Here is the V5 patch which addressed Peter's comments and Amit's comments[1].

[1] 
https://www.postgresql.org/message-id/CAA4eK1%2Bq1MYGgF3-LZCj6Xd0idujnjbTsfk-RqU%2BC51wYGaD5g%40mail.gmail.com

Best Regards,
Hou zj


v5-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Description:  v5-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch


Re: Logical Replication of sequences

2024-06-05 Thread Amit Kapila
On Wed, Jun 5, 2024 at 6:01 PM Ashutosh Bapat
 wrote:
>
> On Wed, Jun 5, 2024 at 8:45 AM Amit Kapila  wrote:
>>
>>  How about periodically sending this information?
>> >
>>
>> Now, if we want to support some sort of failover then probably this
>> will help. Do you have that use case in mind?
>
>
> Regular failover was a goal for supporting logical replication of sequences. 
> That might be more common than major upgrade scenario.
>

We can't support regular failovers to subscribers unless we can
replicate/copy slots because the existing nodes connected to the
current publisher/primary would expect that. It should be primarily
useful for major version upgrades at this stage.

>>
>> If we want to send
>> periodically then we can do it when decoding checkpoint
>> (XLOG_CHECKPOINT_ONLINE) or some other periodic WAL record like
>> running_xacts (XLOG_RUNNING_XACTS).
>>
>
> Yeah. I am thinking along those lines.
>
> It must be noted, however, that none of those optional make sure that the 
> replicated sequence's states are consistent with the replicated object state 
> which use those sequences.
>

Right, I feel as others are advocating, it seems better to support it
manually via command and then later we can extend it to do at shutdown
or at some regular intervals. If we do that then we should be able to
support major version upgrades and planned switchover.


-- 
With Regards,
Amit Kapila.




Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-05 Thread Greg Sabino Mullane
On Wed, Jun 5, 2024 at 9:03 PM Robert Haas  wrote:

>  It's a funny use of "max" and "min", because the max is really what we're
> trying to do and the min is what we end
> up with, and those terms don't necessarily bring those ideas to mind.


requested_protocol_version and minimum_protocol_version?


Re: Better error message when --single is not the first arg to postgres executable

2024-06-05 Thread Greg Sabino Mullane
On Wed, Jun 5, 2024 at 3:18 PM Nathan Bossart 
wrote:

> Could we remove the requirement that --single must be first?  I'm not
> thrilled about adding a list of "must be first" options that needs to stay
> updated, but given this list probably doesn't change too frequently, maybe
> that's still better than a more invasive patch to allow specifying these
> options in any order...
>

It would be nice, and I briefly looked into removing the "first"
requirement, but src/backend/tcop/postgres.c for one assumes that --single
is always argv[1], and it seemed not worth the extra effort to make it work
for argv[N] instead of argv[1]. I don't mind it being the first argument,
but that confusing error message needs to go.

Thanks,
Greg


RE: Pgoutput not capturing the generated columns

2024-06-05 Thread Hayato Kuroda (Fujitsu)
Dear Shlok and Shubham,

Thanks for updating the patch!

I briefly checked the v5-0002. IIUC, your patch allows to copy generated
columns unconditionally. I think the behavior affects many people so that it is
hard to get agreement.

Can we add a new option like `GENERATED_COLUMNS [boolean]`? If the default is 
set
to off, we can keep the current specification.

Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: [multithreading] extension compatibility

2024-06-05 Thread Robert Haas
On Wed, Jun 5, 2024 at 10:09 PM Andres Freund  wrote:
> Maybe. I think shipping a mode where users can fairly simply toggle between
> threaded and process mode will allow us to get this stable a *lot* quicker
> than if we distribute two builds. Most users don't build from source, distros
> will have to pick the mode. If they don't choose threaded mode, we'll not find
> problems. If they do choose threaded mode, we can't ask users to switch to a
> process based mode to check if the problem is related.

I don't believe that being coercive here is the right approach. I
think distros see the value in compiling with as many things turned on
as possible; when they ship with something turned off, it's because
it's unstable or introduces weird dependencies or has some other
disadvantage.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [multithreading] extension compatibility

2024-06-05 Thread Andres Freund
Hi,

On 2024-06-05 21:59:42 -0400, Robert Haas wrote:
> On Wed, Jun 5, 2024 at 9:50 PM Andres Freund  wrote:
> > Depending on the architecture / ABI / compiler options it's often not
> > meaningfully more expensive to access a thread local variable than a 
> > "normal"
> > variable.
> >
> > I think these days it's e.g. more expensive on x86-64 windows, but not
> > linux. On arm the overhead of TLS is more noticeable, across platforms,
> > afaict.
> 
> I mean, to me, this still sounds like we want multithreading to be a
> build-time option.

Maybe. I think shipping a mode where users can fairly simply toggle between
threaded and process mode will allow us to get this stable a *lot* quicker
than if we distribute two builds. Most users don't build from source, distros
will have to pick the mode. If they don't choose threaded mode, we'll not find
problems. If they do choose threaded mode, we can't ask users to switch to a
process based mode to check if the problem is related.

We have been talking in a bunch of threads about having a mode where the main
postgres binary chooses a build optimized for the current architecture, to be
able to use SIMD instructions without a runtime check/dispatch. I guess we
could add threadedness to such a matrix...

Greetings,

Andres Freund




Re: [multithreading] extension compatibility

2024-06-05 Thread Robert Haas
On Wed, Jun 5, 2024 at 9:50 PM Andres Freund  wrote:
> Depending on the architecture / ABI / compiler options it's often not
> meaningfully more expensive to access a thread local variable than a "normal"
> variable.
>
> I think these days it's e.g. more expensive on x86-64 windows, but not
> linux. On arm the overhead of TLS is more noticeable, across platforms,
> afaict.

I mean, to me, this still sounds like we want multithreading to be a
build-time option.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [multithreading] extension compatibility

2024-06-05 Thread Andres Freund
Hi,

On 2024-06-05 21:10:01 -0400, Robert Haas wrote:
> On Wed, Jun 5, 2024 at 8:01 PM Heikki Linnakangas  wrote:
> > I'm very much in favor of a runtime toggle. To be precise, a
> > PGC_POSTMASTER setting. We'll get a lot more testing if you can easily
> > turn it on/off, and so far I haven't seen anything that would require it
> > to be a compile time option.
> 
> I was thinking about global variable annotations. If someone wants to
> build without multithreading, I think that they won't want to still
> end up with a ton of variables being changed to thread-local.

Depending on the architecture / ABI / compiler options it's often not
meaningfully more expensive to access a thread local variable than a "normal"
variable.

I think these days it's e.g. more expensive on x86-64 windows, but not
linux. On arm the overhead of TLS is more noticeable, across platforms,
afaict.

Example compiler output for x86-64 and armv8:
https://godbolt.org/z/K369eG5MM

Cycle analysis or linux x86-64 output:
https://godbolt.org/z/KK57vM1of

This shows that for the linux x86-64 case there's no difference in efficiency
between the tls/non-tls case.

The reason it's so fast on x86-64 linux is that they reused one of the "old"
segment registers to serve as the index register differing between each
thread.  For x86-64 code, most code is compiled position independent, and
*also* uses an indexed mode (but relative to the instruction pointer).


I think we might be able to gain some small performance benefits via the
annotations, which actualy might make it viable to just apply the annotations
regardless of using threads or not.


Greetings,

Andres Freund




Re: ssl tests fail due to TCP port conflict

2024-06-05 Thread Andrew Dunstan



On 2024-06-05 We 17:37, Tom Lane wrote:

Andrew Dunstan  writes:

On 2024-06-05 We 16:00, Alexander Lakhin wrote:

That is, psql from the test instance 001_ssltests_34 opened a
connection to
the test server with the client port 50072 and it made using the port by
the server from the test instance 001_ssltests_30 impossible.

Oh. (kicks self)

D'oh.


Should we really be allocating ephemeral server ports in the range
41952..65535? Maybe we should be looking for an unallocated port
somewhere below 41952, and above, say, 32767, so we couldn't have a
client socket collision.

Hmm, are there really any standards about how these port numbers
are used?

I wonder if we don't need to just be prepared to retry the whole
thing a few times.  Even if it's true that "clients" shouldn't
choose ports below 41952, we still have a small chance of failure
against a non-Postgres server starting up at the wrong time.



Yeah, I think you're right. One thing we should do is be careful to use 
the port as soon as possible after we have picked it, to reduce the 
possibility that something else will use it first.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [multithreading] extension compatibility

2024-06-05 Thread Robert Haas
On Wed, Jun 5, 2024 at 8:01 PM Heikki Linnakangas  wrote:
> I'm very much in favor of a runtime toggle. To be precise, a
> PGC_POSTMASTER setting. We'll get a lot more testing if you can easily
> turn it on/off, and so far I haven't seen anything that would require it
> to be a compile time option.

I was thinking about global variable annotations. If someone wants to
build without multithreading, I think that they won't want to still
end up with a ton of variables being changed to thread-local. So I
think there has to be a build-time option controlling whether this
build supports threading. I suspect there will be other people who
want to just shut all of this experimental code off, which is probably
going to be a second driver for a build-time toggle. But even with
that, we can still have a GUC controlling whether threading is
actually used. Does that make sense to you?

Supposing it does, then how does the extension-marking system need to
work? I suppose in this world we don't want any build failures: you're
allowed to build a non-thread-aware extension against a
threading-capable PostgreSQL; you're just not allowed to load the
resulting extension when the server is in threading mode.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-05 Thread Robert Haas
On Wed, Jun 5, 2024 at 5:16 PM Jelte Fennema-Nio  wrote:
> I agree longcancelkey=true is not what we want. In my patch 0004, you
> can specify max_protocol_version=latest to use the latest protocol
> version as opt-in. This is a future proof version of
> protocolversion=3.1 that you're proposing, because it will
> automatically start using 3.2 when it comes out. So I think that
> solves your concern here. (although maybe it should be called
> latest-3.x or something, in case we ever want to add a 4.0 protocol,
> naming is hard)
>
> I personally quite like the max_protocol_version connection parameter.
> I think even for testing it is pretty useful to tell libpq what
> protocol version to try to connect as. It could even be accompanied
> with a min_protocol_version, e.g. in case you only want the connection
> attempt to fail when the server does not support this more secure
> cancel key.

This makes some sense to me. I don't think that I believe
max_protocol_version=latest is future-proof: just because I want to
opt into this round of new features doesn't mean I feel the same way
about the next round. But it may still be a good idea.

I suppose the semantics are that we try to connect with the version
specified by max_protocol_version and, if we get downgraded by the
server, we abort if we end up below min_protocol_version. I like those
semantics, and I think I also like having both parameters, but I'm not
100% sure of the naming. It's a funny use of "max" and "min", because
the max is really what we're trying to do and the min is what we end
up with, and those terms don't necessarily bring those ideas to mind.
I don't have a better idea off-hand, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: First draft of PG 17 release notes

2024-06-05 Thread Bruce Momjian
On Wed, Jun  5, 2024 at 11:46:17PM +0100, Dean Rasheed wrote:
> On Thu, 9 May 2024 at 05:03, Bruce Momjian  wrote:
> >
> > I have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> >
> > https://momjian.us/pgsql_docs/release-17.html
> 
> I noticed a couple more things. This item:
> 
>   Add functions to convert integers to hex and binary strings
> 
> should read:
> 
>   Add functions to convert integers to binary and octal strings
> 
> 
> The "Improve psql tab completion" item should include this commit:
> 
> Author: Michael Paquier 
> 2024-05-01 [2800fbb2b] Add tab completion for EXPLAIN (MEMORY|SERIALIZE)
> 
> and credit Jian He.

Agreed, attached patch applied.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index 1131f2aab51..93bc7408b5b 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -1685,7 +1685,7 @@ Author: Nathan Bossart 
 
 
 
-Add functions to convert integers to hex and binary strings (Eric Radman, Nathan Bossart)
+Add functions to convert integers to binary and octal strings (Eric Radman, Nathan Bossart)
 
 
 
@@ -2000,11 +2000,13 @@ Author: Masahiko Sawada 
 2024-04-08 [304b6b1a6] Add more tab completion support for ALTER DEFAULT PRIVIL
 Author: Alexander Korotkov 
 2024-04-30 [60ae37a8b] Add tab completion for partition MERGE/SPLIT operations
+Author: Michael Paquier 
+2024-05-01 [2800fbb2b] Add tab completion for EXPLAIN (MEMORY|SERIALIZE)
 -->
 
 
 
-Improve psql tab completion (Dagfinn Ilmari Mannsåker, Gilles Darold, Christoph Heiss, Steve Chavez, Vignesh C, Pavel Borisov)
+Improve psql tab completion (Dagfinn Ilmari Mannsåker, Gilles Darold, Christoph Heiss, Steve Chavez, Vignesh C, Pavel Borisov, Jian He)
 
 
 


Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-06-05 Thread Michael Paquier
On Tue, May 21, 2024 at 08:33:51AM -0500, Justin Pryzby wrote:
> It occurred to me that psql \dP+ should show the AM of partitioned
> tables (and other partitioned rels).
> Arguably, this could've been done when \dP was introduced in v12, but
> at that point would've shown the AM only for partitioned indexes.
> But it makes a lot of sense to do it now that partitioned tables support
> AMs.  I suggest to consider this for v17.

Not sure that this is a must-have.  It is nice to have, but extra
information is a new feature IMO.  Any extra opinions?

I would suggest to attach a patch, that makes review easier.  And so
here is one.
--
Michael
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index f67bf0b892..7c9a1f234c 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4113,7 +4113,7 @@ listPartitionedTables(const char *reltypes, const char *pattern, bool verbose)
 	PQExpBufferData title;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
-	bool		translate_columns[] = {false, false, false, false, false, false, false, false, false};
+	bool		translate_columns[] = {false, false, false, false, false, false, false, false, false, false};
 	const char *tabletitle;
 	bool		mixed_output = false;
 
@@ -4181,6 +4181,13 @@ listPartitionedTables(const char *reltypes, const char *pattern, bool verbose)
 
 	if (verbose)
 	{
+		/*
+		 * Table access methods were introduced in v12, and can be set on
+		 * partitioned tables since v17.
+		 */
+		appendPQExpBuffer(&buf, ",\n  am.amname as \"%s\"",
+		  gettext_noop("Access method"));
+
 		if (showNested)
 		{
 			appendPQExpBuffer(&buf,
@@ -4216,6 +4223,9 @@ listPartitionedTables(const char *reltypes, const char *pattern, bool verbose)
 
 	if (verbose)
 	{
+		appendPQExpBufferStr(&buf,
+			 "\n LEFT JOIN pg_catalog.pg_am am ON c.relam = am.oid");
+
 		if (pset.sversion < 12)
 		{
 			appendPQExpBufferStr(&buf,


signature.asc
Description: PGP signature


tiny step toward threading: reduce dependence on setlocale()

2024-06-05 Thread Jeff Davis

There was an unconference session at pgconf.dev related to threading
support. One of the problems identified was setlocale().

The attached series of patches make collation not depend on
setlocale(), even if the database collation uses the libc provider.

Since commit 8d9a9f034e, all supported platforms have locale_t, so we
can use strcoll_l(), etc., or uselocale() when no "_l" variant is
available.

A brief test shows that there may be a performance regression for libc
default collations. But if so, I'm not sure that's avoidable if the
goal is to take away setlocale. I'll see if removing the extra branches
mitigates it.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 9f50c24878740acdd3f1bc036442a0fcc0ea1a5e Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 5 Jun 2024 11:45:55 -0700
Subject: [PATCH v1 1/5] Make database default collation internal to
 pg_locale.c.

---
 src/backend/utils/adt/pg_locale.c | 64 ++-
 src/backend/utils/init/postinit.c | 35 ++---
 src/include/utils/pg_locale.h |  3 +-
 3 files changed, 66 insertions(+), 36 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 7e5bb2b703..29f16c49cb 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -56,6 +56,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_database.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
@@ -116,6 +117,8 @@ char	   *localized_full_months[12 + 1];
 /* is the databases's LC_CTYPE the C locale? */
 bool		database_ctype_is_c = false;
 
+static struct pg_locale_struct default_locale;
+
 /* indicates whether locale information cache is valid */
 static bool CurrentLocaleConvValid = false;
 static bool CurrentLCTimeValid = false;
@@ -1443,8 +1446,6 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
-struct pg_locale_struct default_locale;
-
 void
 make_icu_collator(const char *iculocstr,
   const char *icurules,
@@ -1537,6 +1538,65 @@ pg_locale_deterministic(pg_locale_t locale)
 		return locale->deterministic;
 }
 
+void
+pg_init_database_collation()
+{
+	HeapTuple	tup;
+	Form_pg_database dbform;
+	Datum		datum;
+	bool		isnull;
+
+	/* Fetch our pg_database row normally, via syscache */
+	tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+	if (!HeapTupleIsValid(tup))
+		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+	dbform = (Form_pg_database) GETSTRUCT(tup);
+
+	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
+	{
+		char	*datlocale;
+
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
+		datlocale = TextDatumGetCString(datum);
+
+		builtin_validate_locale(dbform->encoding, datlocale);
+
+		default_locale.info.builtin.locale = MemoryContextStrdup(
+ TopMemoryContext, datlocale);
+	}
+	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
+	{
+		char	*datlocale;
+		char	*icurules;
+
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
+		datlocale = TextDatumGetCString(datum);
+
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		make_icu_collator(datlocale, icurules, &default_locale);
+	}
+	else
+	{
+		Assert(dbform->datlocprovider == COLLPROVIDER_LIBC);
+	}
+
+	default_locale.provider = dbform->datlocprovider;
+
+	/*
+	 * Default locale is currently always deterministic.  Nondeterministic
+	 * locales currently don't support pattern matching, which would break a
+	 * lot of things if applied globally.
+	 */
+	default_locale.deterministic = true;
+
+	ReleaseSysCache(tup);
+}
+
 /*
  * Create a locale_t from a collation OID.  Results are cached for the
  * lifetime of the backend.  Thus, do not free the result with freelocale().
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 0805398e24..6347efdd5a 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -423,43 +423,14 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 		strcmp(ctype, "POSIX") == 0)
 		database_ctype_is_c = true;
 
-	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-	{
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		builtin_validate_locale(dbform->encoding, datlocale);
-
-		default_locale.info.builtin.locale = MemoryContextStrdup(
- TopMemoryContext, datlocale);
-	}
-	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-	{
-		char	   *icurules;
+	pg_init_database_collation();
 
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
+	datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datlocale, &isnull);
+	if

Re: [multithreading] extension compatibility

2024-06-05 Thread Heikki Linnakangas

On 05/06/2024 23:55, Robert Haas wrote:

On Wed, Jun 5, 2024 at 4:32 PM Tristan Partin  wrote:

Not entirely sure how I feel about the approach you've taken, but here
is a patch that Heikki and I put together for extension compatibility.
It's not a build time solution, but a runtime solution. Instead of
PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
is a GUC called `multithreaded` which controls the variable
IsMultithreaded. We operated under the assumption that being able to
toggle multithreading and multi-processing without recompiling has
value.


That's interesting, because I thought Heikki was against having a
runtime toggle.


I'm very much in favor of a runtime toggle. To be precise, a 
PGC_POSTMASTER setting. We'll get a lot more testing if you can easily 
turn it on/off, and so far I haven't seen anything that would require it 
to be a compile time option.



I don't think PG_MODULE_MAGIC_REENTRANT is a good syntax. It all looks
great as long as we only ever need the PG_MODULE_MAGIC line to signal
one bit of information, but as soon as you get to two bits it's pretty
questionable, and anything more than two bits is insane. If we want to
do something with the PG_MODULE_MAGIC line, I think it should involve
options-passing of some form rather than just having an alternate
macro name.


+1, that would be nicer.

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Test to dump and restore objects left behind by regression

2024-06-05 Thread Michael Paquier
On Wed, Jun 05, 2024 at 05:09:58PM +0530, Ashutosh Bapat wrote:
> Thanks for the suggestion. I didn't understand the dependency with the
> buildfarm module. Will the new module be used in buildfarm separately? I
> will work on this soon. Thanks for changing CF entry to WoA.

I had some doubts about PGBuild/Modules/TestUpgradeXversion.pm, but
after double-checking it loads dynamically AdjustUpgrade from the core
tree based on the base path where all the modules are:
# load helper module from source tree
unshift(@INC, "$srcdir/src/test/perl");
require PostgreSQL::Test::AdjustUpgrade;
PostgreSQL::Test::AdjustUpgrade->import;
shift(@INC);

It would be annoying to tweak the buildfarm code more to have a
different behavior depending on the branch of Postgres tested.
Anyway, from what I can see, you could create a new module with the
dump filtering rules that AdjustUpgrade requires without having to
update the buildfarm code.

> Changing the physical order of column of a child table based on the
> inherited table seems intentional as per MergeAttributes(). That logic
> looks sane by itself. In binary mode pg_dump works very hard to retain the
> column order by issuing UPDATE commands against catalog tables. I don't
> think mimicking that behaviour is the right choice for non-binary dump. I
> agree with your conclusion that we fix it in by fixing the diffs. The code
> to do that will be part of a separate module.

Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: First draft of PG 17 release notes

2024-06-05 Thread Dean Rasheed
On Thu, 9 May 2024 at 05:03, Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html

I noticed a couple more things. This item:

  Add functions to convert integers to hex and binary strings

should read:

  Add functions to convert integers to binary and octal strings


The "Improve psql tab completion" item should include this commit:

Author: Michael Paquier 
2024-05-01 [2800fbb2b] Add tab completion for EXPLAIN (MEMORY|SERIALIZE)

and credit Jian He.

Regards,
Dean




Re: [multithreading] extension compatibility

2024-06-05 Thread Tristan Partin

On Wed Jun 5, 2024 at 3:56 PM CDT, Robert Haas wrote:

On Wed, Jun 5, 2024 at 4:32 PM Tristan Partin  wrote:
> Not entirely sure how I feel about the approach you've taken, but here
> is a patch that Heikki and I put together for extension compatibility.
> It's not a build time solution, but a runtime solution. Instead of
> PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
> is a GUC called `multithreaded` which controls the variable
> IsMultithreaded. We operated under the assumption that being able to
> toggle multithreading and multi-processing without recompiling has
> value.

That's interesting, because I thought Heikki was against having a
runtime toggle.

I don't think PG_MODULE_MAGIC_REENTRANT is a good syntax. It all looks
great as long as we only ever need the PG_MODULE_MAGIC line to signal
one bit of information, but as soon as you get to two bits it's pretty
questionable, and anything more than two bits is insane. If we want to
do something with the PG_MODULE_MAGIC line, I think it should involve
options-passing of some form rather than just having an alternate
macro name.


I agree that this method doesn't lend itself to future extensibility.

--
Tristan Partin
https://tristan.partin.io




Re: ssl tests fail due to TCP port conflict

2024-06-05 Thread Tom Lane
Andrew Dunstan  writes:
> On 2024-06-05 We 16:00, Alexander Lakhin wrote:
>> That is, psql from the test instance 001_ssltests_34 opened a 
>> connection to
>> the test server with the client port 50072 and it made using the port by
>> the server from the test instance 001_ssltests_30 impossible.

> Oh. (kicks self)

D'oh.

> Should we really be allocating ephemeral server ports in the range 
> 41952..65535? Maybe we should be looking for an unallocated port 
> somewhere below 41952, and above, say, 32767, so we couldn't have a 
> client socket collision.

Hmm, are there really any standards about how these port numbers
are used?

I wonder if we don't need to just be prepared to retry the whole
thing a few times.  Even if it's true that "clients" shouldn't
choose ports below 41952, we still have a small chance of failure
against a non-Postgres server starting up at the wrong time.

regards, tom lane




Re: ssl tests fail due to TCP port conflict

2024-06-05 Thread Andrew Dunstan



On 2024-06-05 We 16:49, Andrew Dunstan wrote:


On 2024-06-05 We 16:00, Alexander Lakhin wrote:

Hello Andrew,

05.06.2024 21:10, Andrew Dunstan wrote:


I think I see what's going on here. It looks like it's because we 
start the server in unix socket mode, and then switch to using TCP 
as well.


Can you try your test with this patch applied and see if the problem 
persists? If we start in TCP mode the framework should test for a 
port clash.




It seems that the failure rate decreased (I guess the patch rules out 
the

case with two servers choosing the same port), but I still got:

16/53 postgresql:ssl / ssl/001_ssltests_36 OK 15.25s 205 
subtests passed
17/53 postgresql:ssl / ssl/001_ssltests_30 ERROR 3.17s (exit 
status 255 or signal 127 SIGinvalid)


2024-06-05 19:40:37.395 UTC [414110] LOG:  starting PostgreSQL 
17beta1 on x86_64-linux, compiled by gcc-13.2.1, 64-bit
2024-06-05 19:40:37.395 UTC [414110] LOG:  could not bind IPv4 
address "127.0.0.1": Address already in use
2024-06-05 19:40:37.395 UTC [414110] HINT:  Is another postmaster 
already running on port 50072? If not, wait a few seconds and retry.


`grep '\b50072\b' -r testrun/` yields:
testrun/ssl/001_ssltests_34/log/001_ssltests_34_primary.log:2024-06-05 
19:40:37.392 UTC [414111] [unknown] LOG:  connection received: 
host=localhost port=50072

(a psql case)

That is, psql from the test instance 001_ssltests_34 opened a 
connection to

the test server with the client port 50072 and it made using the port by
the server from the test instance 001_ssltests_30 impossible.



Oh. (kicks self)

Should we really be allocating ephemeral server ports in the range 
41952..65535? Maybe we should be looking for an unallocated port 
somewhere below 41952, and above, say, 32767, so we couldn't have a 
client socket collision.



Except that I see this on my Ubuntu instance:

$ sudo sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768    60999

And indeed I see client sockets in that range.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-05 Thread Jelte Fennema-Nio
On Wed, 5 Jun 2024 at 22:48, Robert Haas  wrote:
> I agree that we need such a mechanism, but if the driving feature is
> cancel-key length, I expect that opt-in isn't going to work very well.
> Opt-in seems like it would work well with compression or transparent
> column encryption, but few users will specify a connection string
> option just to get a longer cancel key length, so the feature won't
> get much use if we do it that way.

I know Neon wants to make use of this for their proxy (to encode some
tenant_id into the key). So they might want to require people to
opt-in when using their proxy.

> I won't be crushed if we decide to
> somehow make it opt-in, but I kind of doubt that will happen.

> Would we
> make everyone add longcancelkey=true to all their connection strings
> for one release and then carry that connection parameter until the
> heat death of the universe even though after the 1-release transition
> period there would be no reason to ever use it? Would we rip the
> parameter back out after the transition period and break existing
> connection strings? Would we have people write protocolversion=3.1 to
> opt in and then they could just keep that in the connection string
> without harm, or at least without harm until 3.2 comes out? I don't
> really like any of these options that well.

I agree longcancelkey=true is not what we want. In my patch 0004, you
can specify max_protocol_version=latest to use the latest protocol
version as opt-in. This is a future proof version of
protocolversion=3.1 that you're proposing, because it will
automatically start using 3.2 when it comes out. So I think that
solves your concern here. (although maybe it should be called
latest-3.x or something, in case we ever want to add a 4.0 protocol,
naming is hard)

I personally quite like the max_protocol_version connection parameter.
I think even for testing it is pretty useful to tell libpq what
protocol version to try to connect as. It could even be accompanied
with a min_protocol_version, e.g. in case you only want the connection
attempt to fail when the server does not support this more secure
cancel key.




Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions

2024-06-05 Thread Jeff Davis
On Wed, 2024-06-05 at 14:36 +0530, Ashutosh Sharma wrote:
> Thank you, Ashutosh, for the quick response. I've drafted a patch
> aimed at addressing this issue. The patch attempts to solve this
> issue by configuring the search_path for all security definer
> functions created by the extension.

I like the general direction you propose, but I think it needs more
discussion about the details.

* What exactly is the right search_path for a function defined in an
extension?

* Do we need a new magic search_path value of "$extension_schema" that
resolves to the extension's schema, so that it can handle ALTER
EXTENSION ... SET SCHEMA?

* What do we do for functions that want the current behavior and how do
we handle migration issues?

* What about SECURITY INVOKER functions? Those can still be vulnerable
to manipulation by the caller by setting search_path, which can cause
an incorrect value to be returned. That can matter in some contexts
like a CHECK constraint.

Regards,
Jeff Davis





Re: SQL:2011 application time

2024-06-05 Thread Paul Jungwirth

On Thu, May 9, 2024 at 5:44 PM Matthias van de Meent 
 wrote:
> Additionally, because I can't create my own non-constraint-backing
> unique GIST indexes, I can't pre-create my unique constraints
> CONCURRENTLY as one could do for the non-temporal case

We talked about this a bit at pgconf.dev. I would like to implement it, since I agree it is an 
important workflow to support. Here are some thoughts about what would need to be done.


First we could take a small step: allow non-temporal UNIQUE GiST indexes. This is possible according 
to [1], but in the past we had no way of knowing which strategy number an opclass was using for 
equality. With the stratnum support proc introduced by 6db4598fcb (reverted for v17), we could 
change amcanunique to true for the GiST AM handler. If the index's opclasses had that sproc and it 
gave non-zero for RTEqualStrategyNumber, we would have a reliable "definition of uniqueness". UNIQUE 
GiST indexes would raise an error if they detected a duplicate record.


Incidentally, this would also let us correct the error message about GiST not supporting unique, 
fixing the problem you raised here:


On Sun, May 12, 2024 at 8:51 AM Paul Jungwirth  
wrote:
>
> On 5/12/24 05:55, Matthias van de Meent wrote:
> >>   > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, 
valid_during);
> >>   > ERROR:  access method "gist" does not support unique indexes
> >>
> >> To me that error message seems correct. The programmer hasn't said 
anything about the special
> >> temporal behavior they are looking for.
> >
> > But I showed that I had a GIST index that does have the indisunique
> > flag set, which shows that GIST does support indexes with unique
> > semantics.
> >
> > That I can't use CREATE UNIQUE INDEX to create such an index doesn't
> > mean the feature doesn't exist, which is what the error message
> > implies.
>
> True, the error message is not really telling the truth anymore.

But that is just regular non-temporal indexes. To avoid a long table lock you'd need a way to build 
the index that is not just unique, but also does exclusion based on &&.  We could borrow syntax from 
SQL:2011 and allow `CREATE INDEX idx ON t (id, valid_at WITHOUT OVERLAPS)`. But since CREATE INDEX 
is a lower-level concept than a constraint, it'd be better to do something more general. You can 
already give opclasses for each indexed column. How about allowing operators as well? For instance 
`CREATE UNIQUE INDEX idx ON t (id WITH =, valid_at WITH &&)`? Then the index would know to enforce 
those rules. This is the same data we store today in pg_constraint.conexclops. So that would get 
moved/copied to pg_index (probably moved).


Then when you add the constraint, what is the syntax? Today when you say PRIMARY KEY/UNIQUE USING 
INDEX, you don't give the column names. So how do we know it's WITHOUT OVERLAPS? I guess if the 
underlying index has (foo WITH = [, bar WITH =], baz WITH &&) we just assume the user wants WITHOUT 
OVERLAPS, and otherwise they want a regular PK/UQ constraint?


In addition this workflow only works if you can CREATE INDEX CONCURRENTLY. I'm not sure yet if we'll 
have problems there. I noticed that for REINDEX at least, there were plans in 2012 to support 
exclusion-constraint indexes,[2] but when the patch was committed in 2019 they had been dropped, 
with plans to add support eventually.[3] Today they are still not supported. Maybe whatever caused 
problems for REINDEX isn't an issue for just INDEX, but it would take more research to find out.


[1] https://dsf.berkeley.edu/papers/sigmod97-gist.pdf
[2] Original patch thread from 2012: 
https://www.postgresql.org/message-id/flat/CAB7nPqS%2BWYN021oQHd9GPe_5dSVcVXMvEBW_E2AV9OOEwggMHw%40mail.gmail.com#e1a372074cfdf37bf9e5b4e29ddf7b2d
[3] Revised patch thread, committed in 2019: 
https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: SQL:2011 application time

2024-06-05 Thread Paul Jungwirth

On 5/21/24 11:27, Isaac Morland wrote:

On Tue, 21 May 2024 at 13:57, Robert Haas mailto:robertmh...@gmail.com>> wrote:

What I think is less clear is what that means for temporal primary
keys. As Paul pointed out upthread, in every other case, a temporal
primary key is at least as unique as a regular primary key, but in
this case, it isn't. And someone might reasonably think that a
temporal primary key should exclude empty ranges just as all primary
keys exclude nulls. Or they might think the opposite.


Fascinating. I think you're absolutely right that it's clear that two empty intervals don't 
conflict. If somebody wants to claim two intervals conflict, they need to point to at least one 
instant in time that is common between them.


But a major point of a primary key, it seems to me, is that it uniquely identifies a row. If items 
are identified by a time range, non-overlapping or not, then the empty range can only identify one 
item (per value of whatever other columns are in the primary key). I think for a unique key the 
non-overlapping restriction has to be considered an additional restriction on top of the usual 
uniqueness restriction.


I suspect in many applications there will be a non-empty constraint; for example, it seems quite 
reasonable to me for a meeting booking system to forbid empty meetings. But when they are allowed 
they should behave in the mathematically appropriate way.


Finding a way forward for temporal PKs got a lot of discussion at pgconf.dev (thanks especially to 
Peter Eisentraut and Jeff Davis!), so I wanted to summarize some options and describe what I think 
is the best approach.


First the problem: empty ranges! A temporal PK/UNIQUE constraint is basically an exclusion 
constraint that is `(id WITH =, valid_at WITH &&)`. But the special 'empty' value never overlaps 
anything, *including itself*. (Note it has no "position": [3,3) is the same as [4,4).) Since the 
exclusion constraint forbids overlapping ranges, and empties never overlap, your table can have 
duplicates. (I'm talking about "literal uniqueness" as discussed in [1].) For instance:


CREATE EXTENSION btree_gist;
CREATE TABLE t (id int, valid_at daterange, name text);
ALTER TABLE t ADD CONSTRAINT tpk PRIMARY KEY (id, valid_at WITHOUT 
OVERLAPS);
INSERT INTO t VALUES (1, 'empty', 'foo');
INSERT INTO t VALUES (1, 'empty', 'bar');

Multiranges have the same problem. So what do we do about that?

**Option 0**: Allow it but document it. It shouldn't happen in practice: there is no reason for an 
empty range to get into a temporal table, and it arguably doesn't mean anything. The record is true 
at no time? But of course it will happen anyway. It's a footgun and will break expectations for at 
least some.


It causes problems for us too. If you say `SELECT name FROM t GROUP BY id, valid_at`, we recognize 
that `name` is a functional dependency on the PK, so we allow it and give you the first row matching 
each key. You might get "foo" or you might get "bar". Also the planner uses not-nullable uniqueness 
to take many shortcuts. I couldn't create any concrete breakage there, but I bet someone else could. 
PKs that are not literally unique seems like something that would cause headaches for years.


**Option 1**: Temporal PKs should automatically create a CHECK constraint that forbids empty ranges. 
Should UNIQUE constraints too? I'm tempted to say no, since sometimes users surprise us by coming up 
with new ways to use things. For instance one way to use empty ranges is to reference a temporal 
table from a non-temporal table, since `'empty' <@ anything` is always true (though this has 
questionable meaning or practical use). But probably we should forbid empties for UNIQUE constraints 
too. Forbidding them is more aligned with the SQL standard, which says that when you have a PERIOD, 
startcol < endcol (not <=). And it feels more consistent to treat both constraints the same way. 
Finally, if UNIQUEs do allow empties, we still risk confusing our planner.


My last patch created these CHECK constraints for PKs (but not UNIQUEs) as INTERNAL dependencies. 
It's pretty clunky. There are lots of cases to handle, e.g. `ALTER COLUMN c TYPE` may reuse the PK 
index or may generate a new one. And what if the user already created the same constraint? Seeing 
all the trouble giving PKs automatic (cataloged) NOT NULL constraints makes me wary about this 
approach. It's not as bad, since there is no legacy, but it's still more annoying than I expected.


Finally, hanging the CHECK constraint off the PK sets us up for problems when we add true PERIODs. 
Under 11.27 of SQL/Foundation, General Rules 2b says that defining a PERIOD should automatically add 
a CHECK constraint that startcol < endcol. That is already part of my last patch in this series. But 
that would be redundant with the constraint from the PK. And attaching the constraint to the PERIOD 
is a lot simpler than attach

Re: [multithreading] extension compatibility

2024-06-05 Thread Robert Haas
On Wed, Jun 5, 2024 at 4:32 PM Tristan Partin  wrote:
> Not entirely sure how I feel about the approach you've taken, but here
> is a patch that Heikki and I put together for extension compatibility.
> It's not a build time solution, but a runtime solution. Instead of
> PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
> is a GUC called `multithreaded` which controls the variable
> IsMultithreaded. We operated under the assumption that being able to
> toggle multithreading and multi-processing without recompiling has
> value.

That's interesting, because I thought Heikki was against having a
runtime toggle.

I don't think PG_MODULE_MAGIC_REENTRANT is a good syntax. It all looks
great as long as we only ever need the PG_MODULE_MAGIC line to signal
one bit of information, but as soon as you get to two bits it's pretty
questionable, and anything more than two bits is insane. If we want to
do something with the PG_MODULE_MAGIC line, I think it should involve
options-passing of some form rather than just having an alternate
macro name.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: ssl tests fail due to TCP port conflict

2024-06-05 Thread Andrew Dunstan



On 2024-06-05 We 16:00, Alexander Lakhin wrote:

Hello Andrew,

05.06.2024 21:10, Andrew Dunstan wrote:


I think I see what's going on here. It looks like it's because we 
start the server in unix socket mode, and then switch to using TCP as 
well.


Can you try your test with this patch applied and see if the problem 
persists? If we start in TCP mode the framework should test for a 
port clash.




It seems that the failure rate decreased (I guess the patch rules out the
case with two servers choosing the same port), but I still got:

16/53 postgresql:ssl / ssl/001_ssltests_36 OK 15.25s   205 
subtests passed
17/53 postgresql:ssl / ssl/001_ssltests_30 ERROR 3.17s (exit 
status 255 or signal 127 SIGinvalid)


2024-06-05 19:40:37.395 UTC [414110] LOG:  starting PostgreSQL 17beta1 
on x86_64-linux, compiled by gcc-13.2.1, 64-bit
2024-06-05 19:40:37.395 UTC [414110] LOG:  could not bind IPv4 address 
"127.0.0.1": Address already in use
2024-06-05 19:40:37.395 UTC [414110] HINT:  Is another postmaster 
already running on port 50072? If not, wait a few seconds and retry.


`grep '\b50072\b' -r testrun/` yields:
testrun/ssl/001_ssltests_34/log/001_ssltests_34_primary.log:2024-06-05 
19:40:37.392 UTC [414111] [unknown] LOG:  connection received: 
host=localhost port=50072

(a psql case)

That is, psql from the test instance 001_ssltests_34 opened a 
connection to

the test server with the client port 50072 and it made using the port by
the server from the test instance 001_ssltests_30 impossible.



Oh. (kicks self)

Should we really be allocating ephemeral server ports in the range 
41952..65535? Maybe we should be looking for an unallocated port 
somewhere below 41952, and above, say, 32767, so we couldn't have a 
client socket collision.



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-05 Thread Robert Haas
On Wed, Jun 5, 2024 at 1:50 PM Jelte Fennema-Nio  wrote:
> Totally agreed, and that's easily achievable because of the
> NegotiateProtocolVersion message. We're all good on v18 to v17
> connections and protocol changes, as long as we make any new behaviour
> depend on either a protocol parameter or a protocol version bump.

Cool.

> I think at minimum we'll need a mechanism for people to connect using
> a StartupMessage that doesn't break existing poolers. If users have no
> way to connect at all to their semi-recently installed connection
> pooler with the newest libpq, then I expect we'll have many unhappy
> users. I think it's debatable whether the compatible StartupMessage
> should be opt-in or opt-out. I'd personally at minimum want to wait
> one PG release before using the incompatible StartupMessage by
> default, just to give pooler installs some time to catch up.

I agree that we need such a mechanism, but if the driving feature is
cancel-key length, I expect that opt-in isn't going to work very well.
Opt-in seems like it would work well with compression or transparent
column encryption, but few users will specify a connection string
option just to get a longer cancel key length, so the feature won't
get much use if we do it that way. I won't be crushed if we decide to
somehow make it opt-in, but I kind of doubt that will happen. Would we
make everyone add longcancelkey=true to all their connection strings
for one release and then carry that connection parameter until the
heat death of the universe even though after the 1-release transition
period there would be no reason to ever use it? Would we rip the
parameter back out after the transition period and break existing
connection strings? Would we have people write protocolversion=3.1 to
opt in and then they could just keep that in the connection string
without harm, or at least without harm until 3.2 comes out? I don't
really like any of these options that well.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [multithreading] extension compatibility

2024-06-05 Thread Jelte Fennema-Nio
On Wed, 5 Jun 2024 at 22:05, Robert Haas  wrote:
> The attached patch is a sketch of one possible approach: PostgreSQL
> signals whether it is multithreaded by defining or not defining
> PG_MULTITHREADING in pg_config_manual.h, and an extension signals
> thread-readiness by defining PG_THREADSAFE_EXTENSION before including
> any PostgreSQL headers other than postgres.h.

My first gut-reaction: It seems kinda annoying to have to do this for
every c that you use to build your extension, e.g. citus or postgis
have a ton of those.

PG_MODULE_MAGIC seems like a better fit imho.

If we really want a compile time failure, then I think I'd prefer to
have a new postgres.h file (e.g. postgres-thread.h) that you would
include instead of plain postgres.h. Basically this file could then
contain the below two lines:

#include "postgres.h"
#define PG_THREADSAFE_EXTENSION1




Re: problems with "Shared Memory and Semaphores" section of docs

2024-06-05 Thread Nathan Bossart
On Mon, Jun 03, 2024 at 02:04:19PM -0500, Nathan Bossart wrote:
> Of course, as soon as I committed this, I noticed another missing reference
> to max_wal_senders in the paragraph about POSIX semaphores.  I plan to
> commit/back-patch the attached patch within the next couple days.

Committed.

-- 
nathan




Re: [multithreading] extension compatibility

2024-06-05 Thread Tristan Partin

On Wed Jun 5, 2024 at 3:05 PM CDT, Robert Haas wrote:

...

== Extension Compatibility Solutions ==

The attached patch is a sketch of one possible approach: PostgreSQL
signals whether it is multithreaded by defining or not defining
PG_MULTITHREADING in pg_config_manual.h, and an extension signals
thread-readiness by defining PG_THREADSAFE_EXTENSION before including
any PostgreSQL headers other than postgres.h. If PostgreSQL is built
multithreaded and the extension does not signal thread-safety, you get
something like this:

../pgsql/src/test/modules/dummy_seclabel/dummy_seclabel.c:20:1: error:
static assertion failed due to requirement '1 == 0': must define
PG_THREADSAFE_EXTENSION or use unthreaded PostgreSQL
PG_MODULE_MAGIC;

I'm not entirely happy with this solution because the results are
confusing if PG_THREADSAFE_EXTENSION is declared after including
fmgr.h. Perhaps this can be adequately handled by documenting and
demonstrating the right pattern, or maybe somebody has a better idea.

Another idea I considered was to replace the PG_MODULE_MAGIC;
declaration with something that allows for arguments, like
PG_MODULE_MAGIC(.process_model = false, .thread_model = true). But on
further reflection, that seems like the wrong thing. AFAICS, that's
going to tell you at runtime about something that you really want to
know at compile time. But this kind of idea might need more thought if
we decide that the *same* build of PostgreSQL can either launch
processes or threads per session, because then we'd to know which
extensions were available in whichever mode applied to the current
session.


Not entirely sure how I feel about the approach you've taken, but here 
is a patch that Heikki and I put together for extension compatibility. 
It's not a build time solution, but a runtime solution. Instead of 
PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There 
is a GUC called `multithreaded` which controls the variable 
IsMultithreaded. We operated under the assumption that being able to 
toggle multithreading and multi-processing without recompiling has 
value.


--
Tristan Partin
https://tristan.partin.io
From 60b0225d8e82d412237297710a7f007b006a7773 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Thu, 31 Aug 2023 11:08:01 -0500
Subject: [PATCH] Add PG_MODULE_MAGIC_REENTRANT

Extensions should use this if and only if they support reentrancy. If
the multithreaded GUC is set to true, then we can only load extensions
which support reentrancy.
---
 src/backend/utils/fmgr/dfmgr.c | 14 +++---
 src/include/fmgr.h | 23 +++
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 6ee9053044917..c2e6c22127067 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -85,7 +85,7 @@ static char *substitute_libpath_macro(const char *name);
 static char *find_in_dynamic_libpath(const char *basename);
 
 /* Magic structure that module needs to match to be accepted */
-static static_singleton const Pg_magic_struct magic_data = PG_MODULE_MAGIC_DATA;
+static static_singleton const Pg_magic_struct magic_data = PG_MODULE_MAGIC_REENTRANT_DATA;
 
 
 /*
@@ -255,8 +255,9 @@ internal_load_library(const char *libname)
 		{
 			const Pg_magic_struct *magic_data_ptr = (*magic_func) ();
 
+			/* We will check reentrancy later. */
 			if (magic_data_ptr->len != magic_data.len ||
-memcmp(magic_data_ptr, &magic_data, magic_data.len) != 0)
+memcmp(magic_data_ptr, &magic_data, offsetof(Pg_magic_struct, reentrant)) != 0)
 			{
 /* copy data block before unlinking library */
 Pg_magic_struct module_magic_data = *magic_data_ptr;
@@ -268,6 +269,13 @@ internal_load_library(const char *libname)
 /* issue suitable complaint */
 incompatible_module_error(libname, &module_magic_data);
 			}
+
+			if (IsMultiThreaded && !magic_data_ptr->reentrant)
+ereport(ERROR,
+		(errmsg("incompatible library \"%s\": not reentrant",
+libname),
+errhint("Extension libraries must use the PG_MODULE_MAGIC_REENTRANT macro to be loaded "
+		"when multithreading is turned on.")));
 		}
 		else
 		{
@@ -278,7 +286,7 @@ internal_load_library(const char *libname)
 			ereport(ERROR,
 	(errmsg("incompatible library \"%s\": missing magic block",
 			libname),
-	 errhint("Extension libraries are required to use the PG_MODULE_MAGIC macro.")));
+	 errhint("Extension libraries are required to use the PG_MODULE_MAGIC or PG_MODULE_MAGIC_REENTRANT macros.")));
 		}
 
 		/*
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index 3ecb98a907d33..b49901eacae10 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -470,6 +470,7 @@ typedef struct
 	int			namedatalen;	/* NAMEDATALEN */
 	int			float8byval;	/* FLOAT8PASSBYVAL */
 	char		abi_extra[32];	/* see pg_config_manual.h */
+	bool		reentrant;		/* Whether the extension supports reentrancy */
 } Pg_magic

Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-05 Thread Jelte Fennema-Nio
On Wed, 5 Jun 2024 at 19:53, Jeff Davis  wrote:
> Is this orthogonal to relocatability?

It's fairly orthogonal, but it has some impact on relocatability: You
can only relocate to a schema name that does not exist yet (while
currently you can only relocate to a schema that already exists). This
is because, for owned_schema=true, relocation is not actually changing
the schema of the extension objects, it only renames the existing
schema to the new name.

> When you say "an easy way to use a safe search_path": the CREATE
> EXTENSION command already sets the search_path, so your patch just
> ensures that it's empty (and therefore safe) first, right?

Correct: **safe** is the key word in that sentence. Without
owned_schema, you get an **unsafe** search_path by default unless you
go out of your way to set "schema=pg_catalog" in the control file.

> Should we go further and try to prevent creating objects in an
> extension-owned schema with normal SQL?

That would be nice for sure, but security wise it doesn't matter
afaict. Only the creator of the extension would be able to add stuff
in the extension-owned schema anyway, so there's no privilege
escalation concern there.

> Approximately how many extensions should be adjusted to use
> owned_schema=true?

Adjusting existing extensions would be hard at the moment, because the
current patch does not introduce a migration path. But basically I
think for most new extension installs (even of existing extensions) it
would be fine if owned_schema=true would be the default. I didn't
propose (yet) to make it the default though, to avoid discussing the
tradeoff of security vs breaking installation for an unknown amount of
existing extensions.

I think having a generic migration path would be hard, due to the many
ways in which extensions can now be installed. But I think we might be
able to add one fairly easily for relocatable extensions: e.g. "ALTER
EXTESION SET SCHEMA new_schema OWNED_SCHEMA", which would essentially
do CREATE SCHEMA new_schema + move all objects from old_schema to
new_schema. And even for non-relocatable one you could do something
like:

CREATE SCHEMA temp_schema_{random_id};
-- move all objects from ext_schema to temp_schema_{random_id};
DROP SCHEMA ext_schema; -- if this fails, ext_schema was not empty
ALTER SCHEMA temp_schema_{random_id} RENAME TO ext_schema;

> What are the reasons an extension would not want to
> own the schema in which the objects are created? I assume some would
> still create objects in pg_catalog, but ideally we'd come up with a
> better solution to that as well.

Some extensions depend on putting stuff into the public schema. But
yeah it would be best if they didn't.

> This protects the extension script, but I'm left wondering if we could
> do something here to make it easier to protect extension functions
> called from outside the extension script, also. It would be nice if we
> could implicitly tack on a "SET search_path TO @extschema@, pg_catalog,
> pg_temp" to each function in the extension. I'm not proposing that, but
> perhaps a related idea might work. Probably outside the scope of your
> proposal.

Yeah, this proposal definitely doesn't solve all security problems
with extensions. And indeed what you're proposing would solve another
major issue, another option would be to default to the "safe"
search_path that you proposed a while back. But yeah I agree that it's
outside of the scope of this proposal. I feel like if we try to solve
every security problem at once, probably nothing gets solved instead.
That's why I tried to keep this proposal very targeted, i.e. have this
be step 1 of an N step plan to make extensions more secure by default.




[multithreading] extension compatibility

2024-06-05 Thread Robert Haas
Hi,

At 2024.pgconf.dev, Heikki did a session on multithreading PostgreSQL
which I was unfortunately unable to attend due my involvement with
another session, and then we had an unconference discussion which I
was able to attend and at which I volunteered to have a look at a
couple of tasks, including "Extension Marking System (marking
extensions as thread-safe)". So in this email I'd like to (1) say a
few things about multithreading for PostgreSQL in general, (2) spell
out my understanding of the extension compatibility problem
specifically, and then (3) discuss possible solutions to that problem.
See also https://wiki.postgresql.org/wiki/Multithreading

== Multithreading Generally ==

I believe there is a consensus in the PostgreSQL developer community,
or at least among committers, that a multi-threaded programming model
would be superior to a multi-process programming model as we have now.
I won't be surprised if a few people disagree with that as a general
statement, and others may view it as better in theory but so difficult
in practice as to be not worth doing, but I believe that the consensus
is otherwise. I do understand that switching to threads introduces
some new stability risks, which are not to be taken lightly, but it
also opens the door to various performance improvements, and even
functionality, that are not feasible today. I do not believe that it
would be necessary, as has been alleged previously, to halt all other
development for a lengthy period of time while such a conversion is
undertaken, nor do I believe that the community would or should accept
such a solution were someone to propose it. I do believe that there
are some difficult problems to be solved in order to make it work at
all, and I believe even more strongly that a good deal of follow-up
work will be necessary to reap the potential benefits of such a
change. I also believe that it's absolutely necessary that both models
coexist side by side for a period of time. I think we will eventually
want to abandon the multi-process model, because I think over time the
benefits of using threads will accumulate until they are overwhelming
and the process model will end up appearing to be an obstacle to
progress. However, I don't think we'll be able to do that particularly
soon, because I think it's going to take a while to fully stabilize
the thread model even as far as the core code is concerned, and
extensions will take even longer to catch up. I realize Heikki in
particular is hoping for a quick transition; I don't see that as
feasible, but like everything else about this, opinions are going to
vary.

Obligatory disclaimer: Everything above (and below) is just a
statement of what I believe, and everyone is free to dispute it. As
always, I cannot speak to objective truth, but I can tell you what I
think.

== The Extension Compatibility Problem ==

I don't know yet whether we're going to end up with a system where the
same build of PostgreSQL can produce processes or threads depending on
configuration or whether it's going to be a build option, but I'm
guessing the latter is more likely. Certainly, if an extension is
assuming that its global variables are session-local and they suddenly
become global to the cluster, chaos will ensue. The same is true for
the core code, and will need to be solved by annotating global
variables so that the appropriate ones can be made thread-local and
the others can be given whatever treatment is appropriate considering
how they are used. The details of how this annotation system will work
are TBD, but the point for this email is that extension global
variables, including file-level globals, will need the same kinds of
annotations that we use in the core code in order to work. Other
adjustments may also be needed.

I think there are two severable problems here. One is that, if an
extension is built for use with a non-threaded PostgreSQL, we
shouldn't permit it to be used with a threaded PostgreSQL, even if the
major version and other details are compatible. Hence, threading or
the lack of it must become part of the data set up by PG_MODULE_MAGIC.
Maybe this problem goes away if we decide that threads-vs-processes is
a configuration option rather than a build-time option, but even then,
we might still end up with a build-time option indicating whether
threads are even a possibility, so I think it's pretty likely we need
this in some form. If or when the process model eventually dies, then
we can take this out again.

The other problem is that we probably want a way for extensions to
signal that they are believed to work with threading. It's a little
bit debatable whether this is a good idea, because (1) some people are
going to blindly state that their extension works fine with threading
even if they haven't actually made the necessary changes and (2) one
could simply declare that making an extension thread-ready is part of
supporting whatever PostgreSQL release adds threading as an option and
(3) one c

Re: Test 031_recovery_conflict fails when a conflict counted twice

2024-06-05 Thread Dmitry Dolgov
> On Mon, May 27, 2024 at 06:00:00PM GMT, Alexander Lakhin wrote:
> Hello hackers,
>
> As a recent buildfarm test failure on olingo (which builds postgres with
> -O1 and address sanitizer) [1] shows:
> ...
> [23:12:02.127](0.166s) not ok 6 - snapshot conflict: stats show conflict on 
> standby
> [23:12:02.130](0.003s) #   Failed test 'snapshot conflict: stats show 
> conflict on standby'
> #   at 
> /home/bf/bf-build/olingo/HEAD/pgsql/src/test/recovery/t/031_recovery_conflict.pl
>  line 332.
> [23:12:02.130](0.000s) #  got: '2'
> # expected: '1'
> ...
> [23:12:06.848](1.291s) not ok 17 - 5 recovery conflicts shown in 
> pg_stat_database
> [23:12:06.887](0.040s) #   Failed test '5 recovery conflicts shown in 
> pg_stat_database'
> #   at 
> /home/bf/bf-build/olingo/HEAD/pgsql/src/test/recovery/t/031_recovery_conflict.pl
>  line 286.
> [23:12:06.887](0.000s) #  got: '6'
> # expected: '5'
> Waiting for replication conn standby's replay_lsn to pass 0/3459160 on primary
> done
> ...
>
> pgsql.build/testrun/recovery/031_recovery_conflict/log/031_recovery_conflict_standby.log:
> 2024-05-15 23:12:01.616 UTC [1299981][client backend][2/2:0] LOG: statement:
> FETCH FORWARD FROM test_recovery_conflict_cursor;
> 2024-05-15 23:12:01.617 UTC [1299981][client backend][2/2:0] LOG: statement: ;
> 2024-05-15 23:12:01.910 UTC [1297595][startup][34/0:0] LOG: recovery still
> waiting after 15.289 ms: recovery conflict on snapshot
> 2024-05-15 23:12:01.910 UTC [1297595][startup][34/0:0] DETAIL: Conflicting 
> process: 1299981.
> 2024-05-15 23:12:01.910 UTC [1297595][startup][34/0:0] CONTEXT:  WAL redo at
> 0/344F468 for Heap2/PRUNE_VACUUM_SCAN: snapshotConflictHorizon: 746,
> isCatalogRel: F, nplans: 2, nredirected: 18, ndead: 0, nunused: 0, plans: [{
> xmax: 0, infomask: 2816, infomask2: 2, ntuples: 2, offsets: [21, 22] }, {
> xmax: 0, infomask: 11008, infomask2: 32770, ntuples: 18, offsets: [41, 42,
> 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58] }],
> redirected: [23->41, 24->42, 25->43, 26->44, 27->45, 28->46, 29->47, 30->48,
> 31->49, 32->50, 33->51, 34->52, 35->53, 36->54, 37->55, 38->56, 39->57,
> 40->58]; blkref #0: rel 1663/16385/16386, blk 0
> 2024-05-15 23:12:01.959 UTC [1299981][client backend][2/2:0] FATAL: 
> terminating connection due to conflict with recovery
> 2024-05-15 23:12:01.959 UTC [1299981][client backend][2/2:0] DETAIL:  User
> query might have needed to see row versions that must be removed.
> 2024-05-15 23:12:01.959 UTC [1299981][client backend][2/2:0] HINT: In a
> moment you should be able to reconnect to the database and repeat your
> command.
> vvv an important difference from a successful test run
> 2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] LOG: could not 
> send data to client: Broken pipe
> 2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] FATAL: 
> terminating connection due to conflict with recovery
> 2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] DETAIL:  User
> query might have needed to see row versions that must be removed.
> 2024-05-15 23:12:01.966 UTC [1299981][client backend][2/2:0] HINT: In a
> moment you should be able to reconnect to the database and repeat your
> command.
> ^^^
>
> test  031_recovery_conflict may fail in the following scenario:
>
> 031_recovery_conflict.pl:
>     executes  a query, which produces a conflict:
>     ## RECOVERY CONFLICT 2: Snapshot conflict
>     ...
>     $psql_standby->query_safe(...)
>
> startup process:
>     detects a snapshot conflict and sends
>     PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
>     (ResolveRecoveryConflictWithVirtualXIDs ->
>     CancelVirtualTransaction) to the client backend
>
> client backend:
>     receives and processes the signal:
>     HandleRecoveryConflictInterrupt; ProcessClientReadInterrupt ->
>     ProcessInterrupts -> ProcessRecoveryConflictInterrupts ->
>     ProcessRecoveryConflictInterrupt,
>
>     reports the recovery conflict:
>     pgstat_report_recovery_conflict(reason);
>
>     and reports the error:
>     ereport(FATAL, ... "terminating connection due to conflict with
>     recovery" ...)
>     sends the message to the server log:
>     errfinish -> EmitErrorReport -> send_message_to_server_log
>
> 031_recovery_conflict.pl:
>     # finds the message in the log and resets psql connection:
>     check_conflict_log(
>   "User query might have needed to see row versions that must
>    be removed");
>     $psql_standby->reconnect_and_clear();
>
> startup process:
>     keeps sending PROCSIG_RECOVERY_CONFLICT_SNAPSHOT to the client
>     backend in a loop inside ResolveRecoveryConflictWithVirtualXIDs
>
> client backend:
>     tries to send the message to the client:
>     send_message_to_frontend -> socket_flush ->
>     internal_flush_buffer,
>     gets an error (EPIPE) from secure_wri

Re: ssl tests fail due to TCP port conflict

2024-06-05 Thread Alexander Lakhin

Hello Andrew,

05.06.2024 21:10, Andrew Dunstan wrote:


I think I see what's going on here. It looks like it's because we start the server in unix socket mode, and then 
switch to using TCP as well.


Can you try your test with this patch applied and see if the problem persists? If we start in TCP mode the framework 
should test for a port clash.




It seems that the failure rate decreased (I guess the patch rules out the
case with two servers choosing the same port), but I still got:

16/53 postgresql:ssl / ssl/001_ssltests_36 OK 15.25s   205 subtests 
passed
17/53 postgresql:ssl / ssl/001_ssltests_30 ERROR 3.17s   (exit status 
255 or signal 127 SIGinvalid)

2024-06-05 19:40:37.395 UTC [414110] LOG:  starting PostgreSQL 17beta1 on 
x86_64-linux, compiled by gcc-13.2.1, 64-bit
2024-06-05 19:40:37.395 UTC [414110] LOG:  could not bind IPv4 address 
"127.0.0.1": Address already in use
2024-06-05 19:40:37.395 UTC [414110] HINT:  Is another postmaster already running on port 50072? If not, wait a few 
seconds and retry.


`grep '\b50072\b' -r testrun/` yields:
testrun/ssl/001_ssltests_34/log/001_ssltests_34_primary.log:2024-06-05 19:40:37.392 UTC [414111] [unknown] LOG:  
connection received: host=localhost port=50072

(a psql case)

That is, psql from the test instance 001_ssltests_34 opened a connection to
the test server with the client port 50072 and it made using the port by
the server from the test instance 001_ssltests_30 impossible.

Best regards.
Alexander




Re: Better error message when --single is not the first arg to postgres executable

2024-06-05 Thread Nathan Bossart
On Wed, Jun 05, 2024 at 02:51:05PM -0400, Greg Sabino Mullane wrote:
> Please find attached a quick patch to prevent this particularly bad error
> message for running "postgres", when making the common mistake of
> forgetting to put the "--single" option first because you added an earlier
> arg (esp. datadir)

Could we remove the requirement that --single must be first?  I'm not
thrilled about adding a list of "must be first" options that needs to stay
updated, but given this list probably doesn't change too frequently, maybe
that's still better than a more invasive patch to allow specifying these
options in any order...

-- 
nathan




Re: Optimizing COPY with SIMD

2024-06-05 Thread Nathan Bossart
On Wed, Jun 05, 2024 at 01:46:44PM -0400, Neil Conway wrote:
> We could go further and use the same code to handle both the tail of the
> string in the vectorized case and the entire string in the non-vectorized
> case, but I didn't bother with that -- as written, it would require taking
> an unnecessary strlen() of the input string in the non-vectorized case.

For pg_lfind32(), we ended up using an overlapping approach for the
vectorized case (see commit 7644a73).  That appeared to help more than it
harmed in the many (admittedly branch predictor friendly) tests I ran.  I
wonder if you could do something similar here.

> Looks like there is a slight regression for short attribute values, but I
> think the tradeoff is a net win.

It'd be interesting to see the threshold where your patch starts winning.
IIUC the vector stuff won't take effect until there are 16 bytes to
process.  If we don't expect attributes to ordinarily be >= 16 bytes, it
might be worth trying to mitigate this ~3% regression.  Maybe we can find
some other small gains elsewhere to offset it.

-- 
nathan




Re: ssl tests fail due to TCP port conflict

2024-06-05 Thread Andrew Dunstan



On 2024-06-05 We 14:10, Andrew Dunstan wrote:


On 2024-06-05 We 09:00, Alexander Lakhin wrote:


Another case (with psql using the port):
testrun/ssl/001_ssltests_47/log/regress_log_001_ssltests_47:# 
Checking port 49448
testrun/ssl/001_ssltests_47/log/regress_log_001_ssltests_47:# Found 
port 49448
testrun/ssl/001_ssltests_47/log/001_ssltests_47_primary.log:2024-06-05 
12:20:50.178 UTC [976826] LOG:  listening on Unix socket 
"/tmp/GePu6gmouP/.s.PGSQL.49448"
testrun/ssl/001_ssltests_47/log/001_ssltests_47_primary.log:2024-06-05 
12:20:50.491 UTC [976927] HINT:  Is another postmaster already 
running on port 49448? If not, wait a few seconds and retry.

...
testrun/ssl/001_ssltests_48/log/001_ssltests_48_primary.log:2024-06-05 
12:20:50.491 UTC [976943] [unknown] LOG:  connection received: 
host=localhost port=49448

The broader excerpt:
2024-06-05 12:20:50.415 UTC [976918] [unknown] LOG:  connection 
received: host=localhost port=50326
2024-06-05 12:20:50.418 UTC [976918] [unknown] LOG:  could not accept 
SSL connection: EOF detected
2024-06-05 12:20:50.433 UTC [976920] [unknown] LOG:  connection 
received: host=localhost port=49420
2024-06-05 12:20:50.435 UTC [976920] [unknown] LOG:  could not accept 
SSL connection: EOF detected
2024-06-05 12:20:50.447 UTC [976922] [unknown] LOG:  connection 
received: host=localhost port=49430
2024-06-05 12:20:50.452 UTC [976922] [unknown] LOG:  could not accept 
SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.466 UTC [976933] [unknown] LOG:  connection 
received: host=localhost port=49440
2024-06-05 12:20:50.472 UTC [976933] [unknown] LOG:  could not accept 
SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.491 UTC [976943] [unknown] LOG:  connection 
received: host=localhost port=49448
2024-06-05 12:20:50.497 UTC [976943] [unknown] LOG:  could not accept 
SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.513 UTC [976969] [unknown] LOG:  connection 
received: host=localhost port=49464
2024-06-05 12:20:50.517 UTC [976969] [unknown] LOG:  could not accept 
SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.532 UTC [976971] [unknown] LOG:  connection 
received: host=localhost port=49468



I think I see what's going on here. It looks like it's because we 
start the server in unix socket mode, and then switch to using TCP as 
well.


Can you try your test with this patch applied and see if the problem 
persists? If we start in TCP mode the framework should test for a port 
clash.






Hmm, on closer inspection we should have reserved the port anyway.  But 
why is the port "already used" on restart? We haven't previously opened 
a TCP connection on that port (except when checking if we can bind it), 
and instances should be locked against using that port.



 ... wanders away muttering and scratching head ...


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Better error message when --single is not the first arg to postgres executable

2024-06-05 Thread Greg Sabino Mullane
Please find attached a quick patch to prevent this particularly bad error
message for running "postgres", when making the common mistake of
forgetting to put the "--single" option first because you added an earlier
arg (esp. datadir)

Current behavior:

$ ~/pg/bin/postgres -D ~/pg/data --single
2024-06-05 18:30:40.296 GMT [22934] FATAL:  --single requires a value

Improved behavior:

$ ~/pg/bin/postgres -D ~/pg/data --single
--single must be first argument.

I applied it for all the "first arg only" flags (boot, check,
describe-config, and fork), as they suffer the same fate.

Cheers,
Greg


0001-Give-a-more-accurate-error-message-if-single-not-number-one.patch
Description: Binary data


Re: question regarding policy for patches to out-of-support branches

2024-06-05 Thread Tom Lane
Joe Conway  writes:
> I was having a discussion regarding out-of-support branches and effort 
> to keep them building, but could not for the life of me find any actual 
> documented policy (although I distinctly remember that we do something...).
> Is the policy written down somewhere, or is it only project lore? In 
> either case, what is the actual policy?

I believe our policy was set in this thread:

https://www.postgresql.org/message-id/flat/2923349.1634942313%40sss.pgh.pa.us

and you're right that it hasn't really been memorialized anywhere
else.  I'm not sure where would be appropriate.  Anyway, what
I think the policy is:

* Out-of-support versions back to (currently) 9.2 are still to be
kept buildable on modern toolchains.

* Build failures, regression failures, and easily-fixable compiler
warnings are candidates for fixes.

* We aren't too excited about code that requires external dependencies
(e.g. libpython) though, because those can be moving targets.

* Under no circumstances back-patch anything that changes external
behavior, as the point of the exercise is to be able to test against
the actual behavior of the last releases of these branches.

regards, tom lane




Re: race condition in pg_class

2024-06-05 Thread Noah Misch
On Sun, May 12, 2024 at 04:29:23PM -0700, Noah Misch wrote:
> I'm attaching patches implementing the LockTuple() design.

Starting 2024-06-10, I plan to push the first seven of the ten patches:

inplace005-UNEXPECTEDPASS-tap-meson-v1.patch
inplace010-tests-v1.patch
inplace040-waitfuncs-v1.patch
inplace050-tests-inj-v1.patch
inplace060-nodeModifyTable-comments-v1.patch
  Those five just deal in tests, test infrastructure, and comments.
inplace070-rel-locks-missing-v1.patch
  Main risk is new DDL deadlocks.
inplace080-catcache-detoast-inplace-stale-v1.patch
  If it fails to fix the bug it targets, I expect it's a no-op rather than
  breaking things.

I'll leave the last three of the ten needing review.  Those three are beyond
my skill to self-certify.




Re: ssl tests fail due to TCP port conflict

2024-06-05 Thread Andrew Dunstan


On 2024-06-05 We 09:00, Alexander Lakhin wrote:


Another case (with psql using the port):
testrun/ssl/001_ssltests_47/log/regress_log_001_ssltests_47:# Checking 
port 49448
testrun/ssl/001_ssltests_47/log/regress_log_001_ssltests_47:# Found 
port 49448
testrun/ssl/001_ssltests_47/log/001_ssltests_47_primary.log:2024-06-05 
12:20:50.178 UTC [976826] LOG:  listening on Unix socket 
"/tmp/GePu6gmouP/.s.PGSQL.49448"
testrun/ssl/001_ssltests_47/log/001_ssltests_47_primary.log:2024-06-05 
12:20:50.491 UTC [976927] HINT:  Is another postmaster already running 
on port 49448? If not, wait a few seconds and retry.

...
testrun/ssl/001_ssltests_48/log/001_ssltests_48_primary.log:2024-06-05 
12:20:50.491 UTC [976943] [unknown] LOG:  connection received: 
host=localhost port=49448

The broader excerpt:
2024-06-05 12:20:50.415 UTC [976918] [unknown] LOG:  connection 
received: host=localhost port=50326
2024-06-05 12:20:50.418 UTC [976918] [unknown] LOG:  could not accept 
SSL connection: EOF detected
2024-06-05 12:20:50.433 UTC [976920] [unknown] LOG:  connection 
received: host=localhost port=49420
2024-06-05 12:20:50.435 UTC [976920] [unknown] LOG:  could not accept 
SSL connection: EOF detected
2024-06-05 12:20:50.447 UTC [976922] [unknown] LOG:  connection 
received: host=localhost port=49430
2024-06-05 12:20:50.452 UTC [976922] [unknown] LOG:  could not accept 
SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.466 UTC [976933] [unknown] LOG:  connection 
received: host=localhost port=49440
2024-06-05 12:20:50.472 UTC [976933] [unknown] LOG:  could not accept 
SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.491 UTC [976943] [unknown] LOG:  connection 
received: host=localhost port=49448
2024-06-05 12:20:50.497 UTC [976943] [unknown] LOG:  could not accept 
SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.513 UTC [976969] [unknown] LOG:  connection 
received: host=localhost port=49464
2024-06-05 12:20:50.517 UTC [976969] [unknown] LOG:  could not accept 
SSL connection: tlsv1 alert unknown ca
2024-06-05 12:20:50.532 UTC [976971] [unknown] LOG:  connection 
received: host=localhost port=49468



I think I see what's going on here. It looks like it's because we start 
the server in unix socket mode, and then switch to using TCP as well.


Can you try your test with this patch applied and see if the problem 
persists? If we start in TCP mode the framework should test for a port 
clash.



cheers


andrew

--

Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index b877327023..20ec7cb5fb 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -4,8 +4,9 @@
 use strict;
 use warnings FATAL => 'all';
 use Config qw ( %Config );
-use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
+BEGIN { $PostgreSQL::Test::Utils::use_unix_sockets = 0; }
+use PostgreSQL::Test::Cluster;
 use Test::More;
 
 use FindBin;


question regarding policy for patches to out-of-support branches

2024-06-05 Thread Joe Conway
I was having a discussion regarding out-of-support branches and effort 
to keep them building, but could not for the life of me find any actual 
documented policy (although I distinctly remember that we do something...).


I did find fleeting references, for example:

8<---
commit c705646b751e08d584f6eeb098f1ed002aa7b11c
Author: Tom Lane 
Date:   2022-09-21 13:52:38 -0400



Per project policy, this is a candidate for back-patching into
out-of-support branches: it suppresses annoying compiler warnings
but changes no behavior.  Hence, back-patch all the way to 9.2.
8<---

and on its related thread:

8<---
However, I think that that would *not* be fit material for
back-patching into out-of-support branches, since our policy
for them is "no behavioral changes".
8<---

Is the policy written down somewhere, or is it only project lore? In 
either case, what is the actual policy?


Thanks,

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: small pg_dump code cleanup

2024-06-05 Thread Tom Lane
Nathan Bossart  writes:
> On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote:
>> (1) Names like `getXXX` for these functions suggest to me that they return
>> a value, rather than side-effecting. I realize some variants continue to
>> return a value, but the majority no longer do. Perhaps a name like
>> lookupXXX() or readXXX() would be clearer?

> What about collectXXX() to match similar functions in pg_dump.c (e.g.,
> collectRoleNames(), collectComments(), collectSecLabels())?

Personally I see nothing much wrong with leaving them as getXXX.

>> (2) These functions malloc() a single ntups * sizeof(struct) allocation and
>> then index into it to fill-in each struct before entering it into the hash
>> table. It might be more straightforward to just malloc each individual
>> struct.

> That'd increase the number of allocations quite significantly, but I'd be
> surprised if that was noticeable outside of extreme scenarios.  At the
> moment, I'm inclined to leave these as-is for this reason and because I
> doubt it'd result in much cleanup, but I'll yield to the majority opinion
> here.

I think that would be quite an invasive change; it would require
many hundreds of edits like

-   finfo[i].dobj.objType = DO_FUNC;
+   finfo->dobj.objType = DO_FUNC;

which aside from being tedious would create a back-patching hazard.
So I'm kind of -0.1 or so.

Another angle to this is that Coverity and possibly other tools tend
to report that these functions leak these allocations, apparently
because they don't notice that pointers into the allocations get
stored in hash tables by a subroutine.  I'm not sure if making this
change would make that worse or better.  If we really want to change
it, that might be worth checking somehow before we jump.

regards, tom lane




Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-05 Thread Jeff Davis
On Sat, 2024-06-01 at 17:08 -0700, Jelte Fennema-Nio wrote:
> This patch adds a new "owned_schema" option to the extension control
> file that can be set to true to indicate that this extension wants to
> own the schema in which it is installed. What that means is that the
> schema should not exist before creating the extension, and will be
> created during extension creation. This thus gives the extension
> author
> an easy way to use a safe search_path, while still allowing all
> objects
> to be grouped together in a schema. The implementation also has the
> pleasant side effect that the schema will be automatically dropped
> when
> the extension is dropped.

Is this orthogonal to relocatability?

When you say "an easy way to use a safe search_path": the CREATE
EXTENSION command already sets the search_path, so your patch just
ensures that it's empty (and therefore safe) first, right?

Should we go further and try to prevent creating objects in an
extension-owned schema with normal SQL?

Approximately how many extensions should be adjusted to use
owned_schema=true? What are the reasons an extension would not want to
own the schema in which the objects are created? I assume some would
still create objects in pg_catalog, but ideally we'd come up with a
better solution to that as well.

This protects the extension script, but I'm left wondering if we could
do something here to make it easier to protect extension functions
called from outside the extension script, also. It would be nice if we
could implicitly tack on a "SET search_path TO @extschema@, pg_catalog,
pg_temp" to each function in the extension. I'm not proposing that, but
perhaps a related idea might work. Probably outside the scope of your
proposal.

Regards,
Jeff Davis





Re: small pg_dump code cleanup

2024-06-05 Thread Neil Conway
On Wed, Jun 5, 2024 at 12:37 PM Nathan Bossart 
wrote:

> What about collectXXX() to match similar functions in pg_dump.c (e.g.,
> collectRoleNames(), collectComments(), collectSecLabels())?
>

sgtm.


> > (2) These functions malloc() a single ntups * sizeof(struct) allocation
> and
> > then index into it to fill-in each struct before entering it into the
> hash
> > table. It might be more straightforward to just malloc each individual
> > struct.
>
> That'd increase the number of allocations quite significantly, but I'd be
> surprised if that was noticeable outside of extreme scenarios.  At the
> moment, I'm inclined to leave these as-is for this reason and because I
> doubt it'd result in much cleanup, but I'll yield to the majority opinion
> here.
>

As you say, I'd be surprised if the performance difference is noticeable.
Personally I don't think the marginal performance win justifies the hit to
readability, but I don't feel strongly about it.

Neil


Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-05 Thread Jelte Fennema-Nio
On Wed, 5 Jun 2024 at 17:12, Robert Haas  wrote:
> Well, hang on. The discussion on that thread suggests that this is
> only going to break connections to servers that don't have
> NegotiateProtocolVersion.

Yes, correct.

> What
> I really don't want is for v18 to break connections to v17 servers.
> That would be exponentially more disruptive.

Totally agreed, and that's easily achievable because of the
NegotiateProtocolVersion message. We're all good on v18 to v17
connections and protocol changes, as long as we make any new behaviour
depend on either a protocol parameter or a protocol version bump.

> I do take your point that poolers haven't added support for
> NegotiateProtocolVersion yet, but I bet that will change pretty
> quickly once there's a benefit to doing so.

I think at minimum we'll need a mechanism for people to connect using
a StartupMessage that doesn't break existing poolers. If users have no
way to connect at all to their semi-recently installed connection
pooler with the newest libpq, then I expect we'll have many unhappy
users. I think it's debatable whether the compatible StartupMessage
should be opt-in or opt-out. I'd personally at minimum want to wait
one PG release before using the incompatible StartupMessage by
default, just to give pooler installs some time to catch up.

> I think it's a lot easier
> for people to install a new pooler version than a new server version,
> because the server has the data in it. Also, it's not like they
> haven't had time.

I agree that it's a lot easier to upgrade poolers than it is to
upgrade PG, but still people are generally hesitant to upgrade stuff
in their infrastructure. And I totally agree that poolers have had the
time to implement NegotiateProtocolVersion support, but I'm pretty
sure many users will assign blame to libpq anyway.

> But the real question here is whether we think the longer cancel key
> is really important enough to justify the partial compatibility break.
> I'm not deathly opposed to it, but I think it's debatable and I'm sure
> some people are going to be unhappy.

I think if it's an opt-in compatibility break, users won't care how
important it is. It's either important enough to opt-in to this
compatibility break for them, or it's not and nothing changes for
them.

My feeling is that we're unlikely to find a feature that's worth
breaking compatibility (with poolers and pre-9.3 PGs) by default on
its own. But after adding a few new protocol features, at some point
together these features become worth this break. Especially, because
by then 9.3 will be even older and poolers will have started to
support NegotiateProtocolVersion (due to people complaining that they
cannot connect with the new opt-in libpq break-compatibility flag).
But if we're going to wait until we have the one super important
protocol feature, then I don't see us moving forward.

To summarize: I'd like to make some relatively small opt-in change(s)
in PG18 that breaks compatibility (with poolers and pre-9.3 PGs). So
that when we have an important enough reason to break compatibility by
default, that break will be much less painful to do.




Re: Optimizing COPY with SIMD

2024-06-05 Thread Neil Conway
Thanks for the review and feedback!

On Mon, Jun 3, 2024 at 10:56 AM Nathan Bossart 
wrote:

> > -/*
> > - * Send text representation of one attribute, with conversion and
> escaping
> > - */
> >  #define DUMPSOFAR() \
>
> IIUC this comment was meant to describe the CopyAttributeOutText() function
> just below this macro.  When the macro was added in commit 0a5fdb0 from
> 2006, the comment became detached from the function.  Maybe we should just
> move it back down below the macro.
>

Ah, that makes sense -- done.


> > +/*
> > + * Send text representation of one attribute, with conversion and
> CSV-style
> > + * escaping. This variant uses SIMD instructions to optimize
> processing, but
> > + * we can only use this approach when encoding_embeds_ascii if false.
> > + */
>
> nitpick: Can we add a few words about why using SIMD instructions when
> encoding_embeds_ascii is true is difficult?  I don't dispute that it is
> complex and/or not worth the effort, but it's not clear to me why that's
> the case just from reading the patch.
>

Sounds good.


> > +static void
> > +CopyAttributeOutCSVFast(CopyToState cstate, const char *ptr,
> > + bool use_quote)
>
> nitpick: Can we add "vector" or "simd" to the name instead of "fast"?  IMHO
> it's better to be more descriptive.
>

Sure, done.

Attached is a revised patch series, that incorporates the feedback above
and makes two additional changes:

* Add some regression tests to cover COPY behavior with octal and hex
escape sequences
* Optimize the COPY TO text (non-CSV) code path (CopyAttributeOutText()).

In CopyAttributeOutText(), I refactored some code into a helper function to
reduce code duplication, on the theory that field delimiters and escape
sequences are rare, so we don't mind taking a function call in those cases.

We could go further and use the same code to handle both the tail of the
string in the vectorized case and the entire string in the non-vectorized
case, but I didn't bother with that -- as written, it would require taking
an unnecessary strlen() of the input string in the non-vectorized case.

Performance for COPY TO in text (non-CSV) mode:

===
master

Benchmark 1: ./psql -f
/Users/neilconway/copy-out-bench-text-long-strings.sql
  Time (mean ± σ):  1.240 s ±  0.013 s[User: 0.001 s, System: 0.000
s]
  Range (min … max):1.220 s …  1.256 s10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-text-short.sql
  Time (mean ± σ): 522.3 ms ±  11.3 ms[User: 1.2 ms, System: 0.0 ms]
  Range (min … max):   512.0 ms … 544.3 ms10 runs

master + SIMD patches:

Benchmark 1: ./psql -f
/Users/neilconway/copy-out-bench-text-long-strings.sql
  Time (mean ± σ): 867.6 ms ±  12.7 ms[User: 1.2 ms, System: 0.0 ms]
  Range (min … max):   842.1 ms … 891.6 ms10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-out-bench-text-short.sql
  Time (mean ± σ): 536.7 ms ±  10.9 ms[User: 1.2 ms, System: 0.0 ms]
  Range (min … max):   530.1 ms … 566.8 ms10 runs
===

Looks like there is a slight regression for short attribute values, but I
think the tradeoff is a net win.

I'm going to take a look at applying similar ideas to COPY FROM next.

Neil


v2-0001-Adjust-misleading-comment-placement.patch
Description: Binary data


v2-0002-Improve-COPY-test-coverage-for-handling-of-contro.patch
Description: Binary data


v2-0003-Optimize-COPY-TO-in-CSV-format-using-SIMD.patch
Description: Binary data


v2-0004-Optimize-COPY-TO-in-text-format-using-SIMD.patch
Description: Binary data


Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade

2024-06-05 Thread Nathan Bossart
On Wed, Jun 05, 2024 at 07:28:42PM +0200, Matthias van de Meent wrote:
> As for "on my laptop", that sounds very reasonable, but could you
> check the performance on systems with larger shared buffer
> configurations? I'd imagine (but haven't checked) that binary upgrade
> target systems may already be using the shared_buffers from their
> source system, which would cause a severe regression when the
> to-be-upgraded system has large shared buffers. For initdb the
> database size is known in advance and shared_buffers is approximately
> empty, but the same is not (always) true when we're doing binary
> upgrades.

Will do.  FWIW I haven't had much luck improving pg_upgrade times by
adjusting server settings, but I also haven't explored it all that much.

-- 
nathan




Improving PL/Tcl's error context reports

2024-06-05 Thread Tom Lane
While working on commit b631d0149, I got a bee in my bonnet about
how unfriendly PL/Tcl's error CONTEXT reports are:

* The context reports expose PL/Tcl's internal names for the Tcl
procedures it creates, which'd be fine if those names were readable.
But actually they're something like "__PLTcl_proc_", where 
is the function OID.  Not only is that unintelligible, but because
the OIDs aren't stable this forces us to disable display of the
CONTEXT lines in all of PL/Tcl's regression tests.

* The first line of the context report (almost?) always duplicates
the primary error message, which is redundant and not per our
normal reporting style.

So attached is a patch that attempts to improve this situation.

The key question is how to avoid including function OIDs in the
strings that will appear in the regression test outputs.  The
answer I propose is to start with an internal name like
"__PLTcl_proc_NAME", where NAME is the function's normal SQL name,
and then append the OID only if that function name is not unique.
As long as we don't create test cases that involve throwing
errors from duplicatively-named functions, we can show the context
reports and still have stable regression outputs.  I think this will
improve the user experience for regular users too.

PL/Tcl wants the internal names to be all-ASCII-alphanumeric,
which saves it from having to think about encoding conversion
or quoting when inserting those names into Tcl command strings.
What I did in the attached is to copy only ASCII alphanumerics
from the SQL name.  Perhaps it's worth working harder but
I failed to get excited about that.

A few notes:

* To avoid unnecessarily appending the OID when a function is
redefined, I modified the logic to explicitly delete the old Tcl
command before checking for duplication.  This is okay even if the
function is currently being evaluated, because Tcl's internal
reference counting prevents it from deleting the underlying code
object until it's done being executed.  Really we were depending on
that reference counting to handle such cases already, but you wouldn't
have known it from our comments.  I added a test case to demonstrate
explicitly that this works correctly.

* Sadly, pltcl_trigger.sql still has to suppress the context
reports.  Although its function names are now stable, the reports
include trigger argument lists, which include numeric table OIDs
so they're unstable.  I don't see a way to change that without
breaking API for user trigger functions.

* A hazard with this plan is that the regression tests' context
reports might turn out to be platform-dependent.  I experimented
with Tcl 8.5 and 8.6 here and found one difference: the "missing
close-brace" error reported by our tcl_error() test case shows the
unmatched open-brace on one version but not the other.  AFAICS the
point of that test is just to exercise some Tcl-detected error, not
necessarily that exact one, so I just modified the test case to cause
a different error.  We might find additional problems once this patch
hits the buildfarm or gets out into the field.

I'll park this in the next CF.


regards, tom lane

diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index b31f2c1330..64c4918419 100644
--- a/doc/src/sgml/pltcl.sgml
+++ b/doc/src/sgml/pltcl.sgml
@@ -1120,16 +1120,24 @@ CALL transaction_test1();
 
 
  In PostgreSQL, the same function name can be used for
- different function definitions as long as the number of arguments or their types
+ different function definitions if the functions are placed in different
+ schemas, or if the number of arguments or their types
  differ. Tcl, however, requires all procedure names to be distinct.
- PL/Tcl deals with this by making the internal Tcl procedure names contain
- the object
- ID of the function from the system table pg_proc as part of their name. Thus,
+ PL/Tcl deals with this by appending the function's object ID (OID) to
+ the internal Tcl procedure name if necessary to make it different
+ from the names of all previously-loaded functions in the same Tcl
+ interpreter.  Thus,
  PostgreSQL functions with the same name
  and different argument types will be different Tcl procedures, too.  This
  is not normally a concern for a PL/Tcl programmer, but it might be visible
  when debugging.
 
 
+
+ For this reason among others, a PL/Tcl function cannot call another one
+ directly (that is, within Tcl).  If you need to do that, you must go
+ through SQL, using spi_exec or a related command.
+
+

  
diff --git a/src/pl/tcl/expected/pltcl_queries.out b/src/pl/tcl/expected/pltcl_queries.out
index 2d922c2333..27a3d355c6 100644
--- a/src/pl/tcl/expected/pltcl_queries.out
+++ b/src/pl/tcl/expected/pltcl_queries.out
@@ -1,5 +1,3 @@
--- suppress CONTEXT so that function OIDs aren't in output
-\set VERBOSITY terse
 -- Test composite-type arguments
 select tcl_

Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade

2024-06-05 Thread Matthias van de Meent
On Wed, 5 Jun 2024 at 18:47, Ranier Vilela  wrote:
>
> Em ter., 4 de jun. de 2024 às 16:39, Nathan Bossart 
>  escreveu:
>>
>> I noticed that the "Restoring database schemas in the new cluster" part of
>> pg_upgrade can take a while if you have many databases, so I experimented
>> with a couple different settings to see if there are any easy ways to speed
>> it up.  The FILE_COPY strategy for CREATE DATABASE helped quite
>> significantly on my laptop.  For ~3k empty databases, this step went from
>> ~100 seconds to ~30 seconds with the attached patch.  I see commit ad43a41
>> made a similar change for initdb, so there might even be an argument for
>> back-patching this to v15 (where STRATEGY was introduced).  One thing I
>> still need to verify is that this doesn't harm anything when there are lots
>> of objects in the databases, i.e., more WAL generated during many
>> concurrent CREATE-DATABASE-induced checkpoints.
>>
>> Thoughts?
>
> Why not use it too, if not binary_upgrade?

Because in the normal case (not during binary_upgrade) you don't want
to have to generate 2 checkpoints for every created database,
especially not when your shared buffers are large. Checkpoints' costs
scale approximately linearly with the size of shared buffers, so being
able to skip those checkpoints (with strategy=WAL_LOG) will save a lot
of performance in the systems where this performance impact matters
most.

>> I noticed that the "Restoring database schemas in the new cluster" part of
>> pg_upgrade can take a while if you have many databases, so I experimented
>> with a couple different settings to see if there are any easy ways to speed
>> it up.  The FILE_COPY strategy for CREATE DATABASE helped quite
>> significantly on my laptop.  For ~3k empty databases, this step went from
>> ~100 seconds to ~30 seconds with the attached patch.

As for "on my laptop", that sounds very reasonable, but could you
check the performance on systems with larger shared buffer
configurations? I'd imagine (but haven't checked) that binary upgrade
target systems may already be using the shared_buffers from their
source system, which would cause a severe regression when the
to-be-upgraded system has large shared buffers. For initdb the
database size is known in advance and shared_buffers is approximately
empty, but the same is not (always) true when we're doing binary
upgrades.

Kind regards,

Matthias van de Meent




Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade

2024-06-05 Thread Nathan Bossart
On Wed, Jun 05, 2024 at 01:47:09PM -0300, Ranier Vilela wrote:
> Why not use it too, if not binary_upgrade?
> 
> else
> {
> appendPQExpBuffer(creaQry, "CREATE DATABASE %s WITH TEMPLATE = template0
> STRATEGY = FILE_COPY",
>  qdatname);
> }
> 
> It seems to me that it also improves in any use.

Well, if that is true, and I'm not sure it is, then we should probably
consider changing the default STRATEGY in the server instead.  I haven't
looked too deeply, but my assumption is that when fsync is disabled (as it
is when restoring schemas during pg_upgrade), both checkpointing and
copying the template database are sufficiently fast enough to beat writing
out all the data to WAL.  Furthermore, in my test, all the databases were
basically empty, so I suspect that some CREATE DATABASE commands could
piggy-back on checkpoints initiated by others.  This might not be the case
when there are many objects in each database, and that is a scenario I have
yet to test.

-- 
nathan




Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade

2024-06-05 Thread Ranier Vilela
Em ter., 4 de jun. de 2024 às 16:39, Nathan Bossart <
nathandboss...@gmail.com> escreveu:

> I noticed that the "Restoring database schemas in the new cluster" part of
> pg_upgrade can take a while if you have many databases, so I experimented
> with a couple different settings to see if there are any easy ways to speed
> it up.  The FILE_COPY strategy for CREATE DATABASE helped quite
> significantly on my laptop.  For ~3k empty databases, this step went from
> ~100 seconds to ~30 seconds with the attached patch.  I see commit ad43a41
> made a similar change for initdb, so there might even be an argument for
> back-patching this to v15 (where STRATEGY was introduced).  One thing I
> still need to verify is that this doesn't harm anything when there are lots
> of objects in the databases, i.e., more WAL generated during many
> concurrent CREATE-DATABASE-induced checkpoints.
>
> Thoughts?
>
Why not use it too, if not binary_upgrade?

else
{
appendPQExpBuffer(creaQry, "CREATE DATABASE %s WITH TEMPLATE = template0
STRATEGY = FILE_COPY",
 qdatname);
}

It seems to me that it also improves in any use.

best regards,
Ranier Vilela


Re: small pg_dump code cleanup

2024-06-05 Thread Nathan Bossart
On Wed, Jun 05, 2024 at 12:22:03PM -0400, Neil Conway wrote:
> Nice cleanup! Two minor comments:

Thanks for taking a look.

> (1) Names like `getXXX` for these functions suggest to me that they return
> a value, rather than side-effecting. I realize some variants continue to
> return a value, but the majority no longer do. Perhaps a name like
> lookupXXX() or readXXX() would be clearer?

What about collectXXX() to match similar functions in pg_dump.c (e.g.,
collectRoleNames(), collectComments(), collectSecLabels())?

> (2) These functions malloc() a single ntups * sizeof(struct) allocation and
> then index into it to fill-in each struct before entering it into the hash
> table. It might be more straightforward to just malloc each individual
> struct.

That'd increase the number of allocations quite significantly, but I'd be
surprised if that was noticeable outside of extreme scenarios.  At the
moment, I'm inclined to leave these as-is for this reason and because I
doubt it'd result in much cleanup, but I'll yield to the majority opinion
here.

-- 
nathan




Re: small pg_dump code cleanup

2024-06-05 Thread Neil Conway
On Wed, Jun 5, 2024 at 11:14 AM Nathan Bossart 
wrote:

>  In fact, many of the functions in this area don't actually need to

return anything, so we can trim some code and hopefully reduce confusion a
> bit.  Patch attached.
>

Nice cleanup! Two minor comments:

(1) Names like `getXXX` for these functions suggest to me that they return
a value, rather than side-effecting. I realize some variants continue to
return a value, but the majority no longer do. Perhaps a name like
lookupXXX() or readXXX() would be clearer?

(2) These functions malloc() a single ntups * sizeof(struct) allocation and
then index into it to fill-in each struct before entering it into the hash
table. It might be more straightforward to just malloc each individual
struct.

Neil


Re: Make query cancellation keys longer

2024-06-05 Thread Robert Haas
On Fri, Mar 8, 2024 at 5:20 PM Heikki Linnakangas  wrote:
> The nice thing about relying on the message length was that we could
> just redefine the CancelRequest message to have a variable length
> secret, and old CancelRequest with 4-byte secret was compatible with the
> new definition too. But it doesn't matter much, so added an explicit
> length field.

I think I liked your original idea better than this one.

> One consequence of this patch that I didn't mention earlier is that it
> makes libpq incompatible with server version 9.2 and below, because the
> minor version negotiation was introduced in version 9.3. We could teach
> libpq to disconnect and reconnect with minor protocol version 3.0, if
> connecting with 3.1 fails, but IMHO it's better to not complicate this
> and accept the break in backwards-compatibility. 9.3 was released in
> 2013. We dropped pg_dump support for versions older than 9.2 a few years
> ago, this raises the bar for pg_dump to 9.3 as well.

I think we shouldn't underestimate the impact of bumping the minor
protocol version. Minor version negotiation is probably not supported
by all drivers and Jelte says that it's not supported by any poolers,
so for anybody but direct libpq users, there will be some breakage.
Now, on the one hand, as Jelte has said, there's little value in
having a protocol version if we're too afraid to make use of it. But
on the other hand, is this problem serious enough to justify the
breakage we'll cause? I'm not sure. It seems pretty silly to be using
a 32-bit value for this in 2024, but I'm sure some people aren't going
to like it, and they may not all have noticed this thread. On the
third hand, if we do this, it may help to unblock a bunch of other
pending patches that also want to do protocol-related things.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




small pg_dump code cleanup

2024-06-05 Thread Nathan Bossart
While reviewing Daniel's pg_dump patch [0], I was initially confused
because the return value of getTypes() isn't saved anywhere.  Once I found
commit 92316a4, I realized that data was actually stored in a separate hash
table.  In fact, many of the functions in this area don't actually need to
return anything, so we can trim some code and hopefully reduce confusion a
bit.  Patch attached.

[0] https://postgr.es/m/8F1F1E1D-D17B-4B33-B014-EDBCD15F3F0B%40yesql.se

-- 
nathan
>From a6b8585cd55c0758ffb1cabb4c4deacbb65af9d3 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 4 Jun 2024 22:32:20 -0500
Subject: [PATCH v1 1/1] Trim some unnecessary pg_dump code.

---
 src/bin/pg_dump/common.c  |  69 --
 src/bin/pg_dump/pg_dump.c | 282 +-
 src/bin/pg_dump/pg_dump.h |  49 ---
 3 files changed, 113 insertions(+), 287 deletions(-)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 64e7dc89f1..c323b5bd3d 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -101,31 +101,8 @@ getSchemaData(Archive *fout, int *numTablesPtr)
ExtensionInfo *extinfo;
InhInfo*inhinfo;
int numTables;
-   int numTypes;
-   int numFuncs;
-   int numOperators;
-   int numCollations;
-   int numNamespaces;
int numExtensions;
-   int numPublications;
-   int numAggregates;
int numInherits;
-   int numRules;
-   int numProcLangs;
-   int numCasts;
-   int numTransforms;
-   int numAccessMethods;
-   int numOpclasses;
-   int numOpfamilies;
-   int numConversions;
-   int numTSParsers;
-   int numTSTemplates;
-   int numTSDicts;
-   int numTSConfigs;
-   int numForeignDataWrappers;
-   int numForeignServers;
-   int numDefaultACLs;
-   int numEventTriggers;
 
/*
 * We must read extensions and extension membership info first, because
@@ -139,7 +116,7 @@ getSchemaData(Archive *fout, int *numTablesPtr)
getExtensionMembership(fout, extinfo, numExtensions);
 
pg_log_info("reading schemas");
-   (void) getNamespaces(fout, &numNamespaces);
+   getNamespaces(fout);
 
/*
 * getTables should be done as soon as possible, so as to minimize the
@@ -153,69 +130,69 @@ getSchemaData(Archive *fout, int *numTablesPtr)
getOwnedSeqs(fout, tblinfo, numTables);
 
pg_log_info("reading user-defined functions");
-   (void) getFuncs(fout, &numFuncs);
+   getFuncs(fout);
 
/* this must be after getTables and getFuncs */
pg_log_info("reading user-defined types");
-   (void) getTypes(fout, &numTypes);
+   getTypes(fout);
 
/* this must be after getFuncs, too */
pg_log_info("reading procedural languages");
-   getProcLangs(fout, &numProcLangs);
+   getProcLangs(fout);
 
pg_log_info("reading user-defined aggregate functions");
-   getAggregates(fout, &numAggregates);
+   getAggregates(fout);
 
pg_log_info("reading user-defined operators");
-   (void) getOperators(fout, &numOperators);
+   getOperators(fout);
 
pg_log_info("reading user-defined access methods");
-   getAccessMethods(fout, &numAccessMethods);
+   getAccessMethods(fout);
 
pg_log_info("reading user-defined operator classes");
-   getOpclasses(fout, &numOpclasses);
+   getOpclasses(fout);
 
pg_log_info("reading user-defined operator families");
-   getOpfamilies(fout, &numOpfamilies);
+   getOpfamilies(fout);
 
pg_log_info("reading user-defined text search parsers");
-   getTSParsers(fout, &numTSParsers);
+   getTSParsers(fout);
 
pg_log_info("reading user-defined text search templates");
-   getTSTemplates(fout, &numTSTemplates);
+   getTSTemplates(fout);
 
pg_log_info("reading user-defined text search dictionaries");
-   getTSDictionaries(fout, &numTSDicts);
+   getTSDictionaries(fout);
 
pg_log_info("reading user-defined text search configurations");
-   getTSConfigurations(fout, &numTSConfigs);
+   getTSConfigurations(fout);
 
pg_log_info("reading user-defined foreign-data wrappers");
-   getForeignDataWrappers(fout, &numForeignDataWrappers);
+   getForeignDataWrappers(fout);
 
pg_log_info("reading user-defined foreign servers");
-   getForeignServers(fout, &numForeignServers);
+  

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-05 Thread Robert Haas
On Wed, Jun 5, 2024 at 10:06 AM Jelte Fennema-Nio  wrote:
> FYI Heikki his patchset is here:
> https://www.postgresql.org/message-id/flat/508d0505-8b7a-4864-a681-e7e5edfe32aa%40iki.fi
>
> Afaict there's no way to implement this with old clients supporting
> the new message. So it would need to be opt-in from the client
> perspective, either using a version bump or a protocol parameter (e.g.
> large_cancel=true). IMHO a version bump would make more sense for this
> one.

Well, hang on. The discussion on that thread suggests that this is
only going to break connections to servers that don't have
NegotiateProtocolVersion. Personally, I think that's fairly OK. It's
true that connections to, I guess, pre-9.3 servers will break, but
there shouldn't be tons of those left around. It's not great to break
connectivity to such servers, of course, but it seems kind of OK. What
I really don't want is for v18 to break connections to v17 servers.
That would be exponentially more disruptive.

I do take your point that poolers haven't added support for
NegotiateProtocolVersion yet, but I bet that will change pretty
quickly once there's a benefit to doing so. I think it's a lot easier
for people to install a new pooler version than a new server version,
because the server has the data in it. Also, it's not like they
haven't had time.

But the real question here is whether we think the longer cancel key
is really important enough to justify the partial compatibility break.
I'm not deathly opposed to it, but I think it's debatable and I'm sure
some people are going to be unhappy.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: XLog size reductions: Reduced XLog record header size for PG17

2024-06-05 Thread Mark Dilger



> On Jun 20, 2023, at 1:01 PM, Matthias van de Meent 
>  wrote:
> 
> 0001 is copied essentially verbatim from [1] and reduces overhead in
> the registered block's length field where possible. It is included to
> improve code commonality between varcoded integer fields. See [1] for
> more details.

Hi Matthias!  I am interested in seeing this patch move forward.  We seem to be 
stuck.

The disagreement on the other thread seems to be about whether we can 
generalize and reuse variable integer encoding.  Could you comment on whether 
perhaps we just need a few versions of that?  Perhaps one version where the 
number of length bytes is encoded in the length itself (such as is used for 
varlena and by Andres' patch) and one where the number of length bytes is 
stored elsewhere?  You are clearly using the "elsewhere" form, but perhaps you 
could pull out the logic of that into src/common?  In struct 
XLogRecordBlockHeader.id , you are reserving 
two bits for the size class.  (The code comments aren't clear about this, by 
the way.)  Perhaps if the generalized length encoding logic could take a couple 
arguments to represent where and how the size class bits are to be stored, and 
where the length itself is stored?  I doubt you need to sacrifice any 
performance gains of this patch to make that happen.  You'd just need to 
restructure the patch.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Fix use of possible uninitialized variable retval (src/pl/plpgsql/src/pl_handler.c)

2024-06-05 Thread Julien Rouhaud
On Wed, Jun 5, 2024 at 1:05 PM Michael Paquier  wrote:
>
> This complain was from lapwing, that uses a version of gcc which
> produces a lot of noise with incorrect issues.  It is one of the only
> 32b buildfarm members, so it still has a lot of value.

Note that I removed the -Werror from lapwing a long time ago, so at
least this animal shouldn't lead to hackish fixes for false positive
anymore.




Re: Patch bug: Fix jsonpath .* on Arrays

2024-06-05 Thread David E. Wheeler
On Jun 4, 2024, at 20:45, David E. Wheeler  wrote:

> Here’s a new patch that demonstrates that behavior, since that code path is 
> not currently represented in tests AFAICT (I would have expected to have 
> broken it with this patch).

Commitfest link:

  https://commitfest.postgresql.org/48/5017/

D





Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-05 Thread Marco Slot
On Sun, Jun 2, 2024, 02:08 Jelte Fennema-Nio  wrote:
> This patch adds a new "owned_schema" option to the extension control
> file that can be set to true to indicate that this extension wants to
> own the schema in which it is installed.

Huge +1

Many managed PostgreSQL services block superuser access, but provide a
way for users to trigger a create/alter extension as superuser. There
have been various extensions whose SQL scripts can be tricked into
calling a function that was pre-created in the extension schema. This
is usually done by finding an unqualified call to a pg_catalog
function/operator, and overloading it with one whose arguments types
are a closer match for the provided values, which then takes
precedence regardless of search_path order. The custom function can
then do something like "alter user foo superuser".

The sequence of steps assumes the user already has some kind of admin
role and is gaining superuser access to their own database server.
However, the superuser implicitly has shell access, so it gives
attackers an additional set of tools to poke around in the managed
service. For instance, they can change the way the machine responds to
control plane requests, which can sometimes trigger further
escalations. In addition, many applications use the relatively
privileged default user, which means SQL injection issues can also
escalate into superuser access and beyond.

There are some static analysis tools like
https://github.com/timescale/pgspot that address this issue, though it
seems like a totally unnecessary hole. Using schema = pg_catalog,
relocatable = false, and doing an explicit create schema (without "if
not exists") plugs the hole by effectively disabling extension
schemas. For extensions I'm involved in, I consider this to be a hard
requirement.

I think Jelte's solution is preferable going forward, because it
preserves the flexibility that extension schemas were meant to
provide, and makes the potential hazards of reusing a schema more
explicit.

cheers,
Marco




Re: Partial aggregates pushdown

2024-06-05 Thread Bruce Momjian
On Wed, Jun  5, 2024 at 08:19:04AM +0300, Alexander Pyhalov wrote:
> > *  Passes unsafe binary data from the foreign server.
> > 
> > Can someone show me where that last item is in the patch, and why can't
> > we just pass back values cast to text?
> 
> In finalize_aggregate() when we see partial aggregate with
> peragg->aggref->aggtranstype = INTERNALOID
> we call aggregate's serialization function and return it as bytea.
> 
> The issue is that this internal representation isn't guaranteed to be
> compatible between servers
> of different versions (or architectures?). So, likely, we instead should
> have called some export function for aggregate
> and later - some import function on postgres_fdw side. It doesn't matter
> much, what this export function
> generates - text, json or some portable binary format,
> 1) export/import functions should just "understand" it,
> 2) it should be a stable representation.

Okay, so looking at the serialization output functions already defined, I
see many zeros, which I assume means just the base data type, and eight
more:

SELECT DISTINCT aggserialfn from pg_aggregate WHERE aggserialfn::oid != 
0;
aggserialfn
---
 numeric_avg_serialize
 string_agg_serialize
 array_agg_array_serialize
 numeric_serialize
 int8_avg_serialize
 array_agg_serialize
 interval_avg_serialize
 numeric_poly_serialize

I realize we need to return the sum and count for average, so that makes
sense.

So, we need import/export text representation for the partial aggregate
mode for these eight, and call the base data type text import/export
functions for the zero ones when in this mode?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Conflict Detection and Resolution

2024-06-05 Thread Dilip Kumar
On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila  wrote:
>
> Can you share the use case of "earliest_timestamp_wins" resolution
> method? It seems after the initial update on the local node, it will
> never allow remote update to succeed which sounds a bit odd. Jan has
> shared this and similar concerns about this resolution method, so I
> have added him to the email as well.
>
I can not think of a use case exactly in this context but it's very
common to have such a use case while designing a distributed
application with multiple clients.  For example, when we are doing git
push concurrently from multiple clients it is expected that the
earliest commit wins.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: ResourceOwner refactoring

2024-06-05 Thread Heikki Linnakangas

On 04/06/2024 01:49, Heikki Linnakangas wrote:

A straightforward fix is to modify RelationFlushRelation() so that if
!IsTransactionState(), it just marks the entry as invalid instead of
calling RelationClearRelation(). That's what RelationClearRelation()
would do anyway, if it didn't hit the assertion first.


Here's a patch with that straightforward fix. Your test case hit the 
"rd_createSubid != InvalidSubTransactionId" case, I expanded it to also 
cover the "rd_firstRelfilelocatorSubid != InvalidSubTransactionId" case. 
Barring objections, I'll commit this later today or tomorrow. Thanks for 
the report!


I started a new thread with the bigger refactorings I had in mind: 
https://www.postgresql.org/message-id/9c9e8908-7b3e-4ce7-85a8-00c0e165a3d6%40iki.fi


--
Heikki Linnakangas
Neon (https://neon.tech)
From 227f173d1066955a2ab6aba12c508aaa1f416c27 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 5 Jun 2024 13:29:51 +0300
Subject: [PATCH 1/3] Make RelationFlushRelation() work without ResourceOwner
 during abort

ReorderBufferImmediateInvalidation() executes invalidation messages in
an aborted transaction. However, RelationFlushRelation sometimes
required a valid resource owner, to temporarily increment the refcount
of the relache entry. Commit b8bff07daa worked around that in the main
subtransaction abort function, AbortSubTransaction(), but missed this
similar case in ReorderBufferImmediateInvalidation().

To fix, introduce a separate function to invalidate a relcache
entry. It does the same thing as RelationClearRelation(rebuild==true)
does, when outside a transaction, but can be called without
incrementing the refcount.

Add regression test. Before this fix, it failed with:

ERROR: ResourceOwnerEnlarge called after release started

Reported-by: Alexander Lakhin 
Disussion: https://www.postgresql.org/message-id/e56be7d9-14b1-664d-0bfc-00ce97727...@gmail.com
---
 .../expected/decoding_in_xact.out | 48 +
 .../test_decoding/sql/decoding_in_xact.sql| 27 ++
 src/backend/access/transam/xact.c | 13 -
 src/backend/utils/cache/relcache.c| 54 ---
 4 files changed, 122 insertions(+), 20 deletions(-)

diff --git a/contrib/test_decoding/expected/decoding_in_xact.out b/contrib/test_decoding/expected/decoding_in_xact.out
index b65253f4630..8555b3d9436 100644
--- a/contrib/test_decoding/expected/decoding_in_xact.out
+++ b/contrib/test_decoding/expected/decoding_in_xact.out
@@ -79,6 +79,54 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
  COMMIT
 (6 rows)
 
+-- Decoding works in transaction that issues DDL
+--
+-- We had issues handling relcache invalidations with these, see
+-- https://www.postgresql.org/message-id/e56be7d9-14b1-664d-0bfc-00ce97727...@gmail.com
+CREATE TABLE tbl_created_outside_xact(id SERIAL PRIMARY KEY);
+BEGIN;
+  -- TRUNCATE changes the relfilenode and sends relcache invalidation
+  TRUNCATE tbl_created_outside_xact;
+  INSERT INTO tbl_created_outside_xact(id) VALUES('1');
+  -- don't show yet, haven't committed
+  SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data 
+--
+(0 rows)
+
+COMMIT;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+ data 
+--
+ BEGIN
+ table public.tbl_created_outside_xact: TRUNCATE: (no-flags)
+ table public.tbl_created_outside_xact: INSERT: id[integer]:1
+ COMMIT
+(4 rows)
+
+set debug_logical_replication_streaming=immediate;
+BEGIN;
+  CREATE TABLE tbl_created_in_xact(id SERIAL PRIMARY KEY);
+  INSERT INTO tbl_created_in_xact VALUES (1);
+  CHECKPOINT; -- Force WAL flush, so that the above changes will be streamed
+  SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+   data   
+--
+ opening a streamed block for transaction
+ streaming change for transaction
+ closing a streamed block for transaction
+(3 rows)
+
+COMMIT;
+reset debug_logical_replication_streaming;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+  data   
+-
+ BEGIN
+ table public.tbl_created_in_xact: INSERT: id[integer]:1
+ COMMIT
+(3 rows)
+
 SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
  ?column? 
 --
diff --git a/contrib/test_decoding/sql/decoding_in_xact.sql b/contrib/test_decoding/sql/decoding_in_xact.sql
index 108782dc2e9..841a92fa780 100644
--- a/contrib/test_decoding/sql/decoding_in_xact.sql
+++ b/contrib/test_decoding/sql/d

Relcache refactoring

2024-06-05 Thread Heikki Linnakangas
While looking at the recent bug report from Alexander Lakhin [1], I got 
annoyed by the relcache code, and RelationClearRelation in particular. I 
propose to refactor it for clarity.


[1] 
https://www.postgresql.org/message-id/e56be7d9-14b1-664d-0bfc-00ce9772721c%40gmail.com


## Patch 1

This is just a narrow fix for the reported bug [1], same as I posted on 
that thread. Included here because I wrote the refactorings on top of 
this patch and didn't commit it yet.



## Patch 2: Simplify call to rebuild relcache entry for indexes

To rebuild a relcache entry that's been marked as invalid, 
RelationIdGetRelation() calls RelationReloadIndexInfo() for indexes and 
RelationClearRelation(rebuild == true) for other relations. However, 
RelationClearRelation calls RelationReloadIndexInfo() for indexes 
anyway, so RelationIdGetRelation() can just always call 
RelationClearRelation() and let RelationClearRelation() do the right 
thing to rebuild the relation, whether it's an index or something else. 
That seems more straightforward.


Also add comments explaining how the rebuild works at index creation. 
It's a bit special, see the comments.



## Patch 3: Split RelationClearRelation into three different functions

RelationClearRelation() is complicated. Depending on the 'rebuild' 
argument and the circumstances, like if it's called in a transaction and 
whether the relation is an index, a nailed relation, a regular table, or 
a relation dropped in the same xact, it does different things:


- Remove the relation completely from the cache (rebuild == false),
- Mark the entry as invalid (rebuild == true, but not in xact), or
- Rebuild the entry (rebuild == true).

The callers have expectations on what they want it to do. Mostly the 
callers with 'rebuild == false' expect the entry to be removed, and 
callers with 'rebuild == true' expect it to be rebuilt or invalidated, 
but there are exceptions. RelationForgetRelation() for example sets 
rd_droppedSubid and expects RelationClearRelation() to then merely 
invalidate it, and the call from RelationIdGetRelation() expects it to 
rebuild, not merely invalidate it.


I propose to split RelationClearRelation() into three functions:

RelationInvalidateRelation: mark the relcache entry as invalid, so that 
it it is rebuilt on next access.

RelationRebuildRelation: rebuild the relcache entry in-place.
RelationClearRelation: Remove the entry from the relcache.

This moves the responsibility of deciding the right action to the 
callers. Which they were mostly already doing. Each of those actions 
have different preconditions, e.g. RelationRebuildRelation() can only be 
called in a valid transaction, and RelationClearRelation() can only be 
called if the reference count is zero. Splitting them makes those 
preconditions more clear, we can have assertions to document them in each.



## RelationInitPhysicalAddr() call in RelationReloadNailed()

One question or little doubt I have: Before these patches, 
RelationReloadNailed() calls RelationInitPhysicalAddr() even when it 
leaves the relation as invalidated because we're not in a transaction or 
if the relation isn't currently in use. That's a bit surprising, I'd 
expect it to be done when the entry is reloaded, not when it's 
invalidated. That's how it's done for non-nailed relations. And in fact, 
for a nailed relation, RelationInitPhysicalAddr() is called *again* when 
it's reloaded.


Is it important? Before commit a54e1f1587, nailed non-index relations 
were not reloaded at all, except for the call to 
RelationInitPhysicalAddr(), which seemed consistent. I think this was 
unintentional in commit a54e1f1587, or perhaps just overly defensive, as 
there's no harm in some extra RelationInitPhysicalAddr() calls.


This patch removes that extra call from the invalidation path, but if it 
turns out to be wrong, we can easily add it to RelationInvalidateRelation.


--
Heikki Linnakangas
Neon (https://neon.tech)From 227f173d1066955a2ab6aba12c508aaa1f416c27 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 5 Jun 2024 13:29:51 +0300
Subject: [PATCH 1/3] Make RelationFlushRelation() work without ResourceOwner
 during abort

ReorderBufferImmediateInvalidation() executes invalidation messages in
an aborted transaction. However, RelationFlushRelation sometimes
required a valid resource owner, to temporarily increment the refcount
of the relache entry. Commit b8bff07daa worked around that in the main
subtransaction abort function, AbortSubTransaction(), but missed this
similar case in ReorderBufferImmediateInvalidation().

To fix, introduce a separate function to invalidate a relcache
entry. It does the same thing as RelationClearRelation(rebuild==true)
does, when outside a transaction, but can be called without
incrementing the refcount.

Add regression test. Before this fix, it failed with:

ERROR: ResourceOwnerEnlarge called after release started

Reported-by: Alexander Lakhin 
Disussion: https://www.postgresql.org/message-

Re: Conflict Detection and Resolution

2024-06-05 Thread Dilip Kumar
On Wed, Jun 5, 2024 at 9:12 AM shveta malik  wrote:
>
> On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila  wrote:
> >
> > > >
> > > > >
> > > > > Conflict Resolution
> > > > > 
> > > > > a) latest_timestamp_wins:The change with later commit timestamp 
> > > > > wins.
> > > > > b) earliest_timestamp_wins:   The change with earlier commit 
> > > > > timestamp wins.
> >
> > Can you share the use case of "earliest_timestamp_wins" resolution
> > method? It seems after the initial update on the local node, it will
> > never allow remote update to succeed which sounds a bit odd. Jan has
> > shared this and similar concerns about this resolution method, so I
> > have added him to the email as well.
>
> I do not have the exact scenario for this.  But I feel, if 2 nodes are
> concurrently inserting different data against a primary key, then some
> users may have preferences that retain the row which was inserted
> earlier. It is no different from latest_timestamp_wins. It totally
> depends upon what kind of application and requirement the user may
> have, based on which, he may discard the later coming rows (specially
> for INSERT case).

I haven't read the complete design yet, but have we discussed how we
plan to deal with clock drift if we use timestamp-based conflict
resolution? For example, a user might insert something conflicting on
node1 first and then on node2. However, due to clock drift, the
timestamp from node2 might appear earlier. In this case, if we choose
"earliest timestamp wins," we would keep the changes from node2.

I haven't fully considered if this would cause any problems, but users
might detect this issue. For instance, a client machine might send a
change to node1 first and then, upon confirmation, send it to node2.
If the clocks on node1 and node2 are not synchronized, the changes
might appear in a different order. Does this seem like a potential
problem?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




ssl tests fail due to TCP port conflict

2024-06-05 Thread Alexander Lakhin

Hello hackers,

As buildfarm shows, ssl tests are not stable enough when running via meson.
I can see the following failures for the last 90 days:
REL_16_STABLE:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-03-12%2023%3A15%3A50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-03-21%2000%3A35%3A23
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-03-27%2011%3A15%3A31
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-04-16%2016%3A10%3A45

master:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-03-08%2011%3A19%3A42
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-03-11%2022%3A23%3A28
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-03-17%2023%3A03%3A50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-03-20%2009%3A21%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-03-20%2016%3A53%3A27
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-04-07%2012%3A25%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2024-04-08%2019%3A50%3A13
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-04-19%2021%3A24%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-04-22%2006%3A17%3A13
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-04-29%2023%3A27%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-30%2000%3A24%3A28
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-06-04%2011%3A20%3A07

All the failures are caused by the server inability to restart with a
previously chosen random port on TCP/IP. For example:
2024-06-04 11:30:40.227 UTC [3373644][postmaster][:0] LOG:  starting PostgreSQL 17beta1 on x86_64-linux, compiled by 
clang-13.0.1-11, 64-bit

2024-06-04 11:30:40.231 UTC [3373644][postmaster][:0] LOG: listening on Unix socket 
"/tmp/tUmT8ItNQ2/.s.PGSQL.60362"
2024-06-04 11:30:40.337 UTC [3373798][startup][:0] LOG:  database system was 
shut down at 2024-06-04 11:21:25 UTC
...
2024-06-04 11:30:45.273 UTC [3376046][postmaster][:0] LOG:  starting PostgreSQL 17beta1 on x86_64-linux, compiled by 
clang-13.0.1-11, 64-bit

2024-06-04 11:30:45.273 UTC [3376046][postmaster][:0] LOG:  could not bind IPv4 address 
"127.0.0.1": Address already in use
2024-06-04 11:30:45.273 UTC [3376046][postmaster][:0] HINT:  Is another postmaster already running on port 60362? If 
not, wait a few seconds and retry.

2024-06-04 11:30:45.273 UTC [3376046][postmaster][:0] WARNING: could not create listen 
socket for "127.0.0.1"

I've managed to reproduce the failure locally with the following change:
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -149,7 +149,7 @@ INIT
    $ENV{PGDATABASE} = 'postgres';

    # Tracking of last port value assigned to accelerate free port lookup.
-   $last_port_assigned = int(rand() * 16384) + 49152;
+   $last_port_assigned = int(rand() * 1024) + 49152;

and multiplying one of the tests.
for i in `seq 50`; do cp .../src/test/ssl/t/001_ssltests.pl \
  .../src/test/ssl/t/001_ssltests_$i.pl; \
  sed -E "s|('t/001_ssltests.pl',)|\1\n't/001_ssltests_$i.pl',|" -i \
    .../src/test/ssl/meson.build; done

Then `meson test --suite ssl` fails for me as below:
...
26/53 postgresql:ssl / ssl/001_ssltests_26 OK 9.03s   205 subtests 
passed
27/53 postgresql:ssl / ssl/001_ssltests_18 ERROR 3.55s   (exit status 
255 or signal 127 SIGinvalid)
>>> OPENSSL=/usr/bin/openssl ...
28/53 postgresql:ssl / ssl/001_ssltests_25 OK 10.98s   205 subtests 
passed
29/53 postgresql:ssl / ssl/001_ssltests_24 OK 10.84s   205 subtests 
passed

testrun/ssl/001_ssltests_18/log/001_ssltests_18_primary.log contains:
2024-06-05 11:51:22.005 UTC [710541] LOG:  could not bind IPv4 address 
"127.0.0.1": Address already in use
2024-06-05 11:51:22.005 UTC [710541] HINT:  Is another postmaster already running on port 49632? If not, wait a few 
seconds and retry.


`grep '\b49632\b' -r testrun/` finds:
testrun/ssl/001_ssltests_18/log/regress_log_001_ssltests_18:# Checking port 
49632
testrun/ssl/001_ssltests_18/log/regress_log_001_ssltests_18:# Found port 49632
testrun/ssl/001_ssltests_18/log/regress_log_001_ssltests_18:Connection string: 
port=49632 host=/tmp/sp_VLbpjJF
testrun/ssl/001_ssltests_18/log/001_ssltests_18_primary.log:2024-06-05 11:51:18.896 UTC [710082] LOG:  listening on Unix 
socket "/tmp/sp_VLbpjJF/.s.PGSQL.49632"
testrun/ssl/001_ssltests_18/log/001_ssltests_18_primary.log:2024-06-05 11:51:22.005 UTC [710541] HINT:  Is another 
postmaster already running on port 49632? If not, wait a few seconds and retry.

...
testrun/ssl/001_ssltests_23/log/regress_log_001_ssltests_23:# Checking port 
49632
testrun/ssl/001_ssltests_23/log/regress_log_001_ssltests_23:# Found port 49632
testrun/ssl

Re: Fix use of possible uninitialized variable retval (src/pl/plpgsql/src/pl_handler.c)

2024-06-05 Thread Ranier Vilela
Em qua., 5 de jun. de 2024 às 02:04, Michael Paquier 
escreveu:

> On Wed, Jun 05, 2024 at 01:12:41PM +0900, Kyotaro Horiguchi wrote:
> > At Mon, 27 May 2024 11:31:24 -0300, Ranier Vilela 
> wrote in
> >> The function *plpgsql_inline_handler* can use uninitialized
> >> variable retval, if PG_TRY fails.
> >> Fix like function*plpgsql_call_handler* wich declare retval as
> >> volatile and initialize to (Datum 0).
>
> You forgot to read elog.h, explaining under which circumstances
> variables related to TRY/CATCH block should be marked as volatile.
> There is a big "Note:" paragraph.
>
> It is not the first time that this is mentioned on this list: but
> sending a report without looking at the reason why a change is
> justified makes everybody waste time.  That's not productive.
>
Of course, this is very bad when it happens.


>
> > If PG_TRY fails, retval is not actually accessed, so no real issue
> > exists. Commit 7292fd8f1c changed plpgsql_call_handler() to the
> > current form, but as stated in its commit message, it did not fix a
> > real issue and was solely to silence compiler.
>
> This complain was from lapwing, that uses a version of gcc which
> produces a lot of noise with incorrect issues.  It is one of the only
> 32b buildfarm members, so it still has a lot of value.
>
I posted the report, because of an uninitialized variable warning.
Which is one of the most problematic situations, when it *actually exists*.


> > I believe we do not need to modify plpgsql_inline_handler() unless
> > compiler actually issues a false warning for it.
>
> If we were to do something, that would be to remove the volatile from
> plpgsql_call_handler() at the end once we don't have in the buildfarm
> compilers that complain about it, because there is no reason to use a
> volatile in this case.  :)
>
I don't see any motivation, since there are no reports.

best regards,
Ranier Vilela


RE: Psql meta-command conninfo+

2024-06-05 Thread Maiquel Grassi
>From a quick skim of the latest messages in this thread, it sounds like
there are a couple of folks who think \conninfo (and consequently
\conninfo+) should only report values from the libpq API.  I think they
would prefer server-side information to be provided via a system view or
maybe an entirely new psql meta-command.

IIUC a new system view would more-or-less just gather information from
other system views and functions.  A view would be nice because it could be
used outside psql, but I'm not sure to what extent we are comfortable
adding system views built on other system views.  Something like
\sessioninfo (as proposed by Sami) would look more similar to what you have
in your latest patch, i.e., we'd teach psql about a complicated query for
gathering all this disparate information.

IMHO a good way to help move this forward is to try implementing it as a
system view so that we can compare the two approaches.  I've had luck in
the past with implementing something a couple different ways to help drive
consensus.

--//--

Great Nathan, we can follow that path of the view. I leave it open for anyone 
in the thread who has been following from the beginning to start the 
development of the view. Even you, if you have the interest and time to do it. 
At this exact moment, I am involved in a "my baby born" project, so I am unable 
to stop to look into this. If someone wants to start, I would appreciate it.
Regards,
Maiquel Grassi.


Re: Logical Replication of sequences

2024-06-05 Thread Ashutosh Bapat
On Wed, Jun 5, 2024 at 8:45 AM Amit Kapila  wrote:

> On Tue, Jun 4, 2024 at 7:40 PM Ashutosh Bapat
>  wrote:
> >
> > On Tue, Jun 4, 2024 at 4:27 PM Amit Kapila 
> wrote:
> >>
> >>
> >> 3. Replicate published sequences via walsender at the time of shutdown
> >> or incrementally while decoding checkpoint record. The two ways to
> >> achieve this are: (a) WAL log a special NOOP record just before
> >> shutting down checkpointer. Then allow the WALsender to read the
> >> sequence data and send it to the subscriber while decoding the new
> >> NOOP record. (b) Similar to the previous idea but instead of WAL
> >> logging a new record directly invokes a decoding callback after
> >> walsender receives a request to shutdown which will allow pgoutput to
> >> read and send required sequences. This approach has a drawback that we
> >> are adding more work at the time of shutdown but note that we already
> >> waits for all the WAL records to be decoded and sent before shutting
> >> down the walsender during shutdown of the node.
> >>
> >> Any other ideas?
> >>
> >
> > In case of primary crash the sequence won't get replicated. That is true
> even with the previous approach in case walsender is shut down because of a
> crash, but it is more serious with this approach.
> >
>
> Right, but if we just want to support a major version upgrade scenario
> then this should be fine because upgrades require a clean shutdown.
>
> >
>  How about periodically sending this information?
> >
>
> Now, if we want to support some sort of failover then probably this
> will help. Do you have that use case in mind?


Regular failover was a goal for supporting logical replication of
sequences. That might be more common than major upgrade scenario.


> If we want to send
> periodically then we can do it when decoding checkpoint
> (XLOG_CHECKPOINT_ONLINE) or some other periodic WAL record like
> running_xacts (XLOG_RUNNING_XACTS).
>
>
Yeah. I am thinking along those lines.

It must be noted, however, that none of those optional make sure that the
replicated sequence's states are consistent with the replicated object
state which use those sequences. E.g. table t1 uses sequence s1. By last
sequence replication, as of time T1, let's say t1 had consumed values upto
vl1 from s1. But later, by time T2, it consumed values upto vl2 which were
not replicated but the changes to t1 by T2 were replicated. If failover
happens at that point, INSERTs on t1 would fail because of duplicate keys
(values between vl1 and vl2). Previous attempt to support logical sequence
replication solved this problem by replicating a future state of sequences
(current value +/- log count). Similarly, if the sequence was ALTERed
between T1 and T2, the state of sequence on replica would be inconsistent
with the state of t1. Failing over at this stage, might end t1 in an
inconsistent state.

-- 
Best Wishes,
Ashutosh Bapat


Re: Logical Replication of sequences

2024-06-05 Thread Amit Kapila
On Wed, Jun 5, 2024 at 12:51 PM Peter Eisentraut  wrote:
>
> On 04.06.24 12:57, Amit Kapila wrote:
> > 2. Provide a command say Alter Subscription ...  Replicate Sequences
> > (or something like that) which users can perform before shutdown of
> > the publisher node during upgrade. This will allow copying all the
> > sequences from the publisher node to the subscriber node directly.
> > Similar to previous approach, this could also be inconvenient for
> > users.
>
> I would start with this.  In any case, you're going to need to write
> code to collect all the sequence values, send them over some protocol,
> apply them on the subscriber.  The easiest way to start is to trigger
> that manually.  Then later you can add other ways to trigger it, either
> by timer or around shutdown, or whatever other ideas there might be.
>

Agreed. To achieve this, we can allow sequences to be copied during
the initial CREATE SUBSCRIPTION command similar to what we do for
tables. And then later by new/existing command, we re-copy the already
existing sequences on the subscriber.

The options for the new command could be:
Alter Subscription ... Refresh Sequences
Alter Subscription ... Replicate Sequences

In the second option, we need to introduce a new keyword Replicate.
Can you think of any better option?

In addition to the above, the command Alter Subscription .. Refresh
Publication will fetch any missing sequences similar to what it does
for tables.

Thoughts?

-- 
With Regards,
Amit Kapila.




Re: Fix use of possible uninitialized variable retval (src/pl/plpgsql/src/pl_handler.c)

2024-06-05 Thread Ranier Vilela
Em qua., 5 de jun. de 2024 às 01:12, Kyotaro Horiguchi <
horikyota@gmail.com> escreveu:

> At Mon, 27 May 2024 11:31:24 -0300, Ranier Vilela 
> wrote in
> > Hi.
> >
> > The function *plpgsql_inline_handler* can use uninitialized
> > variable retval, if PG_TRY fails.
> > Fix like function*plpgsql_call_handler* wich declare retval as
> > volatile and initialize to (Datum 0).
>
> If PG_TRY fails, retval is not actually accessed, so no real issue
> exists.

You say it for this call
PG_RE_THROW();


> Commit 7292fd8f1c changed plpgsql_call_handler() to the
> current form, but as stated in its commit message, it did not fix a
> real issue and was solely to silence compiler.
>

> I believe we do not need to modify plpgsql_inline_handler() unless
> compiler actually issues a false warning for it.
>
Yeah, there is a warning, but not from the compiler.

best regards,
Ranier Vilela


libpq: Trace StartupMessage/SSLRequest/GSSENCRequest correctly

2024-06-05 Thread Jelte Fennema-Nio
While working on my patchset for protocol changes I realized that the
StartupMessage/SSLRequest/GSSENCRequest was not shown correctly in the
tracing output of libpq. With this change these messages are now shown
correctly in the tracing output.

To test you can add a PQreset(conn) call to the start of the
test_cancel function in
src/test/modules/libpq_pipeline/libpq_pipeline.c.

And then run:
ninja -C build all install-quiet &&
build/src/test/modules/libpq_pipeline/libpq_pipeline cancel
'port=5432' -t test.trace

And then look at the top of test.trace
From 6fe11a1656e3bc85608e3d883848e42cd765e553 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Wed, 5 Jun 2024 12:07:09 +0200
Subject: [PATCH v1] libpq: Trace StartupMessage/SSLRequest/GSSENCRequest
 correctly

These messages would previously be logged in the tracing output as:
Unknown message: length %d

With this commit their type and contents are now correctly listed.
---
 src/interfaces/libpq/fe-trace.c | 47 -
 1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/src/interfaces/libpq/fe-trace.c b/src/interfaces/libpq/fe-trace.c
index d7a61ec9cc1..cb9648e5ceb 100644
--- a/src/interfaces/libpq/fe-trace.c
+++ b/src/interfaces/libpq/fe-trace.c
@@ -697,6 +697,7 @@ pqTraceOutputNoTypeByteMessage(PGconn *conn, const char *message)
 {
 	int			length;
 	int			logCursor = 0;
+	int			version = 0;
 
 	if ((conn->traceFlags & PQTRACE_SUPPRESS_TIMESTAMPS) == 0)
 	{
@@ -712,19 +713,41 @@ pqTraceOutputNoTypeByteMessage(PGconn *conn, const char *message)
 
 	fprintf(conn->Pfdebug, "F\t%d\t", length);
 
-	switch (length)
+	if (length < 8)
 	{
-		case 16:/* CancelRequest */
-			fprintf(conn->Pfdebug, "CancelRequest\t");
-			pqTraceOutputInt32(conn->Pfdebug, message, &logCursor, false);
-			pqTraceOutputInt32(conn->Pfdebug, message, &logCursor, false);
-			pqTraceOutputInt32(conn->Pfdebug, message, &logCursor, false);
-			break;
-		case 8:	/* GSSENCRequest or SSLRequest */
-			/* These messages do not reach here. */
-		default:
-			fprintf(conn->Pfdebug, "Unknown message: length is %d", length);
-			break;
+		fprintf(conn->Pfdebug, "Unknown message: length is %d\n", length);
+		return;
+	}
+
+	memcpy(&version, message + logCursor, 4);
+	version = (int) pg_ntoh32(version);
+
+	if (version == CANCEL_REQUEST_CODE)
+	{
+		fprintf(conn->Pfdebug, "CancelRequest\t");
+		pqTraceOutputInt32(conn->Pfdebug, message, &logCursor, false);
+		pqTraceOutputInt32(conn->Pfdebug, message, &logCursor, false);
+		pqTraceOutputInt32(conn->Pfdebug, message, &logCursor, false);
+	}
+	else if (version == NEGOTIATE_SSL_CODE)
+	{
+		fprintf(conn->Pfdebug, "SSLRequest\t");
+		pqTraceOutputInt32(conn->Pfdebug, message, &logCursor, false);
+	}
+	else if (version == NEGOTIATE_GSS_CODE)
+	{
+		fprintf(conn->Pfdebug, "GSSENCRequest\t");
+		pqTraceOutputInt32(conn->Pfdebug, message, &logCursor, false);
+	}
+	else
+	{
+		fprintf(conn->Pfdebug, "StartupMessage\t");
+		pqTraceOutputInt32(conn->Pfdebug, message, &logCursor, false);
+		while (message[logCursor] != '\0')
+		{
+			pqTraceOutputString(conn->Pfdebug, message, &logCursor, false);
+			pqTraceOutputString(conn->Pfdebug, message, &logCursor, false);
+		}
 	}
 
 	fputc('\n', conn->Pfdebug);

base-commit: 7b71e5bbccd6c86bc12ba0124e7282cfb3aa3226
-- 
2.34.1



Re: Test to dump and restore objects left behind by regression

2024-06-05 Thread Ashutosh Bapat
On Tue, Jun 4, 2024 at 4:28 AM Michael Paquier  wrote:

> On Fri, Apr 26, 2024 at 06:38:22PM +0530, Ashutosh Bapat wrote:
> > Some points for discussion:
> > 1. The test still hardcodes the diffs between two dumps. Haven't found a
> > better way to do it. I did consider removing the problematic objects from
> > the regression database but thought against it since we would lose some
> > coverage.
> >
> > 2. The new code tests dump and restore of just the regression database
> and
> > does not use pg_dumpall like pg_upgrade. Should it instead perform
> > pg_dumpall? I decided against it since a. we are interested in dumping
> and
> > restoring objects left behind by regression, b. I didn't find a way to
> > provide the format option to pg_dumpall. The test could be enhanced to
> use
> > different dump formats.
> >
> > I have added it to the next commitfest.
> > https://commitfest.postgresql.org/48/4956/
>
> Ashutosh and I have discussed this patch a bit last week.  Here is a
> short summary of my input, after I understood what is going on.
>
> +   # We could avoid this by dumping the database loaded from original
> dump.
> +   # But that would change the state of the objects as left behind by
> the
> +   # regression.
> +   my $expected_diff = " --
> + CREATE TABLE public.gtestxx_4 (
> +-b integer,
> +-a integer NOT NULL
> ++a integer NOT NULL,
> ++b integer
> + )
> [...]
> +   my ($stdout, $stderr) =
> +   run_command([ 'diff', '-u', $dump4_file, $dump5_file]);
> +   # Clear file names, line numbers from the diffs; those are not
> going to
> +   # remain the same always. Also clear empty lines and normalize new
> line
> +   # characters across platforms.
> +   $stdout =~ s/^\@\@.*$//mg;
> +   $stdout =~ s/^.*$dump4_file.*$//mg;
> +   $stdout =~ s/^.*$dump5_file.*$//mg;
> +   $stdout =~ s/^\s*\n//mg;
> +   $stdout =~ s/\r\n/\n/g;
> +   $expected_diff =~ s/\r\n/\n/g;
> +   is($stdout, $expected_diff, 'old and new dumps match after dump
> and restore');
> +}
>
> I am not a fan of what this patch does, adding the knowledge related
> to the dump filtering within 002_pg_upgrade.pl.  Please do not take
> me wrong, I am not against the idea of adding that within this
> pg_upgrade test to save from one full cycle of `make check` to check
> the consistency of the dump.  My issue is that this logic should be
> externalized, and it should be in fewer lines of code.


> For the externalization part, Ashutosh and I considered a few ideas,
> but one that we found tempting is to create a small .pm, say named
> AdjustDump.pm.  This would share some rules with the existing
> AdjustUpgrade.pm, which would be fine IMO even if there is a small
> overlap, documenting the dependency between each module.  That makes
> the integration with the buildfarm much simpler by not creating more
> dependencies with the modules shared between core and the buildfarm
> code.  For the "shorter" part, one idea that I had is to apply to the
> dump a regexp that wipes out the column definitions within the
> parenthesis, keeping around the CREATE TABLE and any other attributes
> not impacted by the reordering.  All that should be documented in the
> module, of course.
>

Thanks for the suggestion. I didn't understand the dependency with the
buildfarm module. Will the new module be used in buildfarm separately? I
will work on this soon. Thanks for changing CF entry to WoA.


>
> Another thing would be to improve the backend so as we are able to
> a better support for physical column ordering, which would, I assume
> (and correct me if I'm wrong!), prevent the reordering of the
> attributes like in this inheritance case.  But that would not address
> the case of dumps taken from older versions with a new version of
> pg_dump, which is something that may be interesting to have more tests
> for in the long-term.  Overall a module sounds like a better solution.
>

Changing the physical order of column of a child table based on the
inherited table seems intentional as per MergeAttributes(). That logic
looks sane by itself. In binary mode pg_dump works very hard to retain the
column order by issuing UPDATE commands against catalog tables. I don't
think mimicking that behaviour is the right choice for non-binary dump. I
agree with your conclusion that we fix it in by fixing the diffs. The code
to do that will be part of a separate module.

--
Best Wishes,
Ashutosh Bapat


Re: meson and check-tests

2024-06-05 Thread Ashutosh Bapat
On Mon, Jun 3, 2024 at 10:04 PM Tristan Partin  wrote:

> On Sun Jun 2, 2024 at 12:25 AM CDT, Tom Lane wrote:
> > "Tristan Partin"  writes:
> > > On Fri May 31, 2024 at 12:02 PM CDT, Ashutosh Bapat wrote:
> > >> We talked this off-list at the conference. It seems we have to somehow
> > >> avoid passing pg_regress --schedule argument and instead pass the
> list of
> > >> tests. Any idea how to do that?
> >
> > > I think there are 2 solutions to this.
> > > 1. Avoid passing --schedule by default, which doesn't sound like a
> great
> > >solution.
> > > 2. Teach pg_regress to ignore the --schedule option if specific tests
> > >are passed instead.
> > > 3. Add a --no-schedule option to pg_regress which would override the
> > >previously added --schedule option.
> > > I personally prefer 2 or 3.
>




> >
> > Just to refresh peoples' memory of what the Makefiles do:
> > src/test/regress/GNUmakefile has
> >
> > check: all
> >   $(pg_regress_check) $(REGRESS_OPTS)
> --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
> >
> > check-tests: all | temp-install
> >   $(pg_regress_check) $(REGRESS_OPTS) $(MAXCONNOPT) $(TESTS)
> $(EXTRA_TESTS)

>
> > (and parallel cases for installcheck etc).  AFAICS, meson.build has
> > no equivalent to the EXTRA_TESTS add-on, nor does it have behavior
> > equivalent to check-tests' substitution of $(TESTS) for --schedule.
> > But I suggest that those behaviors have stood for a long time and
> > so the appropriate thing to do is duplicate them as best we can,
> > not invent something different.
>
> In theory, this makes sense. In practice, this is hard to emulate. We
> could make the check-tests a Meson run_target() instead of another
> test(), which would end up running the same tests more than once.
>

meson has changed the way we run individual perl tests and that looks
better. So I am fine if meson provides a better way to do what `make
check-tests` does. But changing pg_regress seems a wrong choice or even
making changes to the current make system. Instead we should make meson
pass the right arguments to pg_regress. In this case, it should not pass
--schedule when we need `make check-tests` like functionality.

Just adding check-tests as new target won't help we need some way to
specify "which tests" to run. Thus by default this target should not run
any tests? I don't understand meson well. So I might be completely wrong?

How about the following options?
1. TESTS="..." meson test --suite regress - would run the specified tests
from regress

2. Make `meson test --suite regress / regress/partition_join` run
partition_join.sql test. I am not how to specify multiple tests in this
command. May be `meson test --suite regress /
regress/test_setup,partition_join` will do that. make check-tests allows
one to run multiple tests like TESTS="test_setup partition_join" make
check-tests.

-- 
Best Wishes,
Ashutosh Bapat


Re: apply_scanjoin_target_to_paths and partitionwise join

2024-06-05 Thread Ashutosh Bapat
On Tue, May 28, 2024 at 7:13 AM  wrote:

> 1. I think the order by pk frac limit plans had just to similar
> performance behaviour for me to bother.
> But afaics the main point of your proposal is not related to frac plans
> at all.
>

Right.


> 2. We can't expect the optimizers to simply yield better results by
> being given more options to be wrong. (Let me give a simple example:

This patch makes our lack of cross table cross column statistics worse.
> We give it more opportunity to pick something horrible.
>

I don't see the connection between cross column statistics and this bug I
am fixing. Can  you please elaborate?


> 3. I dislike, that this patch makes much harder to debug, why no
> partitionwise join is chosen.
>
Can you please elaborate more? How does my change make debugging harder?

-- 
Best Wishes,
Ashutosh Bapat


Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

2024-06-05 Thread Andrey M. Borodin



> On 4 Jun 2024, at 00:26, Masahiko Sawada  wrote:

Thank you! Vacuum enhancement is a really good step forward, and this small 
change would help a lot of observability tools.


> On 4 Jun 2024, at 00:49, Peter Geoghegan  wrote:
> 
> Can we rename this to num_dead_item_ids (or something similar) in
> passing? 

I do not insist, but many tools will have to adapt to this change [0,1]. 
However, most of tools will have to deal with removed max_dead_tuples anyway 
[2], so this is not that big problem.


Best regards, Andrey Borodin.

[0] 
https://github.com/jalexandre0/pgmetrics/blob/61cb150f6d1e535513a6a07fc0c6d751fbed517e/collector/collect.go#L1289
[1] 
https://github.com/DataDog/integrations-core/blob/250553f3b91d25de28add93afeb929e2abf67e53/postgres/datadog_checks/postgres/util.py#L517
[2] https://github.com/search?q=num_dead_tuples+language%3AGo&type=code



Re: Logical Replication of sequences

2024-06-05 Thread Bharath Rupireddy
Hi,

On Tue, Jun 4, 2024 at 5:40 PM Amit Kapila  wrote:
>
> Even if we decode it periodically (say each time we decode the
> checkpoint record) then also we need to send the entire set of
> sequences at shutdown. This is because the sequences may have changed
> from the last time we sent them.

Agree. How about decoding and sending only the sequences that are
changed from the last time when they were sent? I know it requires a
bit of tracking and more work, but all I'm looking for is to reduce
the amount of work that walsenders need to do during the shutdown.

Having said that, I like the idea of letting the user sync the
sequences via ALTER SUBSCRIPTION command and not weave the logic into
the shutdown checkpoint path. As Peter Eisentraut said here
https://www.postgresql.org/message-id/42e5cb35-4aeb-4f58-8091-90619c7c3ecc%40eisentraut.org,
this can be a good starting point to get going.

> > I can imagine a case where there are tens
> > of thousands of sequences in a production server, and surely decoding
> > and sending them just during the shutdown can take a lot of time
> > hampering the overall server uptime.
>
> It is possible but we will send only the sequences that belong to
> publications for which walsender is supposed to send the required
> data.

Right, but what if all the publication tables can have tens of
thousands of sequences.

> Now, we can also imagine providing option 2 (Alter Subscription
> ... Replicate Sequences) so that users can replicate sequences before
> shutdown and then disable the subscriptions so that there won't be a
> corresponding walsender.

As stated above, I like this idea to start with.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Wrong results with grouping sets

2024-06-05 Thread Richard Guo
On Fri, May 24, 2024 at 9:08 PM Richard Guo  wrote:
> On the basis of the parser infrastructure fixup, 0002 patch adds the
> nullingrel bit that references the grouping RTE to the grouping
> expressions.

I found a bug in the v6 patch.  The following query would trigger the
Assert in make_restrictinfo that the given subexpression should not be
an AND clause.

select max(a) from t group by a > b and a = b having a > b and a = b;

This is because the expression 'a > b and a = b' in the HAVING clause is
replaced by a Var that references the GROUP RTE.  When we preprocess the
columns of the GROUP RTE, we do not know whether the grouped expression
is a havingQual or not, so we do not perform make_ands_implicit for it.
As a result, after we replace the group Var in the HAVING clause with
the underlying grouping expression, we will have a havingQual that is an
AND clause.

As we know, in the planner we need to first preprocess all the columns
of the GROUP RTE.  We also need to replace any Vars in the targetlist
and HAVING clause that reference the GROUP RTE with the underlying
grouping expressions.  To fix the mentioned issue, I choose the perform
this replacement before we preprocess the targetlist and havingQual, so
that the make_ands_implicit would be performed when we preprocess the
havingQual.

One problem with this is, when we preprocess the targetlist and
havingQual, we would see already-planned tree, which is generated by the
preprocessing work for the grouping expressions and then substituted for
the GROUP Vars in the targetlist and havingQual.  This would break the
Assert 'Assert(!IsA(node, SubPlan))' in flatten_join_alias_vars_mutator
and process_sublinks_mutator.  I think we can just return the
already-planned tree unchanged when we see it in the preprocessing
process.

Hence here is the v7 patchset.  I've also added detailed commit messages
for the two patches.

Thanks
Richard


v7-0001-Introduce-a-RTE-for-the-grouping-step.patch
Description: Binary data


v7-0002-Mark-expressions-nullable-by-grouping-sets.patch
Description: Binary data


Re: Fix grammar oddities in comments

2024-06-05 Thread David Rowley
On Sun, 2 Jun 2024 at 10:08, James Coleman  wrote:
> See attached for a small patch fixing some typos and grammatical
> errors in a couple of comments.

Thanks. I pushed this after messing with the comments a bit more.

> Side note: It's not clear to me what "Vars of higher levels don't
> matter here" means in this context (or how that claim is justified),
> but I haven't changed that part of the comment opting to simply
> resolve the clear mistakes in the wording here.

It just means Vars with varlevelsup >= 2 don't matter. It only cares
about Vars with varlevelsup==1, i.e. Vars of the sub-query's direct
parent.

David




Re: meson "experimental"?

2024-06-05 Thread Dave Page
Hi

On Tue, 4 Jun 2024 at 17:36, Heikki Linnakangas  wrote:

> On 04/06/2024 18:28, Dave Page wrote:
> > I clearly missed the discussion in which it was decided to remove the
> > old MSVC++ support from the tree (and am disappointed about that as I
> > would have objected loudly). Having recently started testing Meson on
> > Windows when I realised that change had been made, and having to update
> > builds for pgAdmin and the Windows installers, I think it's clear it's
> > far more experimental on that platform than it is on Linux at least.
>
> What kind of issues did you run into? Have they been fixed since?
>

zlib detection (
https://www.postgresql.org/message-id/CA+OCxozrPZx57ue8rmhq6CD1Jic5uqKh80=vtpzurskesn-...@mail.gmail.com)
for which Nazir worked up an as-yet-uncommitted patch.

uuid detection, which Andrew fixed.

I also ran into an issue in which the build fails if both gssapi and
openssl are enabled, but it turns out that's a code issue that affects at
least v16 with the old build system as well.

-- 
Dave Page
pgAdmin: https://www.pgadmin.org
PostgreSQL: https://www.postgresql.org
EDB: https://www.enterprisedb.com


Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions

2024-06-05 Thread Ashutosh Sharma
On Fri, May 24, 2024 at 2:25 PM Ashutosh Bapat 
wrote:

>
>
> On Fri, May 24, 2024 at 12:52 PM Ashutosh Sharma 
> wrote:
>
>> Hi All,
>>
>> We all know that installing an extension typically requires superuser
>> privileges, which means the database objects it creates are owned by the
>> superuser.
>>
>> If the extension creates any SECURITY DEFINER functions, it can introduce
>> security vulnerabilities. For example, consider an extension that creates
>> the following functions, outer_func and inner_func, in the schema s1 when
>> installed:
>>
>> CREATE OR REPLACE FUNCTION s1.inner_func(data text)
>> RETURNS void AS $$
>> BEGIN
>> INSERT INTO tab1(data_column) VALUES (data);
>> END;
>> $$ LANGUAGE plpgsql;
>>
>> CREATE OR REPLACE FUNCTION s1.outer_func(data text)
>> RETURNS void AS $$
>> BEGIN
>> PERFORM inner_func(data);
>> END;
>> $$ LANGUAGE plpgsql SECURITY DEFINER;
>>
>> If a regular user creates another function with name "inner_func" with
>> the same signature in the public schema and sets the search path to public,
>> s1, the function created by the regular user in the public schema takes
>> precedence when outer_func is called. Since outer_func is marked as
>> SECURITY DEFINER, the inner_func created by the user in the public schema
>> is executed with superuser privileges. This allows the execution of any
>> statements within the function block, leading to potential security issues.
>>
>> To address this problem, one potential solution is to adjust the function
>> resolution logic. For instance, if the caller function belongs to a
>> specific schema, functions in the same schema should be given preference.
>> Although I haven’t reviewed the feasibility in the code, this is one
>> possible approach.
>>
>> Another solution could be to categorize extension-created functions to
>> avoid duplication. This might not be an ideal solution, but it's another
>> consideration worth sharing.
>>
>>
> Function call should schema qualify it. That's painful, but it can be
> avoided by setting a search path from inside the function. There was some
> discussion about setting a search path for a function at [1]. But the last
> message there is non-conclusive. We may want to extend it to extensions
> such that all the object references in a given extension are resolved using
> extension specific search_path.
>
> [1]
> https://www.postgresql.org/message-id/2710f56add351a1ed553efb677408e51b060e67c.camel%40j-davis.com
>

Thank you, Ashutosh, for the quick response. I've drafted a patch aimed at
addressing this issue. The patch attempts to solve this issue by
configuring the search_path for all security definer functions created by
the extension. It ensures they are set to trusted schemas, which includes
the schema where the extension and the function are created. PFA patch.

--
With Regards,
Ashutosh Sharma.


v1-0001-Ensure-security-definer-functions-created-by-extensi.patch
Description: Binary data


Re: Logical Replication of sequences

2024-06-05 Thread Amit Kapila
On Wed, Jun 5, 2024 at 9:13 AM Amit Kapila  wrote:
>
> On Tue, Jun 4, 2024 at 8:56 PM Yogesh Sharma
>  wrote:
> >
> > On 6/4/24 06:57, Amit Kapila wrote:
> >
> > > 2. Provide a command say Alter Subscription ...  Replicate Sequences
> > > (or something like that) which users can perform before shutdown of
> > > the publisher node during upgrade. This will allow copying all the
> > > sequences from the publisher node to the subscriber node directly.
> > > Similar to previous approach, this could also be inconvenient for
> > > users.
> >
> > This is similar to option 1 except that it is a SQL command now.
> >
>
> Right, but I would still prefer a command as it provides clear steps
> for the upgrade. Users need to perform (a) Replicate Sequences for a
> particular subscription (b) Disable that subscription (c) Perform (a)
> and (b) for all the subscriptions corresponding to the publisher we
> want to shut down for upgrade.
>

Another advantage of this approach over just a plain tool to copy all
sequences before upgrade is that here we can have the facility to copy
just the required sequences. I mean the set sequences that the user
has specified as part of the publication.

-- 
With Regards,
Amit Kapila.




Re:Re: [PATCH]A minor improvement to the error-report in SimpleLruWriteAll()

2024-06-05 Thread Long Song
Hi,
I modified the patch, kept the old call of SlruReportIOError()
outside the for-loop, and add a call of erport() with LOG level
when file-close failure occurs in the for-loop.

Please check whether this modification is appropriate
and let me know if there is any problem. Thank you.



At 2024-06-04 14:44:09, "Kyotaro Horiguchi"  wrote:
>At Tue, 28 May 2024 20:15:59 +0800 (CST), "Long Song"  
>wrote in 
>> 
>> Hi,
>> Actually, I still wonder why only the error message
>> of the last failure to close the file was recorded.
>> For this unusual situation, it is acceptable to
>> record all failure information without causing
>> too much logging.
>> Was it designed that way on purpose?
>
>Note that SlruReportIOError() causes a non-local exit. To me, the
>point of the loop seems to be that we want to close every single file,
>apart from the failed ones. From that perspective, the patch disrupts
>that intended behavior by exiting in the middle of the loop. It seems
>we didn't want to bother collecting errors for every failed file in
>that part.
> 
>regards.
>
>-- 
>Kyotaro Horiguchi
>NTT Open Source Software Center












--
Best Regards,

Long


v2-0001-Fix-error-report-missing-in-SimpleLruWriteAll.patch
Description: Binary data


Should we move the resowner field from JitContext to LLVMJitContext?

2024-06-05 Thread Andreas Karlsson

Hi,

I am implementing my own JIT plugin (based on Cranelift) for PostgreSQL 
to learn more about the JIT and noticed an API change in PostgreSQL 17.


When Heikki made the resource owners extensible in commit 
b8bff07daa85c837a2747b4d35cd5a27e73fb7b2 the API for JIT plugins changed 
when ResourceOwnerForgetJIT() was moved from the generic JIT code to the 
LLVM specific JIT code so now the resowner field of the context is only 
used by the code of the LLVM plugin.


Maybe a bit late in the release cycle but should we make the resowner 
field specific to the LLVM code too now that we already are breaking the 
API? I personally do not like having a LLVM JIT specific field in the 
common struct. Code is easier to understand if things are local. Granted 
most JIT engines will likely need similar infrastructure but just 
providing the struct field and nothing else does not seem very helpful.


See the attached patch.

AndreasFrom e3b55e00aee578a46298447463e0984aa3a230f7 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson 
Date: Wed, 5 Jun 2024 10:13:23 +0200
Subject: [PATCH] Move resowner from common JitContext to LLVM specific

Only the LLVM specific code uses it since resource owners were made
extensible in commit b8bff07daa85c837a2747b4d35cd5a27e73fb7b2.
---
 src/backend/jit/llvm/llvmjit.c | 10 +-
 src/include/jit/jit.h  |  2 --
 src/include/jit/llvmjit.h  |  3 +++
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 1d439f2455..abfe444c7e 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -249,7 +249,7 @@ llvm_create_context(int jitFlags)
 	context->base.flags = jitFlags;
 
 	/* ensure cleanup */
-	context->base.resowner = CurrentResourceOwner;
+	context->resowner = CurrentResourceOwner;
 	ResourceOwnerRememberJIT(CurrentResourceOwner, context);
 
 	llvm_jit_context_in_use_count++;
@@ -323,8 +323,8 @@ llvm_release_context(JitContext *context)
 
 	llvm_leave_fatal_on_oom();
 
-	if (context->resowner)
-		ResourceOwnerForgetJIT(context->resowner, llvm_jit_context);
+	if (llvm_jit_context->resowner)
+		ResourceOwnerForgetJIT(llvm_jit_context->resowner, llvm_jit_context);
 }
 
 /*
@@ -1372,8 +1372,8 @@ llvm_error_message(LLVMErrorRef error)
 static void
 ResOwnerReleaseJitContext(Datum res)
 {
-	JitContext *context = (JitContext *) DatumGetPointer(res);
+	LLVMJitContext *context = (LLVMJitContext *) DatumGetPointer(res);
 
 	context->resowner = NULL;
-	jit_release_context(context);
+	jit_release_context(&context->base);
 }
diff --git a/src/include/jit/jit.h b/src/include/jit/jit.h
index b7a1eed281..d9a080ce98 100644
--- a/src/include/jit/jit.h
+++ b/src/include/jit/jit.h
@@ -59,8 +59,6 @@ typedef struct JitContext
 	/* see PGJIT_* above */
 	int			flags;
 
-	ResourceOwner resowner;
-
 	JitInstrumentation instr;
 } JitContext;
 
diff --git a/src/include/jit/llvmjit.h b/src/include/jit/llvmjit.h
index 9d9db80662..420775b189 100644
--- a/src/include/jit/llvmjit.h
+++ b/src/include/jit/llvmjit.h
@@ -39,6 +39,9 @@ typedef struct LLVMJitContext
 {
 	JitContext	base;
 
+	/* used to ensure cleanup of context */
+	ResourceOwner resowner;
+
 	/* number of modules created */
 	size_t		module_generation;
 
-- 
2.43.0



Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2024-06-05 Thread Ashutosh Bapat
Hi David,
Thanks for looking into this.

On Fri, May 31, 2024 at 2:19 AM David Christensen 
wrote:

> Hello,
>
> I am looking through some outstanding CommitFest entries; I wonder if
> this particular patch is already effectively fixed by 5278d0a2, which
> is both attributed to the original author as well as an extremely
> similar approach.  Can this entry
> (https://commitfest.postgresql.org/48/4553/) be closed?
>

This is different. But it needs a rebase. PFA rebased patch. The revised
commit message explains the change better.

Here are numbers revised after 5278d0a2. Since the code changes only affect
partitionwise join code path, I am reporting only partition wise join
numbers. The first column reports the number of joining relations, each
having 1000 partitions. Rest of the column names are self-explanatory.

Planning memory used:
 num_joins | master  | patched | memory saved | memory saved
---+-+-+--+
 2 | 31 MB   | 30 MB   | 525 kB   |   1.68%
 3 | 111 MB  | 107 MB  | 4588 kB  |   4.03%
 4 | 339 MB  | 321 MB  | 17 MB|   5.13%
 5 | 1062 MB | 967 MB  | 95 MB|   8.96%

Here's planning time measurements.
 num_joins | master (ms) | patched (ms) | change in planning time (ms) |
change in planning time
---+-+--+--+---
 2 |  187.86 |   177.27 |10.59 |
   5.64%
 3 |  721.81 |   758.80 |   -36.99 |
  -5.12%
 4 | 2239.87 |  2236.19 | 3.68 |
   0.16%
 5 | 6830.86 |  7027.76 |  -196.90 |
  -2.88%
The patch calls find_appinfos_by_relids() only once (instead of twice),
calls bms_union() on child relids only once, allocates and deallocates
appinfos array only once saving some CPU cycles but spends some CPU cycles
in bms_free(). There's expected to be a net small saving in planning time.
But above numbers show planning time increases in some cases and decreases
in some cases. Repeating the experiment multiple times does not show a
stable increase or decrease. But the variations all being within noise
levels. We could conclude that the patch does not affect planning time
adversely.

The scripts I used originally [1] need a revision too since EXPLAIN MEMORY
output has changed. PFA the scripts. setup.sql creates the required
functions and tables. queries.sql EXPLAINs queries with different number of
joining relations collecting planning time and memory numbers in a table
(msmts). We need to change psql variable code to 'patched' or 'master' when
using master + patches or master branch respectively. Following queries are
used to report above measurements from msmt.

planning memory: select p.num_joins, pg_size_pretty(m.average * 1024)
"master", pg_size_pretty(p.average * 1024) as "patched",
pg_size_pretty((m.average - p.average) * 1024) "memory saved", ((m.average
- p.average) * 100/m.average)::numeric(42, 2) "memory saved in percentage"
from msmts p, msmts m where p.num_joins = m.num_joins and p.variant =
m.variant and p.code = 'patched' and m.code = 'master' and p.measurement =
m.measurement and p.measurement = 'planning memory' and p.variant = 'pwj'
order by num_joins;

planning time: select p.num_joins, m.average "master (ms)", p.average
"patched (ms)", m.average - p.average "change in planning time (ms)",
((m.average - p.
average) * 100/m.average)::numeric(42, 2) "change in planning time as
percentage" from msmts p, msmts m where p.num_joins = m.num_joins and
p.variant = m.var
iant and p.code = 'patched' and m.code = 'master' and p.measurement =
m.measurement and p.measurement = 'planning time' and p.variant = 'pwj'
order by p.vari
ant, num_joins;

[1]
https://www.postgresql.org/message-id/CAExHW5snUW7pD2RdtaBa1T_TqJYaY6W_YPVjWDrgSf33i-0uqA%40mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat


setup.sql
Description: application/sql


queries.sql
Description: application/sql
From 4a37f580806a388e5d78e452426b99f738a583e5 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Wed, 26 Jul 2023 12:08:55 +0530
Subject: [PATCH] Avoid repeated computations in try_partitionwise_join() and
 minions

try_partitionwise_join() computes bms_union() of Bitmapsets representing
joining child relations and fetches AppendRelInfos corresponding child
base relations participating in the join. The same computation is
repeated in build_child_join_rel(). Avoid this repeated computation by
performing it only once in try_partitionwise_join() and passing the
AppendRelInfos gathered there to build_child_join_rel().

Bitmapsets representing child relids consume large memory when thousands
of parti

Re: Logical Replication of sequences

2024-06-05 Thread Peter Eisentraut

On 04.06.24 12:57, Amit Kapila wrote:

2. Provide a command say Alter Subscription ...  Replicate Sequences
(or something like that) which users can perform before shutdown of
the publisher node during upgrade. This will allow copying all the
sequences from the publisher node to the subscriber node directly.
Similar to previous approach, this could also be inconvenient for
users.


I would start with this.  In any case, you're going to need to write 
code to collect all the sequence values, send them over some protocol, 
apply them on the subscriber.  The easiest way to start is to trigger 
that manually.  Then later you can add other ways to trigger it, either 
by timer or around shutdown, or whatever other ideas there might be.





Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-06-05 Thread Bharath Rupireddy
Hi,

On Thu, May 16, 2024 at 5:01 AM Jeff Davis  wrote:
>
> The flushing behavior is entirely controlled by the table AM. The heap
> can use the same flushing logic that it did before, which is to hold
> 1000 tuples.
>
> I like that it's accounting for memory, too, but it doesn't need to be
> overly restrictive. Why not just use work_mem? That should hold 1000
> reasonably-sized tuples, plus overhead.
>
> Even better would be if we could take into account partitioning. That
> might be out of scope for your current work, but it would be very
> useful. We could have a couple new GUCs like modify_table_buffer and
> modify_table_buffer_per_partition or something like that.

I disagree with inventing more GUCs. Instead, I'd vote for just
holding 1000 tuples in buffers for heap AM. This not only keeps the
code and new table AM simple, but also does not cause regression for
COPY. In my testing, 1000 tuples with 1 int and 1 float columns took
4 bytes of memory (40 bytes each tuple), whereas with 1 int, 1
float and 1 text columns took 172000 bytes of memory (172 bytes each
tuple) bytes which IMO mustn't be a big problem. Thoughts?

> > 1. Try to get the actual tuple sizes excluding header sizes for each
> > column in the new TAM.
>
> I don't see the point in arbitrarily excluding the header.
>
> > v21 also adds code to maintain tuple size for virtual tuple slots.
> > This helps make better memory-based flushing decisions in the new
> > TAM.
>
> That seems wrong. We shouldn't need to change the TupleTableSlot
> structure for this patch.

I dropped these ideas as I went ahead with the above idea of just
holding 1000 tuples in buffers for heap AM.

> Comments on v21:
>
> * All callers specify TM_FLAG_MULTI_INSERTS. What's the purpose?

Previously, the multi insert state was initialized in modify_begin, so
it was then required to differentiate the code path. But, it's not
needed anymore with the lazy initialization of the multi insert state
moved to modify_buffer_insert. I removed it.

> * The only caller that doesn't use TM_FLAG_BAS_BULKWRITE is
> ExecInsert(). What's the disadvantage to using a bulk insert state
> there?

The subsequent read queries will not find the just-now-inserted tuples
in shared buffers as a separate ring buffer is used with bulk insert
access strategy. The multi inserts is nothing but buffering multiple
tuples plus inserting in bulk. So using the bulk insert strategy might
be worth it for INSERT INTO SELECTs too. Thoughts?

> * I'm a bit confused by TableModifyState->modify_end_callback. The AM
> both sets the callback and calls the callback -- why can't the code
> just go into the table_modify_end method?

I came up with modify_end_callback as per the discussion upthread to
use modify_begin, modify_end in future for UPDATE, DELETE and MERGE,
and not use any operation specific flags to clean the state
appropriately. The operation specific state cleaning logic can go to
the modify_end_callback implementation defined by the AM.

> * The code structure in table_modify_begin() (and related) is strange.
> Can it be simplified or am I missing something?

I previously defined these new table AMs as optional, check
GetTableAmRoutine(). And, there was a point upthread to provide
default/fallback implementation to help not fail insert operations on
tables without the new table AMs implemented. FWIW, the default
implementation was just doing the single inserts. The
table_modify_begin and friends need the logic to fallback making the
code there look different than other AMs. However, I now have a
feeling to drop the idea of having fallback implementation and let the
AMs deal with it. Although it might create some friction with various
non-core AM implementations, it keeps this patch simple which I would
vote for. Thoughts?

> * Why are table_modify_state and insert_modify_buffer_flush_context
> globals? What if there are multiple modify nodes in a plan?

Can you please provide the case that can generate multiple "modify
nodes" in a single plan? AFAICS, multiple "modify nodes" in a plan can
exist for both partitioned tables and tables that get created as part
of CTEs. I disabled multi inserts for both of these cases. The way I
disabled for CTEs looks pretty naive - I just did the following. Any
better suggestions here to deal with all such cases?

+if (operation == CMD_INSERT &&
+nodeTag(subplanstate) == T_SeqScanState)
+canMultiInsert = true;

> * Can you explain the design in logical rep?

Multi inserts for logical replication work at the table level. In
other words, all tuple inserts related to a single table within a
transaction are buffered and written to the corresponding table when
necessary. Whenever inserts pertaining to another table arrive, the
buffered tuples related to the previous table are written to the table
before starting the buffering for the new table. Also, the tuples are
written to the table from the buffer when there arrives a non-INSERT
operation

Re: Build with LTO / -flto on macOS

2024-06-05 Thread Wolfgang Walther

Peter Eisentraut:

On 04.06.24 18:41, Tom Lane wrote:

Relevant to this: I wonder what we think the supported macOS versions
are, anyway.  AFAICS, the buildfarm only covers current (Sonoma)
and current-1 (Ventura) major versions, and only the latest minor
versions in those OS branches.


For other OS lines I think we are settling on supporting what the OS 
vendor supports.  So for macOS at the moment this would be current, 
current-1, and current-2, per 
.


So I tested both HEAD and v12 on current and current-5, both successful. 
That should cover current-1 and current-2, too. If you want me to test 
any other macOS versions inbetween, or any other PG versions, I can do that.


I would really like to upstream those kind of patches and see them 
backpatched - otherwise we need to carry around those patches for up to 
5 years in the distros. And in light of the discussion in [1] my goal is 
to reduce the number of patches carried to a minimum. Yes - those 
patches are simple enough - but the more patches you have, the less 
likely you are going to spot a malicious patch inbetween.


Best,

Wolfgang

[1]: https://postgr.es/m/flat/ZgdCpFThi9ODcCsJ%40momjian.us