date:20250423

Re: Add Pipelining support in psql

2025-04-23 Thread Michael Paquier

On Tue, Apr 22, 2025 at 02:37:19PM +0200, Anthonin Bonnefoy wrote:
> Also, we still have a triggered assertion failure with the following:
> CREATE TABLE psql_pipeline(a text);
> \startpipeline
> COPY psql_pipeline FROM STDIN;
> SELECT 'val1';
> \syncpipeline
> \endpipeline
> ...
> Assertion failed: (pset.piped_syncs == 0), function
> ExecQueryAndProcessResults, file common.c, line 2153.

Right.  I didn't think about the case of a \endpipeline that fetches
all the results by itself.

> A possible alternative could be to abort discardAbortedPipelineResults
> when we encounter a res != NULL + FATAL error and let the outer loop
> handle it. As you said, the pipeline flow is borked so there's not
> much to salvage. The outer loop would read and print all error
> messages until the closed connection is detected. Then,
> CheckConnection will reset the connection which will reset the
> pipeline state.

Sounds like a better idea seen from here, yes.

> While testing this change, I was initially looking for the "FATAL:
> terminating connection because protocol synchronization was lost"
> message in the tests. However, this was failing on Windows[1] as the
> FATAL message wasn't reported on stderr. I'm not sure why yet.

Hmm.  I vaguely recall that there could be some race condition here
with the attempt to catch up the FATAL message once the server tries
to shut down the connection..

Anyway, I agree that it would be nice to track that this specific
error message is generated in the server.  How about checking the
server logs instead, using a slurp_file() with an offset of the log
file before running each pipeline sequence?  We should use a few
wait_for_log() calls, I think, to be extra careful with the timings
where psql_fails_like() gives up, and I'm worried that this could be
unstable on slow machines.  Something like the attached seems stable
enough here.  What do you think?

The tweak for psql_fails_like() was kind of independent of the rest of
the fix, so I have applied that as a commit of its own.  I am not
convinced about the addition of a 4th test where we use the queries
with semicolons without a \getresults between the sync and the end.
--
Michael
From 2641356ffacc0c69779a6252a8072f41521c0f3f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 23 Apr 2025 15:52:24 +0900
Subject: [PATCH v3] psql: Fix assertion failure with pipeline mode

---
 src/bin/psql/common.c   | 17 +
 src/bin/psql/t/001_basic.pl | 49 +
 2 files changed, 66 insertions(+)

diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 21d660a8961a..0aab02ee32e6 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1478,6 +1478,23 @@ discardAbortedPipelineResults(void)
 			 */
 			return res;
 		}
+		else if (res != NULL && result_status == PGRES_FATAL_ERROR)
+		{
+			/*
+			 * We have a fatal error sent by the backend and we can't recover
+			 * from this state. Instead, return the last fatal error and let
+			 * the outer loop handle it.
+			 */
+			PGresult   *fatal_res PG_USED_FOR_ASSERTS_ONLY;
+
+			/*
+			 * Fetch result to consume the end of the current query being
+			 * processed.
+			 */
+			fatal_res = PQgetResult(pset.db);
+			Assert(fatal_res == NULL);
+			return res;
+		}
 		else if (res == NULL)
 		{
 			/* A query was processed, decrement the counters */
diff --git a/src/bin/psql/t/001_basic.pl b/src/bin/psql/t/001_basic.pl
index b0e4919d4d71..3cada3ba959b 100644
--- a/src/bin/psql/t/001_basic.pl
+++ b/src/bin/psql/t/001_basic.pl
@@ -483,4 +483,53 @@ psql_like($node, "copy (values ('foo'),('bar')) to stdout \\g | $pipe_cmd",
 my $c4 = slurp_file($g_file);
 like($c4, qr/foo.*bar/s);
 
+# Tests with pipelines.  These trigger FATAL failures in the backend,
+# so they cannot be tested through the SQL regression tests.
+$node->safe_psql('postgres', 'CREATE TABLE psql_pipeline()');
+my $log_location = -s $node->logfile;
+psql_fails_like(
+	$node,
+	qq{\\startpipeline
+COPY psql_pipeline FROM STDIN;
+SELECT 'val1';
+\\syncpipeline
+\\getresults
+\\endpipeline},
+	qr/server closed the connection unexpectedly/,
+	'protocol sync loss in pipeline: direct COPY, SELECT, sync and getresult'
+);
+$node->wait_for_log(
+	qr/FATAL: .*terminating connection because protocol synchronization was lost/,
+	$log_location);
+
+$log_location = -s $node->logfile;
+psql_fails_like(
+	$node,
+	qq{\\startpipeline
+COPY psql_pipeline FROM STDIN \\bind \\sendpipeline
+SELECT 'val1' \\bind \\sendpipeline
+\\syncpipeline
+\\getresults
+\\endpipeline},
+	qr/server closed the connection unexpectedly/,
+	'protocol sync loss in pipeline: bind COPY, SELECT, sync and getresult');
+$node->wait_for_log(
+	qr/FATAL: .*terminating connection because protocol synchronization was lost/,
+	$log_location);
+
+# This time, test without the \getresults.
+$log_location = -s $node->logfile;
+psql_fails_like(
+	$node,
+	qq{\\startpipeline
+COPY psql_pipeline FROM STDIN;
+SELECT 'val1';
+\\syncpipeli

Re: [BUG] temporary file usage report with extended protocol and unnamed portals

2025-04-23 Thread Frédéric Yhuel





On 4/22/25 19:37, Sami Imseih wrote:

the patch relies on looking up queryDesc->sourceText inside DropPortal,
which Tom raised concerns about earlier in the thread [0]


Yes, I think I had misunderstood what Tom said. Thank you for pointing 
that out.


However, is it really unsafe?

In exec_bind_message, the portal's query string comes from a duplicate 
of the original string (see CreateCachedPlan). So we are safe in this case.


In exec_simple_query, the portal is dropped towards the end of this 
function, so we are safe here too.


Am I missing something?

Re: Conflicting updates of command progress

2025-04-23 Thread Antonin Houska

Fujii Masao  wrote:

> On 2025/04/15 2:13, Sami Imseih wrote:
> >> While working on [1] I realized that some field of pg_stat_progress_cluste 
> >> has
> >> weird value.
> 
> I ran into the same issue while working on [2], and eventually had to
> withdraw that patch because of it.

Have you considered reporting the progress of each command separately?

I think that can be implemented by moving the progress related fields from
PgBackendStatus into a new structure and by teaching the backend to insert a
new instance of that structure into a shared hash table (dshash.c) whenever a
command that needs progress reporting is being started.

pgstat.c uses such a hash table for various statistics objects, however it's
probably not a good idea to adjust this infrastructure by simply adding a new
object kind (e.g. PGSTAT_KIND_COMMAND). I think that pgstat.c is designed for
frequent updates of backend-local statistics and less frequent flushes
(e.g. at command completion) to the shared memory. That's not suitable for
progress reporting.

> [2] https://commitfest.postgresql.org/patch/5282/

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Avoid core dump in pgstat_read_statsfile()

2025-04-23 Thread Michael Paquier

On Wed, Apr 23, 2025 at 08:01:40AM +, Bertrand Drouvot wrote:
> I think that a check on pgstat_get_kind_info() is missing for this scenario, 
> the
> patch adds it. Such a check already exists for PGSTAT_FILE_ENTRY_FIXED and
> for stats entry identified by name on disk, but not for 
> PGSTAT_FILE_ENTRY_HASH.

It is, indeed.  This can be reproduced with what's in core, just
disable pgstat_register_inj_fixed() to bypass the check for fixed
entries that would be read first if the module is removed from
shared_preload_libraries in injection_points.c and we are good to go.

I can see that you have copy-pasted the elog from the two other entry
types.  This is incomplete here because we are dealing with an entry
in the dshash and we know its full key already: kind, database OID,
object ID.  And this can provide more information to help with the
debugging.  I have added this information, and applied the result.

> The v18 existing checks mentioned above as well as the new check were missing
> in pre-18 but I don't think it's worth a back-patch as the issue is unlikely 
> to
> occur without custom stats. Adding a v18 open item then.

The check based on the kind ID should be enough, because this ensures
that we'll have the PgStat_KindInfo to rely on.  So stable branches
are OK as they are.
--
Michael

signature.asc
Description: PGP signature

Re: [PATCH] dynahash: add memory allocation failure check

2025-04-23 Thread Tom Lane

m.korot...@postgrespro.ru writes:
> I found a case of potential NULL pointer dereference.
> In src/backend/utils/hash/dynahash.c in function HTAB *hash_create() the 
> result of the DynaHashAlloc() is used unsafely.
> The function DynaHashAlloc() calls MemoryContextAllocExtended() with 
> MCXT_ALLOC_NO_OOM and can return a NULL pointer.

Ugh, that's a stupid bug.  Evidently my fault, too (9c911ec06).

> Added the pointer check for avoiding a potential problem.

This doesn't seem like a nice way to fix it.  The code right there is
assuming palloc semantics, which was okay before 9c911ec06, but is so
no longer.  I think the right thing is to put it back to palloc
semantics, which means not using DynaHashAlloc there, as attached.

regards, tom lane

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 3f25929f2d8..1ad155d446e 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -390,7 +390,8 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
 	}
 
 	/* Initialize the hash header, plus a copy of the table name */
-	hashp = (HTAB *) DynaHashAlloc(sizeof(HTAB) + strlen(tabname) + 1);
+	hashp = (HTAB *) MemoryContextAlloc(CurrentDynaHashCxt,
+		sizeof(HTAB) + strlen(tabname) + 1);
 	MemSet(hashp, 0, sizeof(HTAB));
 
 	hashp->tabname = (char *) (hashp + 1);

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-23 Thread Jacob Champion

On Wed, Apr 23, 2025 at 9:38 AM Christoph Berg  wrote:
> > We could all agree to bump the second number in the filename whenever
> > there's an internal ABI change. That works from a technical
> > perspective, but it's hard to test and enforce and... just not forget.
>
> It's hopefully not harder than checking ABI compatibility of any other
> libpq change, just a different number. If that number is in the
> meson.build in the same directory, people should be able to connect
> the dots.

I think it is harder, simply because no one has to do it today, and
that change would sign them up to do it, forever, adding to the
backport checklist. It's one thing if there's a bunch of committers
who pile into the thread right now saying "yes, that's okay", but I
don't really feel comfortable making that decision for them right this
instant.

If we had robust ABI compatibility checks as part of the farm [1], I
think we could do that. Doesn't feel like an 18 thing, though.

> Btw, if we have that number, we might as well drop the MAJOR part as
> well... apt.pg.o is always shipping the latest libpq5, so major libpq
> upgrades while apps are running are going to happen. (But this is just
> once a year and much less problematic than minor upgrades and I'm not
> going to complain if MAJOR is kept.)

I don't want to introduce another testing matrix dimension if I can
avoid it. ("I have this bug where libpq.so.5.18 is using
libpq-oauth.so from PG20 and I had no idea it was doing that and the
problem went away when I restarted and...")

And the intent is for this to be temporary until we have a user-facing
API. If this is the solution we go with, I think it'd wise to prepare
for a -19 version of libpq-oauth, but I'm going to try my best to get
custom modules in ASAP. People are going to be annoyed that v1 of the
feature doesn't let them swap the flow for our utilities. Ideally they
only have to deal with that for a single major release.

Also: since the libpq-oauth-18 and libpq-oauth-19 packages can be
installed side-by-side safely, isn't the upgrade hazard significantly
diminished? (If a user uninstalls the previous libpq-oauth version
while they're still running that version of libpq in memory, _and_
they've somehow never used OAuth until right that instant... it's easy
enough for them to undo their mistake while the application is still
running.)

> > Or, I may still be able to thread the needle with a fuller lookup
> > table, and remove the dependency on libpq-int.h; it's just not going
> > to be incredibly pretty. Thinking...
>
> Don't overdesign it...

Oh, I agree... but my "minimal" ABI designs have all had corner cases
so far. I may need to just bite the bullet.

Are there any readers who feel like an internal ABI version for
`struct pg_conn`, bumped during breaking backports, would be
acceptable? (More definitively: are there any readers who would veto
that?) You're still signing up for delayed errors in the long-lived
client case, so it's not a magic bullet, but the breakage is easy to
see and it's not a crash. The client application "just" has to restart
after a libpq upgrade.

Thanks,
--Jacob

[1] 
https://www.postgresql.org/message-id/B142EE8A-5D38-48B9-A4BB-82D69A854B55%40justatheory.com

Re: Add Pipelining support in psql

2025-04-23 Thread Michael Paquier

On Wed, Apr 23, 2025 at 04:13:14PM +0900, Michael Paquier wrote:
> The tweak for psql_fails_like() was kind of independent of the rest of
> the fix, so I have applied that as a commit of its own.  I am not
> convinced about the addition of a 4th test where we use the queries
> with semicolons without a \getresults between the sync and the end.

I've had some room to look again at all that this morning as this is
an open item, and I have applied the fix.  One tweak was in the tests,
where I have added only one wait_for_log() which should offer enough
coverage.

If you notice anything else, please let me know.
--
Michael

signature.asc
Description: PGP signature

Non-reproducible AIO failure

2025-04-23 Thread Tom Lane

My buildfarm animal sifaka just failed like this [1]:

TRAP: failed Assert("aio_ret->result.status != PGAIO_RS_UNKNOWN"), File: 
"bufmgr.c", Line: 1605, PID: 79322
0   postgres0x000100e3df2c ExceptionalCondition 
+ 108
1   postgres0x000100cbb154 WaitReadBuffers + 616
2   postgres0x000100f072bc 
read_stream_next_buffer.cold.1 + 184
3   postgres0x000100cb6dd8 
read_stream_next_buffer + 300
4   postgres0x0001009b75b4 
heap_fetch_next_buffer + 136
5   postgres0x0001009ad8c4 heapgettup + 368
6   postgres0x0001009ad364 heap_getnext + 104
7   postgres0x0001009b9770 
heapam_index_build_range_scan + 700
8   postgres0x0001009ef4b8 spgbuild + 480
9   postgres0x000100a41acc index_build + 428
10  postgres0x000100a43d14 reindex_index + 752
11  postgres0x000100ad09e0 ExecReindex + 1908
12  postgres0x000100d04628 ProcessUtilitySlow + 
1300
13  postgres0x000100d03754 
standard_ProcessUtility + 1528
14  pg_stat_statements.dylib0x00010158d7a0 pgss_ProcessUtility 
+ 632
15  postgres0x000100d02cec PortalRunUtility + 
184
16  postgres0x000100d02354 PortalRunMulti + 236
17  postgres0x000100d01d94 PortalRun + 456
18  postgres0x000100d00e10 exec_simple_query + 
1308
...
2025-04-23 16:08:28.834 EDT [78788:4] LOG:  client backend (PID 79322) was 
terminated by signal 6: Abort trap: 6
2025-04-23 16:08:28.834 EDT [78788:5] DETAIL:  Failed process was running: 
reindex index spgist_point_idx;


Scraping the buildfarm database confirmed that there have been no
other similar failures since AIO went in, and unsurprisingly some
attempts to reproduce it manually didn't bear fruit.  So I'm
suspecting a race condition with a very narrow window, but I don't
have an idea where to look.  Any thoughts on how to gather more
data?

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=sifaka&dt=2025-04-23%2020%3A03%3A24&stg=recovery-check

Re: [PATCH] contrib/xml2: xslt_process() should report XSLT-related error details

2025-04-23 Thread Robin Haberkorn

On Wed Apr 23, 2025 at 00:42:06 GMT +03, Robin Haberkorn wrote:
> /*
>  * Legacy error handling mode.  err_occurred is never set, we just add the
>  * message to err_buf.  This mode exists because the xml2 contrib module
>  * uses our error-handling infrastructure, but we don't want to change its
>  * behaviour since it's deprecated anyway.  This is also why we don't
>  * distinguish between notices, warnings and errors here --- the old-style
>  * generic error handler wouldn't have done that either.
>  */

btw. you wrote that comment in cacd42d62cb2ddf32135b151f627780a5509780f
back in 2011.
The commit message also contains some background information.

-- 
Robin Haberkorn
Senior Software Engineer

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537

Re: Large expressions in indexes can't be stored (non-TOASTable)

2025-04-23 Thread Nathan Bossart

On Tue, Apr 22, 2025 at 12:21:01PM -0500, Nathan Bossart wrote:
> On Tue, Apr 22, 2025 at 05:17:17PM +0530, Nisha Moond wrote:
>> 5) As Euler pointed out, logical replication already has a built-in
>> restriction for internally assigned origin names, limiting them to
>> NAMEDATALEN. Given that, only the SQL function
>> pg_replication_origin_create() is capable of creating longer origin
>> names. Would it make sense to consider moving the check into
>> pg_replication_origin_create() itself, since it seems redundant for
>> all other callers?
>> That said, if there's a possibility of future callers needing this
>> restriction at a lower level, it may be reasonable to leave it as is.
> 
> pg_replication_origin_create() might be a better place.  After all, that's
> where we're enforcing the name doesn't start with "pg_" and, for testing,
> _does_ start with "regress_".  Plus, it seems highly unlikely that any
> other callers of replorigin_create() will ever provide names longer than
> 512 bytes.

Here is a new version of the patch with this change.

-- 
nathan
>From 1ca6f9c2a4a3c9917bb4846a77d0b7eaf936b8b8 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 23 Apr 2025 16:21:22 -0500
Subject: [PATCH v4 1/1] Remove pg_replication_origin's TOAST table.

Reviewed-by: Michael Paquier 
Reviewed-by: Amit Kapila 
Reviewed-by: Euler Taveira 
Reviewed-by: Nisha Moond 
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
---
 doc/src/sgml/func.sgml   |  1 +
 src/backend/catalog/catalog.c|  2 --
 src/backend/replication/logical/origin.c | 12 
 src/include/catalog/pg_replication_origin.h  |  2 --
 src/include/replication/origin.h |  7 +++
 src/test/regress/expected/misc_functions.out |  4 
 src/test/regress/expected/misc_sanity.out|  6 +-
 src/test/regress/sql/misc_functions.sql  |  3 +++
 src/test/regress/sql/misc_sanity.sql |  3 +++
 9 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 574a544d9fa..593e45495be 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29941,6 +29941,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * 
ps.setting::int + :offset

 Creates a replication origin with the given external
 name, and returns the internal ID assigned to it.
+The name must be no longer than 512 bytes.

   
 
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6bd0bc7..59caae8f1bc 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -346,8 +346,6 @@ IsSharedRelation(Oid relationId)
relationId == PgDbRoleSettingToastIndex ||
relationId == PgParameterAclToastTable ||
relationId == PgParameterAclToastIndex ||
-   relationId == PgReplicationOriginToastTable ||
-   relationId == PgReplicationOriginToastIndex ||
relationId == PgShdescriptionToastTable ||
relationId == PgShdescriptionToastIndex ||
relationId == PgShseclabelToastTable ||
diff --git a/src/backend/replication/logical/origin.c 
b/src/backend/replication/logical/origin.c
index 6583dd497da..e9629350cb0 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -1296,6 +1296,18 @@ pg_replication_origin_create(PG_FUNCTION_ARGS)
elog(WARNING, "replication origins created by regression test 
cases should have names starting with \"regress_\"");
 #endif
 
+   /*
+* To avoid needing a TOAST table for pg_replication_origin, we restrict
+* replication origin names to 512 bytes.  This should be more than 
enough
+* for all practical use.
+*/
+   if (strlen(name) > MAX_RONAME_LEN)
+   ereport(ERROR,
+   (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+errmsg("replication origin name is too long"),
+errdetail("Replication origin names must be no 
longer than %d bytes.",
+  MAX_RONAME_LEN)));
+
roident = replorigin_create(name);
 
pfree(name);
diff --git a/src/include/catalog/pg_replication_origin.h 
b/src/include/catalog/pg_replication_origin.h
index deb43065fe9..7ade8bbda39 100644
--- a/src/include/catalog/pg_replication_origin.h
+++ b/src/include/catalog/pg_replication_origin.h
@@ -54,8 +54,6 @@ 
CATALOG(pg_replication_origin,6000,ReplicationOriginRelationId) BKI_SHARED_RELAT
 
 typedef FormData_pg_replication_origin *Form_pg_replication_origin;
 
-DECLARE_TOAST_WITH_MACRO(pg_replication_origin, 4181, 4182, 
PgReplicationOriginToastTable, PgReplicationOriginToastIndex);
-
 DECLARE_UNIQUE_INDEX_PKEY(pg_replication_origin_roiident_index, 6001, 
ReplicationOriginIdentIndex, pg_replication_origin, btree(roident oi

Re: Large expressions in indexes can't be stored (non-TOASTable)

2025-04-23 Thread Tom Lane

Nathan Bossart  writes:
> Here is a new version of the patch with this change.

I don't see any comments in this patch that capture the real
reason for removing pg_replication_origin's TOAST table,
namely (IIUC) that we'd like to be able to access that catalog
without a snapshot.  I think it's important to record that
knowledge, because otherwise somebody is likely to think they
can undo this change for $whatever-reason.

regards, tom lane

Re: Large expressions in indexes can't be stored (non-TOASTable)

2025-04-23 Thread Nathan Bossart

On Wed, Apr 23, 2025 at 05:33:41PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> Here is a new version of the patch with this change.
> 
> I don't see any comments in this patch that capture the real
> reason for removing pg_replication_origin's TOAST table,
> namely (IIUC) that we'd like to be able to access that catalog
> without a snapshot.  I think it's important to record that
> knowledge, because otherwise somebody is likely to think they
> can undo this change for $whatever-reason.

D'oh, yes, I'd better add that.

-- 
nathan

Re: [PATCH] dynahash: add memory allocation failure check

2025-04-23 Thread m . korotkov


Andrey Borodin wrote 2025-04-23 12:46:
It seems there are a lot of cases of MCXT_ALLOC_NO_OOM, perhaps should 
we check them all?

Yep, I think we should.
I found this issue with the Svace static analyzer, and it might have 
missed other issues.

Perhaps a more comprehensive investigation is needed here.
---
Best regards, Maksim Korotkov

Re: Cygwin support

2025-04-23 Thread Andrew Dunstan




On 2025-04-21 Mo 12:29 PM, Andrew Dunstan wrote:


Last year the old Windows machine where I was running the buildfarm 
member lorikeet died, and since then we've had no buildfarm coverage 
for Cygwin. I now have a new (but slow) W11pro machine and I have been 
testing out Cygwin builds on it. I wanted to have it running the TAP 
tests, unlike lorikeet. Attached is a set of very small patches aimed 
at enabling this.


The first patch makes us use our getopt implementation, just like we 
do in Mingw. meson.build already has this, so this would just be 
bringing configure into line with that.


The second patch makes cygwin use the WIN32 pattern for psql's \watch 
command. Without that, the Unix style implementation hangs.


The third patch make Cygwin skip a permissions test in the SSL tests, 
just like we do elsewhere in Windows.


The fourth test ensures that we honor MAX_CONNECTIONS in a couple of 
places where we rerun the regression suite. MAX_CONNECTIONS was 
originally designed mainly for Cygwin, where too many concurrent 
connections cause issues.


The fifth patch disables one of the pgbench tests which is unstable on 
Cygwin.


There are still some issues, with the pg_dump, pg_upgrade, recovery 
and subscription test sets. But apart from that, with these patches I 
can consistently get a successful run.


My intention is to apply these soon, and backpatch them as 
appropriate. These are all pretty low risk. I'm going to be away for a 
while starting in a day or so, but I'd like to get a buildfarm animal 
going with these before I disappear.




Time has got the better of me, I won't be able to get back to this for a 
couple of months.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] Support older Pythons in oauth_server.py

2025-04-23 Thread Daniel Gustafsson

> On 23 Apr 2025, at 22:26, Jacob Champion  
> wrote:
> 
> On Tue, Apr 22, 2025 at 12:45 PM Jacob Champion
>  wrote:
>> Great, thanks for the quick review!
> 
> And pushed. Thanks everyone!

Congratulations on your first commit, looking forward to many more to come!

--
Daniel Gustafsson

Re: extension_control_path and "directory"

2025-04-23 Thread Matheus Alcantara

On Wed, Apr 23, 2025 at 10:57 AM David E. Wheeler  wrote:
>
> On Apr 23, 2025, at 09:50, Christoph Berg  wrote:
>
> > Remembering which path the .control file was found in and from there
> > open the extension "directory" doesn't sound too hard. Why does it
> > have to be more complicated?
>
> This was my question, as well. Do you have a WIP patch to share, Matheus?
>
I spent some time trying to implement this and somehow I got lost in the
changes I thought I would need to make to the "find_in_path" function
and others it calls, but reading these messages and looking at the code
again I think that the change is much simpler than I thought.

Attached is a draft patch that uses the path that the .control file was
found to search for the script files when the "directory" is set on the
.control file.

I've tested with the semver extension and it seems to work fine with
this patch. Can you please check on your side to see if it's also
working?

I still want to make some polish on this patch and also include some
more test cases using the "directory" on the .control file but I think
that is stable to make some tests, make check and check-world is happy.

-- 
Matheus Alcantara

v1-0001-Make-directory-work-with-extension-control-path.patch
Description: Binary data

Re: pgsql: Add function to get memory context stats for processes

2025-04-23 Thread Tom Lane

Robert Haas  writes:
> My primary concern about the patch is that
> ProcessGetMemoryContextInterrupt() can be called from any
> CHECK_FOR_INTERRUPTS() and calls lots of DSA functions, including
> dsa_create() and, via PublishMemoryContext(), dsa_allocate0(). I'm
> shocked to hear that you and Andres think that's safe to do at any
> current or future CHECK_FOR_INTERRUPTS() anywhere in the code; but
> Andres seems very confident that it's fine, so perhaps I should just
> stop worrying and be happy that we have the feature.

Just for the record, it sounds quite unsafe to me too.  I could
credit it being all right to examine the process' MemoryContext data
structures, but calling dsa_create() from CFI seems really insane.
Way too many moving parts there.

regards, tom lane

Re: Using pg_bitutils.h in tidbitmap.c.

2025-04-23 Thread John Naylor

On Mon, Apr 14, 2025 at 7:42 PM Thomas Munro  wrote:
>
> tidbitmap.c's operations loop over all the bits, but could leap over
> zeros with bitmap instructions like bitmapset.c.  Hard to imagine that
> matters, but I think it also comes out easier to read?

Interesting idea -- I toyed with something like this when TID store
was being developed. Most of it looks more readable at a glance,
although tbm_advance_schunkbit() may be a wash.

+bmw_pop_rightmost_one(bitmapword *word)
+{
+ int position = bmw_rightmost_one_pos(*word);
+
+ *word &= ~((bitmapword) 1 << position);
+
+ return position;
+}

There's a more succinct way to write the second statement, since here
we only care about the rightmost bit:

*word &= *word - 1;

That has a bonus in that we don't need to know "position" beforehand,
so the CPU can schedule that independently of the bitscan, which is 3
or 4 cycles on modern hardware, but like you said, I'm not sure if
that matters.

-- 
John Naylor
Amazon Web Services

Avoid core dump in pgstat_read_statsfile()

2025-04-23 Thread Bertrand Drouvot

Hi hackers,

please find attached a patch to $SUBJECT.

Indeed one could generate a core dump similar to:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00a7a7a7 in pgstat_init_entry (kind=129, 
shhashent=0x7a55845cfd98) at pgstat_shmem.c:314
314 chunk = dsa_allocate0(pgStatLocal.dsa, 
pgstat_get_kind_info(kind)->shared_size);
(gdb) bt
#0  0x00a7a7a7 in pgstat_init_entry (kind=129, 
shhashent=0x7a55845cfd98) at pgstat_shmem.c:314
#1  0x00a72935 in pgstat_read_statsfile () at pgstat.c:1982
#2  0x00a700a8 in pgstat_restore_stats () at pgstat.c:507

by:

1. creating a custom stat kind (non fixed_amount and write_to_file enabled)
2. generate stats
3. stop the engine
4. change the custom PGSTAT_KIND id linked to 1., compile and install
5. re-start the engine

I think that a check on pgstat_get_kind_info() is missing for this scenario, the
patch adds it. Such a check already exists for PGSTAT_FILE_ENTRY_FIXED and
for stats entry identified by name on disk, but not for PGSTAT_FILE_ENTRY_HASH.

The v18 existing checks mentioned above as well as the new check were missing
in pre-18 but I don't think it's worth a back-patch as the issue is unlikely to
occur without custom stats. Adding a v18 open item then.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From b80bcbcf05e44d31cc90c1de2e3200899db1294f Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 23 Apr 2025 07:19:44 +
Subject: [PATCH v1] Avoid core dump in pgstat_read_statsfile()

Adding a check that pgstat_get_kind_info() is not NULL when loading the
stats file. Without this extra check a non fixed_amount custom stats kind that
has write_to_file enabled would produce a core dump should its PGSTAT_KIND ID
change before a re-start.
---
 src/backend/utils/activity/pgstat.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 21bdff106a9..d735a2c1a58 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -1908,6 +1908,13 @@ pgstat_read_statsfile(void)
  key.objid, t);
 			goto error;
 		}
+
+		if (!pgstat_get_kind_info(key.kind))
+		{
+			elog(WARNING, "could not find information of kind %u for entry of type %c",
+ key.kind, t);
+			goto error;
+		}
 	}
 	else
 	{
-- 
2.34.1

RE: Fix premature xmin advancement during fast forward decoding

2025-04-23 Thread Zhijie Hou (Fujitsu)

On Wed, Apr 23, 2025 at 2:31 PM shveta malik wrote:
> 
> On Tue, Apr 22, 2025 at 12:36 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > Hi,
> >
> > To fix this, I think we can allow the base snapshot to be built during fast
> > forward decoding, as implemented in the patch 0001 (We already built base
> > snapshot in fast-forward mode for logical message in logicalmsg_decode()).
> 
> The idea and code looks okay to me and the performance impact is also
> not that huge.

Thanks for reviewing!

> 
> IIUC, fast_forward decoding mode is used only in two cases. 1)
> pg_replication_slot_advance and 2) in upgrade flow to check if there
> are any pending WAL changes which are not yet replicated. See
> 'binary_upgrade_logical_slot_has_caught_up'-->'LogicalReplicationSlotHasP
> endingWal'.
> It seems like this change will not have any negative impact in the
> upgrade flow as well (in terms of performance and anything else).
> Thoughts?

I also think so. The upgrade uses fast forward decoding to check if there are
any pending WALs not sent yet and report an ERROR if so. This indicates that
we do not expect to decode real changes for the upgrade slots, so the
performance impact would be minimal in normal cases.

Best Regards,
Hou zj

Re: What's our minimum supported Python version?

2025-04-23 Thread Devrim Gündüz

Hi,

On Wed, 2025-04-23 at 11:03 +0200, Jelte Fennema-Nio wrote:
> 
> I meant more in what ways do these things break? Since you're actually
> the one that's packaging them, I'd expect that you could make them
> depend on the python39 package and be mostly done.

You are right, our side is fixable. However many packages in the
upstream also depend on Psycopg. I don't want to create a Linux 
distribution based on RHEL 8 built against Python 3.9 (or 3.1x) :-) 

Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, PostgreSQL Major Contributor
BlueSky: @devrim.gunduz.org , @gunduz.org


signature.asc
Description: This is a digitally signed message part

Re: [BUG] temporary file usage report with extended protocol and unnamed portals

2025-04-23 Thread Frédéric Yhuel





On 4/23/25 09:41, Frédéric Yhuel wrote:



On 4/22/25 19:37, Sami Imseih wrote:

the patch relies on looking up queryDesc->sourceText inside DropPortal,
which Tom raised concerns about earlier in the thread [0]


Yes, I think I had misunderstood what Tom said. Thank you for pointing 
that out.


However, is it really unsafe?

In exec_bind_message, the portal's query string comes from a duplicate 
of the original string (see CreateCachedPlan). So we are safe in this case.


In exec_simple_query, the portal is dropped towards the end of this 
function, so we are safe here too.


Am I missing something?


Note: the patch doesn't work well with server-side prepared statements: 
the PREPARE query is blamed instead of the EXECUTE one. But this is 
maybe something that can be fixed easily.

[PATCH] dynahash: add memory allocation failure check

2025-04-23 Thread m . korotkov


Hi all,
I found a case of potential NULL pointer dereference.
In src/backend/utils/hash/dynahash.c in function HTAB *hash_create() the 
result of the DynaHashAlloc() is used unsafely.
The function DynaHashAlloc() calls MemoryContextAllocExtended() with 
MCXT_ALLOC_NO_OOM and can return a NULL pointer.

Added the pointer check for avoiding a potential problem.
---
Best regards, Korotkov Maksim
PostgresPro
m.korot...@postgrespro.ruFrom 75916a6855bb9e96c7b34b76c0380edc157c150c Mon Sep 17 00:00:00 2001
From: Maksim Korotkov 
Date: Tue, 22 Apr 2025 12:20:58 +0300
Subject: [PATCH] dynahash: add memory allocation failure check
 The function DynaHashAlloc() calls MemoryContextAllocExtended() with MCXT_ALLOC_NO_OOM
 and can return a NULL pointer.
 Fixes: e3860ffa4dd ("Initial pgindent run with pg_bsd_indent version 2.0.")
 Signed-off-by: Maksim Korotkov 

---
 src/backend/utils/hash/dynahash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 3f25929f2d8..0cfee50dc21 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -391,6 +391,12 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
 
 	/* Initialize the hash header, plus a copy of the table name */
 	hashp = (HTAB *) DynaHashAlloc(sizeof(HTAB) + strlen(tabname) + 1);
+	if (unlikely(hashp == NULL))
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory")));
+	}
 	MemSet(hashp, 0, sizeof(HTAB));
 
 	hashp->tabname = (char *) (hashp + 1);
-- 
2.34.1

extension_control_path and "directory"

2025-04-23 Thread Christoph Berg

Re: Peter Eisentraut
> The new GUC extension_control_path specifies a path to look for
> extension control files.  The default value is $system, which looks in
> the compiled-in location, as before.

Is the expectation that this also works for the extension "directory"?

semver is still failing because it's shipping its .sql files in a
separate directory:

2025-04-23 09:06:24.864 UTC [422345] myon@contrib_regression ERROR:  could not 
open directory "/usr/share/postgresql/18/semver": No such file or directory
2025-04-23 09:06:24.864 UTC [422345] myon@contrib_regression STATEMENT:  CREATE 
EXTENSION semver;

$ cat 
debian/postgresql-18-semver/usr/share/postgresql/18/extension/semver.control
# semver extension
comment = 'Semantic version data type'
default_version = '0.40.0'

directory = 'semver'
module_pathname = '$libdir/semver'
relocatable = true

$ ls debian/postgresql-18-semver/usr/share/postgresql/18/semver/
semver--0.10.0--0.11.0.sql  semver--0.17.0--0.20.0.sql  
semver--0.30.0--0.31.0.sql  semver--0.32.1--0.40.0.sql
semver--0.11.0--0.12.0.sql  semver--0.20.0--0.21.0.sql  
semver--0.3.0--0.4.0.sqlsemver--0.40.0.sql
semver--0.12.0--0.13.0.sql  semver--0.21.0--0.22.0.sql  
semver--0.31.0--0.31.1.sql  semver--0.5.0--0.10.0.sql
semver--0.13.0--0.15.0.sql  semver--0.2.1--0.2.4.sql
semver--0.31.1--0.31.2.sql  semver.sql
semver--0.15.0--0.16.0.sql  semver--0.22.0--0.30.0.sql  
semver--0.31.2--0.32.0.sql  semver--unpackaged--0.2.1.sql
semver--0.16.0--0.17.0.sql  semver--0.2.4--0.3.0.sql
semver--0.32.0--0.32.1.sql

Christoph

Re: jsonapi: scary new warnings with LTO enabled

2025-04-23 Thread Daniel Gustafsson

> On 23 Apr 2025, at 02:01, Jacob Champion  
> wrote:
> On Tue, Apr 22, 2025 at 3:10 AM Daniel Gustafsson  wrote:

>> The attached
>> v3 allocates via the JSON api, no specific error handling should be required 
>> as
>> it's already handled today.
> 
> pgindent shows one whitespace change on my machine (comment
> indentation), but other than that, LGTM! Fuzzers are happy too.

Thanks for confirming, I've pushed this now.

--
Daniel Gustafsson

Re: What's our minimum supported Python version?

2025-04-23 Thread Jelte Fennema-Nio

On Wed, 23 Apr 2025 at 11:13, Devrim Gündüz  wrote:
> You are right, our side is fixable. However many packages in the
> upstream also depend on Psycopg. I don't want to create a Linux
> distribution based on RHEL 8 built against Python 3.9 (or 3.1x) :-)

I'm confused. The upstream RHEL8 repo depends on packages in pgdg???

Or are you saying that psycopg (and patroni) are not part of the
pgdg-repo, and instead part of the upstream-repo. And so that other
packages in pgdg that depend on upstream psycopg, would always need to
support python 3.6 (because that's what upstream psycopg uses).

Re: [PATCH] dynahash: add memory allocation failure check

2025-04-23 Thread Andrey Borodin

> On 23 Apr 2025, at 13:32, m.korot...@postgrespro.ru wrote:
> 
> I found a case of potential NULL pointer dereference.
> In src/backend/utils/hash/dynahash.c in function HTAB *hash_create() the 
> result of the DynaHashAlloc() is used unsafely.
> The function DynaHashAlloc() calls MemoryContextAllocExtended() with 
> MCXT_ALLOC_NO_OOM and can return a NULL pointer.
> Added the pointer check for avoiding a potential problem.

Yeah, good catch.
And all HTAB->alloc() (which relies on DynaHashAlloc) callers seem to check for 
NULL result.
It seems there are a lot of cases of MCXT_ALLOC_NO_OOM, perhaps should we check 
them all?

Best regards, Andrey Borodin.

virtual generated column as partition key

2025-04-23 Thread jian he

hi.

The attached patch is to implement $subject.

demo:
CREATE TABLE t(f1 bigint, f2 bigint GENERATED ALWAYS AS (f1 * 2)
VIRTUAL) PARTITION BY RANGE (f2);

it will works just fine as
CREATE TABLE t(f1 bigint, f2 bigint GENERATED ALWAYS AS (f1 * 2)
VIRTUAL) PARTITION BY RANGE ((f1 *2 );
under the hood.

but the partition key can not be an expression on top of a virtual
generated column.
so the following is not allowed:
CREATE TABLE t(f1 bigint, f2 bigint GENERATED ALWAYS AS (f1 * 2)
VIRTUAL) PARTITION BY RANGE ((f2+1));

The virtual generated column expression for each partition must match with
the partitioned table, since it is used as a partition key. Otherwise, the
partition bound would be dynamically evaluated.
so the following  table gtest_part_key1_0 can not attach to the partition tree.

CREATE TABLE gtest_part_key1 (f1 date, f2 bigint, f3 bigint GENERATED
ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE ((f3)); --ok
CREATE TABLE gtest_part_key1_0(f3 bigint GENERATED ALWAYS AS (f2 * 3)
VIRTUAL, f2 bigint, f1 date);
ALTER TABLE gtest_part_key1 ATTACH PARTITION gtest_part_key1_0 FOR
VALUES FROM (20) TO (30); --error

cross partition update tests added.

A virtual generated column entry in the pg_partitioned_table catalog is marked
as non-zero partattrs and a non-null partexprs, which is abnormal. Normally,
either partattrs is non-zero or partexprs is null.
we should mention this in the doc/src/sgml/catalogs.sgml
From 25986c0aefbbabe022282b7941023fe4fbb9e1dc Mon Sep 17 00:00:00 2001
From: jian he 
Date: Wed, 23 Apr 2025 19:54:03 +0800
Subject: [PATCH v1 1/1] virtual generated column as partition key

demo:
CREATE TABLE t(f1 bigint, f2 bigint GENERATED ALWAYS AS (f1 * 2) VIRTUAL) PARTITION BY RANGE (f2);
but partition key can not be expression on top of virtual generated column.
so the following is not allowed:
CREATE TABLE t(f1 bigint, f2 bigint GENERATED ALWAYS AS (f1 * 2) VIRTUAL) PARTITION BY RANGE ((f2+1));

The virtual generated column expression for each partition must match with
the partitioned table, since it is used as a partition key. Otherwise, the
partition bound would be dynamically evaluated.

cross partition update tests added.

A virtual generated column entry in the pg_partitioned_table catalog is marked
as non-zero partattrs and a non-null partexprs, which is abnormal. Normally,
either partattrs is non-zero or partexprs is null.
we should mention this in the doc/src/sgml/catalogs.sgml

discussion: https://postgr.es/m/
---
 src/backend/commands/tablecmds.c  | 371 --
 src/backend/executor/execPartition.c  |   3 +-
 src/backend/partitioning/partbounds.c |  32 +-
 src/backend/utils/cache/partcache.c   |   5 +-
 src/backend/utils/cache/relcache.c|  13 +
 src/include/utils/relcache.h  |   1 +
 .../regress/expected/generated_stored.out |  11 +-
 .../regress/expected/generated_virtual.out| 113 +-
 src/test/regress/sql/generated_stored.sql |   1 +
 src/test/regress/sql/generated_virtual.sql|  64 ++-
 10 files changed, 471 insertions(+), 143 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 265b1c397fb..9d05d83b5e3 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -707,6 +707,13 @@ static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid,
 static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid,
 			 Oid oldrelid, void *arg);
 static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec);
+static void ComputePartitionExprs(ParseState *pstate,
+  Relation rel,
+  int	attn,
+  AttrNumber *partattrs,
+  List **partexprs,
+  PartitionElem *pelem,
+  Node	   *expr);
 static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
   List **partexprs, Oid *partopclass, Oid *partcollation,
   PartitionStrategy strategy);
@@ -8601,6 +8608,45 @@ ATExecSetExpression(AlteredTableInfo *tab, Relation rel, const char *colName,
  errmsg("column \"%s\" of relation \"%s\" is not a generated column",
 		colName, RelationGetRelationName(rel;
 
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		if (has_partition_attrs(rel,
+bms_make_singleton(attnum - FirstLowInvalidHeapAttributeNumber),
+NULL))
+			ereport(ERROR,
+	errcode(ERRCODE_INVALID_COLUMN_DEFINITION),
+	errmsg("cannot alter column \"%s\" because it is part of the partition key of relation \"%s\"",
+			colName, RelationGetRelationName(rel)));
+	}
+
+	if (rel->rd_rel->relispartition)
+	{
+		AttrNumber	parent_attnum;
+		Oid			parentId;
+		Relation	parent;
+		AttrMap		*map	= NULL;
+
+		parentId = get_partition_parent(RelationGetRelid(rel), false);
+
+		parent = table_open(parentId, AccessShareLock);
+		map = build_attrmap_by_name_if_req(RelationGetDes

Re: bug: virtual generated column can be partition key

2025-04-23 Thread jian he

On Tue, Apr 22, 2025 at 4:55 PM Ashutosh Bapat
 wrote:
>
> Sorry I missed this email while sending the patches - our emails crossed in 
> the air.
>
> On Tue, Apr 22, 2025 at 2:15 PM jian he  wrote:
>>
>> On Tue, Apr 22, 2025 at 3:02 PM jian he  wrote:
>> > Other than that, it looks good to me for fixing this bug.
>>
>> The error message seems not that intuitive.
>>
>> +CREATE TABLE gtest_part_key (f1 date NOT NULL, f2 bigint, f3 bigint
>> GENERATED ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE
>> ((gtest_part_key));
>> +ERROR:  cannot use generated column in partition key
>> +LINE 1: ...D ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE ((gtest_par...
>> + ^
>> +DETAIL:  Column "f3" is a generated column.
>
>
> Yes. It's not apparent where does f3 appear in the partition key, to a lay 
> users. Users who explicitly use whole row expression in a partition key, 
> would know that a whole row expression contains all columns. And the error 
> location points to the whole-row reference. So the current state isn't that 
> bad.
>
>>
>>
>> with the attached patch. now,
>> +CREATE TABLE gtest_part_key (f1 date NOT NULL, f2 bigint, f3 bigint
>> GENERATED ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE
>> ((gtest_part_key));
>> ERROR:  cannot use generated column in partition key
>> LINE 1: ...D ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE ((gtest_par...
>>  ^
>> DETAIL:  Generated column "f3" is part of the partition key of
>> relation "gtest_part_key"
>>
>
> The DETAIL here is just mentioning what's already known from the command, so 
> the change in the DETAIL may not be useful.
>
> If I understand your intention correctly, we have to mention something to the 
> effect "partition key expression contains a whole-row reference which in turn 
> contains a generated column." But that might be too verbose.
>

you understand my intention correctly,
I aggrees current error message in that location is not that bad.

overall your latest patch looks good to me.

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-23 Thread Robert Haas

On Mon, Apr 21, 2025 at 8:52 PM Nikhil Kumar Veldanda
 wrote:
> After reviewing the email thread you attached on previous response, I
> identified a natural choke point for both inserts and updates: the
> call to "heap_toast_insert_or_update" inside
> heap_prepare_insert/heap_update. In the current master branch, that
> function only runs when HeapTupleHasExternal is true; my patch extends
> it to HeapTupleHasVarWidth tuples as well.

Isn't that basically all tuples, though? I think that's where this gets painful.

> On the performance side, my basic benchmarks show almost no regression
> for simple INSERT … VALUES workloads. CTAS, however, does regress
> noticeably: a CTAS completes in about 4 seconds before this patch, but
> with this patch it takes roughly 24 seconds. (For reference, a normal
> insert into the source table took about 58 seconds when using zstd
> dictionary compression), I suspect the extra cost comes from the added
> zstd decompression and PGLZ compression on the destination table.

That's nice to know, but I think the key question is not so much what
the feature costs when it is used but what it costs when it isn't
used. If we implement a system where we don't let
dictionary-compressed zstd datums leak out of tables, that's bound to
slow down a CTAS from a table where this feature is used, but that's
kind of OK: the feature has pros and cons, and if you don't like those
tradeoffs, you don't have to use it. However, it sounds like this
could also slow down inserts and updates in some cases even for users
who are not making use of the feature, and that's going to be a major
problem unless it can be shown that there is no case where the impact
is at all significant. Users hate paying for features that they aren't
using.

I wonder if there's a possible design where we only allow
dictionary-compressed datums to exist as top-level attributes in
designated tables to which those dictionaries are attached; and any
time you try to bury that Datum inside a container object (row, range,
array, whatever) detoasting is forced. If there's a clean and
inexpensive way to implement that, then you could avoid having
heap_toast_insert_or_update care about HeapTupleHasExternal(), which
seems like it might be a key point.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Add function to get memory context stats for processes

2025-04-23 Thread Robert Haas

On Tue, Apr 22, 2025 at 4:12 AM Daniel Gustafsson
 wrote:
> Do you mean that an interrupt handler will make DSA allocations?  I don't 
> think
> that would be something we'd want to allow regardless of this patch.  Or did I
> misunderstand the above?

My primary concern about the patch is that
ProcessGetMemoryContextInterrupt() can be called from any
CHECK_FOR_INTERRUPTS() and calls lots of DSA functions, including
dsa_create() and, via PublishMemoryContext(), dsa_allocate0(). I'm
shocked to hear that you and Andres think that's safe to do at any
current or future CHECK_FOR_INTERRUPTS() anywhere in the code; but
Andres seems very confident that it's fine, so perhaps I should just
stop worrying and be happy that we have the feature.

I thought that the issues here would be similar to
https://commitfest.postgresql.org/patch/5473/ and
https://commitfest.postgresql.org/patch/5330/ where it seems we need
to go to fairly extraordinary lengths to try to make the operation
safe -- the proposal there is essentially to have a CFI handler run
around the executor state tree and replace every ExecProcNode pointer
with a pointer to some new wrapper function which in turn does the
actual query-plan printing. Even that requires some tricky footwork to
get right, but the core idea is that you can't imagine that
CHECK_FOR_INTERRUPTS() is a safe place to do arbitrary stuff, so you
have to use it to trigger the actual work at some later point where it
will be safe to do the stuff that you want to do. I suppose the
difference in this case is that memory-allocation is a lower-level
subsystem than query execution and therefore there are presumably
fewer places where it's not safe to assume that you can do
memory-allocation type operations. But I at least feel like DSA is a
pretty high-level system that I wouldn't want to count on being able
to access from a fairly-arbitrary point in the code, which is why I'm
quite astonished to hear Andres basically saying "don't worry, it's
all fine!". But my astonishment does not mean that he is wrong.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [BUG] temporary file usage report with extended protocol and unnamed portals

2025-04-23 Thread Sami Imseih

> Yes, I think I had misunderstood what Tom said. Thank you for pointing
> that out.
>
> However, is it really unsafe?

I have not been able to find a case where this is unsafe, but
the documentation in mmgr/README does indicate that the
query string may live shorter than the portal in some cases.

"
This
is kept separate from per-transaction and per-portal contexts because a
query string might need to live either a longer or shorter time than any
single transaction or portal.
"""

Also, another strange behavior of the way portal cleanup occurs is that
in extended-query-protocol and within a transaction, ExecutorEnd for the
last query is not actually called until the next command. This just seems
odd to me especially for extensions that rely on ExecutorEnd.

So, Can we do something like this? This drops the portal as soon as
execution completes ( the portal is fetched to completion ). This will
ensure that there is no delay in ExecutorEnd getting called and in the
case of log_temp_files, the message will be logged while debug_query_string
is still pointing to the correct query.


diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dc4c600922d..efe0151ca8f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2327,6 +2327,9 @@ exec_execute_message(const char *portal_name,
long max_rows)

/* Send appropriate CommandComplete to client */
EndCommand(&qc, dest, false);
+
+   if (!portal->portalPinned)
+   PortalDrop(portal, false);
}
else
{



```
postgres=# begin;
BEGIN
postgres=*# select from foo order by a \bind
postgres-*# ;
--
(100 rows)

postgres=*# select 1 \bind
postgres-*# ;
 ?column?
--
1
(1 row)

postgres=*# commit;
COMMIT
```

```
2025-04-23 11:11:47.777 CDT [67362] LOG:  temporary file: path
"base/pgsql_tmp/pgsql_tmp67362.0", size 1009983488
2025-04-23 11:11:47.777 CDT [67362] STATEMENT:  select from foo order by a
;
```


thoughts?

--
Sami Imseih
Amazon Web Services (AWS)

Re: Conflicting updates of command progress

2025-04-23 Thread Antonin Houska

Sami Imseih  wrote:

> > While working on [1] I realized that some field of pg_stat_progress_cluste 
> > has
> > weird value.
> 
> Is there a repro that you can share that shows the weird values? It sounds 
> like
> the repro is on top of [1]. Is that right?

Yes.

> > AFAICS the current design does not consider that one progress-reporting
> > command can be called by another one.
> 
> pgstat_progress_start_command should only be called once by the entry
> point for the
> command. In theory, we could end up in a situation where start_command
> is called multiple times during the same top-level command;

Not only in theory - it actually happens when CLUSTER is rebuilding indexes.

> > Not sure what the correct fix is. We can
> > either ignore update requests from the "nested" commands, or display the
> 
> There is a pattern where we do
> 
> ```
> if (progress)
>   pgstat_progress_update_param
> ```
> 
> cluster_rel can pass down a flag to index_build or others that update progress
> to not report progress. Therefore, only the top level command is
> updating progress.
> what do you think?

That's a possible approach. However, if the index build takes long time in the
CLUSTER case, the user will probably be interested in details about the index
build.

The other approach is that we report only on the index build, but in that case
the interesting row of pg_stat_progress_cluster would disappear
temporarily. That might also be confusing.

Ideally we should report on all the commands in progress, but that is not
trivial to implement.

> [1] https://commitfest.postgresql.org/patch/5117/

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: What's our minimum supported Python version?

2025-04-23 Thread Jelte Fennema-Nio

On Wed, 23 Apr 2025 at 10:59, Devrim Gündüz  wrote:
> The most notable one would be Psycopg (2 and 3). Plus Patroni
> dependencies. There may be a couple of more.

I meant more in what ways do these things break? Since you're actually
the one that's packaging them, I'd expect that you could make them
depend on the python39 package and be mostly done.

Re: Built-in Raft replication

2025-04-23 Thread Devrim Gündüz

Hi,

On Wed, 2025-04-23 at 11:48 -0500, Jim Nasby wrote:
> unless we added multiple WAL streams. That would allow for splitting
> WAL traffic across multiple devices as well as providing better
> support for configurations that don’t replicate the entire cluster.
> The current situation where delayed replication of a single table
> mandates retention of all the WAL for the entire cluster is less than
> ideal.

I think the problem is handling the stream of global objects. Having
separate stream for each database would be awesome as long as it can
deal with the "global stream".

Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, PostgreSQL Major Contributor
BlueSky: @devrim.gunduz.org , @gunduz.org


signature.asc
Description: This is a digitally signed message part

Re: Fix slot synchronization with two_phase decoding enabled

2025-04-23 Thread Nisha Moond

On Tue, Apr 22, 2025 at 3:23 PM shveta malik  wrote:
>
> On Fri, Apr 11, 2025 at 1:03 PM Nisha Moond  wrote:
> >
> >
> > Patch "v5_aprch_3-0001" implements the above Approach 3.
> >
> > Thanks Hou-san for implementing approach-2 and providing the patch.
> >
>
> I find Approach 2 better than Approach1. Yet to review Approach3.
> Please find my initial comments:
>

Thanks for the review! I’ve addressed the comments for Approach 2, as
it seems the most suitable. We can revisit the others if needed.

>
>
> Approach #2:
>
> 1)
> + ReorderBufferSkipPrepare(reorder, parsed.twophase_xid);
>
> In xact_decode, why do we have a new call to ReorderBufferSkipPrepare,
> can you please add some comments here?
>

Done.

> 2)
> + ereport(ERROR,
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("\"%s\" and \"%s\" are mutually exclusive options when \"%s\"
> is specified",
>
> So like approach 1, here as well we disallow creating subscriptions
> with both failover and two_phase enabled when create_slot and
> copy_data is true? But users can alter the sub later to allow failover
> for two-phase enabled slot provided there are no pending PREPARE txns
> which are not yet consumed by the subscriber. Is my understanding
> correct? Can you please tell all the ways using which the user can
> enable both failover and two_phase together? The patch msg is not that
> clear. It will be good to add such details in patch message.
>

We allow creating subscriptions in this scenario as long as no
prepared transactions exist before the two_phase_at. Similarly,
altering a subscription to set failover = true is permitted, provided
the slot's restart_lsn is greater than two_phase_at.
Amit has suggested the ways at [1]  in which users can enable both
failover and two_phase together.
I've updated the commit message to include more details about the
conditions enforced by the fix.

> 3)
>
> + /*
> + * Do not allow users to enable the failover and two_phase options together
> + * when create_slot is specified.
> + *
> + * See comments atop the similar check in DecodingContextFindStartpoint()
> + * for a detailed reason.
> + */
> + if (!opts.create_slot && opts.twophase && opts.failover)
> + ereport(ERROR,
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("\"%s\" and \"%s\" are mutually exclusive options when \"%s\"
> is specified",
> +
>
> Is the check '!opts.create_slot' correct? The error msg and comment
> says otherwise.
>

Corrected the check as it should be checking if copy_data is
specified. Thanks for the input!

Please find the attached v6 patch for approach-2 fix on PG17.

[1] 
https://www.postgresql.org/message-id/CAA4eK1LvMwXxvAzHpK%2BEgjc7vu1NmGxxKcaK_06pE7GKk7JtJQ%40mail.gmail.com

--
Thanks,
Nisha


v6-0001-PG17-Approach-2-Fix-slot-synchronization-for-two_.patch
Description: Binary data

Re: What's our minimum supported Python version?

2025-04-23 Thread Devrim Gündüz

Hi,

On Wed, 2025-04-23 at 11:46 +0200, Jelte Fennema-Nio wrote:
> I'm confused. The upstream RHEL8 repo depends on packages in pgdg???
> 
> Or are you saying that psycopg (and patroni) are not part of the
> pgdg-repo, and instead part of the upstream-repo. And so that other
> packages in pgdg that depend on upstream psycopg, would always need to
> support python 3.6 (because that's what upstream psycopg uses).

psycopg is included in RHEL 8, but PGDG packages are up2date (2.7.5 vs
2.9.5) so they override OS packages. That is why things will break.

A solution would be creating our own psycopg2 (likely for Python 3.12)
package, update all PGDG packages that depend on psycopg2 to use that
package. That is not impossible (I have already psycopg2 built against
Python 3.11 on SLES 15), but I don't know how much work it will be and
its impact as that set of update should go to both RHEL 8, 9 and 10
(RHEL 10 already includes Python 3.12 by default)

I can go for this solution if it is *absolutely* required. We already
have custom packages to support PostGIS, so that is not a new topic for
me.

Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, PostgreSQL Major Contributor
BlueSky: @devrim.gunduz.org , @gunduz.org

signature.asc
Description: This is a digitally signed message part

Re: extension_control_path and "directory"

2025-04-23 Thread Matheus Alcantara

On Wed, Apr 23, 2025 at 6:39 AM Christoph Berg  wrote:
>
> Re: Peter Eisentraut
> > The new GUC extension_control_path specifies a path to look for
> > extension control files.  The default value is $system, which looks in
> > the compiled-in location, as before.
>
> Is the expectation that this also works for the extension "directory"?
>
> semver is still failing because it's shipping its .sql files in a
> separate directory:
>
> 2025-04-23 09:06:24.864 UTC [422345] myon@contrib_regression ERROR:  could 
> not open directory "/usr/share/postgresql/18/semver": No such file or 
> directory
> 2025-04-23 09:06:24.864 UTC [422345] myon@contrib_regression STATEMENT:  
> CREATE EXTENSION semver;
>
> $ cat 
> debian/postgresql-18-semver/usr/share/postgresql/18/extension/semver.control
> # semver extension
> comment = 'Semantic version data type'
> default_version = '0.40.0'
>
> directory = 'semver'
> module_pathname = '$libdir/semver'
> relocatable = true
>
> $ ls debian/postgresql-18-semver/usr/share/postgresql/18/semver/
> semver--0.10.0--0.11.0.sql  semver--0.17.0--0.20.0.sql  
> semver--0.30.0--0.31.0.sql  semver--0.32.1--0.40.0.sql
> semver--0.11.0--0.12.0.sql  semver--0.20.0--0.21.0.sql  
> semver--0.3.0--0.4.0.sqlsemver--0.40.0.sql
> semver--0.12.0--0.13.0.sql  semver--0.21.0--0.22.0.sql  
> semver--0.31.0--0.31.1.sql  semver--0.5.0--0.10.0.sql
> semver--0.13.0--0.15.0.sql  semver--0.2.1--0.2.4.sql
> semver--0.31.1--0.31.2.sql  semver.sql
> semver--0.15.0--0.16.0.sql  semver--0.22.0--0.30.0.sql  
> semver--0.31.2--0.32.0.sql  semver--unpackaged--0.2.1.sql
> semver--0.16.0--0.17.0.sql  semver--0.2.4--0.3.0.sql
> semver--0.32.0--0.32.1.sql
>

I've tried to implement some kind of "SHAREDIR search path" as we've
discussed on another thread [1] but it shows out that we need some
considerable refactoring to make it work.

Talking with Peter privately IIUC we concluded that we may document
this limitation that using extension control path with an extension that
uses a custom "directory" field on .control file will not work. I think
that we may add a note section on "extension_control_path" doc on [2],
what do you think?

[1] 
https://www.postgresql.org/message-id/0F50D989-B82D-4F59-9F13-C08A4673322C%40justatheory.com
[2] 
https://www.postgresql.org/docs/devel/runtime-config-client.html#GUC-EXTENSION-CONTROL-PATH

-- 
Matheus Alcantara

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-04-23 Thread Amit Kapila

On Mon, Apr 21, 2025 at 11:01 PM Masahiko Sawada  wrote:
>
> I would like to discuss behavioral and user interface considerations.
>
> Upon further analysis of this patch regarding the conversion of
> wal_level to a SIGHUP parameter, I find that supporting all
> combinations of wal_level value changes might make less sense.
> Specifically, changing to or from 'minimal' would necessitate a
> checkpoint, and reducing wal_level to 'minimal' would require
> terminating physical replication, WAL archiving, and online backups.
> While these operations demand careful consideration, there seems to be
> no compelling use case for decreasing to 'minimal'. Furthermore,
> increasing wal_level from 'minimal' is typically a one-time operation
> during a database's lifetime. Therefore, we should weigh the benefits
> against the implementation complexity.
>
> One solution is to manage the effective WAL level using two distinct
> GUC parameters: max_wal_level and wal_level. max_wal_level would be a
> POSTMASTER parameter controlling the system's maximum allowable WAL
> level, with values 'minimal', 'replica', and 'logical'. wal_level
> would function as a SIGHUP parameter managing the runtime WAL level,
> accepting values 'replica', 'logical', and 'auto'. The selected value
> must be either 'auto' or not exceed max_wal_level. When set to 'auto',
> wal_level automatically synchronizes with max_wal_level's value. This
> approach would enable online WAL level transitions between 'replica'
> and 'logical'.
>
>
> Regarding logical decoding on standbys, currently both primary and
> standby servers must have wal_level set to 'logical'. We need to
> determine the appropriate behavior when users decrease the WAL level
> from 'logical' to 'replica' through configuration file reload.
>
> One approach would be to invalidate all logical replication slots on
> the standby when transitioning to 'replica' WAL level. Although
> incoming WAL records from the primary would still be written at
> 'logical' level, making logical decoding technically feasible, this
> behavior seems logical as it reflects the user's intent to discontinue
> logical decoding on the standby. For consistency, we might need to
> invalidate logical slots during server startup if the WAL level is
> insufficient.
>
> Alternatively, we could permit logical decoding on the standby even
> with wal_level set to 'replica'. However, this would necessitate
> invalidating all logical replication slots during promotion,
> potentially extending downtime during failover.
>

BTW, did we consider the idea to automatically transition to 'logical'
when the first logical slot is created and transition back to
'replica' when last logical slot gets dropped? I see some ideas around
this last time we discussed this topic.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1J0we5qsZ-ZOwXPbZyvwdWbnT43knO2Cxidia2aHxZSJw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.

2025-04-23 Thread Christoph Berg

Re: Nathan Bossart
>   pg_log(PG_REPORT,
> +"Some statistics are not transferred by pg_upgrade.\n"
>  "Once you start the new server, consider running:\n"
> +"%s/vacuumdb %s--all --analyze-in-stages 
> --missing-stats-only\n"
> +"%s/vacuumdb %s--all --analyze-only",

I would make it "Once you start the new server, run these two commands:"
to make it explicit that both should be run, not alternatively either.

This patch addresses my concern, thanks.

Christoph

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-04-23 Thread Masahiko Sawada

On Wed, Apr 23, 2025 at 5:46 AM Amit Kapila  wrote:
>
> On Mon, Apr 21, 2025 at 11:01 PM Masahiko Sawada  
> wrote:
> >
> > I would like to discuss behavioral and user interface considerations.
> >
> > Upon further analysis of this patch regarding the conversion of
> > wal_level to a SIGHUP parameter, I find that supporting all
> > combinations of wal_level value changes might make less sense.
> > Specifically, changing to or from 'minimal' would necessitate a
> > checkpoint, and reducing wal_level to 'minimal' would require
> > terminating physical replication, WAL archiving, and online backups.
> > While these operations demand careful consideration, there seems to be
> > no compelling use case for decreasing to 'minimal'. Furthermore,
> > increasing wal_level from 'minimal' is typically a one-time operation
> > during a database's lifetime. Therefore, we should weigh the benefits
> > against the implementation complexity.
> >
> > One solution is to manage the effective WAL level using two distinct
> > GUC parameters: max_wal_level and wal_level. max_wal_level would be a
> > POSTMASTER parameter controlling the system's maximum allowable WAL
> > level, with values 'minimal', 'replica', and 'logical'. wal_level
> > would function as a SIGHUP parameter managing the runtime WAL level,
> > accepting values 'replica', 'logical', and 'auto'. The selected value
> > must be either 'auto' or not exceed max_wal_level. When set to 'auto',
> > wal_level automatically synchronizes with max_wal_level's value. This
> > approach would enable online WAL level transitions between 'replica'
> > and 'logical'.
> >
> >
> > Regarding logical decoding on standbys, currently both primary and
> > standby servers must have wal_level set to 'logical'. We need to
> > determine the appropriate behavior when users decrease the WAL level
> > from 'logical' to 'replica' through configuration file reload.
> >
> > One approach would be to invalidate all logical replication slots on
> > the standby when transitioning to 'replica' WAL level. Although
> > incoming WAL records from the primary would still be written at
> > 'logical' level, making logical decoding technically feasible, this
> > behavior seems logical as it reflects the user's intent to discontinue
> > logical decoding on the standby. For consistency, we might need to
> > invalidate logical slots during server startup if the WAL level is
> > insufficient.
> >
> > Alternatively, we could permit logical decoding on the standby even
> > with wal_level set to 'replica'. However, this would necessitate
> > invalidating all logical replication slots during promotion,
> > potentially extending downtime during failover.
> >
>
> BTW, did we consider the idea to automatically transition to 'logical'
> when the first logical slot is created and transition back to
> 'replica' when last logical slot gets dropped? I see some ideas around
> this last time we discussed this topic.

Yes. Bertrand pointed out that a drawback is that the primary server
needs to create a logical slot in order to execute logical decoding on
the standbys[1].

Regards,

[1] 
https://www.postgresql.org/message-id/Z5DCm6xiBfbUdvX7%40ip-10-97-1-34.eu-west-3.compute.internal

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Built-in Raft replication

2025-04-23 Thread Jim Nasby

On Apr 16, 2025, at 2:29 PM, Greg Sabino Mullane  wrote:
> 
> On Wed, Apr 16, 2025 at 2:18 AM Ashutosh Bapat  > wrote:
>> Users find it a waste of resources to deploy 3 big PostgreSQL instances just 
>> for HA where 2 suffice even if they deploy 3 lightweight DCS instances. 
>> Having only some of the nodes act as DCS and others purely PostgreSQL nodes 
>> will reduce waste of resources.
> 
> A big problem is that putting your DCS into Postgres means your whole system 
> is now super-sensitive to IO/WAL-streaming issues, and a busy database doing 
> database stuff is going to start affecting the DCS stuff.  With three 
> lightweight DCS servers, you don't really need to worry about how stressed 
> the database servers are. In that way, I feel etcd et al. are adhering to the 
> unix philosophy of "do one thing, and do it well.”


… unless we added multiple WAL streams. That would allow for splitting WAL 
traffic across multiple devices as well as providing better support for 
configurations that don’t replicate the entire cluster. The current situation 
where delayed replication of a single table mandates retention of all the WAL 
for the entire cluster is less than ideal.

Re: SQL:2023 JSON simplified accessor support

2025-04-23 Thread Nikita Malakhov

Hi Alex!

Glad you made so much effort to develop this patch set!
I think this is an important part of Json functionality.

I've looked into you patch and noticed change in behavior
in new test results:

postgres@postgres=# create table t(x int, y jsonb);
insert into t select 1, '{"a": 1, "b": 42}'::jsonb;
insert into t select 1, '{"a": 2, "b": {"c": 42}}'::jsonb;
insert into t select 1, '{"a": 3, "b": {"c": "42"}, "d":[11, 12]}'::jsonb;
CREATE TABLE
Time: 6.373 ms
INSERT 0 1
Time: 3.299 ms
INSERT 0 1
Time: 2.532 ms
INSERT 0 1
Time: 2.453 ms

Original master:
postgres@postgres=# select (t.y).b.c.d.e from t;
ERROR:  column notation .b applied to type jsonb, which is not a composite
type
LINE 1: select (t.y).b.c.d.e from t;
^
Time: 0.553 ms

Patched (with v11):
postgres@postgres=# select (t.y).b.c.d.e from t;
 e
---



(3 rows)

Is this correct?

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: long-standing data loss bug in initial sync of logical replication

2025-04-23 Thread Masahiko Sawada

On Tue, Apr 22, 2025 at 11:31 PM Amit Kapila  wrote:
>
> On Tue, Apr 22, 2025 at 10:57 PM Masahiko Sawada  
> wrote:
> >
> > On Tue, Mar 18, 2025 at 2:55 AM Amit Kapila  wrote:
> > >
> > > On Mon, Mar 17, 2025 at 4:56 PM Hayato Kuroda (Fujitsu)
> > >  wrote:
> > > >
> > > > > Regarding the PG13, it cannot be
> > > > > applied
> > > > > as-is thus some adjustments are needed. I will share it in upcoming 
> > > > > posts.
> > > >
> > > > Here is a patch set for PG13. Apart from PG14-17, the patch could be 
> > > > created as-is,
> > > > because...
> > > >
> > > > 1. WAL record for invalidation messages (XLOG_XACT_INVALIDATIONS) does 
> > > > not exist.
> > > > 2. Thus the ReorderBufferChange for the invalidation does not exist.
> > > >Our patch tries to distribute it but cannot be done as-is.
> > > > 3. Codes assumed that invalidation messages can be added only once.
> > > > 4. The timing when invalidation messages are consumed is limited:
> > > >   a. COMMAND_ID change is poped,
> > > >   b. start of decoding a transaction, or
> > > >   c. end of decoding a transaction.
> > > >
> > > > Above means that invalidations cannot be executed while being decoded.
> > > > I created two patch sets to resolve the data loss issue. 0001 has less 
> > > > code
> > > > changes but could resolve a part of issue, 0002 has huge changes but 
> > > > provides a
> > > > complete solution.
> > > >
> > > > 0001 - mostly same as patches for other versions. 
> > > > ReorderBufferAddInvalidations()
> > > >was adjusted to allow being called several times. As I said 
> > > > above,
> > > >0001 cannot execute inval messages while decoding the 
> > > > transacitons.
> > > > 0002 - introduces new ReorderBufferChange type to indicate inval 
> > > > messages.
> > > >It would be handled like PG14+.
> > > >
> > > > Here is an example. Assuming that the table foo exists on both nodes, a
> > > > publication "pub" which publishes all tables, and a subscription "sub" 
> > > > which
> > > > subscribes "pub". What if the workload is executed?
> > > >
> > > > ```
> > > > S1  S2
> > > > BEGIN;
> > > > INSERT INTO foo VALUES (1)
> > > > ALTER PUBLICATION pub RENAME TO 
> > > > pub_renamed;
> > > > INSERT INTO foo VALUES (2)
> > > > COMMIT;
> > > > LR -> ?
> > > > ```
> > > >
> > > > With 0001, tuples (1) and (2) would be replicated to the subscriber.
> > > > An error "publication "pub" does not exist" would raise when new 
> > > > changes are done
> > > > later.
> > > >
> > > > 0001+0002 works more aggressively; the error would raise when S1 
> > > > transaction is decoded.
> > > > The behavior is same as for patched PG14-PG17.
> > > >
> > >
> > > I understand that with 0001 the fix is partial in the sense that
> > > because invalidations are processed at the transaction end, the
> > > changes of concurrent DDL will only be reflected for the next
> > > transaction. Now, on one hand, it is prudent to not add a new type of
> > > ReorderBufferChange in the backbranch (v13) but the change is not that
> > > invasive, so we can go with it as well. My preference would be to go
> > > with just 0001 for v13 to minimize the risk of adding new bugs or
> > > breaking something unintentionally.
> > >
> > > Sawada-San, and others involved here, do you have any suggestions on
> > > this matter?
> >
> > Sorry for the late response.
> >
> > I agree with just 0001 for v13 as 0002 seems invasive. Given that v13
> > would have only a few releases until EOL and 0001 can deal with some
> > cases in question, I'd like to avoid such invasive  changes in v13.
> >
>
> Fair enough. OTOH, we can leave the 13 branch considering following:
> (a) it is near EOL, (b) bug happens in rare cases (when the DDLs like
> ALTER PUBLICATION ... ADD TABLE ... or ALTER TYPE ...  that don't take
> a strong lock on table happens concurrently to DMLs on the tables
> involved in the DDL.), and (c) the complete fix is invasive, even
> partial fix is not simple. I have a slight fear that if we make any
> mistake in fixing it partially (of course, we can't see any today), we
> may not even get a chance to fix it.
>
> Now, if the above convinces you or someone else not to push the
> partial fix in 13, then fine; otherwise, I'll push the 0001 to 13 day
> after tomorrow.

I've considered the above points. I guess (b), particularly executing
ALTER PUBLICATION .. ADD TABLE while the target table is being
updated, might not be rare depending on systems. Given that this bug
causes a silent data-loss on the subscriber that is hard for users to
realize, it could ultimately depend on to what extent we can mitigate
the problem with only 0001 and there is a workaround when the problem
happens.

Kuroda-san already shared[1] the analysis of what happens with and
without 0002 patch, but let me try with the example close to the
original data-loss problem[2]:

Consider the following scenario:

S1: CREATE TABLE d(data text not nul

Startup subpaths in MergeAppend path

2025-04-23 Thread Andrei Lepikhov


Hi,

Upon reviewing [1], I noticed an inconsistency: within the 
add_paths_to_append_rel function, Postgres only constructs 
startup_subpaths when the rel->consider_startup flag is set to true. 
However, the generate_ordered_append_paths function generates these 
paths regardless of this flag.


After examining the code, I found no scenario where a startup-optimal 
path would be necessary if consider_startup is false. Impact on the 
planning time might be noticeable in partitioned cases. Therefore, does 
it make sense to include startup paths only when it is necessary?


I created a simple patch to address this issue, and it has successfully 
passed all regression tests. If I overlooked something, it would be 
beneficial to add a regression test demonstrating the necessity of 
startup paths regardless of declared limits.


Anyway, it will be beneficial to discuss this logic in the mailing list.

[1] 
https://www.postgresql.org/message-id/flat/25d6a2cd161673d51373b7e07e6d9dd6%40postgrespro.ru


--
regards, Andrei Lepikhov
From 90c001d767d092784289c4853ae9ef7ecd46f391 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Wed, 23 Apr 2025 14:17:22 +0200
Subject: [PATCH] Consider startup paths in MergeAppend only if fractional
 result may take place

---
 src/backend/optimizer/path/allpaths.c | 72 ---
 1 file changed, 43 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 905250b3325..7e8a8d789b3 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1849,17 +1849,18 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 		for (int i = first_index; i != end_index; i += direction)
 		{
 			RelOptInfo *childrel = list_nth_node(RelOptInfo, live_childrels, i);
-			Path	   *cheapest_startup,
+			Path	   *cheapest_startup = NULL,
 	   *cheapest_total,
 	   *cheapest_fractional = NULL;
 
 			/* Locate the right paths, if they are available. */
-			cheapest_startup =
-get_cheapest_path_for_pathkeys(childrel->pathlist,
-			   pathkeys,
-			   NULL,
-			   STARTUP_COST,
-			   false);
+			if (rel->consider_startup)
+cheapest_startup =
+	get_cheapest_path_for_pathkeys(childrel->pathlist,
+   pathkeys,
+   NULL,
+   STARTUP_COST,
+   false);
 			cheapest_total =
 get_cheapest_path_for_pathkeys(childrel->pathlist,
 			   pathkeys,
@@ -1871,10 +1872,15 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 			 * If we can't find any paths with the right order just use the
 			 * cheapest-total path; we'll have to sort it later.
 			 */
-			if (cheapest_startup == NULL || cheapest_total == NULL)
+			if (cheapest_total == NULL)
 			{
-cheapest_startup = cheapest_total =
-	childrel->cheapest_total_path;
+Assert(cheapest_startup == NULL);
+
+cheapest_total = childrel->cheapest_total_path;
+
+if (rel->consider_startup)
+	cheapest_startup = cheapest_total;
+
 /* Assert we do have an unparameterized path for this child */
 Assert(cheapest_total->param_info == NULL);
 			}
@@ -1932,10 +1938,12 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
  * just a single subpath (and hence aren't doing anything
  * useful).
  */
-cheapest_startup = get_singleton_append_subpath(cheapest_startup);
+if (cheapest_startup)
+{
+	cheapest_startup = get_singleton_append_subpath(cheapest_startup);
+	startup_subpaths = lappend(startup_subpaths, cheapest_startup);
+}
 cheapest_total = get_singleton_append_subpath(cheapest_total);
-
-startup_subpaths = lappend(startup_subpaths, cheapest_startup);
 total_subpaths = lappend(total_subpaths, cheapest_total);
 
 if (cheapest_fractional)
@@ -1950,8 +1958,10 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
  * Otherwise, rely on accumulate_append_subpath to collect the
  * child paths for the MergeAppend.
  */
-accumulate_append_subpath(cheapest_startup,
-		  &startup_subpaths, NULL);
+if (cheapest_startup)
+	accumulate_append_subpath(cheapest_startup,
+			  &startup_subpaths, NULL);
+
 accumulate_append_subpath(cheapest_total,
 		  &total_subpaths, NULL);
 
@@ -1965,15 +1975,16 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 		if (match_partition_order)
 		{
 			/* We only need Append */
-			add_path(rel, (Path *) create_append_path(root,
-	  rel,
-	  startup_subpaths,
-	  NIL,
-	  pathkeys,
-	  NULL,
-	  0,
-	  false,
-	  -1));
+			if (startup_subpaths)
+add_path(rel, (Path *) create_append_path(root,
+		  rel,
+		  startup_subpaths,
+		  NIL,
+		  pathkeys,
+		  NULL,
+

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-23 Thread Jacob Champion

On Wed, Apr 23, 2025 at 8:39 AM Christoph Berg  wrote:
> This will cause problems when programs are running while packages are
> updated on disk. That program then tries to dlopen 18-0.so when there
> is already 18-1.so installed. Relevant when the first oauth connection
> is made way after startup.

Ugh, good point. This hazard applies to the previous suggestion of
pkglibdir, too, but in that case it would have been silent...

> This is trading one problem for another, but within-a-major ABI
> changes should be much rarer than normal minor updates with
> applications restarting only later.

But the consequences are much worse for a silent ABI mismatch. Imagine
if libpq-oauth examines the wrong pointer inside PGconn for a
security-critical check.

> Alternatively, there could be a dedicated SONAME for the plugin that
> only changes when necessary, but perhaps the simple "18.so" solution
> is good enough.

I don't think SONAME helps us, does it? We're not using it in dlopen().

We could all agree to bump the second number in the filename whenever
there's an internal ABI change. That works from a technical
perspective, but it's hard to test and enforce and... just not forget.
Or, I may still be able to thread the needle with a fuller lookup
table, and remove the dependency on libpq-int.h; it's just not going
to be incredibly pretty. Thinking...

Thanks so much for your continued review!

--Jacob

sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql)

2025-04-23 Thread Christoph Berg

Re: George MacKerron
> SMALLER IDEA
> 
> I’d suggest two new special sslrootcert values:
> 
> (1) sslrootcert=openssl
> 
> This does exactly what sslrootcert=system does now, but is less confusingly 
> named for Windows users. sslrootcert=system becomes a deprecated synonym for 
> this option.
> 
> (2) sslrootcert=os
> 
> This does what I was proposing in my patch: it uses winstore on Windows and 
> behaves the same as sslrootcert=openssl elsewhere, where openssl *is* the 
> operating system SSL provider.
> 
> These changes would be fully backwards-compatible.

On Linux/*ix, there would be 3 things that are all the same.

If the Windows Openssl store is that bad, wouldn't the smarter thing
to do for PG19 to use winstore by default? The Openssl one would still
be available when requested explicitly. This would avoid the
proliferation of default values.

> BIGGER IDEA
> 
> * Entirely remove the current default, sslmode=prefer, and make explicitly 
> asking for sslmode=prefer an error. After all, as the docs themselves point 
> out for sslmode=prefer: “this makes no sense from a security point of view”.

(It's not really secure, but opportunistic "use SSL when available" is
still better than nothing.)

> * Create a new option, sslmode=secure, which means sslmode=verify-full + 
> sslrootcert=os. Make this the default!

I like the name.

> In summary, you end up with these as sslmode values:
> 
> * disabled
> * insecure (formerly known as require)
> * verify-ca
> * verify-full
> * secure (the new default, meaning sslmode=verify-full + sslrootcert=os)
> 
> Obviously this would need to be well-trailed ahead of time, as some people 
> would need to make changes to how they use psql/libpq. But it would peg the 
> default security of a Postgres connection at the same level as the security 
> of any random blog page (which I think is a bare minimum one might aspire to).

I agree that this would be a good change for SSL users, and also one
that people would likely be willing to buy.

The big problem here is that a lot of installations are not using SSL
at all (default on RPM), and another big chunk is using SSL, but
relying on the default snakeoil certificates to just work (default on
Debian), so this would not be "some people" but more like "everyone
except the few who have already configured certificates properly".

These people would have to change every single connection string to
include "sslmode=disabled" or the like. This will likely not be
received well.

Before we can make this change, I think we would have to improve the
UX. psql does not even have any --switch for it. PostgreSQL serving
non-SSL and SSL on the same port doesn't make the UX better... :-/

Christoph

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-04-23 Thread Nathan Bossart

This one was briefly discussed in an RMT meeting.

On Wed, Apr 09, 2025 at 01:16:20PM +0800, jian he wrote:
> attached patch is for address pg_dump inconsistency
> when parent is "not null not valid" while child is "not null".

I see an open item [0] that points to this patch, but there's no owner
listed, and there doesn't appear to have been any follow-up discussion.
Álvaro, should I list you as the owner, and if so, are you planning to look
at it soon?

[0] https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items

-- 
nathan

Re: extension_control_path and "directory"

2025-04-23 Thread David E. Wheeler

On Apr 23, 2025, at 09:50, Christoph Berg  wrote:

> Remembering which path the .control file was found in and from there
> open the extension "directory" doesn't sound too hard. Why does it
> have to be more complicated?

This was my question, as well. Do you have a WIP patch to share, Matheus?

> Also, re-running a search path discovery for the directory is probably
> just wrong, if there are different extension versions in the "control"
> search path and the "extensions" search path, it might lead to weird
> version skew problems.

I assumed we would just have one or the other GUCs, not both.

> The number of extensions using that feature is limited, though, so it
> wouldn't be a huge problem:

FWIW it’s a a simple patch to make semver work, and probably also for the 
others. It’s just the reverse of this change[1]:

```patch
--- a/Makefile
+++ b/Makefile
@@ -5,7 +5,6 @@ EXTVERSION   = $(shell grep -m 1 '[[:space:]]\{8\}"version":' 
META.json | \
 DISTVERSION  = $(shell grep -m 1 '[[:space:]]\{3\}"version":' META.json | \
sed -e 
's/[[:space:]]*"version":[[:space:]]*"\([^"]*\)",\{0,1\}/\1/')
  -MODULEDIR= semver
 DATA = $(wildcard sql/*.sql)
 DOCS = $(wildcard doc/*.mmd)
 TESTS= $(wildcard test/sql/*.sql)
--- a/semver.control
+++ b/semver.control
@@ -1,7 +1,5 @@
 # semver extension
 comment = 'Semantic version data type'
 default_version = '0.32.1'
-
-directory = 'semver'
 module_pathname = '$libdir/semver'
 relocatable = true

```

I think I’ll write a blog post this week recommending people not use these 
directives, and also to remove `$lib/` from `module_pathname`.

Best,

David

[1]: https://github.com/theory/pg-semver/commit/88b3abd



signature.asc
Description: Message signed with OpenPGP

Re: Doc: fix the rewrite condition when executing ALTER TABLE ADD COLUMN

2025-04-23 Thread Nathan Bossart

This one was briefly discussed in an RMT meeting.

On Mon, Mar 24, 2025 at 11:37:20AM +0100, Álvaro Herrera wrote:
> On 2025-Mar-24, jian he wrote:
> 
>> In the current development branch,
>> virtual generated columns still do not support the domain.
>> you can not change the generation expression if it contains a check
>> constraint on it.
>> so virtual generated columns don't need rewriting.
>> 
>> IMHO, the committed doc description didn't mention this exception.
>> we need some text to cover this exception?
> 
> I'd add a note about these two things to the open items page, and wait
> to see if we get some of these limitations fixed, so that if we don't,
> we remember to note this limitation in the documentation.

Are we still waiting on something for this, or should we proceed with the
documentation changes?  It doesn't seem tremendously urgent, but I noticed
it's been about a month since the last message on this thread.

-- 
nathan

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-23 Thread Christoph Berg

Re: Jacob Champion
> But the consequences are much worse for a silent ABI mismatch. Imagine
> if libpq-oauth examines the wrong pointer inside PGconn for a
> security-critical check.

True.

> > Alternatively, there could be a dedicated SONAME for the plugin that
> > only changes when necessary, but perhaps the simple "18.so" solution
> > is good enough.
> 
> I don't think SONAME helps us, does it? We're not using it in dlopen().

That was paraphrasing, with SONAME I meant "library file name that
changes when the ABI changes".

> We could all agree to bump the second number in the filename whenever
> there's an internal ABI change. That works from a technical
> perspective, but it's hard to test and enforce and... just not forget.

It's hopefully not harder than checking ABI compatibility of any other
libpq change, just a different number. If that number is in the
meson.build in the same directory, people should be able to connect
the dots.

Btw, if we have that number, we might as well drop the MAJOR part as
well... apt.pg.o is always shipping the latest libpq5, so major libpq
upgrades while apps are running are going to happen. (But this is just
once a year and much less problematic than minor upgrades and I'm not
going to complain if MAJOR is kept.)

> Or, I may still be able to thread the needle with a fuller lookup
> table, and remove the dependency on libpq-int.h; it's just not going
> to be incredibly pretty. Thinking...

Don't overdesign it...

Christoph

Re: Enable data checksums by default

2025-04-23 Thread Christoph Berg

Re: Tomas Vondra
> We went through the open items on the RMT team meeting today, and my
> impression was the questions are mostly about performance of having
> checksums by default, but now I realize the thread talks about "upgrade
> experience" which seems fairly wide.

Fwiw, Debian's pg_upgradecluster already knows about that change. With
the default "dump" upgrade method, clusters will be transitioned to
checksums enabled, while "upgrade" (= pg_upgrade) upgrades will stick
with whatever was there before.

Christoph

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-23 Thread Christoph Berg

Re: Jacob Champion
> - Per ABI comment upthread, we are back to major-minor versioning for
> the shared library (e.g. libpq-oauth-18-0.so). 0001 adds the macros
> and makefile variables to make this easy, and 0002 is the bulk of the
> change now.

This will cause problems when programs are running while packages are
updated on disk. That program then tries to dlopen 18-0.so when there
is already 18-1.so installed. Relevant when the first oauth connection
is made way after startup.

This is trading one problem for another, but within-a-major ABI
changes should be much rarer than normal minor updates with
applications restarting only later.

Alternatively, there could be a dedicated SONAME for the plugin that
only changes when necessary, but perhaps the simple "18.so" solution
is good enough.

Christoph

Re: vacuumdb --missing-stats-only and pg_upgrade from PG13

2025-04-23 Thread Nathan Bossart

Here is what I have staged for commit.  I'll aim to commit these patches
sometime next week to give time for additional feedback.

-- 
nathan
>From 6d1e608edb17d9bbaaf7d57ace35fee68ff869c0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 23 Apr 2025 10:11:46 -0500
Subject: [PATCH v2 1/2] Further adjust guidance for running vacuumdb after
 pg_upgrade.

Since pg_upgrade does not transfer the cumulative statistics used
for triggering autovacuum and autoanalyze, the server may take much
longer than expected to process them post-upgrade.  Currently, the
pg_upgrade documentation recommends analyzing only relations for
which optimizer statistics were not carried over during upgrade.
This commit appends another recommendation to also analyze all
relations to update the relevant cumulative statistics, similar to
the recommendation for pg_stat_reset().

Reported-by: Christoph Berg 
Reviewed-by: Christoph Berg 
Discussion: https://postgr.es/m/aAfxfKC82B9NvJDj%40msg.df7cb.de
---
 doc/src/sgml/ref/pgupgrade.sgml | 12 +++-
 src/bin/pg_upgrade/check.c  |  9 ++---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index df13365b287..648c6e2967c 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -833,17 +833,19 @@ psql --username=postgres --file=script.sql postgres
 
 
  Because not all statistics are not transferred by
- pg_upgrade, you will be instructed to run a command to
+ pg_upgrade, you will be instructed to run commands to
  regenerate that information at the end of the upgrade.  You might need to
  set connection parameters to match your new cluster.
 
 
 
- Using vacuumdb --all --analyze-only 
--missing-stats-only
- can efficiently generate such statistics.  Alternatively,
+ First, use
  vacuumdb --all --analyze-in-stages --missing-stats-only
- can be used to generate minimal statistics quickly.  For either command,
- the use of --jobs can speed it up.
+ to quickly generate minimal optimizer statistics for relations without
+ any.  Then, use vacuumdb --all --analyze-only to ensure
+ all relations have updated cumulative statistics for triggering vacuum and
+ analyze.  For both commands, the use of --jobs can speed
+ it up.
  If vacuum_cost_delay is set to a non-zero
  value, this can be overridden to speed up statistics generation
  using PGOPTIONS, e.g., PGOPTIONS='-c
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 18c2d652bb6..940fc77fc2e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -814,9 +814,12 @@ output_completion_banner(char *deletion_script_file_name)
}
 
pg_log(PG_REPORT,
-  "Some optimizer statistics may not have been transferred by 
pg_upgrade.\n"
-  "Once you start the new server, consider running:\n"
-  "%s/vacuumdb %s--all --analyze-in-stages 
--missing-stats-only", new_cluster.bindir, user_specification.data);
+  "Some statistics are not transferred by pg_upgrade.\n"
+  "Once you start the new server, consider running these two 
commands:\n"
+  "%s/vacuumdb %s--all --analyze-in-stages 
--missing-stats-only\n"
+  "%s/vacuumdb %s--all --analyze-only",
+  new_cluster.bindir, user_specification.data,
+  new_cluster.bindir, user_specification.data);
 
if (deletion_script_file_name)
pg_log(PG_REPORT,
-- 
2.39.5 (Apple Git-154)

>From 299c8bea787fb1637b58390d351a07aaa2521ac9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 23 Apr 2025 10:37:42 -0500
Subject: [PATCH v2 2/2] vacuumdb: Don't skip empty relations in
 --missing-stats-only.

Presently, --missing-stats-only skips relations with reltuples set
to 0 because empty relations don't get optimizer statistics.
However, before v14, a reltuples value of 0 was ambiguous: it could
either mean the relation is empty, or it could mean that it hadn't
yet been vacuumed or analyzed.  (Commit 3d351d916b taught Postgres
14 and newer to use -1 for the latter case.)  This ambiguity can
cause --missing-stats-only to inadvertently skip relations that
need optimizer statistics.

To fix, simply remove the check for reltuples != 0.  This will
cause --missing-stats-only to analyze some empty tables, but that
doesn't seem too terrible a trade-off.

Reported-by: Christoph Berg 
Reviewed-by: Christoph Berg 
Discussion: https://postgr.es/m/aAjyvW5_fRGNr7yF%40msg.df7cb.de
---
 src/bin/scripts/vacuumdb.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 22067faaf7d..79b1096eb08 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -954,7 +954,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,

Re: [PATCH] dynahash: add memory allocation failure check

2025-04-23 Thread Aleksander Alekseev

Hi Maksim,

> I found a case of potential NULL pointer dereference.
> In src/backend/utils/hash/dynahash.c in function HTAB *hash_create() the
> result of the DynaHashAlloc() is used unsafely.
> The function DynaHashAlloc() calls MemoryContextAllocExtended() with
> MCXT_ALLOC_NO_OOM and can return a NULL pointer.
> Added the pointer check for avoiding a potential problem.

Thanks for the patch. It looks correct to me.

I didn't check if it needs to be back-ported and if it does - to how
many branches.

-- 
Best regards,
Aleksander Alekseev

Does RENAME TABLE rename associated identity sequence?

2025-04-23 Thread Jason Song

Hi hackers,

I was wondering if there's any built-in functionality in PostgreSQL where
renaming a table with an identity column would also rename the
auto-generated sequence associated with that identity column.

In my case, I renamed a table that used `GENERATED BY DEFAULT AS IDENTITY`,
and later when I ran `pg_dump`, I noticed that the sequence name was
unchanged (e.g., still `old_table_id_seq`). As a result, any `setval()` or
sequence-related operations referenced the old sequence name, even though
the table name had changed.

I realize this can be worked around — for example, by using
`--exclude-table-data` to skip the `setval()` or manually renaming the
sequence after the table rename. But I'm curious if there are any plans (or
technical reasons against) supporting something like `ALTER TABLE ...
RENAME ... WITH SEQUENCE`, or having the sequence name automatically follow
the table rename when it was originally auto-generated by an identity
column.

Thanks for your time!

Re: extension_control_path and "directory"

2025-04-23 Thread Christoph Berg

Re: Matheus Alcantara
> I've tried to implement some kind of "SHAREDIR search path" as we've
> discussed on another thread [1] but it shows out that we need some
> considerable refactoring to make it work.

Remembering which path the .control file was found in and from there
open the extension "directory" doesn't sound too hard. Why does it
have to be more complicated?

Also, re-running a search path discovery for the directory is probably
just wrong, if there are different extension versions in the "control"
search path and the "extensions" search path, it might lead to weird
version skew problems.

> Talking with Peter privately IIUC we concluded that we may document
> this limitation that using extension control path with an extension that
> uses a custom "directory" field on .control file will not work. I think
> that we may add a note section on "extension_control_path" doc on [2],
> what do you think?

Seen from Debian, this would be a regression since it worked with my
custom patch.

The number of extensions using that feature is limited, though, so it
wouldn't be a huge problem:

$ grep directory */*/*.control
pgfincore/pgfincore/pgfincore.control:directory = pgfincore
postgresql-pgmp/postgresql-pgmp/pgmp.control:directory = 'pgmp'
postgresql-semver/postgresql-semver/semver.control:directory = 'semver'

(Not including extensions generating the .control file at build time.)

Christoph

Re: vacuumdb --missing-stats-only and pg_upgrade from PG13

2025-04-23 Thread Nathan Bossart

On Wed, Apr 23, 2025 at 04:01:33PM +0200, Christoph Berg wrote:
> If I create a table in a PG13-or-earlier cluster, never ANALYZE it,
> and then pg_upgrade to 18 and run vacuumdb --analyze-only
> --missing-stats-only, the table will not get analyzed. The only table
> visited there is pg_largeobject.

I suspect this is due to commit 3d351d9, which started using -1 for
reltuples before the first vacuum/analyze.  Before that, we set it to 0,
which could also mean the table is empty.  --missing-stats-only checks for
reltuples != 0.

My first reaction is that we should just remove the reltuples != 0 check.
That means vacuumdb might analyze some empty tables, but that doesn't seem
too terrible.

-- 
nathan
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 22067faaf7d..79b1096eb08 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -954,7 +954,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
 " EXISTS (SELECT NULL 
FROM pg_catalog.pg_attribute a\n"
 " WHERE a.attrelid 
OPERATOR(pg_catalog.=) c.oid\n"
-" AND c.reltuples 
OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
 " AND a.attnum 
OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
 " AND NOT 
a.attisdropped\n"
 " AND a.attstattarget 
IS DISTINCT FROM 0::pg_catalog.int2\n"
@@ -967,7 +966,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
 " OR EXISTS (SELECT 
NULL FROM pg_catalog.pg_statistic_ext e\n"
 " WHERE e.stxrelid 
OPERATOR(pg_catalog.=) c.oid\n"
-" AND c.reltuples 
OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
 " AND e.stxstattarget 
IS DISTINCT FROM 0::pg_catalog.int2\n"
 " AND NOT EXISTS 
(SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
 " WHERE d.stxoid 
OPERATOR(pg_catalog.=) e.oid\n"
@@ -979,7 +977,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
 " JOIN 
pg_catalog.pg_index i"
 " ON i.indexrelid 
OPERATOR(pg_catalog.=) a.attrelid\n"
 " WHERE i.indrelid 
OPERATOR(pg_catalog.=) c.oid\n"
-" AND c.reltuples 
OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
 " AND 
i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
 " 
OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
 " AND a.attnum 
OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
@@ -994,7 +991,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
 " OR EXISTS (SELECT 
NULL FROM pg_catalog.pg_attribute a\n"
 " WHERE a.attrelid 
OPERATOR(pg_catalog.=) c.oid\n"
-" AND c.reltuples 
OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
 " AND a.attnum 
OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
 " AND NOT 
a.attisdropped\n"
 " AND a.attstattarget 
IS DISTINCT FROM 0::pg_catalog.int2\n"
@@ -1011,7 +1007,6 @@ retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
appendPQExpBufferStr(&catalog_query,
 " OR EXISTS (SELECT 
NULL FROM pg_catalog.pg_statistic_ext e\n"
 " WHERE e.stxrelid 
OPERATOR(pg_catalog.=) c.oid\n"
-" AND c.reltuples 
OPERATOR(pg_catalog.!=) 0::pg_catalog.float4\n"
 " AND e.stxstattarget 
IS DISTINCT FROM 0::pg_catalog.int2\n"
 " AND 
c.relhassubclass\n"
 " AND NOT 
p.inherited\n"

Re: vacuumdb --missing-stats-only and pg_upgrade from PG13

2025-04-23 Thread Christoph Berg

Re: Nathan Bossart
> My first reaction is that we should just remove the reltuples != 0 check.
> That means vacuumdb might analyze some empty tables, but that doesn't seem
> too terrible.

Or some tables that aren't empty but were never analyzed when they
should have been. Sounds like a good thing.

Thanks!

Christoph

Re: What's our minimum supported Python version?

2025-04-23 Thread Jelte Fennema-Nio

On Wed, 23 Apr 2025 at 13:24, Devrim Gündüz  wrote:
> psycopg is included in RHEL 8, but PGDG packages are up2date (2.7.5 vs
> 2.9.5) so they override OS packages. That is why things will break.
>
> A solution would be creating our own psycopg2 (likely for Python 3.12)
> package, update all PGDG packages that depend on psycopg2 to use that
> package. That is not impossible (I have already psycopg2 built against
> Python 3.11 on SLES 15), but I don't know how much work it will be and
> its impact as that set of update should go to both RHEL 8, 9 and 10
> (RHEL 10 already includes Python 3.12 by default)

Thanks for that explanation, now I understand the problem.

> I can go for this solution if it is *absolutely* required. We already
> have custom packages to support PostGIS, so that is not a new topic for
> me.

I don't really think this would be required to bump Postgres its
minimum supported postgres version. psycopg is a client and postgresql
is the server, it seems fine for them to use different Python
versions. The postgresql-server package could depend on python3.9,
while psycopg packages would happily use python3.6.

vacuumdb --missing-stats-only and pg_upgrade from PG13

2025-04-23 Thread Christoph Berg

Re: To Nathan Bossart
> > Update guidance for running vacuumdb after pg_upgrade.
> > 
> > Now that pg_upgrade can carry over most optimizer statistics, we
> > should recommend using vacuumdb's new --missing-stats-only option
> > to only analyze relations that are missing statistics.
> 
> I've been looking at vacuumdb --missing-stats-only because Debian's
> pg_upgradecluster is using that now.

The reason I was looking closely yesterday is because Debian's
regression tests were tripping over it, but I only figured out the
problem today:

If I create a table in a PG13-or-earlier cluster, never ANALYZE it,
and then pg_upgrade to 18 and run vacuumdb --analyze-only
--missing-stats-only, the table will not get analyzed. The only table
visited there is pg_largeobject.

Upgrades from 14..17 are fine.

Christoph

Re: Fix slot synchronization with two_phase decoding enabled

2025-04-23 Thread Masahiko Sawada

On Tue, Apr 22, 2025 at 3:00 AM Amit Kapila  wrote:
>
> On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:
> > >
> > > On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
> > >  wrote:
> > > >
> > > >
> > > > --
> > > > Approach 2
> > > > --
> > > >
> > > > Instead of disallowing the use of two-phase and failover together, a 
> > > > more
> > > > flexible strategy could be only restrict failover for slots with 
> > > > two-phase
> > > > enabled when there's a possibility of existing prepared transactions 
> > > > before
> > > the
> > > > two_phase_at that are not yet replicated. During slot creation with
> > > two-phase
> > > > and failover, we could check for any decoded prepared transactions when
> > > > determining the decoding start point (DecodingContextFindStartpoint). 
> > > > For
> > > > subsequent attempts to alter failover to true, we ensure that 
> > > > two_phase_at is
> > > > less than restart_lsn, indicating that all prepared transactions have 
> > > > been
> > > > committed and replicated, thus the bug would not happen.
> > > >
> > > > pros:
> > > >
> > > > This method minimizes restrictions for users. Especially during slot 
> > > > creation
> > > > with (two_phase=on, failover=on), as it’s uncommon for transactions to
> > > prepare
> > > > during consistent snapshot creation, the restriction becomes almost
> > > > unnoticeable.
> > >
> > > I think this approach can work for the transactions that are prepared
> > > while the slot is created. But if I understand the problem correctly,
> > > while the initial table sync is performing, the slot's two_phase is
> > > still false, so we need to deal with the transactions that are
> > > prepared during the initial table sync too. What do you think?
> > >
> >
> > Yes, I agree that we need to restrict this case too. Given that we haven't
> > started decoding when setting two_phase=true during CreateDecodingContext()
> > after tablesync, we could check prepared transactions afterwards during
> > decoding. This could involve reporting an ERROR when skipping a prepared
> > transaction during decoding if its prepare LSN is less than two_phase_at.
> >
>
> It will make it difficult for users to detect it as this happens at a
> later point of time.
>
> > Alternatively, a simpler method would be to prevent this situation entirely
> > during the CREATE SUBSCRIPTION command. For example, we could restrict slots
> > created with failover set to true and twophase is later modified to true 
> > after
> > tablesync. Although the simpler check is more user-visible, it may offer 
> > less
> > flexibility.
> >
>
> I agree with your point, but OTOH, I am also afraid of adding too many
> smart checks in the back-branch. If we follow what you say here, then
> users have the following ways in PG17 to enable both failover and
> two_phase. (a) During Create Subscription, users can set both
> 'failover' and 'two_phase', if 'copy_data' is false, or (b), if
> 'copy_data' is true, during Create Subscription, then users can enable
> 'two_phase' and wait for it to be enabled. Then use Alter Subscription
> to set 'failover'.

Yet another idea would be to disallow enabling both two_phase and
failover at CREATE SUBSCRIPTION regardless of copy_data value and to
add check when enabling failover for the two_phase-enabled-slots. For
example, users who want to enable both need to do two steps:

1. CREATE SUBSCRIPTION with two_phase = true and failover = false.
2. ALTER SUBSCRIPTION with failover = true.

At ALTER SUBSCRIPTION with failover = true, the subscriber checks if
the two_phase states is ready (and possibly check if the slot's
two_phase has been enabled too), otherwise raises an ERROR. Then, when
the publisher enables the failover for the two_phase-enabled-slot up
on walrcv_alter_slot() request, it checks the slot's restart_lsn has
passed slot's two_phase_at, otherwise raise an error with the message
like "the slot need to consume change upto %X/%X to enable failover".

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Logical Replication of sequences

2025-04-23 Thread Peter Smith

Hi Vignesh,

Some comments for v20250422-0004.

==
src/backend/commands/sequence.c

pg_sequence_state:

1.
+ * The page_lsn will be utilized in logical replication sequence
+ * synchronization to record the page_lsn of sequence in the
pg_subscription_rel
+ * system catalog. It will reflect the page_lsn of the remote sequence at the
+ * moment it was synchronized.
+ *

SUGGESTION (minor rewording)
The page LSN will be used in logical replication of sequences to
record the LSN of the sequence page in the pg_subscription_rel system
catalog.  It reflects the LSN of the remote sequence at the time it
was synchronized.

==
src/backend/commands/subscriptioncmds.c

AlterSubscription:

2.
- case ALTER_SUBSCRIPTION_REFRESH:
+ case ALTER_SUBSCRIPTION_REFRESH_PUBLICATION_SEQUENCES:
+ {
+ if (!sub->enabled)
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("ALTER SUBSCRIPTION ... REFRESH PUBLICATION SEQUENCES is not
allowed for disabled subscriptions"));
+
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ...
REFRESH PUBLICATION SEQUENCES");
+
+ AlterSubscription_refresh(sub, true, NULL, false, true, true);
+
+ break;
+ }
+
+ case ALTER_SUBSCRIPTION_REFRESH_PUBLICATION:

I felt these should be reordered so REFRESH PUBLICATION comes before
REFRESH PUBLICATION SEQUENCES. No particular reason, but AFAICT that
is how you've ordered them in all other places -- eg, gram.y, the
documentation, etc. -- so let's be consistent.

==
src/backend/replication/logical/launcher.c

3.
+/*
+ * Update the failure time of the sequencesync worker in the subscription's
+ * apply worker.
+ *
+ * This function is invoked when the sequencesync worker exits due to a
+ * failure.
+ */
+void
+logicalrep_seqsyncworker_failuretime(int code, Datum arg)

It might be better to call this function name
'logicalrep_seqsyncworker_failure' (not _failuretime) because it is
more generic, and in future, you might want to do more things in this
function apart from just setting the failure time.

==
src/backend/replication/logical/syncutils.c

SyncFinishWorker:

4.
+ /* This is a clean exit, so no need to set a sequence failure time. */
+ if (wtype == WORKERTYPE_SEQUENCESYNC)
+ cancel_before_shmem_exit(logicalrep_seqsyncworker_failuretime, 0);
+

I didn't think the comment should mention setting 'failure time'.
Those details belong at a lower level -- here, it is better to be more
generic.

SUGGESTION:
/* This is a clean exit, so no need for any sequence failure logic. */

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: What's our minimum supported Python version?

2025-04-23 Thread Devrim Gündüz

Hi,

On Tue, 2025-04-22 at 15:33 -0700, Jacob Champion wrote:
>  I'm still hopeful that Devrim has some evidence in favor of bumping
> to 3.8 or 3.9. :)

I would love to have such an evidence -- but I don't have :) In the last
couple of weeks I've also been thinking of bumping every single Python
piece in the PGDG RPM repository to 3.9 (at least) on RHEL 8, but that
will break many things immediately. It is still a very major platform
for users and such a breakage is not welcome.

Regards,

-- 
Devrim Gündüz
Open Source Solution Architect, PostgreSQL Major Contributor
BlueSky: @devrim.gunduz.org , @gunduz.org

signature.asc
Description: This is a digitally signed message part

Re: What's our minimum supported Python version?

2025-04-23 Thread Jelte Fennema-Nio

On Wed, 23 Apr 2025 at 10:15, Devrim Gündüz  wrote:
> I would love to have such an evidence -- but I don't have :) In the last
> couple of weeks I've also been thinking of bumping every single Python
> piece in the PGDG RPM repository to 3.9 (at least) on RHEL 8, but that
> will break many things immediately. It is still a very major platform
> for users and such a breakage is not welcome.

Sad... Sounds like a bump to 3.9 is probably out of the question for PG18 then.

Can you share some examples of the things that break? Because if (and
I'm hopeful that Jacob will be adding some) for PG19 we get more
testing infrastructure in Python, I wouldn't want all that extra
testing infra to be stuck on Python3.6 for the next 5 years.

I think the UX that we should make work for compiling/testing infra on
RHEL8 is the following:
yum install python39
./configure PYTHON=python39  # or equivalent meson
make check-world

That seems like a reasonable hurdle for people using RHEL8 to jump
over if they want to compile themselves. Especially since when
installing from pgdg-rpm the package could simply depend on the
python39 package.

I haven't tried, but I'm pretty sure we're not there yet. At minimum I
believe we'd have to change our shebang's from:
#!/usr/bin/env python3

To something that takes into account the passed in PYTHON option.

Re: What's our minimum supported Python version?

2025-04-23 Thread Devrim Gündüz

Hi,

On Wed, 2025-04-23 at 10:47 +0200, Jelte Fennema-Nio wrote:
> Can you share some examples of the things that break?

The most notable one would be Psycopg (2 and 3). Plus Patroni
dependencies. There may be a couple of more.

Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, PostgreSQL Major Contributor
BlueSky: @devrim.gunduz.org , @gunduz.org


signature.asc
Description: This is a digitally signed message part

Re: What's our minimum supported Python version?

2025-04-23 Thread Jelte Fennema-Nio

On Wed, 23 Apr 2025 at 00:33, Jacob Champion
 wrote:
> As long as the need to backport to PG18 doesn't freeze that
> conversation in place, I suppose.

I'm confused. Are you intending to backport new test infra to PG18?

Looking at the amount of python code that we currently have, I agree
with Tom: Making the few scripts that we have compatible with
Python3.6 seems the best solution for PG18. (especially since you
already have a patch that fixes the known issues).

Given the purpose and pretty small size of the scripts, I expect that
it'd be extremely rare for us to backport changes to them. e.g. I
doubt your oauth server would need many changes to keep working
correctly for the lifetime of PG18. Maybe it gets some minor fixes in
18.1 and/or 18.2 but other than that it seems really unlikely.

Re: Fix slot synchronization with two_phase decoding enabled

2025-04-23 Thread Amit Kapila

On Wed, Apr 23, 2025 at 11:04 PM Masahiko Sawada  wrote:
>
> On Tue, Apr 22, 2025 at 3:00 AM Amit Kapila  wrote:
> >
> > On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote:
> > > >
> > > > On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu)
> > > >  wrote:
> > > > >
> > > > >
> > > > > --
> > > > > Approach 2
> > > > > --
> > > > >
> > > > > Instead of disallowing the use of two-phase and failover together, a 
> > > > > more
> > > > > flexible strategy could be only restrict failover for slots with 
> > > > > two-phase
> > > > > enabled when there's a possibility of existing prepared transactions 
> > > > > before
> > > > the
> > > > > two_phase_at that are not yet replicated. During slot creation with
> > > > two-phase
> > > > > and failover, we could check for any decoded prepared transactions 
> > > > > when
> > > > > determining the decoding start point (DecodingContextFindStartpoint). 
> > > > > For
> > > > > subsequent attempts to alter failover to true, we ensure that 
> > > > > two_phase_at is
> > > > > less than restart_lsn, indicating that all prepared transactions have 
> > > > > been
> > > > > committed and replicated, thus the bug would not happen.
> > > > >
> > > > > pros:
> > > > >
> > > > > This method minimizes restrictions for users. Especially during slot 
> > > > > creation
> > > > > with (two_phase=on, failover=on), as it’s uncommon for transactions to
> > > > prepare
> > > > > during consistent snapshot creation, the restriction becomes almost
> > > > > unnoticeable.
> > > >
> > > > I think this approach can work for the transactions that are prepared
> > > > while the slot is created. But if I understand the problem correctly,
> > > > while the initial table sync is performing, the slot's two_phase is
> > > > still false, so we need to deal with the transactions that are
> > > > prepared during the initial table sync too. What do you think?
> > > >
> > >
> > > Yes, I agree that we need to restrict this case too. Given that we haven't
> > > started decoding when setting two_phase=true during 
> > > CreateDecodingContext()
> > > after tablesync, we could check prepared transactions afterwards during
> > > decoding. This could involve reporting an ERROR when skipping a prepared
> > > transaction during decoding if its prepare LSN is less than two_phase_at.
> > >
> >
> > It will make it difficult for users to detect it as this happens at a
> > later point of time.
> >
> > > Alternatively, a simpler method would be to prevent this situation 
> > > entirely
> > > during the CREATE SUBSCRIPTION command. For example, we could restrict 
> > > slots
> > > created with failover set to true and twophase is later modified to true 
> > > after
> > > tablesync. Although the simpler check is more user-visible, it may offer 
> > > less
> > > flexibility.
> > >
> >
> > I agree with your point, but OTOH, I am also afraid of adding too many
> > smart checks in the back-branch. If we follow what you say here, then
> > users have the following ways in PG17 to enable both failover and
> > two_phase. (a) During Create Subscription, users can set both
> > 'failover' and 'two_phase', if 'copy_data' is false, or (b), if
> > 'copy_data' is true, during Create Subscription, then users can enable
> > 'two_phase' and wait for it to be enabled. Then use Alter Subscription
> > to set 'failover'.
>
> Yet another idea would be to disallow enabling both two_phase and
> failover at CREATE SUBSCRIPTION regardless of copy_data value and to
> add check when enabling failover for the two_phase-enabled-slots. For
> example, users who want to enable both need to do two steps:
>
> 1. CREATE SUBSCRIPTION with two_phase = true and failover = false.
> 2. ALTER SUBSCRIPTION with failover = true.
>
> At ALTER SUBSCRIPTION with failover = true, the subscriber checks if
> the two_phase states is ready (and possibly check if the slot's
> two_phase has been enabled too), otherwise raises an ERROR. Then, when
> the publisher enables the failover for the two_phase-enabled-slot up
> on walrcv_alter_slot() request, it checks the slot's restart_lsn has
> passed slot's two_phase_at, otherwise raise an error with the message
> like "the slot need to consume change upto %X/%X to enable failover".
>

This should further simplify the checks with an additional restriction
during the CREATE SUBSCRIPTION time. I am in favor of it because I
want the code in this area to be as much same in HEAD and backbranches
as possible.

Please note that the PG17 also suffers from data loss after promotion
due to a bug in fast-forward mode as described in email [1]. So, we
should try to get that fixed as well.

Thanks for your feedback.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB57163087F86621D44D9A72BF94BB2%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-04-23 Thread Masahiko Sawada

On Fri, Apr 4, 2025 at 1:38 AM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Mon, 31 Mar 2025 12:35:23 -0700,
>   Masahiko Sawada  wrote:
>
> > Most of the queries under test_copy_format/sql verifies the input
> > patterns of the FORMAT option. I find that the regression tests
> > included in that directory probably should focus on testing new
> > callback APIs and some regression tests for FORMAT option handling can
> > be moved into the normal regression test suite (e.g., in copy.sql or a
> > new file like copy_format.sql). IIUC testing for invalid input
> > patterns can be done even without creating artificial wrong handler
> > functions.
>
> Can we clarify what should we do for the next patch set
> before we create the next patch set? Are the followings
> correct?
>
> 1. Move invalid input patterns in
>src/test/modules/test_copy_format/sql/invalid.sql to
>src/test/regress/sql/copy.sql as much as possible.
>* We can do only 4 patterns in 16 patterns.
>* Other tests in
>  src/test/modules/test_copy_format/sql/*.sql depend on
>  custom COPY handler for test. So we can't move to
>  src/test/regress/sql/copy.sql.
> 2. Create
>src/test/modules/test_copy_format/sql/test_copy_format.sql
>and move all contents in existing *.sql to the file

Agreed.

>
> >> We can do it but I suggest that we do it as a refactoring
> >> (or cleanup) in a separated patch for easy to review.
> >
> > I think that csv_mode and binary are used mostly in
> > ProcessCopyOptions() so probably we can use local variables for that.
> > I find there are two other places where to use csv_mode:
> > NextCopyFromRawFields() and CopyToTextLikeStart(). I think we can
> > simply check the handler function's OID there, or we can define macros
> > like COPY_FORMAT_IS_TEXT/CSV/BINARY checking the OID and use them
> > there.
>
> We need this change for "ready for merge", right?

I think so.

> Can we clarify items should be resolved for "ready for
> merge"?
>
> Are the followings correct?
>
> 1. Move invalid input patterns in
>src/test/modules/test_copy_format/sql/invalid.sql to
>src/test/regress/sql/copy.sql as much as possible.
> 2. Create
>src/test/modules/test_copy_format/sql/test_copy_format.sql
>and move all contents in existing *.sql to the file.
> 3. Add comments what the tests expect to
>src/test/modules/test_copy_format/sql/test_copy_format.sql.
> 4. Remove CopyFormatOptions::{binary,csv_mode}.

Agreed with the above items.

> 5. Squash the "Support custom format" patch and the "Define
>handler functions for built-in formats" patch.
>* Could you do it when you push it? Or is it required for
>  "ready for merge"?

Let's keep them for now.

> 6. Use handler OID for detecting the default built-in format
>instead of comparing the given format as string.
> 7. Update documentation.

Agreed.

>
> There are 3 unconfirmed suggested changes for tests in:
> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
>
> Here are my opinions for them:
>
> > 1.: There is no difference between single-quoting and
> > double-quoting here. Because the information what quote
> > was used for the given FORMAT value isn't remained
> > here. Should we update gram.y?
> >
> > 2.: I don't have a strong opinion for it. If nobody objects
> > it, I'll remove them.
> >
> > 3.: I don't have a strong opinion for it. If nobody objects
> > it, I'll remove them.
>
> Is the 1. required for "ready for merge"? If so, is there
> any suggestion? I don't have a strong opinion for it.
>
> If there are no more opinions for 2. and 3., I'll remove
> them.

Agreed.

I think we would still need some rounds of reviews but the patch is
getting in good shape.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: AIO v2.5

2025-04-23 Thread Andres Freund

Hi,

On 2025-04-20 18:00:00 +0300, Alexander Lakhin wrote:
> 31.03.2025 02:46, Andres Freund wrote:
> > I've pushed most of these after some very light further editing.  Yay.  
> > Thanks
> > a lot for all the reviews!
> > 
> > So far the buildfarm hasn't been complaining, but it's early days.
> 
> I found one complaint against commit 12ce89fd0, expressed as:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2025-04-08%2001%3A36%3A41

Thanks for pointing that out and repro'ing it!


> 2025-04-08 01:41:54.670 UTC [4013251][client backend][0/2:0] DEBUG: waiting 
> for self with 0 pending

I'd like to extend this debug message with the number of in-flight IOs.  I
assume nobody will mind me doing that.



> I could reproduce this error with the following TEMP_CONFIG:
> log_min_messages = DEBUG4
> 
> log_connections = on
> log_disconnections = on
> log_statement = 'all'
> log_line_prefix = '%m [%d][%p:%l][%b] %q[%a] '
> 
> fsync = on
> io_method=io_uring
> backtrace_functions = 'pgaio_io_wait_for_free'

Hm. Several hundred iterations in without triggering the issue even once, with
that added config. Given your reproducer used fsync, I assume this was on a
real filesystem not tmpfs? What FS?  I tried on XFS.

very well could just be a timing dependent issue, the time for an fsync will
differ heavily betweeen devices...


> When running 032_relfilenode_reuse.pl in a loop, I got failures on
> iterations 10, 18, 6:
> 2025-04-20 15:56:01.310 CEST [][1517068:122][startup] DEBUG: updated min 
> recovery point to 0/4296E90 on timeline 1
> 2025-04-20 15:56:01.310 CEST [][1517068:123][startup] CONTEXT:  WAL redo at
> 0/4296E48 for Transaction/COMMIT: 2025-04-20 15:56:01.09363+02; inval msgs:
> catcache 21; sync
> 2025-04-20 15:56:01.310 CEST [][1517068:124][startup] DEBUG: waiting for all
> backends to process ProcSignalBarrier generation 5
> 2025-04-20 15:56:01.310 CEST [][1517068:125][startup] CONTEXT:  WAL redo at 
> 0/4296E90 for Database/DROP: dir 16409/50001
> 2025-04-20 15:56:01.312 CEST [postgres][1517075:144][client backend]
> [032_relfilenode_reuse.pl] DEBUG:  waiting for self with 0 pending
> 2025-04-20 15:56:01.312 CEST [postgres][1517075:145][client backend] 
> [032_relfilenode_reuse.pl] BACKTRACE:
> pgaio_io_wait_for_free at aio.c:703:2
>  (inlined by) pgaio_io_acquire at aio.c:186:3
> AsyncReadBuffers at bufmgr.c:1854:9
> StartReadBuffersImpl at bufmgr.c:1425:18
>  (inlined by) StartReadBuffers at bufmgr.c:1497:9
> read_stream_start_pending_read at read_stream.c:333:25
> read_stream_look_ahead at read_stream.c:475:3
> read_stream_next_buffer at read_stream.c:840:6
> pg_prewarm at pg_prewarm.c:214:10
> ExecInterpExpr at execExprInterp.c:927:7
> ExecEvalExprNoReturn at executor.h:447:2
> ...
> 2025-04-20 15:56:01.312 CEST [][1517068:126][startup] DEBUG: finished
> waiting for all backends to process ProcSignalBarrier generation 5
> 2025-04-20 15:56:01.312 CEST [][1517068:127][startup] CONTEXT:  WAL redo at 
> 0/4296E90 for Database/DROP: dir 16409/50001
> 2025-04-20 15:56:01.314 CEST [postgres][1517075:146][client backend]
> [032_relfilenode_reuse.pl] ERROR:  no free IOs despite no in-flight IOs

The earlier report also had a ProcSignalBarrier wait just before. That's
doesn't have to be related, the test stresses those pretty intentionally.

But it does seem like it could be related, perhaps we somehow end processing
interrupts while issuing IOs, which then again submits IOs or something? I
don't immediately see such code, e.g. pgaio_closing_fd() is careful to iterate
over the list while waiting.

I think pgaio_closing_fd() should get a DEBUG2 (or so) message documenting
that we are waiting for an IO due to closing an FD. pgaio_shutdown() as well,
while at it.


I somewhat suspect that this is the same issue that you reported earlier, that
I also couldn't repro.  That involved a lot of CREATE DATABASEs, which will
trigger a lot of procsignal barriers...  I'll try to come up with a more
dedicated stress test and see whether I can trigger a problem that way.


> I also encountered another failure when running this test:
> t/032_relfilenode_reuse.pl .. 1/? Bailout called.  Further testing stopped:  
> aborting wait: program died
> 
> Core was generated by `postgres: standby: law postgres [local] SELECT 
>   '.
> Program terminated with signal SIGABRT, Aborted.
> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid= out>) at ./nptl/pthread_kill.c:44
> 
> (gdb) bt
> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid= out>) at ./nptl/pthread_kill.c:44
> #1  __pthread_kill_internal (signo=6, threadid=) at 
> ./nptl/pthread_kill.c:78
> #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
> ./nptl/pthread_kill.c:89
> #3  0x700a5a24527e in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/posix/raise.c:26
> #4  0x700a5a2288ff in __GI_abort () at ./stdlib/abort.c:79
> #5  0x630ff2c5dcea in errfinish (filename=filename@entry=0x630ff2cdd139 
> "aio

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-23 Thread Robert Haas

On Wed, Apr 23, 2025 at 11:59 AM Robert Haas  wrote:
> heap_toast_insert_or_update care about HeapTupleHasExternal(), which
> seems like it might be a key point.

Care about HeapTupleHasVarWidth, rather.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: support ALTER COLUMN SET EXPRESSION over virtual generated column with check constraint

2025-04-23 Thread jian he

On Tue, Mar 11, 2025 at 12:17 PM jian he  wrote:
> hi.
> in summary: ATExecSetExpression, RememberAllDependentForRebuilding
> will do all the work to change the generation expression,
> whether it's virtual or stored.
>

while working on another patch, I found out this can be further simplified.
Thus a new patch is attached.
From 1a5f3da55b625911f6e381f11fe569bbb5cde888 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Thu, 24 Apr 2025 10:41:23 +0800
Subject: [PATCH v1 1/1] generated column set expression with check constraint

currently, if we have check constraints over virtual generated column
then we can not change the generation expression.

for example:
CREATE TABLE gtest20 (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL CHECK (b < 50));
INSERT INTO gtest20 (a) VALUES (10);
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3); --error

this patch is to support it.
main gotcha is in ATExecSetExpression,
RememberAllDependentForRebuilding will do all the work.

also add a test for ALTER TABLE SET EXPRESSION for virtual generated column will not
do table rewrite.

discussion: https://postgr.es/m/cacjufxh3vetr7orf5rw29gndk3n1wwboe3wdkhyd3ipgrq9...@mail.gmail.com
---
 src/backend/commands/tablecmds.c  | 31 ++-
 .../regress/expected/generated_virtual.out| 31 +++
 src/test/regress/sql/generated_virtual.sql| 19 ++--
 3 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2705cf11330..dc4eb160f69 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8601,18 +8601,6 @@ ATExecSetExpression(AlteredTableInfo *tab, Relation rel, const char *colName,
  errmsg("column \"%s\" of relation \"%s\" is not a generated column",
 		colName, RelationGetRelationName(rel;
 
-	/*
-	 * TODO: This could be done, just need to recheck any constraints
-	 * afterwards.
-	 */
-	if (attgenerated == ATTRIBUTE_GENERATED_VIRTUAL &&
-		rel->rd_att->constr && rel->rd_att->constr->num_check > 0)
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("ALTER TABLE / SET EXPRESSION is not supported for virtual generated columns on tables with check constraints"),
- errdetail("Column \"%s\" of relation \"%s\" is a virtual generated column.",
-		   colName, RelationGetRelationName(rel;
-
 	if (attgenerated == ATTRIBUTE_GENERATED_VIRTUAL && attTup->attnotnull)
 		tab->verify_new_notnull = true;
 
@@ -8642,18 +8630,17 @@ ATExecSetExpression(AlteredTableInfo *tab, Relation rel, const char *colName,
 		 * this renders them pointless.
 		 */
 		RelationClearMissing(rel);
-
-		/* make sure we don't conflict with later attribute modifications */
-		CommandCounterIncrement();
-
-		/*
-		 * Find everything that depends on the column (constraints, indexes,
-		 * etc), and record enough information to let us recreate the objects
-		 * after rewrite.
-		 */
-		RememberAllDependentForRebuilding(tab, AT_SetExpression, rel, attnum, colName);
 	}
 
+	/* make sure we don't conflict with later attribute modifications */
+	CommandCounterIncrement();
+
+	/*
+	 * Find everything that depends on the column (constraints, indexes,
+	 * etc), and record enough information to let us recreate the objects.
+	*/
+	RememberAllDependentForRebuilding(tab, AT_SetExpression, rel, attnum, colName);
+
 	/*
 	 * Drop the dependency records of the GENERATED expression, in particular
 	 * its INTERNAL dependency on the column, which would otherwise cause
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 6300e7c1d96..ae4cf5abf76 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -636,12 +636,22 @@ INSERT INTO gtest20 (a) VALUES (10);  -- ok
 INSERT INTO gtest20 (a) VALUES (30);  -- violates constraint
 ERROR:  new row for relation "gtest20" violates check constraint "gtest20_b_check"
 DETAIL:  Failing row contains (30, virtual).
-ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 100);  -- violates constraint (currently not supported)
-ERROR:  ALTER TABLE / SET EXPRESSION is not supported for virtual generated columns on tables with check constraints
-DETAIL:  Column "b" of relation "gtest20" is a virtual generated column.
-ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);  -- ok (currently not supported)
-ERROR:  ALTER TABLE / SET EXPRESSION is not supported for virtual generated columns on tables with check constraints
-DETAIL:  Column "b" of relation "gtest20" is a virtual generated column.
+ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 100);  -- violates constraint
+ERROR:  check constraint "gtest20_b_check" of relation "gtest20" is violated by some row
+ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);  -- ok
+--test no table rewrite happen
+ALTER TABLE gtest20 ALTER COLUMN b SET E

Re: Making sslrootcert=system work on Windows psql

2025-04-23 Thread George MacKerron

> On 3 Apr 2025, at 15:26, Daniel Gustafsson  wrote:
> 
>> I quite like sslrootcert=os: it’s snappy, and it implies that the Operating 
>> System root certs are going to be used (which is what I would have liked 
>> sslrootcert=system to mean). Some possible alternatives might be 
>> sslrootcert=public-cas or sslrootcert=os-default.
> 
> The thing is, we don't know that sslrootcert=os will mean the operating system
> root certs.  We are limited to what the OpenSSL API provides, and what can ask
> OpenSSL to do is use its defaults.
> 
>> The key thing is that it should work out-of-the-box basically everywhere, so 
>> my preferred behaviour for it would be:
>> 
>> * On Windows, use the Windows built-in cert store (per my original patch).
>> 
>> * On Mac and Linux, just do the exact same thing sslrootcert=system 
>> currently does.
>> 
>> Is there any realistic prospect of this?
> 
> IMV there isn't.  I can't see it being an improvement to switch the meaning of
> a value based on the underlying OpenSSL version, especially since the current
> meaning might be useful for some installations who would then lose that
> ability.  I am convinced we need to do be able to use the defaults (as we do
> now) *and* use winstore and whatever new stores come, not that one replaces 
> the
> other.
> 
>> I appreciate that it’s not the result of a lengthy, thorough and principled 
>> UX evaluation. On the other hand, it’s a few lines of code that could enable 
>> a pretty big improvement in security for many users’ Postgres connections 
>> much sooner.
> 
> To be clear, wanting to make postgres more secure is a Good Thing, and your
> efforts are much appreciated!  Don't take no's in this thread as an objection
> to your idea and mission.  Most likely we will support winstore in some way in
> v19, we just need to make sure we develop features in a way which is
> sustainable wrt our available resources and our development process.

Thanks for your appreciation! It might be good to start thinking about how 
things might look in v19, then?

Perhaps I can start things off with one smaller idea and one bigger one.

SMALLER IDEA

I’d suggest two new special sslrootcert values:

(1) sslrootcert=openssl

This does exactly what sslrootcert=system does now, but is less confusingly 
named for Windows users. sslrootcert=system becomes a deprecated synonym for 
this option.

(2) sslrootcert=os

This does what I was proposing in my patch: it uses winstore on Windows and 
behaves the same as sslrootcert=openssl elsewhere, where openssl *is* the 
operating system SSL provider.

These changes would be fully backwards-compatible.

BIGGER IDEA

A much bigger, backwards-incompatible shake-up of libpq security parameters 
might incorporate the above changes, and then proceed something like this:

* Entirely remove the current default, sslmode=prefer, and make explicitly 
asking for sslmode=prefer an error. After all, as the docs themselves point out 
for sslmode=prefer: “this makes no sense from a security point of view”.

* Rename sslmode=require to sslmode=insecure, because it’s vulnerable to MITM 
attacks, and ideally people would get a sense of that without reading the 
relevant page of the docs. Make asking for sslmode=require an error (with a 
helpful explanation pointing out the rename to sslmode=insecure).

* Retain sslmode=verify-ca and sslmode=verify-full.

* Create a new option, sslmode=secure, which means sslmode=verify-full + 
sslrootcert=os. Make this the default!

In summary, you end up with these as sslmode values:

* disabled
* insecure (formerly known as require)
* verify-ca
* verify-full
* secure (the new default, meaning sslmode=verify-full + sslrootcert=os)

Obviously this would need to be well-trailed ahead of time, as some people 
would need to make changes to how they use psql/libpq. But it would peg the 
default security of a Postgres connection at the same level as the security of 
any random blog page (which I think is a bare minimum one might aspire to).

Please do all suggest better ideas!

All the best,
George

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-23 Thread Christoph Berg

Re: Jacob Champion
> Also: since the libpq-oauth-18 and libpq-oauth-19 packages can be
> installed side-by-side safely, isn't the upgrade hazard significantly
> diminished? (If a user uninstalls the previous libpq-oauth version
> while they're still running that version of libpq in memory, _and_
> they've somehow never used OAuth until right that instant... it's easy
> enough for them to undo their mistake while the application is still
> running.)

Uhm, so far the plan was to have one "libpq-oauth" package, not several.
Since shipping a single libpq5.deb package for all PG majors has worked well
for the past decades, I wouldn't want to complicate that now.

Christoph

Re: Non-reproducible AIO failure

2025-04-23 Thread Tom Lane

Andres Freund  writes:
> On 2025-04-23 17:17:01 -0400, Tom Lane wrote:
>> My buildfarm animal sifaka just failed like this [1]:

> There's nothing really special about sifaka, is there? I see -std=gnu99 and a
> few debug -D cppflags, but they don't look they could really be relevant here.

No, it's intended to be a bog-standard macOS environment.

>> TRAP: failed Assert("aio_ret->result.status != PGAIO_RS_UNKNOWN"), File: 
>> "bufmgr.c", Line: 1605, PID: 79322

> Huh.

> I assume there's no core file left over?

No such luck, I did not have that enabled.  I will turn it on though,
and also on its sibling indri.

regards, tom lane

Re: [PATCH] contrib/xml2: xslt_process() should report XSLT-related error details

2025-04-23 Thread Tom Lane

"Robin Haberkorn"  writes:
> btw. you wrote that comment in cacd42d62cb2ddf32135b151f627780a5509780f
> back in 2011.
> The commit message also contains some background information.

Hm.  I wonder if we couldn't get rid of the
HAVE_XMLSTRUCTUREDERRORCONTEXT conditionalization.  If the
libxml2 versions that lacked that were "old" in 2011, surely
they are extinct in the wild by now.  I'm thinking that we
really don't want to be using the generic handler at all, given
the commit log's comment that it can't distinguish between
error, warning, and notice cases.

regards, tom lane

Re: Conflicting updates of command progress

2025-04-23 Thread Sami Imseih

>> pgstat_progress_start_command should only be called once by the entry
>> point for the
>> command. In theory, we could end up in a situation where start_command
>> is called multiple times during the same top-level command;

> Not only in theory - it actually happens when CLUSTER is rebuilding indexes.

In the case of CLUSTER, pgstat_progress_start_command is only called once,
but pgstat_progress_update_param is called in the context of both CLUSTER
and CREATE INDEX.

> That's a possible approach. However, if the index build takes long time in the
> CLUSTER case, the user will probably be interested in details about the index
> build.

I agree,

>> Is there a repro that you can share that shows the weird values? It sounds 
>> like
>> the repro is on top of [1]. Is that right?

>> You can reproduce the similar problem by creating a trigger function that
>> runs a progress-reporting command like COPY, and then COPY data into
>> a table that uses that trigger.

>> [2] https://commitfest.postgresql.org/patch/5282/

In this case, pgstat_progress_start_command is actually called
twice in the life of a single COPY command; the upper-level COPY
command calls pgstat_progress_start_command and then the nested COPY
command also does calls pgstat_progress_start_command.

> I think that can be implemented by moving the progress related fields from
> PgBackendStatus into a new structure and by teaching the backend to insert a
> new instance of that structure into a shared hash table (dshash.c)

I think this is a good idea in general to move the backend progress to
shared memory.
and with a new API that will deal with scenarios as described above.
1/ an (explicit) nested
command was started by a top-level command, such as the COPY case above.
2/ a top-level command triggered some other progress code implicitly, such as
CLUSTER triggering CREATE INDEX code.

I also like the shared memory approach because we can then not have to use
a message like the one introduced in f1889729dd3ab0 to support parallel index
vacuum progress 46ebdfe164c61.


-- 
Sami Imseih
Amazon Web Services (AWS)

Re: [PATCH] Support older Pythons in oauth_server.py

2025-04-23 Thread Jacob Champion

On Tue, Apr 22, 2025 at 12:45 PM Jacob Champion
 wrote:
> Great, thanks for the quick review!

And pushed. Thanks everyone!

--Jacob

Re: Logical Replication of sequences

2025-04-23 Thread Peter Smith

Hi Vignesh,

Some review comments for patch v20250422-0003.

==
src/backend/replication/logical/syncutils.c

1.
+/*
+ * Exit routine for synchronization worker.
+ */
+pg_noreturn void
+SyncFinishWorker(void)

Why does this have the pg_noreturn annotation? None of the other void
functions do.

~~~

2.
+bool
+FetchRelationStates(bool *started_tx)

All the functions in the sync utils.c are named like Syncxxx, so for
consistency, why not name this one also?
e.g. /FetchRelationStates/SyncFetchRelationStates/

==
src/backend/replication/logical/tablesync.c

3.
-static bool
-wait_for_relation_state_change(Oid relid, char expected_state)
+bool
+WaitForRelationStateChange(Oid relid, char expected_state)
 {
  char state;
~

3a.
Why isn't this static, like before?

~

3b.
If it is *only* for tables and nothing else, shouldn't it be static
and have a function name like 'wait_for_table_state_change' (not
_relation_)?
OTOH, if there is potential for this to be used for sequences in
future, then it should be in the syncutils.c module with a name like
'SyncWaitForRelationStateChange'.

==
src/include/replication/worker_internal.h

4.
@@ -250,6 +252,7 @@ extern void logicalrep_worker_stop(Oid subid, Oid relid);
 extern void logicalrep_pa_worker_stop(ParallelApplyWorkerInfo *winfo);
 extern void logicalrep_worker_wakeup(Oid subid, Oid relid);
 extern void logicalrep_worker_wakeup_ptr(LogicalRepWorker *worker);
+pg_noreturn extern void SyncFinishWorker(void);

 extern int logicalrep_sync_worker_count(Oid subid);

@@ -259,9 +262,13 @@ extern void
ReplicationOriginNameForLogicalRep(Oid suboid, Oid relid,
 extern bool AllTablesyncsReady(void);
 extern void UpdateTwoPhaseState(Oid suboid, char new_state);

-extern void process_syncing_tables(XLogRecPtr current_lsn);
-extern void invalidate_syncing_table_states(Datum arg, int cacheid,
- uint32 hashvalue);
+extern bool FetchRelationStates(bool *started_tx);
+extern bool WaitForRelationStateChange(Oid relid, char expected_state);
+extern void ProcessSyncingTablesForSync(XLogRecPtr current_lsn);
+extern void ProcessSyncingTablesForApply(XLogRecPtr current_lsn);
+extern void SyncProcessRelations(XLogRecPtr current_lsn);
+extern void SyncInvalidateRelationStates(Datum arg, int cacheid,
+ uint32 hashvalue);

~

4a.
Why does SyncFinishWorker have the pg_noreturn annotation? None of the
other void functions do.

~

4b.
I felt that all the SyncXXX functions exposed from syncutils.c should
be grouped together, and maybe even with a comment like /* from
syncutils.c */

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Non-reproducible AIO failure

2025-04-23 Thread Andres Freund

Hi,

On 2025-04-23 17:17:01 -0400, Tom Lane wrote:
> My buildfarm animal sifaka just failed like this [1]:

There's nothing really special about sifaka, is there? I see -std=gnu99 and a
few debug -D cppflags, but they don't look they could really be relevant here.


> TRAP: failed Assert("aio_ret->result.status != PGAIO_RS_UNKNOWN"), File: 
> "bufmgr.c", Line: 1605, PID: 79322

Huh.

I assume there's no core file left over? Would be helpful to query some more
state (e.g. whether operation->io_wref is valid (think it has to be) and
whether this was a partial read or such).


> 0   postgres0x000100e3df2c 
> ExceptionalCondition + 108
> 1   postgres0x000100cbb154 WaitReadBuffers + 
> 616
> 2   postgres0x000100f072bc 
> read_stream_next_buffer.cold.1 + 184
> 3   postgres0x000100cb6dd8 
> read_stream_next_buffer + 300
> 4   postgres0x0001009b75b4 
> heap_fetch_next_buffer + 136
> 5   postgres0x0001009ad8c4 heapgettup + 368
> 6   postgres0x0001009ad364 heap_getnext + 104
> 7   postgres0x0001009b9770 
> heapam_index_build_range_scan + 700
> 8   postgres0x0001009ef4b8 spgbuild + 480
> 9   postgres0x000100a41acc index_build + 428
> 10  postgres0x000100a43d14 reindex_index + 752
> 11  postgres0x000100ad09e0 ExecReindex + 1908
> 12  postgres0x000100d04628 ProcessUtilitySlow 
> + 1300
> 13  postgres0x000100d03754 
> standard_ProcessUtility + 1528
> 14  pg_stat_statements.dylib0x00010158d7a0 
> pgss_ProcessUtility + 632
> 15  postgres0x000100d02cec PortalRunUtility + 
> 184
> 16  postgres0x000100d02354 PortalRunMulti + 
> 236
> 17  postgres0x000100d01d94 PortalRun + 456
> 18  postgres0x000100d00e10 exec_simple_query 
> + 1308
> ...
> 2025-04-23 16:08:28.834 EDT [78788:4] LOG:  client backend (PID 79322) was 
> terminated by signal 6: Abort trap: 6
> 2025-04-23 16:08:28.834 EDT [78788:5] DETAIL:  Failed process was running: 
> reindex index spgist_point_idx;


> Scraping the buildfarm database confirmed that there have been no
> other similar failures since AIO went in, and unsurprisingly some
> attempts to reproduce it manually didn't bear fruit.  So I'm
> suspecting a race condition with a very narrow window, but I don't
> have an idea where to look.  Any thoughts on how to gather more
> data?

The weird thing is that the concrete symptom here doesn't actually involve
anything that could really race - the relevant state is only updated by the
process itself.  But of course there has to be something narrow here,
otherwise we'd have seen this before :(.

Greetings,

Andres Freund

Re: What's our minimum supported Python version?

2025-04-23 Thread Jacob Champion

On Wed, Apr 23, 2025 at 2:11 AM Jelte Fennema-Nio  wrote:
> I'm confused. Are you intending to backport new test infra to PG18?

Not particularly (though, if the barriers were low enough, wouldn't
that be nice?). I'm talking about hazards in oauth_server.py here.

> Given the purpose and pretty small size of the scripts, I expect that
> it'd be extremely rare for us to backport changes to them. e.g. I
> doubt your oauth server would need many changes to keep working
> correctly for the lifetime of PG18. Maybe it gets some minor fixes in
> 18.1 and/or 18.2 but other than that it seems really unlikely.

That assumes there are no bugs in my code to backport tests for, and I
don't like assuming that. The Perl and Python sides of the
oauth_validator tests are pretty tightly coupled out of necessity.

On Wed, Apr 23, 2025 at 7:54 AM Jelte Fennema-Nio  wrote:
> I don't really think this would be required to bump Postgres its
> minimum supported postgres version. psycopg is a client and postgresql
> is the server, it seems fine for them to use different Python
> versions. The postgresql-server package could depend on python3.9,
> while psycopg packages would happily use python3.6.

I'm not sure that's the mental model we would want, since clients can
be used server-side. Maybe a safer way to split, if and when we want
to decouple this on RHEL, would be between clients of only the Python
language (tests), and clients of the Python ABI (PL/Python, psycopg,
etc.). Like, imagine a world where backported tests could use new
goodies, but extensions remained solidly on platform defaults. That
seems kind of nice to me, at a surface level.

The Python stable ABI may change the calculus, too, but I'm not going
to look into that right now...

--Jacob

Re: What's our minimum supported Python version?

2025-04-23 Thread Jacob Champion

On Wed, Apr 23, 2025 at 4:24 AM Devrim Gündüz  wrote:
> psycopg is included in RHEL 8, but PGDG packages are up2date (2.7.5 vs
> 2.9.5) so they override OS packages. That is why things will break.

Thanks, this is super helpful!

--Jacob

Re: AIO v2.5

2025-04-23 Thread Alexander Lakhin


Hello Andres,

24.04.2025 03:40, Andres Freund wrote:

Hi,

On 2025-04-20 18:00:00 +0300, Alexander Lakhin wrote:


2025-04-08 01:41:54.670 UTC [4013251][client backend][0/2:0] DEBUG: waiting for 
self with 0 pending

I'd like to extend this debug message with the number of in-flight IOs.  I
assume nobody will mind me doing that.


I printed that number for debugging and always got 1, because
io_max_concurrency == 1 during that test (because of shared_buffers=1MB).
With shared_buffers increased:
--- a/src/test/recovery/t/032_relfilenode_reuse.pl
+++ b/src/test/recovery/t/032_relfilenode_reuse.pl
@@ -18,7 +18,7 @@ log_connections=on
 # to avoid "repairing" corruption
 full_page_writes=off
 log_min_messages=debug2
-shared_buffers=1MB
+shared_buffers=16MB

I got 100 iterations passed.



I could reproduce this error with the following TEMP_CONFIG:
log_min_messages = DEBUG4

log_connections = on
log_disconnections = on
log_statement = 'all'
log_line_prefix = '%m [%d][%p:%l][%b] %q[%a] '

fsync = on
io_method=io_uring
backtrace_functions = 'pgaio_io_wait_for_free'

Hm. Several hundred iterations in without triggering the issue even once, with
that added config. Given your reproducer used fsync, I assume this was on a
real filesystem not tmpfs? What FS?  I tried on XFS.

very well could just be a timing dependent issue, the time for an fsync will
differ heavily betweeen devices...


Yeah, it definitely depends on timing.
Using that TEMP_CONFIG changes the duration of the test significantly for me:
Files=1, Tests=14,  2 wallclock secs ( 0.01 usr  0.01 sys +  0.13 cusr  0.32 
csys =  0.47 CPU)
->
Files=1, Tests=14, 14 wallclock secs ( 0.00 usr  0.00 sys +  0.14 cusr  0.33 
csys =  0.47 CPU)

I reproduced the failure on ext4 (desktop-grade NVME) and on xfs (cheap
SSD), on two different machines.

It's also reproduced on HDD (with ext4), seemingly worse, where the
duration is increased even more:
I 25
# +++ tap check in src/test/recovery +++
t/032_relfilenode_reuse.pl .. ok
All tests successful.
Files=1, Tests=14, 28 wallclock secs ( 0.00 usr  0.00 sys +  0.15 cusr  0.39 
csys =  0.54 CPU)
Result: PASS
I 26
# +++ tap check in src/test/recovery +++
t/032_relfilenode_reuse.pl .. 7/? Bailout called.  Further testing stopped:  
aborting wait: program timed out



When running 032_relfilenode_reuse.pl in a loop, I got failures on
iterations 10, 18, 6:
2025-04-20 15:56:01.310 CEST [][1517068:122][startup] DEBUG: updated min 
recovery point to 0/4296E90 on timeline 1
2025-04-20 15:56:01.310 CEST [][1517068:123][startup] CONTEXT:  WAL redo at
0/4296E48 for Transaction/COMMIT: 2025-04-20 15:56:01.09363+02; inval msgs:
catcache 21; sync
2025-04-20 15:56:01.310 CEST [][1517068:124][startup] DEBUG: waiting for all
backends to process ProcSignalBarrier generation 5
2025-04-20 15:56:01.310 CEST [][1517068:125][startup] CONTEXT:  WAL redo at 
0/4296E90 for Database/DROP: dir 16409/50001
2025-04-20 15:56:01.312 CEST [postgres][1517075:144][client backend]
[032_relfilenode_reuse.pl] DEBUG:  waiting for self with 0 pending
2025-04-20 15:56:01.312 CEST [postgres][1517075:145][client backend] 
[032_relfilenode_reuse.pl] BACKTRACE:
pgaio_io_wait_for_free at aio.c:703:2
  (inlined by) pgaio_io_acquire at aio.c:186:3
AsyncReadBuffers at bufmgr.c:1854:9
StartReadBuffersImpl at bufmgr.c:1425:18
  (inlined by) StartReadBuffers at bufmgr.c:1497:9
read_stream_start_pending_read at read_stream.c:333:25
read_stream_look_ahead at read_stream.c:475:3
read_stream_next_buffer at read_stream.c:840:6
pg_prewarm at pg_prewarm.c:214:10
ExecInterpExpr at execExprInterp.c:927:7
ExecEvalExprNoReturn at executor.h:447:2
...
2025-04-20 15:56:01.312 CEST [][1517068:126][startup] DEBUG: finished
waiting for all backends to process ProcSignalBarrier generation 5
2025-04-20 15:56:01.312 CEST [][1517068:127][startup] CONTEXT:  WAL redo at 
0/4296E90 for Database/DROP: dir 16409/50001
2025-04-20 15:56:01.314 CEST [postgres][1517075:146][client backend]
[032_relfilenode_reuse.pl] ERROR:  no free IOs despite no in-flight IOs

The earlier report also had a ProcSignalBarrier wait just before. That's
doesn't have to be related, the test stresses those pretty intentionally.

But it does seem like it could be related, perhaps we somehow end processing
interrupts while issuing IOs, which then again submits IOs or something? I
don't immediately see such code, e.g. pgaio_closing_fd() is careful to iterate
over the list while waiting.

I think pgaio_closing_fd() should get a DEBUG2 (or so) message documenting
that we are waiting for an IO due to closing an FD. pgaio_shutdown() as well,
while at it.


I somewhat suspect that this is the same issue that you reported earlier, that
I also couldn't repro.  That involved a lot of CREATE DATABASEs, which will
trigger a lot of procsignal barriers...  I'll try to come up with a more
dedicated stress test and see whether I can trigger a problem that way.


Maybe it does matter that "no free IOs despite no in-flight IOs" follows
"WAL

Re: [PATCH] Re: Proposal to Enable/Disable Index using ALTER INDEX

2025-04-23 Thread jian he

hi.
The following is a review of version 17.

ATExecSetIndexVisibility
if (indexForm->indisvisible != visible)
{
indexForm->indisvisible = visible;
CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
CacheInvalidateRelcache(heapRel);
InvokeObjectPostAlterHook(IndexRelationId, indexOid, 0);
CommandCounterIncrement();
}
I am slightly confused. if we already used
CommandCounterIncrement();
then we don't need CacheInvalidateRelcache?


doc/src/sgml/ref/alter_index.sgml
  
   ALTER INDEX changes the definition of an existing index.
   There are several subforms described below. Note that the lock level required
   may differ for each subform. An ACCESS EXCLUSIVE
lock is held
   unless explicitly noted. When multiple subcommands are listed, the lock
   held will be the strictest one required from any subcommand.

per the above para, we need mention that ALTER INDEX SET INVISIBLE|INVISIBLE
only use ShareUpdateExclusiveLock?


index_create is called in several places,
most of the time, we use INDEX_CREATE_VISIBLE.
if we define it as INDEX_CREATE_INVISIBLE rather than INDEX_CREATE_VISIBLE
then argument flags required code changes would be less, (i didn't try
it myself)


Similar to get_index_isclustered,
We can place GetIndexVisibility in
src/backend/utils/cache/lsyscache.c,
make it an extern function, so others can use it;
to align with other function names,
maybe rename it as get_index_visibility.


create index v2_idx on v1(data) visible;
is allowed,
doc/src/sgml/ref/create_index.sgml
 section need to change to
[ VISIBLE | INVISIBLE ]

?

84 matches

Mail list logo