date:20240418

On Thu, Apr 18, 2024 at 12:53:43PM +0200, Peter Eisentraut wrote:
> If everything is addressed, I agree that 0001, 0003, and 0004 can go into
> PG17, the rest later.

About the PG17 bits, would you agree about a backpatch?  Or perhaps
you disagree?
--
Michael


signature.asc
Description: PGP signature

Re: WIP Incremental JSON Parser

On Thu, Apr 18, 2024 at 06:03:45AM -0400, Andrew Dunstan wrote:
> On 2024-04-18 Th 02:04, Michael Paquier wrote:
>>   Why usingFile::Temp::tempfile  here?  Couldn't you just use a
>> file in a PostgreSQL::Test::Utils::tempdir() that would be cleaned up
>> once the test finishes?
> 
> That's another possibility, but I think the above is the simplest.

Are you sure that relying on Temp::File is a good thing overall?  The
current temporary file knowledge is encapsulated within Utils.pm, with
files removed or kept depending on PG_TEST_NOCLEAN.  So it would be
just more consistent to rely on the existing facilities instead?
test_json_parser is the only code path in the whole tree that directly
uses File::Temp.  The rest of the TAP tests relies on Utils.pm for
temp file paths.
--
Michael


signature.asc
Description: PGP signature

Re: Cannot find a working 64-bit integer type on Illumos

2024-04-18 Thread Japin Li



On Fri, 19 Apr 2024 at 05:22, Thomas Munro  wrote:
> On Thu, Apr 18, 2024 at 6:09 PM Japin Li  wrote:
>> /home/japin/postgres/build/../src/include/lib/simplehash.h:1138:9: error: 
>> format '%llu' expects argument of type 'long long unsigned int', but 
>> argument 4 has type 'uint64' {aka 'long unsigned int'} [-Werror=format=]
>
> It seems my v1 patch's configure probe for INT64_FORMAT was broken.
> In the v2 patch I tried not doing that probe at all, and instead
> inviting  into our world (that's the standardised way to
> produce format strings, which has the slight complication that we are
> intercepting printf calls...).  I suspect that'll work better for you.

Yeah, the v2 patch fixed this problem.

--
Regards,
Japin Li

Re: promotion related handling in pg_sync_replication_slots()

2024-04-18 Thread Bertrand Drouvot

Hi,

On Thu, Apr 18, 2024 at 05:36:05PM +0530, shveta malik wrote:
> Please find v8 attached. Changes are:

Thanks!

A few comments:

1 ===

@@ -1440,7 +1461,7 @@ ReplSlotSyncWorkerMain(char *startup_data, size_t 
startup_data_len)
 * slotsync_worker_onexit() but that will need the connection to be made
 * global and we want to avoid introducing global for this purpose.
 */
-   before_shmem_exit(slotsync_failure_callback, PointerGetDatum(wrconn));
+   before_shmem_exit(slotsync_worker_disconnect, PointerGetDatum(wrconn));

The comment above this change still states "Register the failure callback once
we have the connection", I think it has to be reworded a bit now that v8 is
making use of slotsync_worker_disconnect().

2 ===

+* Register slotsync_worker_onexit() before we register
+* ReplicationSlotShmemExit() in BaseInit(), to ensure that during exit 
of
+* slot sync worker, ReplicationSlotShmemExit() is called first, 
followed
+* by slotsync_worker_onexit(). Startup process during promotion waits 
for

Worth to mention in shmem_exit() (where it "while (--before_shmem_exit_index >= 
0)"
or before the shmem_exit() definition) that ReplSlotSyncWorkerMain() relies on
this LIFO behavior? (not sure if there is other "strong" LIFO requirement in
other part of the code).

3 ===

+* Startup process during promotion waits for slot sync to finish and it
+* does that by checking the 'syncing' flag.

worth to mention ShutDownSlotSync()?

4 ===

I did a few tests manually (launching ShutDownSlotSync() through gdb / with and
without sync worker and with / without pg_sync_replication_slots() running
concurrently) and it looks like it works as designed.

Having said that, the logic that is in place to take care of the corner cases
described up-thread seems reasonable to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Direct SSL connection with ALPN and HBA rules

Hi all,
(Heikki in CC.)

Since 91044ae4baea (require ALPN for direct SSL connections) and
d39a49c1e459 (direct hanshake), direct SSL connections are supported
(yeah!), still the thread where this has been discussed does not cover
the potential impact on HBA rules:
https://www.postgresql.org/message-id/CAM-w4HOEAzxyY01ZKOj-iq%3DM4-VDk%3DvzQgUsuqiTFjFDZaebdg%40mail.gmail.com

My point is, would there be a point in being able to enforce that ALPN
is used from the server rather than just relying on the client-side
sslnegotiation to decide if direct SSL connections should be forced or
not?

Hence, I'd imagine that we could have an HBA option for hostssl rules,
like a negotiation={direct,postgres,all} that cross-checks
Port->alpn_used with the option value in a hostssl entry, rejecting
the use of connections using direct connections or the default
protocol if these are not used, giving users a way to avoid one.  As
this is a new thing, there may be an argument in this option for
security reasons, as well, so as it would be possible for operators to
turn that off in the server.

Thoughts or comments?
--
Michael


signature.asc
Description: PGP signature

Re: cfbot is failing all tests on FreeBSD/Meson builds

On Fri, Apr 19, 2024 at 10:36 AM Thomas Munro  wrote:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276535
>
> So perhaps it's time for me to undo what I did before...  looking now.

It turned out that I still needed the previous work-around, but I was
too clever for my own boots last time.  For the record, here is the
new change to the image building script:

https://github.com/anarazel/pg-vm-images/commit/faff91cd40d6af0cbc658f5c11da47e2aa88d332

I should have listened to Bilal's prediction[1] about this the first
time.  But this time, I know that the real fix is coming in the next
package very soon, per bugzilla link above.

One thing I noticed is that 010_tab_completion is actually being
skipped, with that fix.  It used to run, I'm sure.  Not sure why, but
I'll look more seriously when the 1.21 or whatever shows up.  At least
we should soon have green CI again in the meantime.

[1] https://github.com/anarazel/pg-vm-images/pull/92

Re: Allow tests to pass in OpenSSL FIPS mode

On Fri, Apr 19, 2024 at 4:00 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > Probably not this thread's fault, but following the breadcrumbs to the
> > last thread to touch the relevant test lines in
> > authentication/001_password, is it expected that we have these
> > warnings?
>
> > psql::1: WARNING:  roles created by regression test cases
> > should have names starting with "regress_"
>
> I think the policy is that we enforce that for cases reachable
> via "make installcheck" (to avoid possibly clobbering global
> objects in a live installation), but not for cases only reachable
> via "make check", such as TAP tests.  So I'm not that concerned
> about this, although if someone is feeling anal enough to rename
> the test role I won't stand in the way.

Got it, thanks.  Not me, just asking.

Re: Allow tests to pass in OpenSSL FIPS mode

Thomas Munro  writes:
> Probably not this thread's fault, but following the breadcrumbs to the
> last thread to touch the relevant test lines in
> authentication/001_password, is it expected that we have these
> warnings?

> psql::1: WARNING:  roles created by regression test cases
> should have names starting with "regress_"

I think the policy is that we enforce that for cases reachable
via "make installcheck" (to avoid possibly clobbering global
objects in a live installation), but not for cases only reachable
via "make check", such as TAP tests.  So I'm not that concerned
about this, although if someone is feeling anal enough to rename
the test role I won't stand in the way.

regards, tom lane

Re: ECPG cleanup and fix for clang compile-time problem

Andres Freund  writes:
> On 2024-04-18 23:11:52 -0400, Tom Lane wrote:
>> I was just looking locally at what I got by removing that, and sadly
>> I don't think I believe it: there are a lot of places where it claims
>> we hit lines we don't, and vice versa.  That might be partially
>> blamable on old tools on my RHEL8 workstation, but it sure seems
>> that flex output confuses lcov to some extent.

> Hm. Here it mostly looks reasonable, except that at least things seem off by
> 1.

Yeah, now that you mention it what I'm seeing looks like the line
numbering might be off-by-one.  Time for a bug report?

regards, tom lane

Re: Allow tests to pass in OpenSSL FIPS mode

On Sat, Nov 18, 2023 at 7:46 AM Peter Eisentraut  wrote:
> All done, thanks.

Probably not this thread's fault, but following the breadcrumbs to the
last thread to touch the relevant test lines in
authentication/001_password, is it expected that we have these
warnings?

psql::1: WARNING:  roles created by regression test cases
should have names starting with "regress_"

Use XLOG_CONTROL_FILE macro everywhere?

2024-04-18 Thread Anton A. Melnikov


Hello!

There is a macro XLOG_CONTROL_FILE for control file name
defined in access/xlog_internal.h
And there are some places in code where this macro is used
like here
https://github.com/postgres/postgres/blob/84db9a0eb10dd1dbee6db509c0e427fa237177dc/src/bin/pg_resetwal/pg_resetwal.c#L588
or here
https://github.com/postgres/postgres/blob/84db9a0eb10dd1dbee6db509c0e427fa237177dc/src/common/controldata_utils.c#L214

But there are some other places where the control file
name is used as text string directly.

May be better use this macro everywhere in C code?
The patch attached tries to do this.

Would be glad if take a look on it.

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres CompanyFrom 2692683142c98572114c32f696b55c6a642dd3e9 Mon Sep 17 00:00:00 2001
From: "Anton A. Melnikov" 
Date: Fri, 19 Apr 2024 06:12:44 +0300
Subject: [PATCH] Use XLOG_CONTROL_FILE macro  everywhere in C code

---
 src/backend/backup/basebackup.c |  2 +-
 src/backend/postmaster/postmaster.c |  3 ++-
 src/bin/pg_combinebackup/pg_combinebackup.c |  5 +++--
 src/bin/pg_controldata/pg_controldata.c |  2 +-
 src/bin/pg_rewind/filemap.c |  3 ++-
 src/bin/pg_rewind/pg_rewind.c   |  8 
 src/bin/pg_upgrade/controldata.c| 11 ++-
 src/bin/pg_verifybackup/pg_verifybackup.c   |  3 ++-
 src/common/controldata_utils.c  |  2 +-
 9 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 9a2bf59e84..c5a1e18104 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -1347,7 +1347,7 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
 		snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
 
 		/* Skip pg_control here to back up it last */
-		if (strcmp(pathbuf, "./global/pg_control") == 0)
+		if (strcmp(pathbuf, "./" XLOG_CONTROL_FILE) == 0)
 			continue;
 
 		if (lstat(pathbuf, ) != 0)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 7f3170a8f0..02b36caea7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -90,6 +90,7 @@
 #endif
 
 #include "access/xlog.h"
+#include "access/xlog_internal.h"
 #include "access/xlogrecovery.h"
 #include "common/file_perm.h"
 #include "common/file_utils.h"
@@ -1489,7 +1490,7 @@ checkControlFile(void)
 	char		path[MAXPGPATH];
 	FILE	   *fp;
 
-	snprintf(path, sizeof(path), "%s/global/pg_control", DataDir);
+	snprintf(path, sizeof(path), "%s/%s", DataDir, XLOG_CONTROL_FILE);
 
 	fp = AllocateFile(path, PG_BINARY_R);
 	if (fp == NULL)
diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
index b26c532445..16448e78b8 100644
--- a/src/bin/pg_combinebackup/pg_combinebackup.c
+++ b/src/bin/pg_combinebackup/pg_combinebackup.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 
+#include "access/xlog_internal.h"
 #include "backup_label.h"
 #include "common/blkreftable.h"
 #include "common/checksum_helper.h"
@@ -284,7 +285,7 @@ main(int argc, char *argv[])
 		{
 			char	   *controlpath;
 
-			controlpath = psprintf("%s/%s", prior_backup_dirs[i], "global/pg_control");
+			controlpath = psprintf("%s/%s", prior_backup_dirs[i], XLOG_CONTROL_FILE);
 
 			pg_fatal("%s: manifest system identifier is %llu, but control file has %llu",
 	 controlpath,
@@ -591,7 +592,7 @@ check_control_files(int n_backups, char **backup_dirs)
 		bool		crc_ok;
 		char	   *controlpath;
 
-		controlpath = psprintf("%s/%s", backup_dirs[i], "global/pg_control");
+		controlpath = psprintf("%s/%s", backup_dirs[i], XLOG_CONTROL_FILE);
 		pg_log_debug("reading \"%s\"", controlpath);
 		control_file = get_controlfile_by_exact_path(controlpath, _ok);
 
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93e0837947..1fe6618e7f 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -1,7 +1,7 @@
 /*
  * pg_controldata
  *
- * reads the data from $PGDATA/global/pg_control
+ * reads the data from $PGDATA/
  *
  * copyright (c) Oliver Elphick , 2001;
  * license: BSD
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index 4458324c9d..6224e94d09 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 
+#include "access/xlog_internal.h"
 #include "catalog/pg_tablespace_d.h"
 #include "common/file_utils.h"
 #include "common/hashfn_unstable.h"
@@ -643,7 +644,7 @@ decide_file_action(file_entry_t *entry)
 	 * Don't touch the control file. It is handled specially, after copying
 	 * all the other files.
 	 */
-	if (strcmp(path, "global/pg_control") == 0)
+	if (strcmp(path, XLOG_CONTROL_FILE) == 0)
 		return FILE_ACTION_NONE;
 
 	/* Skip macOS system files */
diff --git

Re: ECPG cleanup and fix for clang compile-time problem

On 2024-04-18 23:11:52 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2024-04-18 22:18:34 -0400, Tom Lane wrote:
> >> (If our code coverage tools worked on bison/flex stuff,
> >> maybe this'd be less scary ... but they don't.)
>
> > For bison coverage seems to work, see e.g.:
>
> Yeah, I'd just noticed that --- I had it in my head that we'd put
> LCOV_EXCL_START/STOP into bison files too, but nope they are only
> in flex files.  That's good for this specific problem, because the
> code I'm worried about is all in the bison file.

At least locally the coverage seems to make sense too, both for the main
grammar and for ecpg's.


> > around the scanner "body".  Without that I get reasonable-looking, albeit 
> > not
> > very comforting, coverage for pgc.l as well.
>
> I was just looking locally at what I got by removing that, and sadly
> I don't think I believe it: there are a lot of places where it claims
> we hit lines we don't, and vice versa.  That might be partially
> blamable on old tools on my RHEL8 workstation, but it sure seems
> that flex output confuses lcov to some extent.

Hm. Here it mostly looks reasonable, except that at least things seem off by
1. And sure enough, if I look at pgc.l it has code like

case 2:
YY_RULE_SETUP
#line 465 "/home/andres/src/postgresql/src/interfaces/ecpg/preproc/pgc.l"
{
token_start = yytext;
state_before_str_start = YYSTATE;

However line 465 is actually the "token_start" line.

Further down this seems to get worse, by "<>" it's off by 4 lines.


$ apt policy flex
flex:
  Installed: 2.6.4-8.2+b2
  Candidate: 2.6.4-8.2+b2
  Version table:
 *** 2.6.4-8.2+b2 500
500 http://mirrors.ocf.berkeley.edu/debian unstable/main amd64 Packages
100 /var/lib/dpkg/status

Greetings,

Andres Freund

Re: ECPG cleanup and fix for clang compile-time problem

Andres Freund  writes:
> On 2024-04-18 22:18:34 -0400, Tom Lane wrote:
>> (If our code coverage tools worked on bison/flex stuff,
>> maybe this'd be less scary ... but they don't.)

> For bison coverage seems to work, see e.g.:

Yeah, I'd just noticed that --- I had it in my head that we'd put
LCOV_EXCL_START/STOP into bison files too, but nope they are only
in flex files.  That's good for this specific problem, because the
code I'm worried about is all in the bison file.

> around the scanner "body".  Without that I get reasonable-looking, albeit not
> very comforting, coverage for pgc.l as well.

I was just looking locally at what I got by removing that, and sadly
I don't think I believe it: there are a lot of places where it claims
we hit lines we don't, and vice versa.  That might be partially
blamable on old tools on my RHEL8 workstation, but it sure seems
that flex output confuses lcov to some extent.

regards, tom lane

Re: ECPG cleanup and fix for clang compile-time problem

Hi,

On 2024-04-18 22:18:34 -0400, Tom Lane wrote:
> Here is a patch series that aims to clean up ecpg's preprocessor
> code a little and fix the compile time problems we're seeing with
> late-model clang [1].  I guess whether it's a cleanup is in the eye of
> the beholder, but it definitely succeeds at fixing compile time: for
> me, the time needed to compile preproc.o with clang 16 drops from
> 104 seconds to less than 1 second.  It might be a little faster at
> processing input too, though that wasn't the primary goal.

Nice! I'll look at this more later.

For now I just wanted to point one minor detail:

> (If our code coverage tools worked on bison/flex stuff,
> maybe this'd be less scary ... but they don't.)

For bison coverage seems to work, see e.g.:

https://coverage.postgresql.org/src/interfaces/ecpg/preproc/preproc.y.gcov.html#10638

I think the only reason it doesn't work for flex is that we have
/* LCOV_EXCL_START */
/* LCOV_EXCL_STOP */

around the scanner "body".  Without that I get reasonable-looking, albeit not
very comforting, coverage for pgc.l as well.

   |Lines  |Functions|Branches
Filename   |RateNum|Rate  Num|Rate   Num
src/interfaces/ecpg/preproc/pgc.l  |65.9%   748|87.5%   8|-0
src/interfaces/ecpg/preproc/preproc.y  |29.9%  4964|66.7%  15|-0

This has been introduced by

commit 421167362242ce1fb46d6d720798787e7cd65aad
Author: Peter Eisentraut 
Date:   2017-08-10 23:33:47 -0400

Exclude flex-generated code from coverage testing

Flex generates a lot of functions that are not actually used.  In order
to avoid coverage figures being ruined by that, mark up the part of the
.l files where the generated code appears by lcov exclusion markers.
That way, lcov will typically only reported on coverage for the .l file,
which is under our control, but not for the .c file.

Reviewed-by: Michael Paquier 

but I don't think it's working as intended, as it's also preventing coverage
for the actual scanner definition.

Greetings,

Andres Freund

RE: Disallow changing slot's failover option in transaction block

2024-04-18 Thread Hayato Kuroda (Fujitsu)

Dear Hou,

Thanks for updating the patch! Let me confirm the content.

In your patch, the pg_dump.c was updated. IIUC the main reason was that
pg_restore executes some queries as single transactions so that ALTER
SUBSCRIPTION cannot be done, right?
Also, failover was synchronized only when we were in the upgrade mode, but
your patch seems to remove the condition. Can you clarify the reason?

Other than that, the patch LGTM.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: Speed up clean meson builds by ~25%

Andres Freund  writes:
> On 2024-04-17 23:10:53 -0400, Tom Lane wrote:
>> I think we should hold off on this.  I found a simpler way to address
>> ecpg's problem than what I sketched upthread.  I have a not-ready-to-
>> show-yet patch that allows the vast majority of ecpg's grammar
>> productions to use the default semantic action.  Testing on my M1
>> Macbook with clang 16.0.6 from MacPorts, I see the compile time for
>> preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec.

> That's pretty amazing.

Patch posted at [1].

regards, tom lane

[1] https://www.postgresql.org/message-id/2011420.1713493114%40sss.pgh.pa.us

RE: Disallow changing slot's failover option in transaction block

2024-04-18 Thread Hayato Kuroda (Fujitsu)

Dear Shveta,

> When I said that option 2 aligns with ALTER-SUB documented behaviour,
> I meant the doc described in [1] stating "When altering the slot_name,
> the failover and two_phase property values of the named slot may
> differ from the counterpart failover and two_phase parameters
> specified in the subscription"
> 
> [1]:
> https://www.postgresql.org/docs/devel/sql-altersubscription.html#SQL-ALTER
> SUBSCRIPTION-PARAMS-SET

I see, thanks for the clarification. Agreed that the description is not 
conflict with
option 2.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

On Thu, Apr 18, 2024 at 06:17:56PM +0200, Alvaro Herrera wrote:
> On 2024-Apr-18, Michael Paquier wrote:
>> I was also worrying about a need to dump the protocol version to be
>> able to track the relkind in the toc entries, but a45c78e3284b has
>> already done one.  The difference in AM handling between relations
>> without storage and relations with storage pushes the relkind logic
>> more within the internals of pg_backup_archiver.c.
> 
> Hmm, does this mean that every dump taking since a45c78e3284b (April
> 1st) and before this commit will be unrestorable?  This doesn't worry me
> too much, because we aren't even in beta yet ... and I think we don't
> have a strict policy about it.

I've been scanning the history of K_VERS_1_* in the recent years, and
it does not seem that we have a case where we would have needed to
bump the version twice in the same release cycle.  Anyway, yes, any
dump taken since 1_16 has been bumped would fail to restore with this
patch in place.  For an unreleased not-yet-in-beta branch, why should
we care?  Things are not set in stone, like extensions.  If others
have comments about this point, feel free of course.

>> --- a/src/bin/pg_dump/t/002_pg_dump.pl
>> +++ b/src/bin/pg_dump/t/002_pg_dump.pl
>> @@ -4591,11 +4591,9 @@ my %tests = (
>>  CREATE TABLE dump_test.regress_pg_dump_table_am_child_2
>>PARTITION OF 
>> dump_test.regress_pg_dump_table_am_parent FOR VALUES IN (2);',
>>  regexp => qr/^
>> -\QSET default_table_access_method = regress_table_am;\E
>> -(\n(?!SET[^;]+;)[^\n]*)*
>> -\n\QCREATE TABLE 
>> dump_test.regress_pg_dump_table_am_parent (\E
>> -(.*\n)*
>>  \QSET default_table_access_method = heap;\E
>> +(.*\n)*
>> +\QALTER TABLE dump_test.regress_pg_dump_table_am_parent 
>> SET ACCESS METHOD regress_table_am;\E
>>  (\n(?!SET[^;]+;)[^\n]*)*
>>  \n\QCREATE TABLE 
>> dump_test.regress_pg_dump_table_am_child_1 (\E
>>  (.*\n)*
> 
> This looks strange -- why did you remove matching for the CREATE TABLE
> of the parent table?  That line should appear shortly before the ALTER
> TABLE SET ACCESS METHOD for the same table, shouldn't it?

Yeah, with the ALTER in place that did not seem that mandatory but I
don't mind keeping it, as well.

> Maybe your
> intention was to remove only the SET default_table_access_method
> = regress_table_am line ... but it's not clear to me why we have the
> "SET default_table_access_method = heap" line before the ALTER TABLE SET
> ACCESS METHOD.

This comes from the contents of the dump for
regress_pg_dump_table_am_2, that uses heap as table AM.  A SET is
issued for it before dumping regress_pg_dump_table_am_parent and its
partitions.  One trick that I can think of to make the output parsing
of the test more palatable is to switch the AMs used by the two
partitions, so as we finish with two SET queries before each partition
rather than one before the partitioned table.  See the attached for
the idea.
--
Michael
From 009ba9bf376d87ab0aaf0a9e9df380cd801a9c2e Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 19 Apr 2024 10:30:53 +0900
Subject: [PATCH v2] Set properly table AMs of partitioned tables in pg_dump

---
 src/bin/pg_dump/pg_backup_archiver.c | 70 +++-
 src/bin/pg_dump/pg_backup_archiver.h |  5 +-
 src/bin/pg_dump/pg_dump.c|  1 +
 src/bin/pg_dump/t/002_pg_dump.pl | 12 ++---
 4 files changed, 79 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index c7a6c918a6..f69f72d731 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -30,6 +30,7 @@
 #include 
 #endif
 
+#include "catalog/pg_class_d.h"
 #include "common/string.h"
 #include "compress_io.h"
 #include "dumputils.h"
@@ -62,6 +63,8 @@ static void _becomeOwner(ArchiveHandle *AH, TocEntry *te);
 static void _selectOutputSchema(ArchiveHandle *AH, const char *schemaName);
 static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
 static void _selectTableAccessMethod(ArchiveHandle *AH, const char *tableam);
+static void _printTableAccessMethodNoStorage(ArchiveHandle *AH,
+			  TocEntry *te);
 static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
 static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
 static void processSearchPathEntry(ArchiveHandle *AH, TocEntry *te);
@@ -1222,6 +1225,7 @@ ArchiveEntry(Archive *AHX, CatalogId catalogId, DumpId dumpId,
 	newToc->namespace = opts->namespace ? pg_strdup(opts->namespace) : NULL;
 	newToc->tablespace = opts->tablespace ? pg_strdup(opts->tablespace) : NULL;
 	newToc->tableam = opts->tableam ? pg_strdup(opts->tableam) : NULL;
+	newToc->relkind = opts->relkind;
 	newToc->owner =

Re: documentation structure

2024-04-18 Thread Corey Huinker

>
> Yeah, we can't expect everyone wanting to call a built-in function to
> know how they would define an equivalent one themselves. In that case I
> propos marking it up like this:
>
> format (
> formatstr text
> , formatarg "any"
> , ...  )
> text
>

Looks good, but I guess I have to ask: is there a parameter-list tag out
there instead of (, and should we be using that?



> The requisite nesting when there are multiple optional parameters makes
> it annoying to wrap and indent it "properly" per XML convention, but how
> about something like this, with each parameter on a line of its own, and
> all the closing  tags on one line?
>
> regexp_substr (
> string text,
> pattern text
> , start integer
> , N integer
> , flags text
> , subexpr integer
> )
> text
>

Yes, that has an easy count-the-vertical, count-the-horizontal,
do-they-match flow to it.


> A lot of functions mostly follow this style, except they tend to put the
> first parameter on the same line of the function namee, even when that
> makes the line overly long. I propose going the other way, with each
> parameter on a line of its own, even if the first one would fit after
> the function name, except the whole parameter list fits after the
> function name.
>

+1


>
> Also, when there's only one optional argument, or they're independently
> optional, not nested, the  tag should go on the same line as
> the parameter.
>
> substring (
> bits bit
>  FROM start
> integer 
>  FOR count
> integer  )
> bit
>

+1

Re: AIX support

On Fri, Apr 19, 2024 at 6:01 AM Andres Freund  wrote:
> On 2024-04-18 11:15:43 +, Sriram RK wrote:
> > We (IBM-AIX team) looked into this issue
> >
> > https://www.postgresql.org/message-id/20240225194322...@rfd.leadboat.com
> >
> > This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0)
> > have issues. But we verified that this issue is resolved with the newer
> > compiler versions openXL(xlc17.1) and gcc(12.0) onwards.
>
> The reason we used these compilers was that those were the only ones we had
> kinda somewhat reasonable access to, via the gcc projects'
> "compile farm" https://portal.cfarm.net/
> We have to rely on whatever the aix machines there provide. They're not
> particularly plentiful resource-wise either.

To be fair, those OSUOSL machines are donated by IBM:

https://osuosl.org/services/powerdev/

It's just that they seem to be mostly focused on supporting Linux on
POWER, with only a couple of AIX hosts (partitions/virtual machines?)
made available via portal.cfarm.net, and they only very recently added
a modern AIX 7.3 host. That's cfarm119, upgraded in September-ish,
long after many threads on this list that between-the-lines threatened
to drop support.

> This is generally one of the big issues with AIX support. There are other
> niche-y OSs that don't have a lot of users, e.g. openbsd, but in contrast to
> AIX I can just start an openbsd VM within a few minutes and iron out whatever
> portability issue I'm dealing with.

Yeah.  It is a known secret that you can run AIX inside Qemu/kvm (it
appears that IBM has made changes to it to make that possible, because
earlier AIX versions didn't like Qemu's POWER emulation or
virtualisation, there are blog posts about it), but IBM doesn't
actually make the images available to non-POWER-hardware owners (you
need a serial number).  If I were an OS vendor and wanted developers
to target my OS for free, at the very top of my TODO list I would
have: provide an easy to use image for developers to be able to spin
something up in minutes and possibly even use in CI systems.  That's
the reason I can fix any minor portability issue on Linux, illumos,
*BSD quickly and Windows with only moderate extra pain.  Even Oracle
knows this, see Solaris CBE.

> > We want to make a note that postgres is used extensively in our IBM product
> > and is being exploited by multiple customers.
>
> To be blunt: Then it'd have been nice to see some investment in that before
> now. Both on the code level and the infrastructure level (i.e. access to
> machines etc).

In the AIX space generally, there were even clues that funding had
been completely cut even for packaging PostgreSQL.  I was aware of two
packaging projects (not sure how they were related):

1.  The ATOS packaging group, who used to show up on our mailing lists
and discuss code changes, which announced it was shutting down:

https://github.com/power-devops/bullfreeware

2.  And last time I looked a few months back, the IBM AIX Toolbox
packaging project only had PostgreSQL 10 or 11 packages, already out
of support by us, meaning that their maintainer had given up, too:

https://www.ibm.com/support/pages/aix-toolbox-open-source-software-downloads-alpha

However I see that recently (last month?) someone has added PostgreSQL
15, so something has only just reawoken there?

There are quite a lot of threads about AIX problems, but they are
almost all just us non-AIX-users trying to unbreak stupid stuff on the
build farm, which at some points began to seem distinctly quixotic:
chivalric hackers valiantly trying to keep the entire Unix family tree
working even though we don't remember why and th versions involved are
out of support even by the vendor.  Of the three old giant commercial
Unixes, HP-UX was dropped without another mention (it really was a
windmill after all), Solaris is somehow easier to deal with (I could
guess it's because it influenced Linux and BSD so much, ELF and linker
details spring to mind), while AIX fails on every dimension:
unrepresented by users, lacking in build farm, unavailable to
non-customers, and unusual among Unixen.

RE: Disallow changing slot's failover option in transaction block

2024-04-18 Thread Zhijie Hou (Fujitsu)

On Thursday, April 18, 2024 1:52 PM Amit Kapila  wrote:
> 
> On Tue, Apr 16, 2024 at 5:06 PM shveta malik 
> wrote:
> >
> > On Tue, Apr 16, 2024 at 1:45 PM Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > > Dear Hou,
> > >
> > > > Kuroda-San reported an issue off-list that:
> > > >
> > > > If user execute ALTER SUBSCRIPTION SET (failover) command inside a
> > > > txn block and rollback, only the subscription option change can be
> > > > rolled back, while the replication slot's failover change is preserved.
> > > >
> > > > This is because ALTER SUBSCRIPTION SET (failover) command
> > > > internally executes the replication command ALTER_REPLICATION_SLOT
> > > > to change the replication slot's failover property, but this
> > > > replication command execution cannot be rollback.
> > > >
> > > > To fix it, I think we can prevent user from executing ALTER
> > > > SUBSCRIPTION set
> > > > (failover) inside a txn block, which is also consistent to the
> > > > ALTER SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach
> a
> > > > small patch to address this.
> > >
> > > Thanks for posting the patch, the fix is same as my expectation.
> > > Also, should we add the restriction to the doc? I feel [1] can be updated.
> >
> > +1.
> >
> > Similar to ALTER SUB, CREATE SUB also needs to be fixed. This is
> > because we call alter_replication_slot in CREATE SUB as well, for the
> > case when slot_name is provided and create_slot=false. But the tricky
> > part is we call alter_replication_slot() when creating subscription
> > for both failover=true and false. That means if we want to fix it on
> > the similar line of ALTER SUB, we have to disallow user from executing
> > the CREATE SUBSCRIPTION (slot_name = xx) in a txn block, which seems
> > to break some existing use cases. (previously, user can execute such a
> > command inside a txn block).
> >
> > So, we need to think if there are better ways to fix it.  After
> > discussion with Hou-San offlist, here are some ideas:
> >
> > 1. do not alter replication slot's failover option when CREATE
> > SUBSCRIPTION   WITH failover=false. This means we alter replication
> > slot only when failover is set to true. And thus we can disallow
> > CREATE SUB WITH (slot_name =xx, failover=true, create_slot=false)
> > inside a txn block.
> >
> > This option allows user to run CREATE-SUB(create_slot=false) with
> > failover=false in txn block like earlier. But on the downside, it
> > makes the behavior inconsistent for otherwise simpler option like
> > failover,  i.e. with failover=true, CREATE SUB is not allowed in txn
> > block while with failover=false, it is allowed. It makes it slightly
> > complex to be understood by user.
> >
> > 2. let's not disallow CREATE SUB in txn block as earlier, just don't
> > perform internal alter-failover during CREATE SUB for existing
> > slots(create_slot=false, slot_name=xx)  i.e. when create_slot is
> > false, we will ignore failover parameter of CREATE SUB and it is
> > user's responsibility to set it appropriately using ALTER SUB cmd. For
> > create_slot=true, the behaviour of CREATE-SUB remains same as earlier.
> >
> > This option does not add new restriction for CREATE SUB wrt txn block.
> > In context of failover with create_slot=false, we already have a
> > similar restriction (documented one) for ALTER SUB, i.e. with 'ALTER
> > SUBSCRIPTION SET(slot_name = new)', user needs to alter the new slot's
> > failover by himself. CREAT SUB can also be documented in similar way.
> > This seems simpler to be understood considering existing ALTER SUB's
> > behavior as well. Plus, this will make CREATE-SUB code slightly
> > simpler and thus easily manageable.
> >
> 
> +1 for option 2 as it sounds logical to me and consistent with ALTER
> SUBSCRIPTION.

+1.

Here is V2 patch which includes the changes for CREATE SUBSCRIPTION as
suggested. Since we don't connect pub to alter slot when (create_slot=false)
anymore, the restriction that disallows failover=true when connect=false is
also removed.

Best Regards,
Hou zj


v2-0001-Fix-the-handling-of-failover-option-in-subscripti.patch
Description:  v2-0001-Fix-the-handling-of-failover-option-in-subscripti.patch

Re: pg_combinebackup does not detect missing files

2024-04-18 Thread David Steele


On 4/19/24 00:50, Robert Haas wrote:

On Wed, Apr 17, 2024 at 7:09 PM David Steele  wrote:


Fair enough. I accept that your reasoning is not random, but I'm still
not very satisfied that the user needs to run a separate and rather
expensive process to do the verification when pg_combinebackup already
has the necessary information at hand. My guess is that most users will
elect to skip verification.


I think you're probably right that a lot of people will skip it; I'm
just less convinced than you are that it's a bad thing. It's not a
*great* thing if people skip it, but restore time is actually just
about the worst time to find out that you have a problem with your
backups. I think users would be better served by verifying stored
backups periodically when they *don't* need to restore them. 


Agreed, running verify regularly is a good idea, but in my experience 
most users are only willing to run verify once they suspect (or know) 
there is an issue. It's a pretty expensive process depending on how many 
backups you have and where they are stored.


> Also,

saying that we have all of the information that we need to do the
verification is only partially true:

- we do have to parse the manifest anyway, but we don't have to
compute checksums anyway, and I think that cost can be significant
even for CRC-32C and much more significant for any of the SHA variants

- we don't need to read all of the files in all of the backups. if
there's a newer full, the corresponding file in older backups, whether
full or incremental, need not be read

- incremental files other than the most recent only need to be read to
the extent that we need their data; if some of the same blocks have
been changed again, we can economize

How much you save because of these effects is pretty variable. Best
case, you have a 2-backup chain with no manifest checksums, and all
verification will have to do that you wouldn't otherwise need to do is
walk each older directory tree in toto and cross-check which files
exist against the manifest. That's probably cheap enough that nobody
would be too fussed. Worst case, you have a 10-backup (or whatever)
chain with SHA512 checksums and, say, a 50% turnover rate. In that
case, I think having verification happen automatically could be a
pretty major hit, both in terms of I/O and CPU. If your database is
1TB, it's ~5.5TB of read I/O (because one 1TB full backup and 9 0.5TB
incrementals) instead of ~1TB of read I/O, plus the checksumming.

Now, obviously you can still feel that it's totally worth it, or that
someone in that situation shouldn't even be using incremental backups,
and it's a value judgement, so fair enough. But my guess is that the
efforts that this implementation makes to minimize the amount of I/O
required for a restore are going to be important for a lot of people.


Sure -- pg_combinebackup would only need to verify the data that it 
uses. I'm not suggesting that it should do an exhaustive verify of every 
single backup in the chain. Though I can see how it sounded that way 
since with pg_verifybackup that would pretty much be your only choice.


The beauty of doing verification in pg_combinebackup is that it can do a 
lot less than running pg_verifybackup against every backup but still get 
a valid result. All we care about is that the output is correct -- if 
there is corruption in an unused part of an earlier backup 
pg_combinebackup doesn't need to care about that.


As far as I can see, pg_combinebackup already checks most of the boxes. 
The only thing I know that it can't do is detect missing files and that 
doesn't seem like too big a thing to handle.


Regards,
-David

Re: Disallow changing slot's failover option in transaction block

On Thu, Apr 18, 2024 at 11:22:24AM +0530, Amit Kapila wrote:
> +1 for option 2 as it sounds logical to me and consistent with ALTER
> SUBSCRIPTION. BTW, IIUC, you are referring to: "When altering the
> slot_name, the failover and two_phase property values of the named
> slot may differ from the counterpart failover and two_phase parameters
> specified in the subscription. When creating the slot, ensure the slot
> properties failover and two_phase match their counterpart parameters
> of the subscription." in docs [1], right?

FWIW, I'd also favor option 2, mostly on consistency ground as it
would offer a better user-experience.  On top of that, you're saying
that may lead to some simplifications in the CREATE path.  Without a
patch, it's hard to tell, though.

As far as I can see, this is not tracked as an open item and it should
be.  So I have added one.
--
Michael


signature.asc
Description: PGP signature

Re: Support a wildcard in backtrace_functions

On Thu, Apr 18, 2024 at 12:21:56PM +0200, Jelte Fennema-Nio wrote:
> On Thu, 18 Apr 2024 at 09:02, Michael Paquier  wrote:
>> On Fri, Apr 12, 2024 at 09:36:36AM +0900, Michael Paquier wrote:
>> log_backtrace speaks a bit more to me as a name for this stuff because
>> it logs a backtrace.  Now, there is consistency on HEAD as well
>> because these GUCs are all prefixed with "backtrace_".  Would
>> something like a backtrace_mode where we have an enum rather than a
>> boolean be better?
> 
> I guess it depends what we want consistency with. If we want naming
> consistency with all other LOGGING_WHAT GUCs or if we want naming
> consistency with the current backtrace_functions GUC. I personally
> like log_backtrace slightly better, but I don't have a super strong
> opinion on this either. Another option could be plain "backtrace".

"backtrace" is too generic IMO.  I'd append a "mode" as an effect of
backtrace_functions, which is also a developer option, and has been
around for a couple of releases now.

>> One thing would be to redesign the existing GUC as
>> having two values on HEAD as of:
>> - "none", to log nothing.
>> - "internal", to log backtraces for internal errors.
>
> If we go for backtrace_mode or backtrace, then I think I'd prefer
> "disabled"/"off" and "internal_error" for these values.

"internal_error" as a value sounds fine to me, that speaks more than
just "internal".  "off" rather that "none" or "disabled", less so,
because it requires more enum entries to map with the boolean values
that could be expected from it.  "disabled" would be mostly a first in
the GUCs, icu_validation_level being the first one using it, so I'd
rather choose "none" over "disabled".  No strong preference on this
one, TBH, but as we're bike-shedding that..
--
Michael


signature.asc
Description: PGP signature

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-18 Thread Dmitry Koval


Hi!

18.04.2024 19:00, Alexander Lakhin wrote:

leaves a strange constraint:
\d+ t*
   Table "public.tp_0"
...
Not-null constraints:
     "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i"


Thanks!
Attached fix (with test) for this case.
The patch should be applied after patches
v6-0001- ... .patch ... v6-0004- ... .patch

--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.comFrom 58e4b7fb1d3b15cdf1c742c28690392dda34915d Mon Sep 17 00:00:00 2001
From: Koval Dmitry 
Date: Fri, 19 Apr 2024 01:57:49 +0300
Subject: [PATCH v6] Fix

---
 src/backend/commands/tablecmds.c  | 49 ++-
 src/test/regress/expected/partition_merge.out | 13 -
 src/test/regress/sql/partition_merge.sql  |  8 ++-
 3 files changed, 33 insertions(+), 37 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 72874295cb..8985747180 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -21508,9 +21508,6 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
ListCell   *listptr;
List   *mergingPartitionsList = NIL;
Oid defaultPartOid;
-   chartmpRelName[NAMEDATALEN];
-   RangeVar   *mergePartName = cmd->name;
-   boolisSameName = false;
 
/*
 * Lock all merged partitions, check them and create list with 
partitions
@@ -21532,8 +21529,22 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
 * function transformPartitionCmdForMerge().
 */
if (equal(name, cmd->name))
+   {
/* One new partition can have the same name as merged 
partition. */
-   isSameName = true;
+   chartmpRelName[NAMEDATALEN];
+
+   /* Generate temporary name. */
+   sprintf(tmpRelName, "merge-%u-%X-tmp", 
RelationGetRelid(rel), MyProcPid);
+
+   /* Rename partition. */
+   
RenameRelationInternal(RelationGetRelid(mergingPartition),
+  tmpRelName, 
false, false);
+   /*
+* We must bump the command counter to make the new 
partition tuple
+* visible for rename.
+*/
+   CommandCounterIncrement();
+   }
 
/* Store a next merging partition into the list. */
mergingPartitionsList = lappend(mergingPartitionsList,
@@ -21553,16 +21564,8 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
DetachPartitionFinalize(rel, mergingPartition, false, 
defaultPartOid);
}
 
-   /* Create table for new partition, use partitioned table as model. */
-   if (isSameName)
-   {
-   /* Create partition table with generated temporary name. */
-   sprintf(tmpRelName, "merge-%u-%X-tmp", RelationGetRelid(rel), 
MyProcPid);
-   mergePartName = makeRangeVar(cmd->name->schemaname, tmpRelName, 
-1);
-   }
-
newPartRel = createPartitionTable(rel,
- 
mergePartName,
+ 
cmd->name,
  
makeRangeVar(get_namespace_name(RelationGetNamespace(rel)),

   RelationGetRelationName(rel), -1),
  
context);
@@ -21588,26 +21591,6 @@ ATExecMergePartitions(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
}
list_free(mergingPartitionsList);
 
-   /* Rename new partition if it is needed. */
-   if (isSameName)
-   {
-   /*
-* We must bump the command counter to make the new partition 
tuple
-* visible for rename.
-*/
-   CommandCounterIncrement();
-
-   /* Rename partition. */
-   RenameRelationInternal(RelationGetRelid(newPartRel),
-  cmd->name->relname, 
false, false);
-
-   /*
-* Bump the command counter to make the tuple of renamed 
partition
-* visible for attach partition operation.
-*/
-   CommandCounterIncrement();
-   }
-
/*
 * Attach a new partition to the partitioned table. wqueue = NULL:
 * verification for each cloned constraint is not needed.
diff --git a/src/test/regress/expected/partition_merge.out

Re: pg_combinebackup fails on file named INCREMENTAL.*

2024-04-18 Thread David Steele


On 4/19/24 01:10, Robert Haas wrote:

On Wed, Apr 17, 2024 at 7:56 PM David Steele  wrote:

Thanks! I've tested this and it works as advertised.

Ideally I'd want an error on backup if there is a similar file in any
data directories that would cause an error on combine, but I admit that
it is vanishingly rare for users to put files anywhere but the root of
PGDATA, so this approach seems OK to me.


OK, committed. Thanks for the review.


Excellent, thank you!

Regards,
-David

Re: plenty code is confused about function level static

On Thu, Apr 18, 2024 at 07:56:35PM +0200, Peter Eisentraut wrote:
> On 18.04.24 19:11, Andres Freund wrote:
>> Thoughts about when to apply these? Arguably they're fixing mildly broken
>> code, making it appropriate to fix in 17, but it's also something that we
>> could end up fixing for a while...
> 
> Yeah, let's keep these for later.  They are not regressions, and there is no
> end in sight yet.  I have some other related stuff queued up, so if we're
> going to start adjusting these kinds of things now, it would open a can of
> worms.

This is a set of optimizations for stuff that has accumulated across
the years in various code paths, so I'd vote on the side of caution
and wait until v18 opens for business.
--
Michael


signature.asc
Description: PGP signature

Re: cfbot is failing all tests on FreeBSD/Meson builds

On Thu, Feb 8, 2024 at 3:53 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Tue, Jan 30, 2024 at 5:06 PM Tom Lane  wrote:
> >> Thomas Munro  writes:
> >>> Ahh, there is one: https://github.com/cpan-authors/IO-Tty/issues/38
>
> >> Just for the archives' sake: I hit this today on a fresh install
> >> of FreeBSD 14.0, which has pulled in p5-IO-Tty-1.18.  Annoying...
>
> > FWIW here's what I did to downgrade:
>
> Thanks for the recipe --- this worked for me, although I noticed
> it insisted on installing perl5.34-5.34.3_3 alongside 5.36.
> Doesn't seem to be a problem though --- the main perl installation
> is still 5.36.

Looks like CI is broken in this way again, as of ~13 hours ago.
Looking into that...

> Also, I'm not entirely convinced that the above-cited issue report is
> complaining about the same thing that's biting us.  The reported error
> messages are completely different.

You could be right about that.  It seems there was a clash between an
upstream commit and a patch in FBSD's port tree, which has just been
removed:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276535

So perhaps it's time for me to undo what I did before...  looking now.

Re: Solaris tar issues, or other reason why margay fails 010_pg_basebackup?

On Fri, Apr 19, 2024 at 12:57 AM Marcel Hofstetter
 wrote:
> SKIP_READLINE_TESTS works. margay is now green again.

Great!  FTR there was a third thing revealed by margay since you
enabled the TAP tests: commit e2a23576.

I would guess that the best chance of getting the readline stuff to
actually work would be to interest someone who hacks on
IPC::Run-and-related-stuff (*cough* Noah *cough*) and who has Solaris
access to look at that... I would guess it needs a one-line fix
relating to raw/cooked behaviour, but as the proverbial mechanic said,
most of the fee is for knowing where to hit it...

Re: Popcount optimization using AVX512

On Thu, Apr 18, 2024 at 10:11:08PM +, Devulapalli, Raghuveer wrote:
>> On that note, is it necessary to also check for avx512f?  At the moment,
>> we are assuming that's supported if the other AVX-512 instructions are
>> available.
> 
> No, it's not needed. There are no CPU's with avx512bw/avx512popcnt
> without avx512f.  Unfortunately though, avx512popcnt does not mean
> avx512bw (I think the deprecated Xeon Phi processors falls in this
> category) which is why we need both.

Makes sense, thanks.  I'm planning to commit this fix sometime early next
week.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: brininsert optimization opportunity

2024-04-18 Thread Tomas Vondra

Hi,

Here's two patched to deal with this open item. 0001 is a trivial fix of
typos and wording, I moved it into a separate commit for clarity. 0002
does the actual fix - adding the index_insert_cleanup(). It's 99% the
patch Alvaro shared some time ago, with only some minor formatting
tweaks by me.

I've also returned to this Alvaro's comment:

> Lastly, I kinda disagree with the notion that only some of the callers
> of aminsert should call aminsertcleanup, even though btree doesn't
> have an aminsertcleanup and thus it can't affect TOAST or catalogs.

which was a reaction to my earlier statement about places calling
index_insert():

> There's a call in toast_internals.c, but that seems OK because that
> only deals with btree indexes (and those don't need any cleanup). The
> same logic applies to unique_key_recheck(). The rest goes through
> execIndexing.c, which should do the cleanup in ExecCloseIndices().

I think Alvaro is right, so I went through all index_insert() callers
and checked which need the cleanup. Luckily there's not that many of
them, only 5 in total call index_insert() directly:

1) toast_save_datum (src/backend/access/common/toast_internals.c)

  This is safe, because the index_insert() passes indexInfo=NULL, so
  there can't possibly be any cache. If we ever decide to pass a valid
  indexInfo, we can add the cleanup, now it seems pointless.

  Note: If someone created a BRIN index on a TOAST table, that'd already
  crash, because BRIN blindly dereferences the indexInfo. Maybe that
  should be fixed, but we don't support CREATE INDEX on TOAST tables.

2) heapam_index_validate_scan (src/backend/access/heap/heapam_handler.c)

  Covered by the committed fix, adding cleanup to validate_index.

3) CatalogIndexInsert (src/backend/catalog/indexing.c)

  Covered by all callers also calling CatalogCloseIndexes, which in turn
  calls ExecCloseIndices and cleanup.

4) unique_key_recheck (src/backend/commands/constraint.c)

  This seems like the only place missing the cleanup call.

5) ExecInsertIndexTuples (src/backend/executor/execIndexing.c)

  Should be covered by ExecCloseIndices, called after the insertions.


So it seems only (4) unique_key_recheck needs the extra call (it can't
really happen higher because the indexInfo is a local variable). So the
0002 patch adds the call.

The patch also adds a test for this (or rather tweaks an existing one).


It's a bit too late for me to push this now, I'll do so early tomorrow.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 0f89677b99b4b70ddfcc6c2cd08f433584bf65aa Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 18 Apr 2024 22:22:37 +0200
Subject: [PATCH 1/2] Fix a couple typos in BRIN code

Typos introduced by commits c1ec02be1d79, b43757171470 and dae761a87eda.

Author: Alvaro Herrera
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/202401091043.e3nrqiad6gb7@alvherre.pgsql
---
 doc/src/sgml/indexam.sgml  |  2 +-
 src/backend/access/brin/brin.c | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 76ac0fcddd7..18cf23296f2 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -370,7 +370,7 @@ aminsert (Relation indexRelation,
initially). After the index insertions complete, the memory will be freed
automatically. If additional cleanup is required (e.g. if the cache contains
buffers and tuple descriptors), the AM may define an optional function
-   indexinsertcleanup, called before the memory is released.
+   aminsertcleanup, called before the memory is released.
   
 
   
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 32722f0961b..4f708bba658 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -222,7 +222,7 @@ static bool add_values_to_range(Relation idxRel, BrinDesc *bdesc,
 BrinMemTuple *dtup, const Datum *values, const bool *nulls);
 static bool check_null_keys(BrinValues *bval, ScanKey *nullkeys, int nnullkeys);
 static void brin_fill_empty_ranges(BrinBuildState *state,
-   BlockNumber prevRange, BlockNumber maxRange);
+   BlockNumber prevRange, BlockNumber nextRange);
 
 /* parallel index builds */
 static void _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
@@ -1151,8 +1151,8 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	 *
 	 * XXX plan_create_index_workers makes the number of workers dependent on
 	 * maintenance_work_mem, requiring 32MB for each worker. That makes sense
-	 * for btree, but not for BRIN, which can do away with much less memory.
-	 * So maybe make that somehow less strict, optionally?
+	 * for btree, but not for BRIN, which can do with much less memory. So
+	 * maybe make that somehow less strict, optionally?
 	 */
 	if (indexInfo->ii_ParallelWorkers > 0)

RE: Popcount optimization using AVX512

2024-04-18 Thread Devulapalli, Raghuveer

> On that note, is it necessary to also check for avx512f?  At the moment, we 
> are assuming that's supported if the other AVX-512 instructions are available.

No, it's not needed. There are no CPU's with avx512bw/avx512popcnt without 
avx512f.  Unfortunately though, avx512popcnt does not mean avx512bw (I think 
the deprecated Xeon Phi processors falls in this category) which is why we need 
both.

Re: Popcount optimization using AVX512

On Thu, Apr 18, 2024 at 09:29:55PM +, Devulapalli, Raghuveer wrote:
> (1) Shouldn't it be: return (_xgetbv(0) & 0xe6) == 0xe6; ? Otherwise
> zmm_regs_available() will return false..

Yes, that's a mistake.  I fixed that in v3.

> (2) Nitpick: avx512_popcnt_available and avx512_bw_available() run the
> same cpuid leaf. You could combine them into one to avoid running cpuid
> twice. My apologies, I should have mentioned this before..

Good call.  The byte-and-word instructions were a late addition to the
patch, so I missed this originally.

On that note, is it necessary to also check for avx512f?  At the moment, we
are assuming that's supported if the other AVX-512 instructions are
available.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From e04c348eb389c6aa1597ac35d57b5e7ae7075381 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 18 Apr 2024 15:57:56 -0500
Subject: [PATCH v3 1/1] osxsave

---
 src/port/pg_popcount_avx512_choose.c | 80 
 1 file changed, 47 insertions(+), 33 deletions(-)

diff --git a/src/port/pg_popcount_avx512_choose.c b/src/port/pg_popcount_avx512_choose.c
index ae3fa3d306..b37107803a 100644
--- a/src/port/pg_popcount_avx512_choose.c
+++ b/src/port/pg_popcount_avx512_choose.c
@@ -34,39 +34,13 @@
 #ifdef TRY_POPCNT_FAST
 
 /*
- * Returns true if the CPU supports the instructions required for the AVX-512
- * pg_popcount() implementation.
+ * Does CPUID say there's support for XSAVE instructions?
  */
-bool
-pg_popcount_avx512_available(void)
+static inline bool
+xsave_available(void)
 {
 	unsigned int exx[4] = {0, 0, 0, 0};
 
-	/* Does CPUID say there's support for AVX-512 popcount instructions? */
-#if defined(HAVE__GET_CPUID_COUNT)
-	__get_cpuid_count(7, 0, [0], [1], [2], [3]);
-#elif defined(HAVE__CPUIDEX)
-	__cpuidex(exx, 7, 0);
-#else
-#error cpuid instruction not available
-#endif
-	if ((exx[2] & (1 << 14)) == 0)	/* avx512-vpopcntdq */
-		return false;
-
-	/* Does CPUID say there's support for AVX-512 byte and word instructions? */
-	memset(exx, 0, sizeof(exx));
-#if defined(HAVE__GET_CPUID_COUNT)
-	__get_cpuid_count(7, 0, [0], [1], [2], [3]);
-#elif defined(HAVE__CPUIDEX)
-	__cpuidex(exx, 7, 0);
-#else
-#error cpuid instruction not available
-#endif
-	if ((exx[1] & (1 << 30)) == 0)	/* avx512-bw */
-		return false;
-
-	/* Does CPUID say there's support for XSAVE instructions? */
-	memset(exx, 0, sizeof(exx));
 #if defined(HAVE__GET_CPUID)
 	__get_cpuid(1, [0], [1], [2], [3]);
 #elif defined(HAVE__CPUID)
@@ -74,15 +48,55 @@ pg_popcount_avx512_available(void)
 #else
 #error cpuid instruction not available
 #endif
-	if ((exx[2] & (1 << 26)) == 0)	/* xsave */
-		return false;
+	return (exx[2] & (1 << 27)) != 0;	/* osxsave */
+}
 
-	/* Does XGETBV say the ZMM registers are enabled? */
+/*
+ * Does XGETBV say the ZMM registers are enabled?
+ *
+ * NB: Caller is responsible for verifying that xsave_available() returns true
+ * before calling this.
+ */
+static inline bool
+zmm_regs_available(void)
+{
 #ifdef HAVE_XSAVE_INTRINSICS
-	return (_xgetbv(0) & 0xe0) != 0;
+	return (_xgetbv(0) & 0xe6) == 0xe6;
 #else
 	return false;
 #endif
 }
 
+/*
+ * Does CPUID say there's support for AVX-512 popcount and byte-and-word
+ * instructions?
+ */
+static inline bool
+avx512_popcnt_available(void)
+{
+	unsigned int exx[4] = {0, 0, 0, 0};
+
+#if defined(HAVE__GET_CPUID_COUNT)
+	__get_cpuid_count(7, 0, [0], [1], [2], [3]);
+#elif defined(HAVE__CPUIDEX)
+	__cpuidex(exx, 7, 0);
+#else
+#error cpuid instruction not available
+#endif
+	return (exx[2] & (1 << 14)) != 0 && /* avx512-vpopcntdq */
+		(exx[1] & (1 << 30)) != 0;	/* avx512-bw */
+}
+
+/*
+ * Returns true if the CPU supports the instructions required for the AVX-512
+ * pg_popcount() implementation.
+ */
+bool
+pg_popcount_avx512_available(void)
+{
+	return xsave_available() &&
+		zmm_regs_available() &&
+		avx512_popcnt_available();
+}
+
 #endif			/* TRY_POPCNT_FAST */
-- 
2.25.1

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

On Thu, 18 Apr 2024 at 23:36, Jelte Fennema-Nio  wrote:
> To clarify: My proposed approach is to use a protocol extension
> parameter for to enable the new messages that the server can send
> (like Peter is doing now). And **in addition to that** gate the new
> Bind format type behind a feature switch.

ugh, correction: gate the new Bind format type behind a **protocol bump**

Re: Idea Feedback: psql \h misses -> Offers Links?

2024-04-18 Thread Dagfinn Ilmari Mannsåker

Kirk Wolak  writes:

> On Thu, Apr 18, 2024 at 2:37 PM Peter Eisentraut 
> wrote:
>
>> On 17.04.24 19:47, Kirk Wolak wrote:
>> > *Example:*
>> > \h current_setting
>> > No help available for "current_setting".
>> > Try \h with no arguments to see available help.
>> >
>> > https://www.postgresql.org/search/?u=%2Fdocs%2F16%2F=current_setting
>> > 
>>
>> One problem is that this search URL does not actually produce any useful
>> information about current_setting.
>
> I see what you mean, but doesn't that imply our web search feature is
> weak?  That's the full name of an existing function, and it's in the index.
> But it cannot be found if searched from the website?

While I do think we could do a better job of providing links directly to
the documentation of functions and config parameters, I wouldn't say
that the search result is _completely_ useless in this case.  The first
hit is https://www.postgresql.org/docs/16/functions-admin.html, which is
where current_setting() is documented (it's even the first function on
that page, but that's just luck in this case).

- ilmari

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

On Thu, 18 Apr 2024 at 22:17, Robert Haas  wrote:
> If we go with Peter's approach, every driver that supports his feature
> will work perfectly, and every driver that doesn't will work exactly
> as it does today. The risk of breaking anything is as near to zero as
> human developers can reasonably hope to achieve. Nobody who doesn't
> care about the feature will have to lift a single finger, today,
> tomorrow, or ever. That's absolutely brilliant.
>
> If we instead go with your approach, then anyone who wants to support
> 3.2 when it materializes will have to also support 3.1, which means
> they have to support this feature.

To clarify: My proposed approach is to use a protocol extension
parameter for to enable the new messages that the server can send
(like Peter is doing now). And **in addition to that** gate the new
Bind format type behind a feature switch. There is literally nothing
clients will have to do to "support" that feature (except for
requesting a higher version protocol). Your suggestion of not bumping
the version but still allowing the new format type on version 3.0
doesn't have any advantage afaict, except secretly hiding from any
pooler in the middle that such a format type might be sent.

> Also, even just 3.1 is going to break
> something for somebody. There's just no way that we've left the
> protocol version unchanged for this long and the first change we make
> doesn't cause some collateral damage.

Sure, but the exact same argument holds for protocol extension
parameters. We've never set them, so they are bound to break something
the first time. My whole point is that once we bite that bullet, the
next protocol parameters and protocol version bumps won't cause such
breakage.

> Sure, those are minor downsides in the grand scheme of things. But
> AFAICS the only downside of Peter's approach that you've alleged is
> that doesn't involve bumping the version number. Of course, if bumping
> the version number is an intrinsic good, then no further justification
> is required, but I don't buy that. I do not believe that users or
> maintainers will throw us a pizza party when they find out that we've
> changed the version number. There's no reason for anyone to be happy
> about that for its own sake.

As a connection pooler maintainer I would definitely love it if every
protocol change required either a protocol version parameter or a
protocol version bump. That way I can easily check every release if
the protocol changed by looking at two things, instead of diffing the
protocol docs for some tiny "supposedly irrelevant" change was made.

Re: documentation structure

2024-04-18 Thread Dagfinn Ilmari Mannsåker

Corey Huinker  writes:

>>
>> I havent dealt with variadic yet, since the two styles are visually
>> different, not just markup (... renders as [...]).
>>
>> The two styles for variadic are the what I call caller-style:
>>
>>concat ( val1 "any" [, val2 "any" [, ...] ] )
>>format(formatstr text [, formatarg "any" [, ...] ])
>>
>
> While this style is obviously clumsier for us to compose, it does avoid
> relying on the user understanding what the word variadic means. Searching
> through online documentation of the python *args parameter, the word
> variadic never comes up, the closest they get is "variable length
> argument". I realize that python is not SQL, but I think it's a good point
> of reference for what concepts the average reader is likely to know.

Yeah, we can't expect everyone wanting to call a built-in function to
know how they would define an equivalent one themselves. In that case I
propos marking it up like this:

format (
formatstr text
, formatarg "any"
, ...  )
text

> Looking at the patch, I think it is good, though I'd consider doing some
> indentation for the nested s to allow the author to do more
> visual tag-matching. The ']'s were sufficiently visually distinct that we
> didn't really need or want nesting, but  is just another tag to
> my eyes in a sea of tags.

The requisite nesting when there are multiple optional parameters makes
it annoying to wrap and indent it "properly" per XML convention, but how
about something like this, with each parameter on a line of its own, and
all the closing  tags on one line?

regexp_substr (
string text,
pattern text
, start integer
, N integer
, flags text
, subexpr integer
)
text

A lot of functions mostly follow this style, except they tend to put the
first parameter on the same line of the function namee, even when that
makes the line overly long. I propose going the other way, with each
parameter on a line of its own, even if the first one would fit after
the function name, except the whole parameter list fits after the
function name.

Also, when there's only one optional argument, or they're independently
optional, not nested, the  tag should go on the same line as
the parameter.

substring (
bits bit
 FROM start 
integer 
 FOR count 
integer  )
bit

I'm not quite sure what to with things like json_object which have even
more complex nexting of optional parameters, but I do think the current
200+ character lines are too long.

- ilmari

RE: Popcount optimization using AVX512

2024-04-18 Thread Devulapalli, Raghuveer

> Thanks for the feedback.  I've attached an updated patch.

(1) Shouldn't it be: return (_xgetbv(0) & 0xe6) == 0xe6; ? Otherwise 
zmm_regs_available() will return false. 
(2) Nitpick: avx512_popcnt_available and avx512_bw_available() run the same 
cpuid leaf. You could combine them into one to avoid running cpuid twice. My 
apologies, I should have mentioned this before.

Re: Idea Feedback: psql \h misses -> Offers Links?

2024-04-18 Thread Kirk Wolak

On Thu, Apr 18, 2024 at 2:37 PM Peter Eisentraut 
wrote:

> On 17.04.24 19:47, Kirk Wolak wrote:
> > *Example:*
> > \h current_setting
> > No help available for "current_setting".
> > Try \h with no arguments to see available help.
> >
> > https://www.postgresql.org/search/?u=%2Fdocs%2F16%2F=current_setting
> > 
>
> One problem is that this search URL does not actually produce any useful
> information about current_setting.
>
> I see what you mean, but doesn't that imply our web search feature is
weak?  That's the full name of an existing function, and it's in the index.
But it cannot be found if searched from the website?

Re: Cannot find a working 64-bit integer type on Illumos

On Thu, Apr 18, 2024 at 6:09 PM Japin Li  wrote:
> /home/japin/postgres/build/../src/common/config_info.c:198:11: error: 
> comparison of integer expressions of different signedness: 'int' and 'size_t' 
> {aka 'long unsigned int'} [-Werror=sign-compare]
>   198 |  Assert(i == *configdata_len);

Right, PostgreSQL doesn't compile cleanly with the "sign-compare"
warning.  There have been a few threads about that, and someone might
want to think harder about it, but it's a different topic unrelated to
.

> /home/japin/postgres/build/../src/include/lib/simplehash.h:1138:9: error: 
> format '%llu' expects argument of type 'long long unsigned int', but argument 
> 4 has type 'uint64' {aka 'long unsigned int'} [-Werror=format=]

It seems my v1 patch's configure probe for INT64_FORMAT was broken.
In the v2 patch I tried not doing that probe at all, and instead
inviting  into our world (that's the standardised way to
produce format strings, which has the slight complication that we are
intercepting printf calls...).  I suspect that'll work better for you.

Re: improve performance of pg_dump --binary-upgrade

On Thu, Apr 18, 2024 at 10:33:08PM +0200, Daniel Gustafsson wrote:
> From a read-through they look good, a very nice performance improvement in an
> important path.  I think it would be nice with some comments on the
> BinaryUpgradeClassOids struct (since the code using it is thousands of lines
> away), and a comment on the if (oids == NULL) block explaining the caching.

Added.  Thanks for reviewing!  Unfortunately, this one will have to sit for
a couple months...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 6797e4e1fc1a63f710ed0a409b1337880eb91ad4 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 18 Apr 2024 15:15:19 -0500
Subject: [PATCH v3 1/2] Remove is_index parameter from
 binary_upgrade_set_pg_class_oids().

---
 src/bin/pg_dump/pg_dump.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 6d2f3fdef3..58c77a5e2b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -324,7 +324,7 @@ static void binary_upgrade_set_type_oids_by_rel(Archive *fout,
 const TableInfo *tbinfo);
 static void binary_upgrade_set_pg_class_oids(Archive *fout,
 			 PQExpBuffer upgrade_buffer,
-			 Oid pg_class_oid, bool is_index);
+			 Oid pg_class_oid);
 static void binary_upgrade_extension_member(PQExpBuffer upgrade_buffer,
 			const DumpableObject *dobj,
 			const char *objtype,
@@ -5394,8 +5394,7 @@ binary_upgrade_set_type_oids_by_rel(Archive *fout,
 
 static void
 binary_upgrade_set_pg_class_oids(Archive *fout,
- PQExpBuffer upgrade_buffer, Oid pg_class_oid,
- bool is_index)
+ PQExpBuffer upgrade_buffer, Oid pg_class_oid)
 {
 	PQExpBuffer upgrade_query = createPQExpBuffer();
 	PGresult   *upgrade_res;
@@ -5444,7 +5443,8 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
 	appendPQExpBufferStr(upgrade_buffer,
 		 "\n-- For binary upgrade, must preserve pg_class oids and relfilenodes\n");
 
-	if (!is_index)
+	if (relkind != RELKIND_INDEX &&
+		relkind != RELKIND_PARTITIONED_INDEX)
 	{
 		appendPQExpBuffer(upgrade_buffer,
 		  "SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('%u'::pg_catalog.oid);\n",
@@ -11864,7 +11864,7 @@ dumpCompositeType(Archive *fout, const TypeInfo *tyinfo)
 		binary_upgrade_set_type_oids_by_type_oid(fout, q,
  tyinfo->dobj.catId.oid,
  false, false);
-		binary_upgrade_set_pg_class_oids(fout, q, tyinfo->typrelid, false);
+		binary_upgrade_set_pg_class_oids(fout, q, tyinfo->typrelid);
 	}
 
 	qtypname = pg_strdup(fmtId(tyinfo->dobj.name));
@@ -15998,7 +15998,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
 
 		if (dopt->binary_upgrade)
 			binary_upgrade_set_pg_class_oids(fout, q,
-			 tbinfo->dobj.catId.oid, false);
+			 tbinfo->dobj.catId.oid);
 
 		appendPQExpBuffer(q, "CREATE VIEW %s", qualrelname);
 
@@ -16100,7 +16100,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
 
 		if (dopt->binary_upgrade)
 			binary_upgrade_set_pg_class_oids(fout, q,
-			 tbinfo->dobj.catId.oid, false);
+			 tbinfo->dobj.catId.oid);
 
 		appendPQExpBuffer(q, "CREATE %s%s %s",
 		  tbinfo->relpersistence == RELPERSISTENCE_UNLOGGED ?
@@ -16990,7 +16990,7 @@ dumpIndex(Archive *fout, const IndxInfo *indxinfo)
 
 		if (dopt->binary_upgrade)
 			binary_upgrade_set_pg_class_oids(fout, q,
-			 indxinfo->dobj.catId.oid, true);
+			 indxinfo->dobj.catId.oid);
 
 		/* Plain secondary index */
 		appendPQExpBuffer(q, "%s;\n", indxinfo->indexdef);
@@ -17244,7 +17244,7 @@ dumpConstraint(Archive *fout, const ConstraintInfo *coninfo)
 
 		if (dopt->binary_upgrade)
 			binary_upgrade_set_pg_class_oids(fout, q,
-			 indxinfo->dobj.catId.oid, true);
+			 indxinfo->dobj.catId.oid);
 
 		appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s\n", foreign,
 		  fmtQualifiedDumpable(tbinfo));
@@ -17653,7 +17653,7 @@ dumpSequence(Archive *fout, const TableInfo *tbinfo)
 	if (dopt->binary_upgrade)
 	{
 		binary_upgrade_set_pg_class_oids(fout, query,
-		 tbinfo->dobj.catId.oid, false);
+		 tbinfo->dobj.catId.oid);
 
 		/*
 		 * In older PG versions a sequence will have a pg_type entry, but v14
-- 
2.25.1

>From b3ae9df69fdbf386d09250f1c225ecd8665141a7 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 17 Apr 2024 22:55:27 -0500
Subject: [PATCH v3 2/2] Improve performance of pg_dump --binary-upgrade.

---
 src/bin/pg_dump/pg_dump.c| 126 ---
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 81 insertions(+), 46 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 58c77a5e2b..b1f06a199b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,6 +55,7 @@
 #include "catalog/pg_trigger_d.h"
 #include "catalog/pg_type_d.h"
 #include "common/connect.h"
+#include "common/int.h"

Re: Popcount optimization using AVX512

On Thu, Apr 18, 2024 at 08:24:03PM +, Devulapalli, Raghuveer wrote:
>> This seems to contradict the note about doing step 3 at any point, and
>> given step 1 is the OSXSAVE check, I'm not following what this means,
>> anyway.
> 
> It is recommended that you run the xgetbv code before you check for cpu
> features avx512-popcnt and avx512-bw. The way it is written now is the
> opposite order. I would also recommend splitting the cpuid feature check
> for avx512popcnt/avx512bw and xgetbv section into separate functions to
> make them modular. Something like:
> 
> static inline
> int check_os_avx512_support(void)
> {
> // (1) run cpuid leaf 1 to check for xgetbv instruction support:
> unsigned int exx[4] = {0, 0, 0, 0};
> __get_cpuid(1, [0], [1], [2], [3]);
> if ((exx[2] & (1 << 27)) == 0)  /* xsave */
> return false;
> 
> /* Does XGETBV say the ZMM/YMM/XMM registers are enabled? */
> return (_xgetbv(0) & 0xe0) == 0xe0;
> }
> 
>> I'm also wondering if we need to check that (_xgetbv(0) & 0xe6) == 0xe6
>> instead of just (_xgetbv(0) & 0xe0) != 0, as the status of the lower
>> half of some of the ZMM registers is stored in the SSE and AVX state
>> [0].  I don't know how likely it is that 0xe0 would succeed but 0xe6
>> wouldn't, but we might as well make it correct.
> 
> This is correct. It needs to check all the 3 bits (XMM/YMM and ZMM). The
> way it is written is now is in-correct. 

Thanks for the feedback.  I've attached an updated patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From d20b19804a17d9f6eab1d40de7e9fb10488ac6b0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 18 Apr 2024 15:57:56 -0500
Subject: [PATCH v2 1/1] osxsave

---
 src/port/pg_popcount_avx512_choose.c | 89 +++-
 1 file changed, 60 insertions(+), 29 deletions(-)

diff --git a/src/port/pg_popcount_avx512_choose.c b/src/port/pg_popcount_avx512_choose.c
index ae3fa3d306..009f94909a 100644
--- a/src/port/pg_popcount_avx512_choose.c
+++ b/src/port/pg_popcount_avx512_choose.c
@@ -34,27 +34,47 @@
 #ifdef TRY_POPCNT_FAST
 
 /*
- * Returns true if the CPU supports the instructions required for the AVX-512
- * pg_popcount() implementation.
+ * Does CPUID say there's support for XSAVE instructions?
  */
-bool
-pg_popcount_avx512_available(void)
+static inline bool
+xsave_available(void)
 {
 	unsigned int exx[4] = {0, 0, 0, 0};
 
-	/* Does CPUID say there's support for AVX-512 popcount instructions? */
-#if defined(HAVE__GET_CPUID_COUNT)
-	__get_cpuid_count(7, 0, [0], [1], [2], [3]);
-#elif defined(HAVE__CPUIDEX)
-	__cpuidex(exx, 7, 0);
+#if defined(HAVE__GET_CPUID)
+	__get_cpuid(1, [0], [1], [2], [3]);
+#elif defined(HAVE__CPUID)
+	__cpuid(exx, 1);
 #else
 #error cpuid instruction not available
 #endif
-	if ((exx[2] & (1 << 14)) == 0)	/* avx512-vpopcntdq */
-		return false;
+	return (exx[2] & (1 << 27)) != 0;	/* osxsave */
+}
+
+/*
+ * Does XGETBV say the ZMM registers are enabled?
+ *
+ * NB: Caller is responsible for verifying that xsave_available() returns true
+ * before calling this.
+ */
+static inline bool
+zmm_regs_available(void)
+{
+#ifdef HAVE_XSAVE_INTRINSICS
+	return (_xgetbv(0) & 0xe6) != 0xe6;
+#else
+	return false;
+#endif
+}
+
+/*
+ * Does CPUID say there's support for AVX-512 popcount instructions?
+ */
+static inline bool
+avx512_popcnt_available(void)
+{
+	unsigned int exx[4] = {0, 0, 0, 0};
 
-	/* Does CPUID say there's support for AVX-512 byte and word instructions? */
-	memset(exx, 0, sizeof(exx));
 #if defined(HAVE__GET_CPUID_COUNT)
 	__get_cpuid_count(7, 0, [0], [1], [2], [3]);
 #elif defined(HAVE__CPUIDEX)
@@ -62,27 +82,38 @@ pg_popcount_avx512_available(void)
 #else
 #error cpuid instruction not available
 #endif
-	if ((exx[1] & (1 << 30)) == 0)	/* avx512-bw */
-		return false;
+	return (exx[2] & (1 << 14)) != 0;	/* avx512-vpopcntdq */
+}
 
-	/* Does CPUID say there's support for XSAVE instructions? */
-	memset(exx, 0, sizeof(exx));
-#if defined(HAVE__GET_CPUID)
-	__get_cpuid(1, [0], [1], [2], [3]);
-#elif defined(HAVE__CPUID)
-	__cpuid(exx, 1);
+/*
+ * Does CPUID say there's support for AVX-512 byte and word instructions?
+ */
+static inline bool
+avx512_bw_available(void)
+{
+	unsigned int exx[4] = {0, 0, 0, 0};
+
+#if defined(HAVE__GET_CPUID_COUNT)
+	__get_cpuid_count(7, 0, [0], [1], [2], [3]);
+#elif defined(HAVE__CPUIDEX)
+	__cpuidex(exx, 7, 0);
 #else
 #error cpuid instruction not available
 #endif
-	if ((exx[2] & (1 << 26)) == 0)	/* xsave */
-		return false;
+	return (exx[1] & (1 << 30)) != 0;	/* avx512-bw */
+}
 
-	/* Does XGETBV say the ZMM registers are enabled? */
-#ifdef HAVE_XSAVE_INTRINSICS
-	return (_xgetbv(0) & 0xe0) != 0;
-#else
-	return false;
-#endif
+/*
+ * Returns true if the CPU supports the instructions required for the AVX-512
+ * pg_popcount() implementation.
+ */
+bool
+pg_popcount_avx512_available(void)
+{
+	return xsave_available() &&
+		zmm_regs_available() &&
+

Re: Cannot find a working 64-bit integer type on Illumos

On Thu, Apr 18, 2024 at 8:47 PM Peter Eisentraut  wrote:
> Maybe this means something like our int64 is long long int but the
> system's int64_t is long int underneath, but I don't see how that would
> matter for the limit macros.

Agreed, so I don't think it's long vs long long (when they have the same width).

I wonder if this comment is a clue:

static char *
inet_net_ntop_ipv6(const u_char *src, int bits, char *dst, size_t size)
{
/*
 * Note that int32_t and int16_t need only be "at least" large enough to
 * contain a value of the specified size.  On some systems, like Crays,
 * there is no such thing as an integer variable with 16 bits. Keep this
 * in mind if you think this function should have been coded to use
 * pointer overlays.  All the world's not a VAX.
 */

I'd seen that claim before somewhere else but I can't recall where.
So there were systems using those names in an ad hoc unspecified way
before C99 nailed this stuff down?  In modern C, int32_t is definitely
an exact width type (but there are other standardised variants like
int_fast32_t to allow for Cray-like systems that would prefer to use a
wider type, ie "at least", 32 bits wide, so I guess that's what
happened to that idea?).

Or perhaps it's referring to worries about the width of char, short,
int or the assumption of two's-complement.  I think if any of that
stuff weren't as assumed we'd have many problems in many places, so
I'm not seeing a problem.  (FTR C23 finally nailed down
two's-complement as a requirement, and although C might not say so,
POSIX says that char is a byte, and our assumption that int = int32_t
is pretty deeply baked into PostgreSQL, so it's almost impossible to
imagine that short has a size other than 16 bits; but these are all
assumptions made by the OLD coding, not by the patch I posted).  In
short, I guess that isn't what was meant.

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

Here are some additional fixes to docs.
>From 6da8beaa5a2b78e785e5b6519894f8357002d916 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 18 Apr 2024 15:40:44 -0500
Subject: [PATCH] doc review for ALTER TABLE ... SPLIT/MERGE PARTITION

---
 doc/src/sgml/ddl.sgml |  4 ++--
 doc/src/sgml/ref/alter_table.sgml | 22 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 026bfff70f3..01277b1d327 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4384,7 +4384,7 @@ ALTER INDEX measurement_city_id_logdate_key
 
 
  There is also an option for merging multiple table partitions into
- a single partition using the
+ a single partition using
  ALTER TABLE ... MERGE PARTITIONS.
  This feature simplifies the management of partitioned tables by allowing
  users to combine partitions that are no longer needed as
@@ -4403,7 +4403,7 @@ ALTER TABLE measurement
 
 
  Similarly to merging multiple table partitions, there is an option for
- splitting a single partition into multiple using the
+ splitting a single partition into multiple partitions using
  ALTER TABLE ... SPLIT PARTITION.
  This feature could come in handy when one partition grows too big
  and needs to be split into multiple.  It's important to note that
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index fe36ff82e52..e52cfee840c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1136,16 +1136,16 @@ WITH ( MODULUS numeric_literal, REM
   If the split partition is a DEFAULT partition, one of the new partitions must be DEFAULT.
   In case one of the new partitions or one of existing partitions is DEFAULT,
   new partitions partition_name1,
-  partition_name2, ... can have spaces
+  partition_name2, ... can have gaps
   between partitions bounds.  If the partitioned table does not have a DEFAULT
   partition, the DEFAULT partition can be defined as one of the new partitions.
  
  
   In case new partitions do not contain a DEFAULT partition and the partitioned table
-  does not have a DEFAULT partition, the following must be true: sum bounds of
+  does not have a DEFAULT partition, the following must be true: the sum bounds of
   new partitions partition_name1,
   partition_name2, ... should be
-  equal to bound of split partition partition_name.
+  equal to the bounds of split partition partition_name.
   One of the new partitions partition_name1,
   partition_name2, ... can have
   the same name as split partition partition_name
@@ -1168,24 +1168,24 @@ WITH ( MODULUS numeric_literal, REM
 
 
  
-  This form merges several partitions into the one partition of the target table.
-  Hash-partitioning is not supported.  If DEFAULT partition is not in the
+  This form merges several partitions of the target table into a single partition.
+  Hash-partitioning is not supported.  If a DEFAULT partition is not in the
   list of partitions partition_name1,
   partition_name2 [, ...]:
   

 
- For range-partitioned tables it is necessary that the ranges
+ For range-partitioned tables, it is necessary that the ranges
  of the partitions partition_name1,
  partition_name2 [, ...] can
- be merged into one range without spaces and overlaps (otherwise an error
+ be merged into one range with neither gaps nor overlaps (otherwise an error
  will be generated).  The combined range will be the range for the partition
  partition_name.
 


 
- For list-partitioned tables the value lists of all partitions
+ For list-partitioned tables, the value lists of all partitions
  partition_name1,
  partition_name2 [, ...] are
  combined and form the list of values of partition
@@ -1193,7 +1193,7 @@ WITH ( MODULUS numeric_literal, REM
 

   
-  If DEFAULT partition is in the list of partitions partition_name1,
+  If a DEFAULT partition is in the list of partitions partition_name1,
   partition_name2 [, ...]:
   

@@ -1204,8 +1204,8 @@ WITH ( MODULUS numeric_literal, REM


 
- For range- and list-partitioned tables the ranges and lists of values
- of the merged partitions can be any.
+ For range- and list-partitioned tables, the ranges and lists of values
+ of the merged partitions can be anything.
 

   
-- 
2.42.0

Re: improve performance of pg_dump --binary-upgrade

2024-04-18 Thread Daniel Gustafsson

> On 18 Apr 2024, at 22:28, Nathan Bossart  wrote:
> 
> On Thu, Apr 18, 2024 at 10:23:01AM -0500, Nathan Bossart wrote:
>> On Thu, Apr 18, 2024 at 09:24:53AM +0200, Daniel Gustafsson wrote:
>>> That does indeed seem like a saner approach.  Since we look up the relkind 
>>> we
>>> can also remove the is_index parameter to binary_upgrade_set_pg_class_oids
>>> since we already know that without the caller telling us?
>> 
>> Yeah.  It looks like that's been possible since commit 9a974cb, so I can
>> write a prerequisite patch for this.
> 
> Here's a new patch set with this change.

From a read-through they look good, a very nice performance improvement in an
important path.  I think it would be nice with some comments on the
BinaryUpgradeClassOids struct (since the code using it is thousands of lines
away), and a comment on the if (oids == NULL) block explaining the caching.

--
Daniel Gustafsson

Re: Cannot find a working 64-bit integer type on Illumos

On Thu, Apr 18, 2024 at 8:47 PM Peter Eisentraut  wrote:
> I'm not sure I understand the problem here.  Do you mean that in theory
> a platform's PRId64 could be something other than "l" or "ll"?

Yes.  I don't know why anyone would do that, and the systems I checked
all have the obvious definitions, eg "ld", "lld" etc.  Perhaps it's an
acceptable risk?  It certainly gives us a tidier result.

For discussion, here is a variant that fully embraces  and
the PRI*64 macros.

v2-0001-Use-int64_t-support-from-stdint.h-and-inttypes.h.patch
Description: Binary data

v2-0002-Remove-traces-of-BeOS.patch
Description: Binary data

Re: improve performance of pg_dump --binary-upgrade

On Thu, Apr 18, 2024 at 10:23:01AM -0500, Nathan Bossart wrote:
> On Thu, Apr 18, 2024 at 09:24:53AM +0200, Daniel Gustafsson wrote:
>> That does indeed seem like a saner approach.  Since we look up the relkind we
>> can also remove the is_index parameter to binary_upgrade_set_pg_class_oids
>> since we already know that without the caller telling us?
> 
> Yeah.  It looks like that's been possible since commit 9a974cb, so I can
> write a prerequisite patch for this.

Here's a new patch set with this change.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From d74692fe52d3c376d4b60323dd367a36593cd31b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 18 Apr 2024 15:15:19 -0500
Subject: [PATCH v2 1/2] Remove is_index parameter from
 binary_upgrade_set_pg_class_oids().

---
 src/bin/pg_dump/pg_dump.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 6d2f3fdef3..58c77a5e2b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -324,7 +324,7 @@ static void binary_upgrade_set_type_oids_by_rel(Archive *fout,
 const TableInfo *tbinfo);
 static void binary_upgrade_set_pg_class_oids(Archive *fout,
 			 PQExpBuffer upgrade_buffer,
-			 Oid pg_class_oid, bool is_index);
+			 Oid pg_class_oid);
 static void binary_upgrade_extension_member(PQExpBuffer upgrade_buffer,
 			const DumpableObject *dobj,
 			const char *objtype,
@@ -5394,8 +5394,7 @@ binary_upgrade_set_type_oids_by_rel(Archive *fout,
 
 static void
 binary_upgrade_set_pg_class_oids(Archive *fout,
- PQExpBuffer upgrade_buffer, Oid pg_class_oid,
- bool is_index)
+ PQExpBuffer upgrade_buffer, Oid pg_class_oid)
 {
 	PQExpBuffer upgrade_query = createPQExpBuffer();
 	PGresult   *upgrade_res;
@@ -5444,7 +5443,8 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
 	appendPQExpBufferStr(upgrade_buffer,
 		 "\n-- For binary upgrade, must preserve pg_class oids and relfilenodes\n");
 
-	if (!is_index)
+	if (relkind != RELKIND_INDEX &&
+		relkind != RELKIND_PARTITIONED_INDEX)
 	{
 		appendPQExpBuffer(upgrade_buffer,
 		  "SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('%u'::pg_catalog.oid);\n",
@@ -11864,7 +11864,7 @@ dumpCompositeType(Archive *fout, const TypeInfo *tyinfo)
 		binary_upgrade_set_type_oids_by_type_oid(fout, q,
  tyinfo->dobj.catId.oid,
  false, false);
-		binary_upgrade_set_pg_class_oids(fout, q, tyinfo->typrelid, false);
+		binary_upgrade_set_pg_class_oids(fout, q, tyinfo->typrelid);
 	}
 
 	qtypname = pg_strdup(fmtId(tyinfo->dobj.name));
@@ -15998,7 +15998,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
 
 		if (dopt->binary_upgrade)
 			binary_upgrade_set_pg_class_oids(fout, q,
-			 tbinfo->dobj.catId.oid, false);
+			 tbinfo->dobj.catId.oid);
 
 		appendPQExpBuffer(q, "CREATE VIEW %s", qualrelname);
 
@@ -16100,7 +16100,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
 
 		if (dopt->binary_upgrade)
 			binary_upgrade_set_pg_class_oids(fout, q,
-			 tbinfo->dobj.catId.oid, false);
+			 tbinfo->dobj.catId.oid);
 
 		appendPQExpBuffer(q, "CREATE %s%s %s",
 		  tbinfo->relpersistence == RELPERSISTENCE_UNLOGGED ?
@@ -16990,7 +16990,7 @@ dumpIndex(Archive *fout, const IndxInfo *indxinfo)
 
 		if (dopt->binary_upgrade)
 			binary_upgrade_set_pg_class_oids(fout, q,
-			 indxinfo->dobj.catId.oid, true);
+			 indxinfo->dobj.catId.oid);
 
 		/* Plain secondary index */
 		appendPQExpBuffer(q, "%s;\n", indxinfo->indexdef);
@@ -17244,7 +17244,7 @@ dumpConstraint(Archive *fout, const ConstraintInfo *coninfo)
 
 		if (dopt->binary_upgrade)
 			binary_upgrade_set_pg_class_oids(fout, q,
-			 indxinfo->dobj.catId.oid, true);
+			 indxinfo->dobj.catId.oid);
 
 		appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s\n", foreign,
 		  fmtQualifiedDumpable(tbinfo));
@@ -17653,7 +17653,7 @@ dumpSequence(Archive *fout, const TableInfo *tbinfo)
 	if (dopt->binary_upgrade)
 	{
 		binary_upgrade_set_pg_class_oids(fout, query,
-		 tbinfo->dobj.catId.oid, false);
+		 tbinfo->dobj.catId.oid);
 
 		/*
 		 * In older PG versions a sequence will have a pg_type entry, but v14
-- 
2.25.1

>From 1580a7b9896b727925cab364ae9fdc7107d791d4 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 17 Apr 2024 22:55:27 -0500
Subject: [PATCH v2 2/2] Improve performance of pg_dump --binary-upgrade.

---
 src/bin/pg_dump/pg_dump.c| 117 +++
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 72 insertions(+), 46 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 58c77a5e2b..062638ad34 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,6 +55,7 @@
 #include "catalog/pg_trigger_d.h"
 #include "catalog/pg_type_d.h"
 #include

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-18 Thread Daniel Gustafsson

> On 18 Apr 2024, at 12:53, Peter Eisentraut  wrote:

> Review of the latest batch:

Thanks for reviewing!

> 8 v9-0002-Remove-support-for-OpenSSL-1.0.2.patch
> 
> Ok, but maybe make the punctuation consistent here:

Fixed.

> * v9-0004-Support-SSL_R_VERSION_TOO_LOW-on-LibreSSL.patch
> 
> Seems ok, but the reason isn't clear to me.  Are there LibreSSL versions that 
> have SSL_R_VERSION_TOO_LOW but not SSL_R_VERSION_TOO_HIGH?  Maybe this could 
> be explained better.

LibreSSL doesn't support SSL_R_VERSION_TOO_HIGH at all, they only support
_TOO_LOW starting with the OpenBSD 7.2 release.  I've expanded the commit
message to document this.

> Also, "OpenSSL 7.2" in the commit message probably meant "OpenBSD"?

Ah yes, fixed.

> * v9-0005-Remove-pg_strong_random-initialization.patch
> 
> I don't understand the reason for this phrase in the commit message: "1.1.1 
> is being increasingly phased out from production use".  Did you mean 1.1.0 
> there?

Correct, I got lost among the version numbers it seems. Fixed.

> Conditionally sticking the RAND_poll() into pg_strong_random(), does that 
> have the effect we want?  It wouldn't reinitialize after a fork, AFAICT.

No I think you're right, my previous version would have worked (but was ugly)
but this doesn't guarantee that.  Thinking more about it maybe it's best to
just keep the init function and have a version check for 1.1.0 there, making it
an empty no-op for all other cases.  When we move past 1.1.0 due to a new API
requirement we can blow it all away.

> If everything is addressed, I agree that 0001, 0003, and 0004 can go into 
> PG17, the rest later.

Agreed, 0002 and 0005 are clearly for the v18 cycle.

--
Daniel Gustafsson

v10-0001-Doc-Use-past-tense-for-things-which-happened-in-.patch
Description: Binary data

v10-0002-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data

v10-0003-Support-disallowing-SSL-renegotiation-in-LibreSS.patch
Description: Binary data

v10-0004-Support-SSL_R_VERSION_TOO_LOW-on-LibreSSL.patch
Description: Binary data

v10-0005-Remove-pg_strong_random-initialization.patch
Description: Binary data

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

On Thu, Apr 18, 2024 at 3:34 PM Jelte Fennema-Nio  wrote:
> I really don't understand what exactly you're worried about. What do
> you expect will break when bumping the protocol version? As Dave said,
> clients should never bail out due to protocol version differences.

Sure, and I should never forget to take out the trash or mow the lawn.

> So, I don't understand why you seem to view bumping the protocol
> version with so much negativity. We're also bumping PG versions every
> year. Afaik people only like that, partially because it's immediately
> clear that certain features (e.g. MERGE) are not supported when
> connecting to older servers. To me the same is true for bumping the
> protocol version. There are no downsides to bumping it, only upsides.

I see it the exact opposite way around.

If we go with Peter's approach, every driver that supports his feature
will work perfectly, and every driver that doesn't will work exactly
as it does today. The risk of breaking anything is as near to zero as
human developers can reasonably hope to achieve. Nobody who doesn't
care about the feature will have to lift a single finger, today,
tomorrow, or ever. That's absolutely brilliant.

If we instead go with your approach, then anyone who wants to support
3.2 when it materializes will have to also support 3.1, which means
they have to support this feature. That's not a terrible burden, but
it's not a necessary one either. Also, even just 3.1 is going to break
something for somebody. There's just no way that we've left the
protocol version unchanged for this long and the first change we make
doesn't cause some collateral damage.

Sure, those are minor downsides in the grand scheme of things. But
AFAICS the only downside of Peter's approach that you've alleged is
that doesn't involve bumping the version number. Of course, if bumping
the version number is an intrinsic good, then no further justification
is required, but I don't buy that. I do not believe that users or
maintainers will throw us a pizza party when they find out that we've
changed the version number. There's no reason for anyone to be happy
about that for its own sake.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

On Thu, Dec 14, 2017 at 08:51:06AM +0700, Ali Akbar wrote:
> Patch for adding check in pg_upgrade.

[...]

On Fri, Sep 29, 2023 at 11:36:42AM -0500, Justin Pryzby wrote:
> On Fri, Sep 29, 2023 at 09:16:35AM +0900, Michael Paquier wrote:
> > You mean when upgrading from an instance of 9.6 or older as c30f177 is
> > not there, right?
> 
> No - while upgrading from v15 to v16.  I'm not clear on how we upgraded
> *to* v15 without hitting the issue, nor how the "not null" got
> dropped...
> 
> > Anyway, it seems like the patch from [1] has no need to run this check
> > when the old cluster's version is 10 or newer.  And perhaps it should
> > mention that this check could be removed from pg_upgrade once v10
> > support is out of scope, in the shape of a comment.
> 
> You're still thinking of PRIMARY KEY as the only way to hit this, right?
> But Ali Akbar already pointed out how to reproduce the problem with DROP
> NOT NULL - which still applies to both v16 and v17.

Rebased and amended the forgotten patch.
See also ZiE3NoY6DdvlvFl9@pryzbyj2023 et seq.
>From f804bec39acac0b53e29563be9ef7a10237ba717 Mon Sep 17 00:00:00 2001
From: Ali Akbar 
Date: Thu, 14 Dec 2017 19:18:59 +0700
Subject: [PATCH] pg_upgrade: check for inconsistent inherited not null columns

Otherwise, the upgrade can fail halfway through with error:
CREATE TABLE ip(id int PRIMARY KEY); CREATE TABLE ic(id int) INHERITS (ip); ALTER TABLE ic ALTER id DROP NOT NULL;
ERROR: column "a" in child table must be marked NOT NULL

JTP: updated to handle contype='n' - otherwise, the cnn_grandchild2
added at b0e96f311 triggers the issue this patch tries to avoid ...

Should be backpatched to all versions.
---
 src/bin/pg_upgrade/check.c | 101 +
 1 file changed, 101 insertions(+)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index ad1f8e3bcb1..79671a9cdb4 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -31,6 +31,7 @@ static void check_new_cluster_logical_replication_slots(void);
 static void check_new_cluster_subscription_configuration(void);
 static void check_old_cluster_for_valid_slots(bool live_check);
 static void check_old_cluster_subscription_state(void);
+static void check_for_not_null_inheritance(ClusterInfo *cluster);
 
 /*
  * DataTypesUsageChecks - definitions of data type checks for the old cluster
@@ -646,6 +647,8 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100)
 		check_for_tables_with_oids(_cluster);
 
+	check_for_not_null_inheritance(_cluster);
+
 	/*
 	 * Pre-PG 10 allowed tables with 'unknown' type columns and non WAL logged
 	 * hash indexes
@@ -1832,6 +1835,104 @@ check_new_cluster_subscription_configuration(void)
 	check_ok();
 }
 
+/*
+ * check_for_not_null_inheritance()
+ *
+ *  Currently pg_upgrade will fail if there are any inconsistencies in NOT NULL
+ *  constraint inheritance: In PostgreSQL, we can have a column that is NOT NULL
+ *  in parent, but nullable in its children. So check for inconsistent NOT NULL
+ *  constraints in inheritance tree.
+ */
+static void
+check_for_not_null_inheritance(ClusterInfo *cluster)
+{
+	int			dbnum;
+	FILE	   *script = NULL;
+	char		output_path[MAXPGPATH];
+
+	prep_status("Checking for NOT NULL inconsistencies in inheritance");
+
+	snprintf(output_path, sizeof(output_path), "not_null_inconsistent_columns.txt");
+
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		PGresult   *res;
+		bool		db_used = false;
+		int			ntups;
+		int			rowno;
+		int			i_nspname,
+	i_relname,
+	i_attname;
+		DbInfo	   *active_db = >dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+
+		res = executeQueryOrDie(conn,
+"WITH RECURSIVE parents AS ( "
+"	SELECT	i.inhrelid, i.inhparent "
+"	FROM	pg_catalog.pg_inherits i "
+"	UNION ALL "
+"	SELECT	p.inhrelid, i.inhparent "
+"   FROM	parents p "
+"	JOIN	pg_catalog.pg_inherits i "
+"		ON	i.inhrelid = p.inhparent "
+") "
+"SELECT	n.nspname, c.relname, ac.attname  "
+"FROM	parents p, "
+"		pg_catalog.pg_attribute ac, "
+"		pg_catalog.pg_attribute ap, "
+"		pg_catalog.pg_constraint cc, "
+"		pg_catalog.pg_class c, "
+"		pg_catalog.pg_namespace n "
+"WHERE	NOT ac.attnotnull AND "
+"		ac.attrelid = p.inhrelid AND "
+"		ap.attrelid = p.inhparent AND "
+"		ac.attname = ap.attname AND "
+"		ac.attrelid = cc.conrelid AND ac.attnum = ANY(cc.conkey) AND contype='n' AND NOT connoinherit AND"
+"		ap.attnotnull AND "
+"		c.oid = ac.attrelid AND "
+"		c.relnamespace = n.oid");
+
+		ntups = PQntuples(res);
+		i_nspname = PQfnumber(res, "nspname");
+		i_relname = PQfnumber(res, "relname");
+		i_attname = PQfnumber(res, "attname");
+		for (rowno = 0; rowno < ntups; rowno++)
+		{
+			if (script == NULL && (script =

Re: cataloguing NOT NULL constraints

2024-04-18 Thread Alexander Lakhin


Hello Alvaro,

18.04.2024 16:39, Alvaro Herrera wrote:

I have pushed a fix which should hopefully fix this problem
(d9f686a72e).  Please give this a look.  Thanks for reporting the issue.


Please look at an assertion failure, introduced with d9f686a72:
CREATE TABLE t(a int, NOT NULL a NO INHERIT);
CREATE TABLE t2() INHERITS (t);

ALTER TABLE t ADD CONSTRAINT nna NOT NULL a;
TRAP: failed Assert("lockmode != NoLock || IsBootstrapProcessingMode() || CheckRelationLockedByMe(r, AccessShareLock, 
true)"), File: "relation.c", Line: 67, PID: 2980258


On d9f686a72~1 this script results in:
ERROR:  cannot change NO INHERIT status of inherited NOT NULL constraint "t_a_not_null" 
on relation "t"

Best regards,
Alexander

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

On Thu, Apr 18, 2024 at 6:35 AM Alexander Korotkov  wrote:
> The revised patchset is attached.
> 1) I've split the fix for the CommandCounterIncrement() issue and the
> fix for relation persistence issue into a separate patch.
> 2) I've validated that the lock on the new partition is held in
> createPartitionTable() after ProcessUtility() as pointed out by
> Robert.  So, no need to place the lock again.
> 3) Added fix for problematic error message as a separate patch [1].
> 4) Added rename "salemans" => "salesmen" for tests as a separate patch.
>
> I think these fixes are reaching committable shape, but I'd like
> someone to check it before I push.

Reviewing 0001:

- Seems mostly fine. I think the comment /* Unlock and drop merged
partitions. */ is wrong. I think it should say something like /* Drop
the current partitions before adding the new one. */ because (a) it
doesn't unlock anything, and there's another comment saying that and
(b) we now know that the drop vs. add order matters.

Reviewing 0002:

- Commit message typos: behavious, corresponsing

- Given the change to the header comment of createPartitionTable, it's
rather surprising to me that this patch doesn't touch the
documentation. Isn't that a big change in semantics?

- My previous review comment was really about the code comment, I
believe, rather than the use of AccessExclusiveLock. NoLock is
probably fine, but if it were me I'd be tempted to write
AccessExclusiveLock and just make the comment say something like /* We
should already have the lock, but do it this way just to be certain
*/. But what you have is probably fine, too. Mostly, I want to clarify
the intent of my previous comment.

- Do we, or can we, have a test that if you split a partition that's
not in the search path, the resulting partitions end up in your
creation namespace? And similarly for merge? And maybe also that
schema-qualification works properly?

I haven't exhaustively verified the patch, but these are some things I
noticed when scrolling through it.

Reviewing 0003:

- Are you sure this can't dereference datum when datum is NULL, in
either the upper or lower half? It sure looks strange to have code
that looks like it can make datum a null pointer, and then an
unconditional deference just after.

- In general I think the wording changes are improvements. I'm
slightly suspicious that there might be an even better way to word it,
but I can't think of it right at this very moment.

- I'm kind of unhappy (but not totally unhappy) with the semantics.
Suppose I have a partition that allows values from 0 to 1000, but
actually only contains values that are either between 0 and 99 or
between 901 and 1000. If I try to to split the partition into one that
allows 0..100 and a second that allows 900..1000, it will fail. Maybe
that's good, because that means that if a failure is going to happen,
it will happen right at the beginning, rather than maybe after doing a
lot of work. But on the other hand, it also kind of stinks, because it
feels like I'm being told I can't do something that I know is
perfectly fine.

Reviewing 0004:

- Obviously this is quite trivial and there's no real problem with it,
but if we're changing it anyway, how about a gender-neutral term
(salesperson/salespeople)?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Popcount optimization using AVX512

On Thu, Apr 18, 2024 at 06:12:22PM +, Shankaran, Akash wrote:
> Good find. I confirmed after speaking with an intel expert, and from the 
> intel AVX-512 manual [0] section 14.3, which recommends to check bit27. From 
> the manual:
> 
> "Prior to using Intel AVX, the application must identify that the operating 
> system supports the XGETBV instruction,
> the YMM register state, in addition to processor's support for YMM state 
> management using XSAVE/XRSTOR and
> AVX instructions. The following simplified sequence accomplishes both and is 
> strongly recommended.
> 1) Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application 
> use1).
> 2) Issue XGETBV and verify that XCR0[2:1] = '11b' (XMM state and YMM state 
> are enabled by OS).
> 3) detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported).
> (Step 3 can be done in any order relative to 1 and 2.)"

Thanks for confirming.  IIUC my patch should be sufficient, then.

> It also seems that step 1 and step 2 need to be done prior to the CPUID 
> OSXSAVE check in the popcount code.

This seems to contradict the note about doing step 3 at any point, and
given step 1 is the OSXSAVE check, I'm not following what this means,
anyway.

I'm also wondering if we need to check that (_xgetbv(0) & 0xe6) == 0xe6
instead of just (_xgetbv(0) & 0xe0) != 0, as the status of the lower half
of some of the ZMM registers is stored in the SSE and AVX state [0].  I
don't know how likely it is that 0xe0 would succeed but 0xe6 wouldn't, but
we might as well make it correct.

[0] https://en.wikipedia.org/wiki/Control_register#cite_ref-23

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

On Mon, 15 Apr 2024 at 21:47, Dave Cramer  wrote:
>> On Mon, 15 Apr 2024 at 19:52, Robert Haas  wrote:
>> > surely it can't be right to use protocol
>> > version 3.0 to refer to a bunch of different things. But at the same
>> > time, surely we don't want clients to start panicking and bailing out
>> > when everything would have been fine.
>>
>> I feel like the ProtocolVersionNegotiation should make sure people
>> don't panic and bail out. And if not, then feature flags won't help
>> with this either. Because clients would then still bail out if some
>> feature is not supported.
>
> I don't think a client should ever bail out. They may not support something 
> but IMO bailing out is not an option.

On Thu, 18 Apr 2024 at 21:01, Robert Haas  wrote:
> On Thu, Apr 18, 2024 at 1:49 PM Jelte Fennema-Nio  wrote:
> > IMHO that means that we should also bump the protocol version for this
> > change, because it's changing the wire protocol by adding a new
> > parameter format code. And it does so in a way that does not depend on
> > the new protocol extension.
>
> I think we're more or less covering the same ground we did on the
> other thread here -- in theory I don't love the fact that we never
> bump the protocol version when we change stuff, but in practice if we
> start bumping it every time we do anything I think it's going to just
> break a bunch of stuff without any real benefit.

(the second quoted message comes from Peter his column encryption
thread, but responding here to keep this discussion in one place)

I really don't understand what exactly you're worried about. What do
you expect will break when bumping the protocol version? As Dave said,
clients should never bail out due to protocol version differences.

When the server supports a lower version than the client, the client
should disable certain features because it gets the
ProtocolVersionNegotiation message. This is also true if we don't bump
the version. Negotiating a lower version actually makes it clearer for
the client what features to disable. Using the reported postgres
version for this might not, because a connection pooler in the middle
might not support the features that the client wants and thus throw an
error (e.g. due to the client sending unknown messages) even if the
backing Postgres server would support these features. Not to mention
non-postgresql servers that implement the PostgreSQL protocol (of
which there are more and more).

When the server supports a higher version, the client never even
notices this because the server will silently accept and only enable
the features of the lower version. So this could never cause breakage,
as from the client's perspective the server didn't bump their protocol
version.

So, I don't understand why you seem to view bumping the protocol
version with so much negativity. We're also bumping PG versions every
year. Afaik people only like that, partially because it's immediately
clear that certain features (e.g. MERGE) are not supported when
connecting to older servers. To me the same is true for bumping the
protocol version. There are no downsides to bumping it, only upsides.

Re: Add notes to pg_combinebackup docs

On Thu, Apr 18, 2024 at 3:25 PM Daniel Gustafsson  wrote:
> I don't think this is the way to go, such an option will be plastered on to
> helpful tutorials which users will copy/paste from even when they don't need
> the option at all, only to disable checksums when there was no reason for 
> doing
> so.

That's actually a really good point. I mean, it's such an obscure
scenario, maybe it wouldn't make it into the tutorials, but if it
does, it'll never get taken out.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add notes to pg_combinebackup docs

2024-04-18 Thread Daniel Gustafsson

> On 18 Apr 2024, at 20:11, Robert Haas  wrote:

> 2. As (1), but make check_control_files() emit a warning message when
> the problem case is detected.

Being in the post-freeze part of the cycle, this seems like the best option.

> 3. As (2), but also add a command-line option to pg_combinebackup to
> flip the checksum flag to false in the control file.

I don't think this is the way to go, such an option will be plastered on to
helpful tutorials which users will copy/paste from even when they don't need
the option at all, only to disable checksums when there was no reason for doing
so.

> We could perhaps consider doing (2) for now and (5) for a future
> release, or something like that.

+1

--
Daniel Gustafsson

Re: [18] clarify the difference between pg_wchar, wchar_t, and Unicode code points

2024-04-18 Thread Peter Eisentraut

On 16.04.24 01:40, Jeff Davis wrote:

I'm not sure I understand all of the history behind pg_wchar, but it
seems to be some blend of:

(a) Postgres's own internal representation of a decoded character
(b) libc's wchar_t
(c) Unicode code point

For example, Postgres has its own encoding/decoding routines, so (a) is
the most obvious definition.

(a) is the correct definition, I think. The other ones are just
occasional conveniences, and occasionally wrong.

When using ICU, we also pass a pg_wchar directly to ICU routines, which
depends on definition (c), and can lead to problems like:

https://www.postgresql.org/message-id/e7b67d24288f811aebada7c33f9ae629dde0def5.ca...@j-davis.com

That's just a plain bug, I think. It's missing the encoding check that
for example pg_strncoll_icu() does.

The comment at the top of pg_regc_locale.c explains some of the above,
but not all. I'd like to organize this a bit better:

* a new typedef for a Unicode code point ("codepoint"? "uchar"?)
* a no-op conversion routine from pg_wchar to a codepoint that would
assert that the server encoding is UTF-8 (#ifndef FRONTEND, of course)
* a no-op conversion routine from pg_wchar to wchar_t that would be a
good place for a comment describing that it's a "best effort" and may
not be correct in all cases

I guess sometimes you really want to just store an array of Unicode code
points. But I'm not sure how this would actually address coding
mistakes like the one above. You still need to check the server
encoding and do encoding conversion when necessary.

Re: pgsql: Fix restore of not-null constraints with inheritance

2024-04-18 Thread Andrew Dunstan



On 2024-04-18 Th 11:39, Alvaro Herrera wrote:

On 2024-Apr-18, Alvaro Herrera wrote:


On 2024-Apr-18, Alvaro Herrera wrote:


Lastly, make two changes to pg_dump: 1) do not try to drop a not-null
constraint that's marked as inherited; this allows a dump to restore
with no errors if a table with a PK inherits from another which also has
a PK; 2) avoid giving inherited constraints throwaway names, for the
rare cases where such a constraint survives after the restore.

Hmm, this last bit broke pg_upgrade on crake.  I'll revert this part,
meanwhile I'll be installing 9.2 to see if it can be fixed in a better way.

Eh, so:

1. running "make check" on pg_upgrade using an oldinstall pointing to
9.2 fails, because PostgreSQL::Test::Cluster doesn't support that
version -- it only goes back to 9.2.  How difficult was it to port it
back to all these old versions?



It's not that hard to make it go back to 9.2. Here's a version that's a 
couple of years old, but it supports versions all the way back to 7.2 :-)


If there's interest I'll work on supporting our official "old" versions 
(i.e. 9.2 and up)





2. running "make check" with an oldinstall pointing to 10 fails, because
the "invalid database" checks fail:

not ok 7 - invalid database causes failure status (got 0 vs expected 1)

#   Failed test 'invalid database causes failure status (got 0 vs expected 1)'
#   at t/002_pg_upgrade.pl line 407.
not ok 8 - invalid database causes failure stdout /(?^:invalid)/







3. Lastly, even if I put back the code that causes the failures on crake
and restore from 10 (and ignore those two problems), I cannot reproduce
the issues it reported.  Is crake running some funky code that's not
what "make check" on pg_upgrade does, perchance?



It's running the buildfarm cross version upgrade module. See 



It's choking on the change in constraint names between the dump of the 
pre-upgrade database and the dump of the post-upgrade database, e.g.



 CREATE TABLE public.rule_and_refint_t2 (
-id2a integer CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT,
-id2c integer CONSTRAINT pgdump_throwaway_notnull_1 NOT NULL NO INHERIT
+id2a integer CONSTRAINT rule_and_refint_t2_id2a_not_null NOT NULL NO 
INHERIT,
+id2c integer CONSTRAINT rule_and_refint_t2_id2c_not_null NOT NULL NO 
INHERIT
 );


look at the dumpdiff-REL9_2_STABLE file for the full list.

I assume that means pg_dump is generating names that pg_upgrade throws 
away? That seems ... unfortunate.


There is a perl module at 
src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm. This is used to adjust 
the dump files before we diff them. Maybe you can remedy the problem by 
adding some code in there.




cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Cluster.pm
Description: Perl program

Re: Transparent column encryption

On Thu, Apr 18, 2024 at 1:49 PM Jelte Fennema-Nio  wrote:
> I think this is an interesting idea. I can indeed see use cases for
> e.g. inserting a new row based on another row (where the secret is the
> same).
>
> IMHO that means that we should also bump the protocol version for this
> change, because it's changing the wire protocol by adding a new
> parameter format code. And it does so in a way that does not depend on
> the new protocol extension.

I think we're more or less covering the same ground we did on the
other thread here -- in theory I don't love the fact that we never
bump the protocol version when we change stuff, but in practice if we
start bumping it every time we do anything I think it's going to just
break a bunch of stuff without any real benefit.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Can't find not null constraint, but \d+ shows that

On 2024-Apr-13, jian he wrote:

> I wonder is there any incompatibility issue, or do we need to say something
> about the new behavior when dropping a key column?

Umm, yeah, maybe we should document it in ALTER TABLE DROP PRIMARY KEY
and in the release notes to note the different behavior.

> only minor cosmetic issue:
> + if (unconstrained_cols)
> i would like change it to
> + if (unconstrained_cols != NIL)
> 
> + foreach(lc, unconstrained_cols)
> we can change to
> + foreach_int(attnum, unconstrained_cols)
> per commit
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=14dd0f27d7cd56ffae9ecdbe324965073d01a9ff

Ah, yeah.  I did that, rewrote some comments and refined the tests a
little bit to ensure the pg_upgrade behavior is sane.  I intend to get
this pushed tomorrow, if nothing ugly comes up.

CI run: https://cirrus-ci.com/build/5471117953990656

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La gente vulgar sólo piensa en pasar el tiempo;
el que tiene talento, en aprovecharlo"
>From 28393502b3d6fcf430264c10f931e948bedc Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 11 Apr 2024 12:29:34 +0200
Subject: [PATCH v2] Better handle indirect constraint drops

It is possible for certain cases to remove not-null constraints without
maintaining the attnotnull in its correct state; for example if you drop
a column that's part of the primary key, and the other columns of the PK don't
have not-null constraints, then we should reset the attnotnull flags for
those other columns; up to this commit, we didn't.  Handle those cases
better by doing the attnotnull reset in RemoveConstraintById() instead
of in dropconstraint_internal().

However, there are some cases where we must not do so.  For example if
those other columns are in replica identity indexes or are generated
identity columns, we must keep attnotnull set, even though it results in
the catalog inconsistency that no not-null constraint supports that.

Because the attnotnull reset now happens in more places than before, for
instance when a column of the primary key changes type, we need an
additional trick to reinstate it as necessary.  Introduce a new
alter-table pass that does this, which needs simply reschedule some
AT_SetAttNotNull subcommands that were already being generated and
ignored.

Because of the exceptions in which attnotnull is not reset noted above,
we also include a pg_dump hack to include a not-null constraint when the
attnotnull flag is set even if no pg_constraint row exists.  This part
is undesirable but necessary, because failing to handle the case can
result in unrestorable dumps.

Reported-by: Tender Wang 
Co-authored-by: Tender Wang 
Reviewed-by: jian he 
Discussion: https://postgr.es/m/CAHewXN=hMbNa3d43NOR=ocgdgptt18s-1fmuecoegesyek4...@mail.gmail.com
---
 src/backend/catalog/pg_constraint.c   | 116 ++-
 src/backend/commands/tablecmds.c  | 135 +++---
 src/bin/pg_dump/pg_dump.c |  30 +++--
 src/test/regress/expected/constraints.out |  78 +
 src/test/regress/sql/constraints.sql  |  39 +++
 5 files changed, 322 insertions(+), 76 deletions(-)

diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 778b7c381d..56c083fe21 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -19,6 +19,7 @@
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
+#include "access/xact.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -933,6 +934,8 @@ RemoveConstraintById(Oid conId)
 	Relation	conDesc;
 	HeapTuple	tup;
 	Form_pg_constraint con;
+	bool		dropping_pk = false;
+	List	   *unconstrained_cols = NIL;

 	conDesc = table_open(ConstraintRelationId, RowExclusiveLock);

@@ -957,7 +960,9 @@ RemoveConstraintById(Oid conId)
 		/*
 		 * We need to update the relchecks count if it is a check constraint
 		 * being dropped.  This update will force backends to rebuild relcache
-		 * entries when we commit.
+		 * entries when we commit.  For not-null and primary key constraints,
+		 * obtain the list of columns affected, so that we can reset their
+		 * attnotnull flags below.
 		 */
 		if (con->contype == CONSTRAINT_CHECK)
 		{
@@ -984,6 +989,36 @@ RemoveConstraintById(Oid conId)

 			table_close(pgrel, RowExclusiveLock);
 		}
+		else if (con->contype == CONSTRAINT_NOTNULL)
+		{
+			unconstrained_cols = list_make1_int(extractNotNullColumn(tup));
+		}
+		else if (con->contype == CONSTRAINT_PRIMARY)
+		{
+			Datum		adatum;
+			ArrayType  *arr;
+			int			numkeys;
+			bool		isNull;
+			int16	   *attnums;
+
+			dropping_pk = true;
+
+			adatum = heap_getattr(tup, Anum_pg_constraint_conkey,
+  RelationGetDescr(conDesc), );
+			if (isNull)
+elog(ERROR, "null conkey for constraint %u", con->oid);
+			arr = DatumGetArrayTypeP(adatum);	/* ensure not toasted */
+

Re: fix tablespace handling in pg_combinebackup

On Thu, Apr 18, 2024 at 1:45 PM Andres Freund  wrote:
> I was really just remarking on this from the angle of a test writer. I know
> that our interfaces around this suck...
>
> For tests, do we really need to set anything on a per-tablespace basis? Can't
> we instead just reparent all of them to a new directory?

I don't know what you mean by this.

> > I wonder if we (as a project) would consider a patch that redesigned
> > this whole mechanism. Produce ${TABLESPACE_NAME}.tar in tar-format,
> > instead of ${OID}.tar. In directory-format, relocate via
> > -T${TABLESPACE_NAME}=${DIR} instead of -T${SERVERDIR}=${DIR}. That
> > would be a significant compatibility break, and you'd somehow need to
> > solve the problem of what to put in the tablespace_map file, which
> > requires OIDs. But it seems like if you could finesse that issue in
> > some elegant way, the result would just be a heck of a lot more usable
> > than what we have today.
>
> For some things that'd definitely be nicer - but not sure it work well for
> everything. Consider cases where you actually have external directories on
> different disks, and you want to restore a backup after some data loss. Now
> you need to list all the tablespaces separately, to put them back into their
> own location.

I mean, don't you need to do that anyway, just in a more awkward way?
I don't understand who ever wants to keep track of their tablespaces
by either (a) source pathname or (b) OID rather than (c) user-visible
tablespace name.

> One thing I've been wondering about is an option to put the "new" tablespaces
> into a location relative to each of the old ones.
>   --tablespace-relative-location=../restore-2024-04-18
> which would rewrite all the old tablespaces to that new location.

I think this would probably get limited use outside of testing scenarios.

> > I said (at least off-list) when that feature was introduced that there was
> > no way it was going to remain an isolated development hack, and I think
> > that's proved to be 100% correct.
>
> Hm, I guess I kinda agree. But not because I think it wasn't good for
> development, but because it'd often be much saner to use relative tablespaces
> than absolute ones even for prod.
>
> My only point here was that the test would be simpler if you
> a) didn't need to create a temp directory for the tablespace, both for
>primary and standby
> b) didn't need to "gin up" a tablespace map, because all the paths are
>relative
>
> Just to be clear: I don't want the above to block merging your test. If you
> think you want to do it the way you did, please do.

I think I will go ahead and do that soon, then. I'm totally fine with
it if somebody wants to do the testing in some other way, but this was
what made sense to me as the easiest way to adapt what's already
there. I think it's lame that init_from_backup() disclaims support for
tablespaces, as if tablespaces weren't a case that needs to be tested
heavily with respect to backups. And I think it's better to get
something merged that fixes the bug ASAP, and adds some test coverage,
and then if there's a better way to do that test coverage down the
road, well and good.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Idea Feedback: psql \h misses -> Offers Links?

2024-04-18 Thread Peter Eisentraut


On 17.04.24 19:47, Kirk Wolak wrote:

*Example:*
\h current_setting
No help available for "current_setting".
Try \h with no arguments to see available help.

https://www.postgresql.org/search/?u=%2Fdocs%2F16%2F=current_setting 



One problem is that this search URL does not actually produce any useful 
information about current_setting.

Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [ NOT Fixed ]

2024-04-18 Thread Kirk Wolak

On Thu, Feb 22, 2024 at 4:49 PM Michael Banck  wrote:

> Hi,
>
> On Wed, Jan 24, 2024 at 02:50:52PM -0500, Kirk Wolak wrote:
> > On Mon, Jan 22, 2024 at 1:30 AM Kirk Wolak  wrote:
> > > On Fri, Jan 19, 2024 at 7:03 PM Daniel Gustafsson 
> wrote:
> > >> > On 19 Jan 2024, at 23:09, Kirk Wolak  wrote:
> > > Thank you, that made it possible to build and run...
> > > UNFORTUNATELY this has a CLEAR memory leak (visible in htop)
> > > I am watching it already consuming 6% of my system memory.
> ...
> I had a look at this (and blogged about it here[1]) and was also
> wondering what was going on with 17devel and the recent back-branch
> releases, cause I could also reproduce those continuing memory leaks.
>
> Adding some debug logging when llvm_inline_reset_caches() is called
> solves the mystery: as you are calling a function, the fix that is in
> 17devel and the back-branch releases is not applicable and only after
> the function returns llvm_inline_reset_caches() is being called (as
> llvm_jit_context_in_use_count is greater than zero, presumably, so it
> never reaches the call-site of llvm_inline_reset_caches()).
>
> If you instead run your SQL in a DO-loop (as in the blog post) and not
> as a PL/PgSQL function, you should see that it no longer leaks. This
> might be obvious to some (and Andres mentioned it in
>
> https://www.postgresql.org/message-id/20210421002056.gjd6rpe6toumiqd6%40alap3.anarazel.de
> )
> but it took me a while to figure out/find.
>
> Thanks for confirming.  Inside a routine is required.  But disabling JIT
was good enough for us.

Curious if there was another way to end up calling the cleanup?  But it had
that "feel" that the context of the cleanup was
being lost between iterations of the loop?

RE: Popcount optimization using AVX512

2024-04-18 Thread Shankaran, Akash

> It was brought to my attention [0] that we probably should be checking for 
> the OSXSAVE bit instead of the XSAVE bit when determining whether there's 
> support for the XGETBV instruction.  IIUC that should indicate that both the 
> OS and the processor have XGETBV support (not just the processor).
> I've attached a one-line patch to fix this.

> [0] https://github.com/pgvector/pgvector/pull/519#issuecomment-2062804463

Good find. I confirmed after speaking with an intel expert, and from the intel 
AVX-512 manual [0] section 14.3, which recommends to check bit27. From the 
manual:

"Prior to using Intel AVX, the application must identify that the operating 
system supports the XGETBV instruction,
the YMM register state, in addition to processor's support for YMM state 
management using XSAVE/XRSTOR and
AVX instructions. The following simplified sequence accomplishes both and is 
strongly recommended.
1) Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application use1).
2) Issue XGETBV and verify that XCR0[2:1] = '11b' (XMM state and YMM state are 
enabled by OS).
3) detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported).
(Step 3 can be done in any order relative to 1 and 2.)"

It also seems that step 1 and step 2 need to be done prior to the CPUID OSXSAVE 
check in the popcount code.

[0]: https://cdrdv2.intel.com/v1/dl/getContent/671200

- Akash Shankaran

Re: Add notes to pg_combinebackup docs

On Tue, Apr 9, 2024 at 5:46 AM Tomas Vondra
 wrote:
> What we could do is detect this in pg_combinebackup, and either just
> disable checksums with a warning and hint to maybe enable them again. Or
> maybe just print that the user needs to disable them.
>
> I was thinking maybe we could detect this while taking the backups, and
> force taking a full backup if checksums got enabled since the last
> backup. But we can't do that because we only have the manifest from the
> last backup, and the manifest does not include info about checksums.

So, as I see it, we have at least five options here:

1. Document that this doesn't work. By "this doesn't work" I think
what we mean is: (A) If any of the backups you'd need to combine were
taken with checksums off but the last one was taken with checksums on,
then you might get checksum failures on the cluster written by
pg_combinebackup. (B) Therefore, after enabling checksums, you should
take a new full backup and make future incrementals dependent
thereupon. (C) But if you don't, then you can (I presume) run recovery
with ignore_checksum_failure=true to bring the cluster to a consistent
state, stop the database, shut checksums off, optionally also turn
them on again, and restart the database. Then presumably everything
should be OK, except that you'll have wiped out any real checksum
failures that weren't artifacts of the reconstruction process along
with the fake ones.

2. As (1), but make check_control_files() emit a warning message when
the problem case is detected.

3. As (2), but also add a command-line option to pg_combinebackup to
flip the checksum flag to false in the control file. Then, if you have
the problem case, instead of following the procedure described above,
you can just use this option, and enable checksums afterward if you
want. It still has the same disadvantage as the procedure described
above: any "real" checksum failures will be suppressed, too. From a
design perspective, this feels like kind of an ugly wart to me: hey,
in this one scenario, you have to add --do-something-random or it
doesn't work! But I see it's got some votes already, so maybe it's the
right answer.

4. Add the checksum state to the backup manifest. Then, if someone
tries to take an incremental backup with checksums on and the
precursor backup had checksums off, we could fail. A strength of this
proposal is that it actually stops the problem from happening at
backup time, which in general is a whole lot nicer than not noticing a
problem until restore time. A possible weakness is that it stops you
from doing something that is ... actually sort of OK. I mean, in the
strict sense, the incremental backup isn't valid, because it's going
to cause checksum failures after reconstruction, but it's valid apart
from the checksums, and those are fixable. I wonder whether users who
encounter this error message will say "oh, I'm glad PostgreSQL
prevented me from doing that" or "oh, I'm annoyed that PostgreSQL
prevented me from doing that."

5. At reconstruction time, notice which backups have checksums
enabled. If the final backup in the chain has them enabled, then
whenever we take a block from an earlier backup with checksums
disabled, re-checksum the block. As opposed to any of the previous
options, this creates a fully correct result, so there's no real need
to document any restrictions on what you're allowed to do. We might
need to document the performance consequences, though: fast file copy
methods will have to be disabled, and we'll have to go block by block
while paying the cost of checksum calculation for each one. You might
be sad to find out that your reconstruction is a lot slower than you
were expecting.

While I'm probably willing to implement any of these, I have some
reservations about attempting (4) or especially (5) after feature
freeze. I think there's a pretty decent chance that those fixes will
turn out to have issues of their own which we'll then need to fix in
turn. We could perhaps consider doing (2) for now and (5) for a future
release, or something like that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: AIX support

Hi,

On 2024-04-18 11:15:43 +, Sriram RK wrote:
> We (IBM-AIX team) looked into this issue
>
> https://www.postgresql.org/message-id/20240225194322...@rfd.leadboat.com
>
> This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0)
> have issues. But we verified that this issue is resolved with the newer
> compiler versions openXL(xlc17.1) and gcc(12.0) onwards.

The reason we used these compilers was that those were the only ones we had
kinda somewhat reasonable access to, via the gcc projects'
"compile farm" https://portal.cfarm.net/
We have to rely on whatever the aix machines there provide. They're not
particularly plentiful resource-wise either.

This is generally one of the big issues with AIX support. There are other
niche-y OSs that don't have a lot of users, e.g. openbsd, but in contrast to
AIX I can just start an openbsd VM within a few minutes and iron out whatever
portability issue I'm dealing with.

Not being AIX customers we also can't raise bugs about compiler bugs, so we're
stuck doing bad workarounds.

> Also as part of the support, we will help in fixing all the issues related
> to AIX and continue to support AIX for Postgres. If we need another CI
> environment we can work to make one available. But for time being can we
> start reverting the patch that has removed AIX support.

The state when was removed was not in a state that I am OK with adding back.

> We want to make a note that postgres is used extensively in our IBM product
> and is being exploited by multiple customers.

To be blunt: Then it'd have been nice to see some investment in that before
now. Both on the code level and the infrastructure level (i.e. access to
machines etc).

Greetings,

Andres Freund

Re: plenty code is confused about function level static

2024-04-18 Thread Peter Eisentraut


On 18.04.24 19:11, Andres Freund wrote:

Thoughts about when to apply these? Arguably they're fixing mildly broken
code, making it appropriate to fix in 17, but it's also something that we
could end up fixing for a while...


Yeah, let's keep these for later.  They are not regressions, and there 
is no end in sight yet.  I have some other related stuff queued up, so 
if we're going to start adjusting these kinds of things now, it would 
open a can of worms.

Re: documentation structure

2024-04-18 Thread Corey Huinker

>
> I havent dealt with variadic yet, since the two styles are visually
> different, not just markup (... renders as [...]).
>
> The two styles for variadic are the what I call caller-style:
>
>concat ( val1 "any" [, val2 "any" [, ...] ] )
>format(formatstr text [, formatarg "any" [, ...] ])
>

While this style is obviously clumsier for us to compose, it does avoid
relying on the user understanding what the word variadic means. Searching
through online documentation of the python *args parameter, the word
variadic never comes up, the closest they get is "variable length
argument". I realize that python is not SQL, but I think it's a good point
of reference for what concepts the average reader is likely to know.

Looking at the patch, I think it is good, though I'd consider doing some
indentation for the nested s to allow the author to do more
visual tag-matching. The ']'s were sufficiently visually distinct that we
didn't really need or want nesting, but  is just another tag to
my eyes in a sea of tags.

Re: Transparent column encryption

On Thu, 18 Apr 2024 at 18:46, Robert Haas  wrote:
> With regard to the Bind message, I suggest that we regard the protocol
> change as reserving a currently-unused bit in the message to indicate
> whether the value is pre-encrypted, without reference to the protocol
> extension. It could be legal for a client that can't understand
> encryption message from the server to supply an encrypted value to be
> inserted into a column. And I don't think we would ever want the bit
> that's being reserved here to be used by some other extension for some
> other purpose, even when this extension isn't used. So I don't see a
> need for this to be tied into the protocol extension.

I think this is an interesting idea. I can indeed see use cases for
e.g. inserting a new row based on another row (where the secret is the
same).

IMHO that means that we should also bump the protocol version for this
change, because it's changing the wire protocol by adding a new
parameter format code. And it does so in a way that does not depend on
the new protocol extension.

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

On 2024-Apr-18, Alexander Lakhin wrote:

> I think the feature implementation should also provide tab completion
> for SPLIT/MERGE.

I don't think that we should be imposing on feature authors or
committers the task of filling in tab-completion for whatever features
they contribute.  I mean, if they want to add that, cool; but if not,
somebody else can do that, too.  It's not a critical piece.

Now, if we're talking about whether a patch to add tab-completion to a
feature post feature-freeze is acceptable, I think it absolutely is
(even though you could claim that it's a new psql feature).  But for
sure we shouldn't mandate that a feature be reverted just because it
lacks tab-completion -- such lack is not an open-item against the
feature in that sense.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."(Paul Thomas)

Re: fix tablespace handling in pg_combinebackup

Hi,

On 2024-04-18 09:03:21 -0400, Robert Haas wrote:
> On Wed, Apr 17, 2024 at 5:50 PM Andres Freund  wrote:
> > > +If there are tablespace present in the backup, include tablespace_map as
> > > +a keyword parameter whose values is a hash. When tar_program is used, the
> > > +hash keys are tablespace OIDs; otherwise, they are the tablespace 
> > > pathnames
> > > +used in the backup. In either case, the values are the tablespace 
> > > pathnames
> > > +that should be used for the target cluster.
> >
> > Where would one get these oids?
> 
> You pretty much have to pick them out of the tar file names. It sucks,
> but it's not this patch's fault.

I was really just remarking on this from the angle of a test writer. I know
that our interfaces around this suck...

For tests, do we really need to set anything on a per-tablespace basis? Can't
we instead just reparent all of them to a new directory?

> I wonder if we (as a project) would consider a patch that redesigned
> this whole mechanism. Produce ${TABLESPACE_NAME}.tar in tar-format,
> instead of ${OID}.tar. In directory-format, relocate via
> -T${TABLESPACE_NAME}=${DIR} instead of -T${SERVERDIR}=${DIR}. That
> would be a significant compatibility break, and you'd somehow need to
> solve the problem of what to put in the tablespace_map file, which
> requires OIDs. But it seems like if you could finesse that issue in
> some elegant way, the result would just be a heck of a lot more usable
> than what we have today.

For some things that'd definitely be nicer - but not sure it work well for
everything. Consider cases where you actually have external directories on
different disks, and you want to restore a backup after some data loss. Now
you need to list all the tablespaces separately, to put them back into their
own location.

One thing I've been wondering about is an option to put the "new" tablespaces
into a location relative to each of the old ones.
  --tablespace-relative-location=../restore-2024-04-18
which would rewrite all the old tablespaces to that new location.

> > Could some of this be simplified by using allow_in_place_tablespaces 
> > instead?
> > Looks like it'd simplify at least the extended test somewhat?
> 
> I don't think we can afford to assume that allow_in_place_tablespaces
> doesn't change the behavior.

I think we can't assume that absolutely everywhere, but we don't need to test
it in a lot of places.

> I said (at least off-list) when that feature was introduced that there was
> no way it was going to remain an isolated development hack, and I think
> that's proved to be 100% correct.

Hm, I guess I kinda agree. But not because I think it wasn't good for
development, but because it'd often be much saner to use relative tablespaces
than absolute ones even for prod.

My only point here was that the test would be simpler if you
a) didn't need to create a temp directory for the tablespace, both for
   primary and standby
b) didn't need to "gin up" a tablespace map, because all the paths are
   relative

Just to be clear: I don't want the above to block merging your test. If you
think you want to do it the way you did, please do.

Greetings,

Andres Freund

Re: plenty code is confused about function level static

2024-04-18 Thread Ranier Vilela

Em qui., 18 de abr. de 2024 às 14:16, Andres Freund 
escreveu:

> Hi,
>
> On 2024-04-18 09:07:43 -0300, Ranier Vilela wrote:
> >  On 18/04/2024 00:39, Andres Freund wrote:
> > >There are lots of places that could benefit from adding 'static
> > >const'.
> >
> > I found a few more places.
>
> Good catches.
>
>
> > Patch 004
> >
> > The opposite would also help, adding static.
> > In these places, I believe it is safe to add static,
> > allowing the compiler to transform into read-only, definitively.
>
> I don't think this would even compile?

Compile, at least with msvc 2022.
Pass ninja test.


E.g. LockTagTypeNames, pg_wchar_table
> are declared in a header and used across translation units.
>
Sad.
There should be a way to export a read-only (static const) variable.
Better remove these.

v1-0005 attached.

best regards,
Ranier Vilela


v1-0005-const-convert-static-const.patch
Description: Binary data

Re: plenty code is confused about function level static

Hi,

On 2024-04-18 09:07:43 -0300, Ranier Vilela wrote:
>  On 18/04/2024 00:39, Andres Freund wrote:
> >There are lots of places that could benefit from adding 'static
> >const'.
> 
> I found a few more places.

Good catches.


> Patch 004
> 
> The opposite would also help, adding static.
> In these places, I believe it is safe to add static,
> allowing the compiler to transform into read-only, definitively.

I don't think this would even compile? E.g. LockTagTypeNames, pg_wchar_table
are declared in a header and used across translation units.

Greetings,

Andres Freund

Re: plenty code is confused about function level static

Hi,

On 2024-04-18 10:33:30 +0200, Peter Eisentraut wrote:
> > Attached are fixes for struct option and a few more occurrences I've found
> > with a bit of grepping.
>
> These look good to me.

Thoughts about when to apply these? Arguably they're fixing mildly broken
code, making it appropriate to fix in 17, but it's also something that we
could end up fixing for a while...

There are some variations of this that are a bit harder to fix, btw. We have

objdump -j .data -t src/backend/postgres|sort -k5
...
01474d00 g O .data  15f0  ConfigureNamesReal
01479a80 g O .data  1fb0  ConfigureNamesEnum
01476300 g O .data  3778  
ConfigureNamesString
...
014682e0 g O .data  5848  ConfigureNamesBool
0146db40 g O .data  71c0  ConfigureNamesInt

Not that thta's all *that* much these days, but it's still pretty silly to use
~80kB of memory in every postgres instance just because we didn't set
  conf->gen.vartype = PGC_BOOL;
etc at compile time.

Large modifiable arrays with callbacks are also quite useful for exploitation,
as one doesn't need to figure out precise addresses.

Greetings,

Andres Freund

Re: Add notes to pg_combinebackup docs

On Tue, Apr 9, 2024 at 4:00 AM Martín Marqués  wrote:
> I've attached two patches, the first one is just neat-picking things I
> found when I first read the docs.

I pushed a commit to remove the spurious "the" that you found, but I
don't really agree with the other changes. They all seem grammatically
correct, but I think they're grammatically correct now, too, and I
find it more readable the way it is.

I'll write a separate email about the checksum issue.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-18 Thread Dagfinn Ilmari Mannsåker

Alexander Lakhin  writes:

> Hi Alexander,
>
> 18.04.2024 13:35, Alexander Korotkov wrote:
>>
>> The revised patchset is attached.
>> 1) I've split the fix for the CommandCounterIncrement() issue and the
>> fix for relation persistence issue into a separate patch.
>> 2) I've validated that the lock on the new partition is held in
>> createPartitionTable() after ProcessUtility() as pointed out by
>> Robert.  So, no need to place the lock again.
>> 3) Added fix for problematic error message as a separate patch [1].
>> 4) Added rename "salemans" => "salesmen" for tests as a separate patch.
>>
>> I think these fixes are reaching committable shape, but I'd like
>> someone to check it before I push.
>
> I think the feature implementation should also provide tab completion for
> SPLIT/MERGE.
> (ALTER TABLE t S
> fills in only SET now.)

Here's a patch for that.  One thing I noticed while testing it was that
the tab completeion for partitions (Query_for_partition_of_table) shows
all the schemas in the DB, even ones that don't contain any partitions
of the table being altered.

- ilmari

>From 26db03b7a7675aa7dbff1f18ee084296caa1e181 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Thu, 18 Apr 2024 17:47:22 +0100
Subject: [PATCH] =?UTF-8?q?Add=20tab=20completion=20for=20ALTER=20TABLE=20?=
 =?UTF-8?q?=E2=80=A6=20SPLIT|MERGE=20PARTITION(S)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 src/bin/psql/tab-complete.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 6fee3160f0..97cd5d9f62 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2353,6 +2353,7 @@ psql_completion(const char *text, int start, int end)
 	  "OWNER TO", "SET", "VALIDATE CONSTRAINT",
 	  "REPLICA IDENTITY", "ATTACH PARTITION",
 	  "DETACH PARTITION", "FORCE ROW LEVEL SECURITY",
+	  "SPLIT PARTITION", "MERGE PARTITIONS (",
 	  "OF", "NOT OF");
 	/* ALTER TABLE xxx ADD */
 	else if (Matches("ALTER", "TABLE", MatchAny, "ADD"))
@@ -2609,10 +2610,10 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("FROM (", "IN (", "WITH (");
 
 	/*
-	 * If we have ALTER TABLE  DETACH PARTITION, provide a list of
+	 * If we have ALTER TABLE  DETACH|SPLIT PARTITION, provide a list of
 	 * partitions of .
 	 */
-	else if (Matches("ALTER", "TABLE", MatchAny, "DETACH", "PARTITION"))
+	else if (Matches("ALTER", "TABLE", MatchAny, "DETACH|SPLIT", "PARTITION"))
 	{
 		set_completion_reference(prev3_wd);
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_partition_of_table);
@@ -2620,6 +2621,19 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches("ALTER", "TABLE", MatchAny, "DETACH", "PARTITION", MatchAny))
 		COMPLETE_WITH("CONCURRENTLY", "FINALIZE");
 
+	/* ALTER TABLE  SPLIT PARTITION  */
+	else if (Matches("ALTER", "TABLE", MatchAny, "SPLIT", "PARTITION", MatchAny))
+		COMPLETE_WITH("INTO ( PARTITION");
+
+	/* ALTER TABLE  MERGE PARTITIONS ( */
+	else if (Matches("ALTER", "TABLE", MatchAny, "MERGE", "PARTITIONS", "("))
+	{
+		set_completion_reference(prev4_wd);
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_partition_of_table);
+	}
+	else if (Matches("ALTER", "TABLE", MatchAny, "MERGE", "PARTITIONS", "(*)"))
+		COMPLETE_WITH("INTO");
+
 	/* ALTER TABLE  OF */
 	else if (Matches("ALTER", "TABLE", MatchAny, "OF"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_composite_datatypes);
-- 
2.39.2

Re: pg17 issues with not-null contraints

On Thu, Apr 18, 2024 at 06:23:30PM +0200, Alvaro Herrera wrote:
> On 2024-Apr-18, Justin Pryzby wrote:
> 
> > BTW, this works up to v16 (although maybe it should not):
> > 
> > | CREATE TABLE ip(id int PRIMARY KEY); CREATE TABLE ic(id int) INHERITS 
> > (ip); ALTER TABLE ic ALTER id DROP NOT NULL;
> 
> > It'd still be nice to detect the issue in advance rather than failing
> > halfway through the upgrade.
> 
> Maybe we can have pg_upgrade --check look for cases we might have
> trouble upgrading.  (I mean: such cases would fail if you have rows with
> nulls in the affected columns, but the schema upgrade should be
> successful.  Is that what you have in mind?)

Before v16, pg_upgrade failed in the middle of restoring the schema,
without being caught during --check.  The patch to implement that was
forgotten and never progressed.

I'm not totally clear on what's intended in v17 - maybe it'd be dead
code, and maybe it shouldn't even be applied to master branch.  But I do
think it's worth patching earlier versions (even though it'll be less
useful than having done so 5 years ago).

-- 
Justin

Re: Transparent column encryption

On Wed, Apr 10, 2024 at 6:13 AM Peter Eisentraut  wrote:
> Obviously, it's early days, so there will be plenty of time to have
> discussions on various other aspects of this patch.  I'm keeping a keen
> eye on the discussion of protocol extensions, for example.

I think the way that you handled that is clever, and along the lines
of what I had in mind when I invented the _pq_ stuff.

More specifically, the way that the ColumnEncryptionKey and
ColumnMasterKey messages are handled is exactly the way that I was
imagining things would work. The client uses _pq_.column_encryption to
signal that it can understand those messages, and the server responds
by including them. I assume that if the client doesn't signal
understanding, then the server simply omits sending those messages. (I
have not checked the code.)

I'm less certain about the changes to the ParameterDescription and
RowDescription messages. I see a couple of potential problems. One is
that, if you say you can understand column encryption messages, the
extra fields are included even for unencrypted columns. The client
must choose at connection startup whether it ever wishes to read any
encrypted data; if so, it pays a portion of that overhead all the
time. Another potential problem is with the scalability of this
design. Suppose that we could not only encrypt columns, but also
compress, fold, mutilate, and spindle them. Then there might end up
being a dizzying array of variation in the format of what is supposed
to be the same message. Perhaps it's not so bad: as long as the
documentation is clear about in which order the additional fields will
appear in the relevant messages when more than one relevant feature is
used, it's probably not too difficult for clients to cope. And it is
probably also true that the precise size of, say, a RowDescription
message will rarely be performance-critical. But another thought is
that we might try to redesign this so that we simply add more message
types rather than mutating message types i.e. after sending the
RowDescription message, if any columns are encrypted, we additionally
send a RowEncryptionDescription message. Then this treatment becomes
symmetric with the handling of ColumnEncryptionKey and ColumnMasterKey
messages, and there's no overhead when the feature is unused.

With regard to the Bind message, I suggest that we regard the protocol
change as reserving a currently-unused bit in the message to indicate
whether the value is pre-encrypted, without reference to the protocol
extension. It could be legal for a client that can't understand
encryption message from the server to supply an encrypted value to be
inserted into a column. And I don't think we would ever want the bit
that's being reserved here to be used by some other extension for some
other purpose, even when this extension isn't used. So I don't see a
need for this to be tied into the protocol extension.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Speed up clean meson builds by ~25%

On 2024-04-17 23:10:53 -0400, Tom Lane wrote:
> Jelte Fennema-Nio  writes:
> > As I expected this problem was indeed fairly easy to address by still
> > building "backend/parser" before "interfaces". See attached patch.
> 
> I think we should hold off on this.  I found a simpler way to address
> ecpg's problem than what I sketched upthread.  I have a not-ready-to-
> show-yet patch that allows the vast majority of ecpg's grammar
> productions to use the default semantic action.  Testing on my M1
> Macbook with clang 16.0.6 from MacPorts, I see the compile time for
> preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec.

That's pretty amazing.

Re: pg17 issues with not-null contraints

On 2024-Apr-18, Justin Pryzby wrote:

> That seems like it could be important.  I considered but never actually
> test your patch by pg_upgrading across major versions.

It would be a welcome contribution for sure.  I've been doing it rather
haphazardly, which is not great.

> BTW, this works up to v16 (although maybe it should not):
> 
> | CREATE TABLE ip(id int PRIMARY KEY); CREATE TABLE ic(id int) INHERITS (ip); 
> ALTER TABLE ic ALTER id DROP NOT NULL;
> 
> Under v17, this fails. Maybe that's okay, but it should probably be
> called out in the release notes.

Sure, we should mention that.

> | ERROR:  cannot drop inherited constraint "ic_id_not_null" of relation "ic"
> 
> That's the issue that I mentioned in the 6 year old thread.  In the
> future (upgrading *from* v17) it won't be possible anymore, right?

Yeah, trying to drop the constraint in 17 fails as it should; it was one
of the goals of this whole thing in fact.

> It'd still be nice to detect the issue in advance rather than failing
> halfway through the upgrade.

Maybe we can have pg_upgrade --check look for cases we might have
trouble upgrading.  (I mean: such cases would fail if you have rows with
nulls in the affected columns, but the schema upgrade should be
successful.  Is that what you have in mind?)

> I have a rebased patch while I'll send on that thread.  I guess it's
> mostly unrelated to your patch but it'd be nice if you could take a
> look.

Okay.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees."  (E. Dijkstra)

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

On 2024-Apr-18, Michael Paquier wrote:

> On Wed, Apr 17, 2024 at 10:31:52AM +0200, Alvaro Herrera wrote:

> > Hmm, maybe we should do a RESET of default_table_access_method before
> > printing the CREATE TABLE to avoid the confusion.
> 
> A hard reset would make the business around currTableAM that decides
> when to generate the SET default_table_access_method queries slightly
> more complicated, while increasing the number of queries run on the
> server.

Hmm, okay.  (I don't think we really care too much about the number of
queries, do we?)

> Fine by me to use two routines to generate the two different commands.
> I am finishing with the attached for now, making dumps, restores and
> upgrades work happily as far as I've tested.

Great.

> I was also worrying about a need to dump the protocol version to be
> able to track the relkind in the toc entries, but a45c78e3284b has
> already done one.  The difference in AM handling between relations
> without storage and relations with storage pushes the relkind logic
> more within the internals of pg_backup_archiver.c.

Hmm, does this mean that every dump taking since a45c78e3284b (April
1st) and before this commit will be unrestorable?  This doesn't worry me
too much, because we aren't even in beta yet ... and I think we don't
have a strict policy about it.

> --- a/src/bin/pg_dump/t/002_pg_dump.pl
> +++ b/src/bin/pg_dump/t/002_pg_dump.pl
> @@ -4591,11 +4591,9 @@ my %tests = (
>   CREATE TABLE dump_test.regress_pg_dump_table_am_child_2
> PARTITION OF 
> dump_test.regress_pg_dump_table_am_parent FOR VALUES IN (2);',
>   regexp => qr/^
> - \QSET default_table_access_method = regress_table_am;\E
> - (\n(?!SET[^;]+;)[^\n]*)*
> - \n\QCREATE TABLE 
> dump_test.regress_pg_dump_table_am_parent (\E
> - (.*\n)*
>   \QSET default_table_access_method = heap;\E
> + (.*\n)*
> + \QALTER TABLE dump_test.regress_pg_dump_table_am_parent 
> SET ACCESS METHOD regress_table_am;\E
>   (\n(?!SET[^;]+;)[^\n]*)*
>   \n\QCREATE TABLE 
> dump_test.regress_pg_dump_table_am_child_1 (\E
>   (.*\n)*

This looks strange -- why did you remove matching for the CREATE TABLE
of the parent table?  That line should appear shortly before the ALTER
TABLE SET ACCESS METHOD for the same table, shouldn't it?  Maybe your
intention was to remove only the SET default_table_access_method
= regress_table_am line ... but it's not clear to me why we have the
"SET default_table_access_method = heap" line before the ALTER TABLE SET
ACCESS METHOD.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"If it is not right, do not do it.
If it is not true, do not say it." (Marcus Aurelius, Meditations)

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-18 Thread Alexander Lakhin


Hi Alexander,

18.04.2024 13:35, Alexander Korotkov wrote:


The revised patchset is attached.
1) I've split the fix for the CommandCounterIncrement() issue and the
fix for relation persistence issue into a separate patch.
2) I've validated that the lock on the new partition is held in
createPartitionTable() after ProcessUtility() as pointed out by
Robert.  So, no need to place the lock again.
3) Added fix for problematic error message as a separate patch [1].
4) Added rename "salemans" => "salesmen" for tests as a separate patch.

I think these fixes are reaching committable shape, but I'd like
someone to check it before I push.


I think the feature implementation should also provide tab completion for
SPLIT/MERGE.
(ALTER TABLE t S
fills in only SET now.)

Also, the following MERGE operation:
CREATE TABLE t (i int, PRIMARY KEY(i)) PARTITION BY RANGE (i);
CREATE TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (1);
CREATE TABLE tp_1 PARTITION OF t FOR VALUES FROM (1) TO (2);
ALTER TABLE t MERGE PARTITIONS (tp_0, tp_1) INTO tp_0;

leaves a strange constraint:
\d+ t*
  Table "public.tp_0"
...
Not-null constraints:
    "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i"

Best regards,
Alexander

PostgreSQL 17 Beta 1 release date

2024-04-18 Thread Jonathan S. Katz


Hi,

PostgreSQL 17 Beta 1 is planned to be release on May 23, 2024. Please 
continue your hard work on closing out open items[1] ahead of the 
release and have the fixes targeted for the release committed by May 18, 
2024.


Thanks - it's very exciting that we're at this point in the release cycle!

Jonathan

[1] https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2024-04-18 Thread Peter Geoghegan

On Thu, Apr 18, 2024 at 2:13 AM Donghang Lin  wrote:
> It's triggered when a scankey's strategy is set to invalid. While for a 
> descending ordered column,
> the strategy needs to get fixed to its commute strategy. That doesn't work if 
> the strategy is invalid.

The problem is that _bt_fix_scankey_strategy shouldn't have been doing
anything with already-eliminated array scan keys in the first place
(whether or not they're on a DESC index column). I just pushed a fix
along those lines.

Thanks for the report!

-- 
Peter Geoghegan

Re: pgsql: Fix restore of not-null constraints with inheritance

On 2024-Apr-18, Alvaro Herrera wrote:

> On 2024-Apr-18, Alvaro Herrera wrote:
> 
> > Lastly, make two changes to pg_dump: 1) do not try to drop a not-null
> > constraint that's marked as inherited; this allows a dump to restore
> > with no errors if a table with a PK inherits from another which also has
> > a PK; 2) avoid giving inherited constraints throwaway names, for the
> > rare cases where such a constraint survives after the restore.
> 
> Hmm, this last bit broke pg_upgrade on crake.  I'll revert this part,
> meanwhile I'll be installing 9.2 to see if it can be fixed in a better way.

Eh, so:

1. running "make check" on pg_upgrade using an oldinstall pointing to
9.2 fails, because PostgreSQL::Test::Cluster doesn't support that
version -- it only goes back to 9.2.  How difficult was it to port it
back to all these old versions?

2. running "make check" with an oldinstall pointing to 10 fails, because
the "invalid database" checks fail:

not ok 7 - invalid database causes failure status (got 0 vs expected 1)

#   Failed test 'invalid database causes failure status (got 0 vs expected 1)'
#   at t/002_pg_upgrade.pl line 407.
not ok 8 - invalid database causes failure stdout /(?^:invalid)/

3. Lastly, even if I put back the code that causes the failures on crake
and restore from 10 (and ignore those two problems), I cannot reproduce
the issues it reported.  Is crake running some funky code that's not
what "make check" on pg_upgrade does, perchance?

I think we should SKIP the tests with invalid databases when running
with an oldinstall 10 and older, because that commit only patches back
to 11:

Author: Andres Freund 
Branch: master [c66a7d75e] 2023-07-13 13:03:28 -0700
Branch: REL_16_STABLE Release: REL_16_0 [a4b4cc1d6] 2023-07-13 13:03:30 -0700
Branch: REL_15_STABLE Release: REL_15_4 [f66403749] 2023-07-13 13:04:45 -0700
Branch: REL_14_STABLE Release: REL_14_9 [d11efe830] 2023-07-13 13:03:33 -0700
Branch: REL_13_STABLE Release: REL_13_12 [81ce6] 2023-07-13 13:03:34 -0700
Branch: REL_12_STABLE Release: REL_12_16 [034a9fcd2] 2023-07-13 13:03:36 -0700
Branch: REL_11_STABLE Release: REL_11_21 [1c38e7ae1] 2023-07-13 13:03:37 -0700

Handle DROP DATABASE getting interrupted

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)

Re: Trigger violates foreign key constraint

Aleksander Alekseev  writes:
>> I agree with documenting this hazard, but I think it'd be better
>> to do so in the "Triggers" chapter.  There is no hazard unless
>> you are writing user-defined triggers, which is surely far fewer
>> people than use foreign keys.  So I suggest something like the
>> attached.

> I don't mind changing the chapter, but I prefer the wording chosen in
> v3. The explanation in v4 is somewhat hard to follow IMO.

Well, I didn't like v3 because I thought it was wrong, or at least
fairly misleading.  The fact that we use triggers rather than some
other mechanism to invoke referential updates isn't really relevant.
The issue is that the update actions fire user-defined triggers,
and it's those that are the (potential) problem.

Perhaps we should leave the system triggers out of the discussion
entirely?  More or less like:

If a foreign key constraint specifies referential actions (that
is, cascading updates or deletes), those actions are performed via
ordinary SQL update or delete commands on the referencing table.
In particular, any triggers that exist on the referencing table
will be fired for those changes.  If such a trigger modifies or
blocks the effect of one of these commands, the end result could
be to break referential integrity.  It is the trigger programmer's
responsibility to avoid that.

regards, tom lane

Re: improve performance of pg_dump --binary-upgrade

On Thu, Apr 18, 2024 at 09:24:53AM +0200, Daniel Gustafsson wrote:
>> On 18 Apr 2024, at 06:17, Nathan Bossart  wrote:
> 
>> The attached work-in-progress patch speeds up 'pg_dump --binary-upgrade'
>> for this case.  Instead of executing the query in every call to the
>> function, we can execute it once during the first call and store all the
>> required information in a sorted array that we can bsearch() in future
>> calls.
> 
> That does indeed seem like a saner approach.  Since we look up the relkind we
> can also remove the is_index parameter to binary_upgrade_set_pg_class_oids
> since we already know that without the caller telling us?

Yeah.  It looks like that's been possible since commit 9a974cb, so I can
write a prerequisite patch for this.

>> One downside of this approach is the memory usage.
> 
> I'm not too worried about the worst-case performance of this.

Cool.  That seems to be the general sentiment.

>> This was more-or-less
>> the first approach that crossed my mind, so I wouldn't be surprised if
>> there's a better way.  I tried to keep the pg_dump output the same, but if
>> that isn't important, maybe we could dump all the pg_class OIDs at once
>> instead of calling binary_upgrade_set_pg_class_oids() for each one.
> 
> Without changing the backend handling of the Oid's we can't really do that
> AFAICT, the backend stores the Oid for the next call so it needs to be per
> relation like now?

Right, this would require additional changes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Transparent column encryption

On Thu, 18 Apr 2024 at 13:25, Peter Eisentraut  wrote:
> Hopefully, the reason for key rotation is mainly that policies require
> key rotation, not that keys get compromised all the time.

These key rotation policies are generally in place to reduce the
impact of a key compromise by limiting the time a compromised key is
valid.

> This
> seems pretty standard to me.  For example, I can change the password on
> my laptop's file system encryption, which somehow wraps a lower-level
> key, but I can't reencrypt the actual file system in place.

I think the threat model for this proposal and a laptop's file system
encryption are different enough that the same choices/tradeoffs don't
automatically translate. Specifically in this proposal the unencrypted
CEK is present on all servers that need to read/write those encrypted
values. And a successful attacker would then be able to read the
encrypted values forever with this key, because it effectively cannot
be rotated. That is a much bigger attack surface and risk than a
laptop's disk encryption. So, I feel quite strongly that shipping the
proposed feature without being able to re-encrypt columns in an online
fashion would be a mistake.

> That's the
> reason for having this two-tier key system in the first place.

If we allow for online-rotation of the actual encryption key, then
maybe we don't even need this two-tier system ;)

Not having this two tier system would have a few benefits in my opinion:
1. We wouldn't need to be sending encrypted key material from the
server to every client. Which seems nice from a security, bandwidth
and client implementation perspective.
2. Asymmetric encryption of columns is suddenly an option. Allowing
certain clients to enter encrypted data into the database but not read
it.

> Two problems here: One, for deterministic encryption, everyone needs to
> agree on the representation, otherwise equality comparisons won't work.
>   Two, if you give clients the option of storing text or binary, then
> clients also get back a mix of text or binary, and it will be a mess.
> Just giving the option of storing the payload in binary wouldn't be that
> hard, but it's not clear what you can sensibly do with that in the end.

How about defining at column creation time if the underlying value
should be binary or not? Something like:

CREATE TABLE t(
mytime timestamp ENCRYPTED WITH (column_encryption_key = cek1, binary=true)
);

> Even if the identifiers
> somehow were global (but OIDs can also change and are not guaranteed
> unique forever),

OIDs of existing rows can't just change while a connection is active,
right? (all I know is upgrades can change them but that seems fine)
Also they are unique within a catalog table, right?

> the state of which keys have already been sent is
> session state.

I agree that this is the case. But it's state that can be tracked
fairly easily by a transaction pooler. Similar to how prepared
statements can be tracked. And this is easier to do when at the IDs of
the same keys are the same across each session to the server, because
if they differ then you need to do re-mapping of IDs.

> This is kind of like SASL or TLS can add new methods dynamically without
> requiring a new version.  I mean, as we are learning, making new
> protocol versions is kind of hard, so the point was to avoid it.

Fair enough

> I guess you could do that, but wouldn't that making the decoding of
> these messages much more complicated?  You would first have to read the
> "short" variant, decode the format, and then decide to read the rest.
> Seems weird.

I see your point. But with the current approach even for queries that
don't return any encrypted columns, these useless fields would be part
of the RowDescryption. It seems quite annoying to add extra network
and parsing overhead all of your queries even if only a small
percentage use the encryption feature. Maybe we should add a new
message type instead like EncryptedRowDescription, or add some flag
field at the start of RowDescription that can be used to indicate that
there is encryption info for some of the columns.

> Yes, that's what would happen, and that's the intention, so that for
> example you can use pg_dump to back up encrypted columns without having
> to decrypt them.

Okay, makes sense. But I think it would be good to document that.

> > A related question to this is that currently libpq throws an error if
> > e.g. a master key realm is not defined but another one is. Is that
> > really what we want? Is not having one of the realms really that
> > different from not providing any realms at all?
>
> Can you provide a more concrete example of what scenario you have a
> concern about?

A server has table A and B. A is encrypted with a master key realm X
and B is encrypted with master key realm Y. If libpq is only given a
key for realm X, and it then tries to read table B, an error is
thrown. While if you don't provide any realm at all, you can read from
table B just

Re: pg_combinebackup fails on file named INCREMENTAL.*

On Wed, Apr 17, 2024 at 7:56 PM David Steele  wrote:
> Thanks! I've tested this and it works as advertised.
>
> Ideally I'd want an error on backup if there is a similar file in any
> data directories that would cause an error on combine, but I admit that
> it is vanishingly rare for users to put files anywhere but the root of
> PGDATA, so this approach seems OK to me.

OK, committed. Thanks for the review.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg17 issues with not-null contraints

On Tue, Apr 16, 2024 at 08:11:49PM +0200, Alvaro Herrera wrote:
> This is still missing some cleanup and additional tests, of course.
> Speaking of which, I wonder if I should modify pg16's tests so that they
> leave behind tables set up in this way, to immortalize pg_upgrade
> testing.

That seems like it could be important.  I considered but never actually
test your patch by pg_upgrading across major versions.

BTW, this works up to v16 (although maybe it should not):

| CREATE TABLE ip(id int PRIMARY KEY); CREATE TABLE ic(id int) INHERITS (ip); 
ALTER TABLE ic ALTER id DROP NOT NULL;

Under v17, this fails. Maybe that's okay, but it should probably be
called out in the release notes.
| ERROR:  cannot drop inherited constraint "ic_id_not_null" of relation "ic"

That's the issue that I mentioned in the 6 year old thread.  In the
future (upgrading *from* v17) it won't be possible anymore, right?  It'd
still be nice to detect the issue in advance rather than failing halfway
through the upgrade.  I have a rebased patch while I'll send on that
thread.  I guess it's mostly unrelated to your patch but it'd be nice if
you could take a look.

-- 
Justin

clang's sanitizer makes stringToNode() extremely slow

2024-04-18 Thread Alexander Lakhin


Hello hackers,

When using a server built with clang (15, 18) with sanitizers enabled,
the last query in the following script:
SET parallel_setup_cost = 0;
SET min_parallel_table_scan_size = 0;

SELECT a::text INTO t FROM generate_series(1, 1000) a;
\timing on
SELECT string_agg(a, ',') FROM t WHERE a = REPEAT('0', 40);

runs for more than a minute on my workstation (a larger repeat count gives
much longer duration):
Time: 66017.594 ms (01:06.018)

The typical stack trace for a running parallel worker is:
#0  0x559a23671885 in __sanitizer::internal_strlen(char const*) ()
#1  0x559a236568ed in StrtolFixAndCheck(void*, char const*, char**, char*, 
int) ()
#2  0x559a236423dc in __interceptor_strtol ()
#3  0x559a240027d9 in atoi (...) at readfuncs.c:629
#5  0x559a23fb03f0 in _readConst () at readfuncs.c:275
#6  parseNodeString () at ./readfuncs.switch.c:29
#7  0x559a23faa421 in nodeRead (
    token=0x7fee75cf3bd2 "{CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false 
:constisnull false :location -1 :constvalue 44 [ 16 106 24 0 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48"...,

    tok_len=1) at read.c:338
#8  0x559a23faa916 in nodeRead (...) at read.c:452
#9  0x559a23fb3f34 in _readOpExpr () at ./readfuncs.funcs.c:279
#10 0x559a23fb0842 in parseNodeString () at ./readfuncs.switch.c:47
#11 0x559a23faa421 in nodeRead (...) at read.c:338
#12 0x559a23faa916 in nodeRead (...) at read.c:452
#13 0x559a23fefb74 in _readSeqScan () at ./readfuncs.funcs.c:3954
#14 0x559a23fb2b97 in parseNodeString () at ./readfuncs.switch.c:559
#15 0x559a23faa421 in nodeRead (...) at read.c:338
#16 0x559a23ffc033 in _readAgg () at ./readfuncs.funcs.c:4685
#17 0x559a23fb2dd3 in parseNodeString () at ./readfuncs.switch.c:611
#18 0x559a23faa421 in nodeRead (...) at read.c:338
#19 0x559a23feb340 in _readPlannedStmt () at ./readfuncs.funcs.c:3685
#20 0x559a23fb2ad1 in parseNodeString () at ./readfuncs.switch.c:541
#21 0x559a23faa421 in nodeRead (...) at read.c:338
#22 0x559a23fa99d8 in stringToNodeInternal (...) at read.c:92
#24 0x559a23d66609 in ExecParallelGetQueryDesc (...) at execParallel.c:1250
#25 ParallelQueryMain (...) at execParallel.c:1424
#26 0x559a238cfe13 in ParallelWorkerMain (...) at parallel.c:1516
#27 0x559a241e5b6a in BackgroundWorkerMain (...) at bgworker.c:848
#28 0x559a241ec254 in postmaster_child_launch (...) at launch_backend.c:265
#29 0x559a241f1c15 in do_start_bgworker (...) at postmaster.c:4270
#30 maybe_start_bgworkers () at postmaster.c:4501
#31 0x559a241f486e in process_pm_pmsignal () at postmaster.c:3774
#32 ServerLoop () at postmaster.c:1667
#33 0x559a241f0ed6 in PostmasterMain (...) at postmaster.c:1372
#34 0x559a23ebe16d in main (...) at main.c:197

The flamegraph (which shows that readDatum() -> __interceptor_strtol() ->
StrtolFixAndCheck() -> __sanitizer::internal_strlen () takes >99% of time)
is attached.
(I could not reproduce this behaviour with the gcc's sanitizer.)

Moreover, this query cannot be canceled, you can only kill -SIGKILL
parallel workers to interrupt it.

Best regards,
Alexander

Re: improve performance of pg_dump --binary-upgrade

On Thu, Apr 18, 2024 at 02:08:28AM -0400, Corey Huinker wrote:
> Bar-napkin math tells me in a worst-case architecture and braindead byte
> alignment, we'd burn 64 bytes per struct, so the 100K tables cited would be
> about 6.25MB of memory.

That doesn't seem too terrible.

> The obvious low-memory alternative would be to make a prepared statement,
> though that does nothing to cut down on the roundtrips.
> 
> I think this is a good trade off.

Cool.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: pg_combinebackup does not detect missing files

On Wed, Apr 17, 2024 at 7:09 PM David Steele  wrote:
> I think here:
>
> +   pg_basebackup only attempts to verify
>
> you mean:
>
> +   pg_combinebackup only attempts to verify
>
> Otherwise this looks good to me.

Good catch, thanks. Committed with that change.

> Fair enough. I accept that your reasoning is not random, but I'm still
> not very satisfied that the user needs to run a separate and rather
> expensive process to do the verification when pg_combinebackup already
> has the necessary information at hand. My guess is that most users will
> elect to skip verification.

I think you're probably right that a lot of people will skip it; I'm
just less convinced than you are that it's a bad thing. It's not a
*great* thing if people skip it, but restore time is actually just
about the worst time to find out that you have a problem with your
backups. I think users would be better served by verifying stored
backups periodically when they *don't* need to restore them. Also,
saying that we have all of the information that we need to do the
verification is only partially true:

- we do have to parse the manifest anyway, but we don't have to
compute checksums anyway, and I think that cost can be significant
even for CRC-32C and much more significant for any of the SHA variants

- we don't need to read all of the files in all of the backups. if
there's a newer full, the corresponding file in older backups, whether
full or incremental, need not be read

- incremental files other than the most recent only need to be read to
the extent that we need their data; if some of the same blocks have
been changed again, we can economize

How much you save because of these effects is pretty variable. Best
case, you have a 2-backup chain with no manifest checksums, and all
verification will have to do that you wouldn't otherwise need to do is
walk each older directory tree in toto and cross-check which files
exist against the manifest. That's probably cheap enough that nobody
would be too fussed. Worst case, you have a 10-backup (or whatever)
chain with SHA512 checksums and, say, a 50% turnover rate. In that
case, I think having verification happen automatically could be a
pretty major hit, both in terms of I/O and CPU. If your database is
1TB, it's ~5.5TB of read I/O (because one 1TB full backup and 9 0.5TB
incrementals) instead of ~1TB of read I/O, plus the checksumming.

Now, obviously you can still feel that it's totally worth it, or that
someone in that situation shouldn't even be using incremental backups,
and it's a value judgement, so fair enough. But my guess is that the
efforts that this implementation makes to minimize the amount of I/O
required for a restore are going to be important for a lot of people.

> At least now they'll have the information they need to make an informed
> choice.

Right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg17 issues with not-null contraints

On 2024-Apr-15, Justin Pryzby wrote:

> Here's a couple more issues affecting upgrades from v16 to v17.
> 
> postgres=# CREATE TABLE a(i int NOT NULL); CREATE TABLE b(i int PRIMARY KEY) 
> INHERITS (a);
> pg_restore: error: could not execute query: ERROR:  constraint 
> "pgdump_throwaway_notnull_0" of relation "b" does not exist
> 
> postgres=# CREATE TABLE a(i int CONSTRAINT a NOT NULL PRIMARY KEY); CREATE 
> TABLE b()INHERITS(a); ALTER TABLE b ADD CONSTRAINT pkb PRIMARY KEY (i);
> pg_restore: error: could not execute query: ERROR:  cannot drop inherited 
> constraint "pgdump_throwaway_notnull_0" of relation "b"

I pushed a fix now, and it should also cover these two issues, which
required only minor changes over what I posted yesterday.  Also, thank
you for pointing out that the patch also fixed Andrew's problem.  It
did, except there was a locking problem which required an additional
tweak.

Thanks for reporting these.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"At least to kernel hackers, who really are human, despite occasional
rumors to the contrary" (LWN.net)

Re: cataloguing NOT NULL constraints

On 2024-Jan-25, Andrew Bille wrote:

> Starting from b0e96f31, pg_upgrade fails with inherited NOT NULL constraint:
> For example upgrade from 9c13b6814a (or REL_12_STABLE .. REL_16_STABLE) to
> b0e96f31 (or master) with following two tables (excerpt from
> src/test/regress/sql/rules.sql)
> 
> create table test_0 (id serial primary key);
> create table test_1 (id integer primary key) inherits (test_0);

I have pushed a fix which should hopefully fix this problem
(d9f686a72e).  Please give this a look.  Thanks for reporting the issue.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I apologize for the confusion in my previous responses.
 There appears to be an error."   (ChatGPT)

Re: documentation structure

2024-04-18 Thread jian he

On Thu, Apr 18, 2024 at 2:37 AM Dagfinn Ilmari Mannsåker
 wrote:
>
> Andres Freund  writes:
>
> > Hi,
> >
> > On 2024-04-17 12:07:24 +0100, Dagfinn Ilmari Mannsåker wrote:
> >> Andres Freund  writes:
> >> > I think the manual work for writing signatures in sgml is not 
> >> > insignificant,
> >> > nor is the volume of sgml for them. Manually maintaining the signatures 
> >> > makes
> >> > it impractical to significantly improve the presentation - which I don't 
> >> > think
> >> > is all that great today.
> >>
> >> And it's very inconsistent.  For example, some functions use 
> >> tags for optional parameters, others use square brackets, and some use
> >> VARIADIC to indicate variadic parameters, others use
> >> ellipses (sometimes in  tags or brackets).
> >
> > That seems almost inevitably the outcome of many people having to manually
> > infer the recommended semantics, for writing something boring but 
> > nontrivial,
> > from a 30k line file.
>
> As Corey mentioned elsethread, having a markup style guide (maybe a
> comment at the top of the file?) would be nice.
>
> >> > And the lack of argument names in the pg_proc entries is occasionally 
> >> > fairly
> >> > annoying, because a \df+ doesn't provide enough information to use 
> >> > functions.
> >>
> >> I was also annoyed by this the other day (specifically wrt. the boolean
> >> arguments to pg_ls_dir),
> >
> > My bane is regexp_match et al, I have given up on remembering the argument
> > order.
>
> There's a thread elsewhere about those specifically, but I can't be
> bothered to find the link right now.
>
> >> and started whipping up a Perl script to parse func.sgml and generate
> >> missing proargnames values for pg_proc.dat, which is how I discovered the
> >> above.
> >
> > Nice.
> >
> >> The script currently has a pile of hacky regexes to cope with that,
> >> so I'd be happy to submit a doc patch to turn it into actual markup to get
> >> rid of that, if people think that's a worhtwhile use of time and won't 
> >> clash
> >> with any other plans for the documentation.
> >
> > I guess it's a bit hard to say without knowing how voluminious the changes
> > would be. If we end up rewriting the whole file the tradeoff is less clear
> > than if it's a dozen inconsistent entries.
>
> It turned out to not be that many that used [] for optional parameters,
> see the attached patch.
>

hi.
I manually checked the html output. It looks good to me.

Re: fix tablespace handling in pg_combinebackup