Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-17 Thread David Rowley
On Tue, 18 Apr 2023 at 09:21, Melanie Plageman
 wrote:
> Are we still thinking that reordering the VACUUM (and ANALYZE) options
> makes sense. And, if so, should it be alphabetical within parameter
> category? That is, all actual parameters (e.g. FULL and FREEZE) are
> alphabetically organized first followed by all parameter types (e.g.
> boolean and size) alphabetically listed?

I've opened a thread for that [1].

David

[1] 
https://postgr.es/m/CAApHDvo1eWbt5PVpk0G=ycbbnglu7karp6dcbhpnbfabjyg...@mail.gmail.com




Should we put command options in alphabetical order in the doc?

2023-04-17 Thread David Rowley
Over on [1], Peter mentions that we might want to consider putting the
VACUUM options into some order that's better than the apparent random
order that they're currently in.

VACUUM is certainly one command that's grown a fairly good number of
options over the years and it appears we've not given much
consideration to what order to put those in in the documentation.

It's not just VACUUM that has this issue.  I see 6 commands using the
following text:

$ git grep "option can be one of"
src/sgml/ref/analyze.sgml: ...
src/sgml/ref/cluster.sgml: ...
src/sgml/ref/copy.sgml: ...
src/sgml/ref/explain.sgml: ...
src/sgml/ref/reindex.sgml: ...
src/sgml/ref/vacuum.sgml: ...

(maybe there's more we should consider adjusting?)

Likely if we do opt to put these options in a more well-defined order,
we should apply that to at least the 6 commands listed above.

For the case of reindex.sgml, I do see that the existing parameter
order lists INDEX | TABLE | SCHEMA | DATABASE | SYSTEM first which is
the target of the reindex. I wondered if that was worth keeping. I'm
just thinking that since all of these are under the "Parameters"
heading that we should class them all as equals and just make the
order alphabetical. I feel that if we don't do that then the order to
add any new parameters is just not going to be obvious and we'll end
up with things getting out of order again quite quickly.

I've attached a patch which makes the changes as I propose them.

David

[1] https://postgr.es/m/16845cb1-b228-e157-f293-5892bced9...@enterprisedb.com


alphabetical_order_for_parameter_names.patch
Description: Binary data


Re: Fix documentation for max_wal_size and min_wal_size

2023-04-17 Thread Michael Paquier
On Mon, Apr 17, 2023 at 07:57:58PM -0700, sirisha chamarthi wrote:
> On Fri, Apr 14, 2023 at 1:01 AM Kyotaro Horiguchi 
> wrote:
>> So, I personally think it should be written like this: "The default
>> size is 80MB. However, if you have changed the WAL segment size from
>> the default of 16MB, it will be five times the segment size.", but I'm
>> not sure what the others think about this..

Yes, I was under the impression that this should mention 16MB, but
I'd also add a note about initdb when a non-default value is specified
for the segment size.
--
Michael


signature.asc
Description: PGP signature


Re: A Question about InvokeObjectPostAlterHook

2023-04-17 Thread Michael Paquier
On Tue, Apr 18, 2023 at 09:51:30AM +0800,  Legs Mansion wrote:
> Recently, I ran into a problem, InvokeObjectPostAlterHook was
> implemented for sepgsql, sepgsql use it to determine whether to
> check permissions during certain operations.  But
> InvokeObjectPostAlterHook doesn't handle all of the alter's
> behavior, at least the table is not controlled.e.g., ALTER 
> TABLE... ENABLE/DISABLE ROW LEVEL SECURITY,ALTER TABLE ... DISABLE
> TRIGGER, GRANT and REVOKE and so on. 
> Whether InvokeObjectPostAlterHookis not fully controlled? it's
> a bug? 

Yes, tablecmds.c has some holes and these are added when there is a
ask for it, as far as I recall.  In some cases, these locations can be
tricky to add, so usually they require an independent analysis.  For
example, EnableDisableTrigger() has one AOT for the trigger itself,
but not for the relation changed in tablecmds.c, as you say, anyway we
should be careful with cross-dependencies.

Note that 90efa2f has made the tests for OATs much easier, and there
is no need to rely only on sepgsql for that.  (Even if test_oat_hooks
has been having some stability issues with namespace lookups because
of the position on the namespace search hook.)

Also, the additions of InvokeObjectPostAlterHook() are historically
conservative because they create behavior changes in stable branches,
meaning no backpatch.  See a995b37 or 7b56584 as past examples, for
example.

Note that the development of PostgreSQL 16 has just finished, so now
may not be the best moment to add these extra AOT calls, but these
could be added in 17~ for sure at the beginning of July once the next
development cycle begins.

Attached would be what I think would be required to add OATs for RLS,
triggers and rules, for example.  There are much more of these at
quick glance, still that's one step in providing more checks.  Perhaps
you'd like to expand this patch with more ALTER TABLE subcommands
covered?
--
Michael
From bb693411864399f41d93819adf8caa0f073a9a2f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 18 Apr 2023 13:32:42 +0900
Subject: [PATCH] Add OAT calls for more flavors of ALTER TABLE

---
 src/backend/commands/tablecmds.c  |  12 ++
 src/test/modules/test_oat_hooks/Makefile  |   2 +-
 .../test_oat_hooks/expected/alter_table.out   | 165 ++
 src/test/modules/test_oat_hooks/meson.build   |   1 +
 .../test_oat_hooks/sql/alter_table.sql|  48 +
 5 files changed, 227 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/test_oat_hooks/expected/alter_table.out
 create mode 100644 src/test/modules/test_oat_hooks/sql/alter_table.sql

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 343fe61115..5b6f8f0376 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14843,6 +14843,9 @@ ATExecEnableDisableTrigger(Relation rel, const char *trigname,
 	EnableDisableTrigger(rel, trigname, InvalidOid,
 		 fires_when, skip_system, recurse,
 		 lockmode);
+
+	InvokeObjectPostAlterHook(RelationRelationId,
+			  RelationGetRelid(rel), 0);
 }
 
 /*
@@ -14855,6 +14858,9 @@ ATExecEnableDisableRule(Relation rel, const char *rulename,
 		char fires_when, LOCKMODE lockmode)
 {
 	EnableDisableRule(rel, rulename, fires_when);
+
+	InvokeObjectPostAlterHook(RelationRelationId,
+			  RelationGetRelid(rel), 0);
 }
 
 /*
@@ -16134,6 +16140,9 @@ ATExecSetRowSecurity(Relation rel, bool rls)
 	((Form_pg_class) GETSTRUCT(tuple))->relrowsecurity = rls;
 	CatalogTupleUpdate(pg_class, >t_self, tuple);
 
+	InvokeObjectPostAlterHook(RelationRelationId,
+			  RelationGetRelid(rel), 0);
+
 	table_close(pg_class, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
@@ -16160,6 +16169,9 @@ ATExecForceNoForceRowSecurity(Relation rel, bool force_rls)
 	((Form_pg_class) GETSTRUCT(tuple))->relforcerowsecurity = force_rls;
 	CatalogTupleUpdate(pg_class, >t_self, tuple);
 
+	InvokeObjectPostAlterHook(RelationRelationId,
+			  RelationGetRelid(rel), 0);
+
 	table_close(pg_class, RowExclusiveLock);
 	heap_freetuple(tuple);
 }
diff --git a/src/test/modules/test_oat_hooks/Makefile b/src/test/modules/test_oat_hooks/Makefile
index 2b5d2687f8..dcaf3a7727 100644
--- a/src/test/modules/test_oat_hooks/Makefile
+++ b/src/test/modules/test_oat_hooks/Makefile
@@ -6,7 +6,7 @@ OBJS = \
 	test_oat_hooks.o
 PGFILEDESC = "test_oat_hooks - example use of object access hooks"
 
-REGRESS = test_oat_hooks
+REGRESS = test_oat_hooks alter_table
 
 # disable installcheck for now
 NO_INSTALLCHECK = 1
diff --git a/src/test/modules/test_oat_hooks/expected/alter_table.out b/src/test/modules/test_oat_hooks/expected/alter_table.out
new file mode 100644
index 00..eac5071977
--- /dev/null
+++ b/src/test/modules/test_oat_hooks/expected/alter_table.out
@@ -0,0 +1,165 @@
+--
+-- OAT checks for ALTER TABLE
+--
+-- This test script fails if debug_discard_caches is enabled, because cache
+-- flushes cause extra calls of the OAT 

Re: Note new NULLS NOT DISTINCT on unique index tutorial page

2023-04-17 Thread David Rowley
On Tue, 18 Apr 2023 at 05:01, Corey Huinker  wrote:
> I'm ok with the wording as-is, but perhaps we can phrase it as "distinct" vs 
> "not equal", thus leaning into the syntax a bit:
>
> By default, null values in a unique column are considered distinct, allowing 
> multiple nulls in the column.
>
>
> or maybe
>
> By default, null values in a unique column are considered 
> DISTINCT, allowing multiple nulls in the column.>

I acknowledge your input, but I didn't think either of these was an
improvement over what David suggested.  I understand that many people
will know that "SELECT DISTINCT" and "WHERE x IS NOT DISTINCT FROM y"
means treat NULLs equally, but I don't think we should expect the
reader here to know that's what we're talking about.   In any case,
we're talking about existing wording here, not something David is
adding.

David




Do we really need two xl_heap_lock records?

2023-04-17 Thread Peter Geoghegan
Attached patch removes xl_heap_lock_updated/XLOG_HEAP2_LOCK_UPDATED,
which allows us to throw out about a hundred lines of duplicative
(though hairy) code, net total. The patch also reclaims some Heap2
info bit space, which seems quite useful.

The patch teaches heap_lock_updated_tuple_rec() to make use of the
generic xl_heap_lock/XLOG_HEAP_LOCK record type, rather than using the
now-removed custom record type. My guess is that commit 0ac5ad5134
(where xl_heap_lock_updated originated) simply missed the opportunity
to consolidate the two records into one, perhaps because the patch
evolved a lot during development. Note that xl_heap_lock is already
used by several heapam routines besides heap_lock_tuple (e.g.
heap_update uses it).

Testing with wal_consistency_checking did not demonstrate any problems
with the patch. It's not completely trivial, though. It seems that
there are steps in the REDO routines that shouldn't be performed in
the locked-updated-tuple case -- I had to invent XLH_LOCK_UPDATED to
deal with the issue. There may be a better approach there, but I
haven't thought about it in enough detail to feel confident either
way.

-- 
Peter Geoghegan


v1-0001-Remove-xl_heap_lock_updated-WAL-record.patch
Description: Binary data


Re: Note new NULLS NOT DISTINCT on unique index tutorial page

2023-04-17 Thread David Rowley
On Thu, 13 Apr 2023 at 02:40, David Gilman  wrote:
> The SQL Language part of the docs has a brief page on unique indexes.
> It doesn't mention the new NULLS NOT DISTINCT functionality on unique
> indexes and this is a good place to mention it, as it already warns
> the user about the old/default behavior.

I think we should do this and apply it to v15 too.

It seems like a good idea to include the [NULLS [NOT] DISTINCT] in the
syntax synopsis too. Otherwise, the reader of that page is just left
guessing where they'll put NULLS NOT DISTINCT to get the behaviour
you've added the text for.

I've attached an updated patch with that plus 2 very small wording
tweaks to your proposed text.

David


doc_nulls_not_distinct_v2.patch
Description: Binary data


Re: Fix documentation for max_wal_size and min_wal_size

2023-04-17 Thread sirisha chamarthi
Hi,

On Fri, Apr 14, 2023 at 1:01 AM Kyotaro Horiguchi 
wrote:

> At Thu, 13 Apr 2023 12:01:04 -0700, sirisha chamarthi <
> sirichamarth...@gmail.com> wrote in
> > The documentation [1] says max_wal_size and min_wal_size defaults are 1GB
> > and 80 MB respectively. However, these are configured based on the
> > wal_segment_size and documentation is not clear about it. Attached a
> patch
> > to fix the documentation.
> >
> > [1] https://www.postgresql.org/docs/devel/runtime-config-wal.html
>
> Good catch! Now wal_segment_size is easily changed.
>
> -The default is 1 GB.
> +The default value is configured to maximum of 64 times the
> wal_segment_size or 1 GB.
> -The default is 80 MB.
> +The default value is configured to maximum of 5 times the
> wal_segment_size or 80 MB.
>
> However, I believe that most users don't change the WAL segment size,
> so the primary information is that the default sizes are 1GB and 80MB.
>
> So, I personally think it should be written like this: "The default
> size is 80MB. However, if you have changed the WAL segment size from
> the default of 16MB, it will be five times the segment size.", but I'm
> not sure what the others think about this..


This looks good to me.

Thanks,
Sirisha


Re: eclg -C ORACLE breaks data

2023-04-17 Thread Michael Paquier
On Mon, Apr 17, 2023 at 05:00:59PM +0900, Michael Paquier wrote:
> This thinko has been introduced by 3b7ab43, so this needs to go down
> to v11.  I'll see to that.

So done.  mylodon is feeling unhappy about that, because this has a
C99 declaration:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon=2023-04-18%2002%3A22%3A04
char_array.pgc:73:8: error: variable declaration in for loop is a C99-specific 
feature [-Werror,-Wc99-extensions]
  for (int i = 0 ; i < sqlda->sqld ; i++)

I'll go fix that in a minute, across all the branches for
consistency.
--
Michael


signature.asc
Description: PGP signature


Re: Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query

2023-04-17 Thread Tom Lane
Amin  writes:
> I made sure EXPLAIN returns CustomScan for all scans in the query. But
> still, ExecCustomScan is only called once while the rest of the functions
> are called for each scan separately. Is this expected behavior? How to work
> around this?

[shrug...] There's some bug in your code, which you've not shown us
(not that I'm volunteering to review it in any detail).

regards, tom lane




Re: Fix typos and inconsistencies for v16

2023-04-17 Thread Tom Lane
David Rowley  writes:
> On Tue, 18 Apr 2023 at 06:00, Alexander Lakhin  wrote:
>> Also, maybe OID_MAX should be removed from src/include/postgres_ext.h as 
>> it's unused since eb8312a22.

> I didn't touch this. It seems like it could be useful for extensions
> and client apps even if it's not used in core.

Agreed, bad idea.  For better or worse, that's part of our client API now.

>> Beside that, this simple script:
>> for w in $(cat src/tools/pgindent/typedefs.list); do grep -q -P "\b$w\b" -r 
>> * --exclude typedefs.list || echo "$w"; done
>> detects 58 identifiers that don't exist in the source tree anymore (see 
>> typedefs.lost attached).
>> Maybe they should be removed from typedefs.list too.

> I didn't touch this either.  typedefs.list normally gets some work
> done during the pgindent run, which is likely going to happen around
> May or June.

Yeah, it will get refreshed from the buildfarm output [1] pretty soon.
A quick check says that as of today, that refresh would add 81 names
and remove 94.  (Seems like a remarkably high number of removals,
but I didn't dig further than counting the diff output.)

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/typedefs.pl?show_list




Re: Fix typos and inconsistencies for v16

2023-04-17 Thread David Rowley
On Tue, 18 Apr 2023 at 10:10, Justin Pryzby  wrote:
> > -  * USER SET values are appliciable only for PGC_USERSET 
> > parameters. We
> > +  * USER SET values are applicable only for PGC_USERSET 
> > parameters. We
> >* use InvalidOid as role in order to evade possible 
> > privileges of the
>
> and s/evade/avoid/

I didn't touch this. You'll need to provide more justification for why
you think it's more correct than what's there.  It might not be worth
too much discussion, however.

> Attached some others that I found.

Pushed the rest.  Thanks

David




A Question about InvokeObjectPostAlterHook

2023-04-17 Thread Legs Mansion
Recently, I ran into a problem, InvokeObjectPostAlterHook was implemented for 
sepgsql,
sepgsql use it to determine whether to check permissions during certain 
operations.
But InvokeObjectPostAlterHook doesn't handle all of the alter's behavior, at 
least the table is not controlled.e.g., ALTER TABLE... ENABLE/DISABLE ROW 
LEVEL SECURITY,ALTER TABLE ... DISABLE TRIGGER, GRANT and REVOKE and so on.
Whether InvokeObjectPostAlterHookis not fully controlled? it's a bug?




发自我的iPhone

RE: pg_upgrade and logical replication

2023-04-17 Thread Hayato Kuroda (Fujitsu)
Dear Julien,

I found a cfbot failure on macOS [1]. According to the log,
"SELECT count(*) FROM t2" was executed before synchronization was done.

```
[09:24:21.018](0.132s) not ok 18 - Table t2 should now have 3 rows on the new 
subscriber
```

With the patch present, wait_for_catchup() is executed after REFRESH, but
it may not be sufficient because it does not check pg_subscription_rel.
wait_for_subscription_sync() seems better for the purpose.


[1]: https://cirrus-ci.com/task/6563827802701824

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: Fix typos and inconsistencies for v16

2023-04-17 Thread David Rowley
On Tue, 18 Apr 2023 at 06:00, Alexander Lakhin  wrote:
> Please consider fixing the following unique words/identifiers introduced in 
> v16:

Thanks, I've pushed all of these apart from the following 2.

> 45. tar_set_error -- remove (obsolete since ebfb814f7)
> 46. test_tranche_name -- remove (not used, see 006b69fd9)

These didn't quite fit in with the "typo fixes" category of the
commit, so I left them off the commit I just pushed.

> Also, maybe OID_MAX should be removed from src/include/postgres_ext.h as it's 
> unused since eb8312a22.

I didn't touch this. It seems like it could be useful for extensions
and client apps even if it's not used in core.

> Beside that, this simple script:
> for w in $(cat src/tools/pgindent/typedefs.list); do grep -q -P "\b$w\b" -r * 
> --exclude typedefs.list || echo "$w"; done
> detects 58 identifiers that don't exist in the source tree anymore (see 
> typedefs.lost attached).
> Maybe they should be removed from typedefs.list too.

I didn't touch this either.  typedefs.list normally gets some work
done during the pgindent run, which is likely going to happen around
May or June.  Maybe you can check back after that's done and make sure
all these unused ones were removed. I'm not sure if the process that's
done for that only finds new ones that are now required or if it
completely generates a new list.

David




Re: Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query

2023-04-17 Thread Amin
Hi Tom,

I made sure EXPLAIN returns CustomScan for all scans in the query. But
still, ExecCustomScan is only called once while the rest of the functions
are called for each scan separately. Is this expected behavior? How to work
around this?

Thank you!

On Mon, Apr 17, 2023 at 3:45 PM Tom Lane  wrote:

> Amin  writes:
> > To simplify: Can CustomScan scan multiple relations in the same query or
> it
> > will always be assigned to one or zero relations?
>
> There's barely any code in the core planner that is specific to custom
> scans.  Almost certainly this misbehavior is the fault of your
> custom-path-creation code.  Maybe you're labeling the paths with the
> wrong parent relation, or forgetting to submit them to add_path,
> or assigning them costs that are high enough to get them rejected?
>
> regards, tom lane
>


Re: check_strxfrm_bug()

2023-04-17 Thread Michael Paquier
On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:
> On Mon, Apr 17, 2023 at 2:48 PM Tom Lane  wrote:
>> +1.  I wonder if we should go further and get rid of TRUST_STRXFRM
>> and the not-so-trivial amount of code around it (pg_strxfrm_enabled
>> etc).  Carrying that indefinitely in the probably-vain hope that
>> the libraries will become trustworthy seems rather pointless.
>> Besides, if such a miracle does occur, we can dig the code out
>> of our git history.
> 
> +1 for getting rid of TRUST_STRXFRM.
> 
> ICU-based collations (which aren't affected by TRUST_STRXFRM) are
> becoming the de facto standard (possibly even the de jure standard).
> So even if we thought that the situation with strxfrm() had improved,
> we'd still have little motivation to do anything about it.

Makes sense to do some cleanup now as this is new in the tree.
Perhaps somebody from the RMT would like to comment?

FYI, Jeff has also posted patches to replace this CFLAGS with a GUC:
https://www.postgresql.org/message-id/6ec4ad7f93f255dbb885da0a664d9c77ed4872c4.ca...@j-davis.com
--
Michael


signature.asc
Description: PGP signature


Re: Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query

2023-04-17 Thread Tom Lane
Amin  writes:
> To simplify: Can CustomScan scan multiple relations in the same query or it
> will always be assigned to one or zero relations?

There's barely any code in the core planner that is specific to custom
scans.  Almost certainly this misbehavior is the fault of your
custom-path-creation code.  Maybe you're labeling the paths with the
wrong parent relation, or forgetting to submit them to add_path,
or assigning them costs that are high enough to get them rejected?

regards, tom lane




Re: check_strxfrm_bug()

2023-04-17 Thread Peter Geoghegan
On Mon, Apr 17, 2023 at 2:48 PM Tom Lane  wrote:
> +1.  I wonder if we should go further and get rid of TRUST_STRXFRM
> and the not-so-trivial amount of code around it (pg_strxfrm_enabled
> etc).  Carrying that indefinitely in the probably-vain hope that
> the libraries will become trustworthy seems rather pointless.
> Besides, if such a miracle does occur, we can dig the code out
> of our git history.

+1 for getting rid of TRUST_STRXFRM.

ICU-based collations (which aren't affected by TRUST_STRXFRM) are
becoming the de facto standard (possibly even the de jure standard).
So even if we thought that the situation with strxfrm() had improved,
we'd still have little motivation to do anything about it.

-- 
Peter Geoghegan




Re: Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query

2023-04-17 Thread Amin
To simplify: Can CustomScan scan multiple relations in the same query or it
will always be assigned to one or zero relations?

On Fri, Apr 14, 2023 at 4:33 PM Amin  wrote:

> Hi there,
>
> In my implementation of CustomScan, when I have a query that scans
> multiple tables (select c from t1,t2,t3), the planner always picks one
> table to be scanned by CustomScan and offloads the rest to SeqScan. I tried
> assigning a cost of 0 to the CustomScan path, but still not working.
> BeginCustomScan gets executed, ExecCustomScan is skipped, and then
> EndCustomScan is executed for all the tables that are offloaded to Seq
> Scan. EXPLAIN shows that always only one table is picked to be executed by
> CustomScan. Any idea what I might be doing wrong? Like a value in a struct
> I might be setting incorrectly?
>
> Thanks!
>


Re: Fix typos and inconsistencies for v16

2023-04-17 Thread Justin Pryzby
On Mon, Apr 17, 2023 at 09:00:00PM +0300, Alexander Lakhin wrote:
> Hello hackers,
> 
> Please consider fixing the following unique words/identifiers introduced in 
> v16:

Well done.

Note that your patches are overlapping:

  3 --- a/src/backend/utils/misc/guc.c
  2 --- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
  2 --- a/src/test/ldap/LdapServer.pm
  2 --- a/src/interfaces/libpq/t/004_load_balance_dns.pl
  2 --- a/src/backend/utils/adt/acl.c

It'd make sense if the changes to each file were isolated to a single
patch (especially 004_load and acl.c).

> -  * USER SET values are appliciable only for PGC_USERSET 
> parameters. We
> +  * USER SET values are applicable only for PGC_USERSET 
> parameters. We
>* use InvalidOid as role in order to evade possible privileges 
> of the

and s/evade/avoid/

> +++ b/src/bin/pg_dump/pg_dumpall.c

You missed "boostrap" :)

I independently found 11 of the same typos you did:

> 1. addresess -> addresses
> 3. appeneded -> appended
> 4. appliciable -> applicable
> 8. containsthe ->  contains the
> 15. execpt -> except
> 19. happend -> happened
> 27. optionn -> option
> 30. permissons -> permissions
> 37. remaing -> remaining
> 42. sentinal -> sentinel
> 47. varilables -> variables

But hadn't yet convinced myself to start the process of defending each
one of the fixes.  Attached some others that I found.

-- 
Justin
diff --git a/contrib/test_decoding/expected/ddl.out 
b/contrib/test_decoding/expected/ddl.out
index 9a28b5ddc5a..d55fb3a667f 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -428,7 +428,7 @@ SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
 -- test whether a known, but not yet logged toplevel xact, followed by a
 -- subxact commit is handled correctly
 BEGIN;
-SELECT pg_current_xact_id() != '0'; -- so no fixed xid apears in the outfile
+SELECT pg_current_xact_id() != '0'; -- so no fixed xid appears in the outfile
  ?column? 
 --
  t
diff --git a/contrib/test_decoding/sql/ddl.sql 
b/contrib/test_decoding/sql/ddl.sql
index 4f76bed72c1..57285a828c7 100644
--- a/contrib/test_decoding/sql/ddl.sql
+++ b/contrib/test_decoding/sql/ddl.sql
@@ -236,7 +236,7 @@ SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
 -- test whether a known, but not yet logged toplevel xact, followed by a
 -- subxact commit is handled correctly
 BEGIN;
-SELECT pg_current_xact_id() != '0'; -- so no fixed xid apears in the outfile
+SELECT pg_current_xact_id() != '0'; -- so no fixed xid appears in the outfile
 SAVEPOINT a;
 INSERT INTO tr_sub(path) VALUES ('4-top-1-#1');
 RELEASE SAVEPOINT a;
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index b5e0392ad27..b6c37ccef26 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -346,7 +346,7 @@ ALTER ROLE myname SET enable_indexscan TO off;
role using SET ROLE. However, since any user who has
ADMIN OPTION on a role can grant membership in that
role to any other user, the CREATEROLE user can gain
-   access to the created role by simplying granting that role back to
+   access to the created role by simply granting that role back to
themselves with the INHERIT and/or SET
options. Thus, the fact that privileges are not inherited by default nor
is SET ROLE granted by default is a safeguard against
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 244957a2483..9bdc70c702e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -4051,7 +4051,7 @@ recurse_push_qual(Node *setOp, Query *topquery,
  *
  * extra_used_attrs can be passed as non-NULL to mark any columns (offset by
  * FirstLowInvalidHeapAttributeNumber) that we should not remove.  This
- * parameter is modifed by the function, so callers must make a copy if they
+ * parameter is modified by the function, so callers must make a copy if they
  * need to use the passed in Bitmapset after calling this function.
  *
  * To avoid affecting column numbering in the targetlist, we don't physically
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index e3824efe9b5..65adf04c4eb 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -436,7 +436,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
 * the number-of-tuples estimate to equal the parent 
table; if it
 * is partial then we have to use the same methods as 
we would for
 * a table, except we can be sure that the index is not 
larger
-* than the table.  We must ignore partitioned indexes 
here as as
+* than the table.  We must ignore partitioned indexes 
here as

Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Tom Lane
Justin Pryzby  writes:
> On Mon, Apr 17, 2023 at 01:50:30PM -0400, Tom Lane wrote:
>> Bingo: bisecting shows the failure started at

> Just curious: what "test" did you use to bisect with ?

The test case I used looked like

start postmaster with -c wal_level=minimal -c max_wal_senders=0
make installcheck-parallel
psql -d regression -c "do 'begin for i in 1..1000 loop execute ''create table 
lots''||i||'' as select * from onek''; end loop; end';"
pg_dump -Fc -Z0 regression >~/regression.dump
createdb r2
pg_restore -d r2 --single-transaction --no-tablespace ~/regression.dump

Dumping the regression database as-is didn't reproduce it for me,
but after I added a bunch more tables it did reproduce.

(I added the -Z0 bit after some of the bisection test points hit the
interval where somebody had broken pg_dump's compression features.
It didn't seem relevant to the problem so I just disabled that.)

regards, tom lane




Re: Clean up hba.c of code freeing regexps

2023-04-17 Thread Michael Paquier
On Mon, Apr 17, 2023 at 02:21:41PM -0700, Nathan Bossart wrote:
> AFAICT this is committed, but the commitfest entry [0] is still set to
> "needs review."  Can it be closed now?

Yes, done.  Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Justin Pryzby
On Mon, Apr 17, 2023 at 01:50:30PM -0400, Tom Lane wrote:
> I wrote:
> > Yeah, I just came to the same conclusion.  One thing I don't understand
> > yet: log_newpage_range is old (it looks like this back to v12), and
> > that Assert is older, so why doesn't this reproduce further back?
> > Maybe the state where all the pages are new didn't happen before?
> 
> Bingo: bisecting shows the failure started at

Just curious: what "test" did you use to bisect with ?




Re: check_strxfrm_bug()

2023-04-17 Thread Tom Lane
Nathan Bossart  writes:
> On Thu, Dec 15, 2022 at 03:22:59PM +1300, Thomas Munro wrote:
>> While studying Jeff's new crop of collation patches I noticed in
>> passing that check_strxfrm_bug() must surely by now be unnecessary.
>> The buffer overrun bugs were fixed a decade ago, and the relevant
>> systems are way out of support.  If you're worried that the bugs might
>> come back, then the test is insufficient: modern versions of both OSes
>> have strxfrm_l(), which we aren't checking.  In any case, we also
>> completely disable this stuff because of bugs and quality problems in
>> every other known implementation, via TRUST_STRXFRM (or rather the
>> lack of it).  So I think it's time to remove that function; please see
>> attached.

> Seems reasonable to me.

+1.  I wonder if we should go further and get rid of TRUST_STRXFRM
and the not-so-trivial amount of code around it (pg_strxfrm_enabled
etc).  Carrying that indefinitely in the probably-vain hope that
the libraries will become trustworthy seems rather pointless.
Besides, if such a miracle does occur, we can dig the code out
of our git history.

regards, tom lane




Re: Direct I/O

2023-04-17 Thread Thomas Munro
On Tue, Apr 18, 2023 at 4:06 AM Tom Lane  wrote:
> Robert Haas  writes:
> > On Sat, Apr 15, 2023 at 2:19 PM Tom Lane  wrote:
> >> I get the impression that we are going to need an actual runtime
> >> test if we want to defend against this.  Not entirely convinced
> >> it's worth the trouble.  Who, other than our deliberately rear-guard
> >> buildfarm animals, is going to be building modern PG with such old
> >> compilers?  (And more especially to the point, on platforms new
> >> enough to have working O_DIRECT?)
>
> > I don't think that I fully understand everything under discussion
> > here, but I would just like to throw in a vote for trying to make
> > failures as comprehensible as we reasonably can.
>
> I'm not hugely concerned about this yet.  I think the reason for
> slipping this into v16 as developer-only code is exactly that we need
> to get a feeling for where the portability dragons live.  When (and
> if) we try to make O_DIRECT mainstream, yes we'd better be sure that
> any known failure cases are reported well.  But we need the data
> about which those are, first.

+1

A couple more things I wanted to note:

* We have no plans to turn this on by default even when the later
asynchronous machinery is proposed, and direct I/O starts to make more
economic sense (think: your stream of small reads and writes will be
converted to larger preadv/pwritev or moral equivalent and performed
ahead of time in the background).  Reasons: (1) There will always be a
few file systems that refuse O_DIRECT (Linux tmpfs is one such, as we
learned in this thread; if fails with EINVAL at open() time), and (2)
without a page cache, you really need to size your shared_buffers
adequately and we can't do that automatically.  It's something you'd
opt into for a dedicated database server along with other carefully
considered settings.  It seems acceptable to me that if you set
io_direct to a non-default setting on an unusual-for-a-database-server
filesystem you might get errors screaming about inability to open
files -- you'll just have to turn it back off again if it doesn't work
for you.

* For the alignment part, C11 has "alignas(x)" in , so I
very much doubt that a hypothetical new Deathstation C compiler would
not know how to align stack objects arbitrarily, even though for now
as a C99 program we have to use the non-standard incantations defined
in our c.h.  I assume we'll eventually switch to that.  In the
meantime, if someone manages to build PostgreSQL on a hypothetical C
compiler that our c.h doesn't recognise, we just won't let you turn
the io_direct GUC on (because we set PG_O_DIRECT to 0 if we don't have
an alignment macro, see commit faeedbce's message for rationale).  If
the alignment trick from c.h appears to be available but is actually
broken (GCC 4.2.1), then those assertions I added into smgrread() et
al will fail as Tom showed (yay! they did their job), or in a
non-assert build you'll probably get EINVAL when you try to read or
write from your badly aligned buffers depending on how picky your OS
is, but that's just an old bug in a defunct compiler that we have by
now written more about they ever did in their bug tracker.

* I guess it's unlikely at this point that POSIX will ever standardise
O_DIRECT if they didn't already in the 90s (I didn't find any
discussion of it in their issue tracker).  There is really only one OS
on our target list that truly can't do direct I/O at all: OpenBSD.  It
seems a reasonable bet that if they or a hypothetical totally new
Unixoid system ever implemented it they'd spell it the same IRIX way
for practical reasons, but if not we just won't use it until someone
writes a patch *shrug*.  There is also one system that's been rocking
direct I/O since the 90s for Oracle etc, but PostgreSQL still doesn't
know how to turn it on: Solaris has a directio() system call.  I
posted a (trivial) patch for that once in the thread where I added
Apple F_NOCACHE, but there is probably nobody on this list who can
test it successfully (as Tom discovered, wrasse's host is not
configured right for it, you'd need an admin/root to help set up a UFS
file system, or perhaps modern (closed) ZFS can do it but that system
is old and unpatched), and I have no desire to commit a "blind" patch
for an untested niche setup; I really only considered it because I
realised I was so close to covering the complete set of OSes.  That's
cool, we just won't let you turn the GUC on if we don't know how and
the error message is clear about that if you try.




Re: Temporary tables versus wraparound... again

2023-04-17 Thread Greg Stark
On Wed, 5 Apr 2023 at 13:42, Andres Freund  wrote:

>
> Somehow it doesn't feel right to use vac_update_relstats() in
> heapam_handler.c.
>
> I also don't like that your patch references
> heapam_relation_nontransactional_truncate in AddNewRelationTuple() - we
> shouldn't add more comments piercing tableam than necessary.

I'm really puzzled because this does look like it was in the last
patch on the mailing list archive. But it's definitely not the code I
have here. I guess I did some cleanup that I never posted, so sorry.

I've attached patches using GetOldestNonRemovableTransactinId() and it
seems to have fixed the race condition here. At least I can't
reproduce it any more.



> Not if you determine a relation specific xmin, and the relation is not a
> shared relation.
>
> ISTM that the problem here really is that you're relying on RecentXmin, rather
> than computing something more accurate. Why not use
> GetOldestNonRemovableTransactionId(rel) - It's a bit more expensive, but I
> don't think it'll matter compared to the cost of truncating the relation.

I am a bit nervous about the overhead here because if your transaction
touched *any* temporary tables then this gets called for *every*
temporary table with ON COMMIT DELETE. That could be a lot and it's
not obvious to users that having temporary tables will impose an
overhead even if they're not actually using them.

So I went ahead and used GetOldestNonRemovableTransactionId and tried
to do some profiling. But this is on a cassert enabled build with -O0
so it's not serious profiling. I can repeat it on a real build if it
matters. But it's been a long time since I've read gprof output. This
is for -F PreCommit_on_commit_actions so the percentages are as a
percent of just the precommit cleanup:

index % timeself  childrencalled name
0.000.00   10102/10102   CommitTransaction (1051)
[1]100.00.01   31.47   10102 PreCommit_on_commit_actions [1]
0.01   31.43   10100/10100   heap_truncate [2]
0.000.03 1005050/1005260 lappend_oid [325]
---
0.01   31.43   10100/10100   PreCommit_on_commit_actions [1]
[2] 99.90.01   31.43   10100 heap_truncate [2]
0.09   27.30 1005050/1005050 heap_truncate_one_rel [3]
0.203.57 1005050/6087120 table_open  [465]
0.010.22 1005050/6045137 table_close [48]
0.000.03 1005050/1017744 lappend [322]
0.010.00   10100/10100   heap_truncate_check_FKs [425]
---
0.09   27.30 1005050/1005050 heap_truncate [2]
[3] 87.00.09   27.30 1005050 heap_truncate_one_rel [3]
0.02   12.23 1005050/1005050 RelationTruncateIndexes [5]
0.06   10.08 1005050/1005050 ResetVacStats [7]
0.034.89 1005050/1005050
table_relation_nontransactional_truncate [12]

I think this is saying that more than half the time is being spent
just checking for indexes. There were no indexes on these temporary
tables. Does not having any indexes cause the relcache treat it as a
cache miss every time?

0.06   10.08 1005050/1005050 heap_truncate_one_rel [3]
[7] 32.20.06   10.08 1005050 ResetVacStats [7]
0.023.83 1005050/1005250 SearchSysCacheCopy [16]
0.203.57 1005050/6087120 table_open  [465]
0.012.02 1005050/1005050 heap_inplace_update [35]
0.010.22 1005050/6045137 table_close [48]
0.000.20 1005050/1005150
GetOldestNonRemovableTransactionId [143]
0.000.01 1005050/1005150 GetOurOldestMultiXactId [421]
0.000.00 1005050/1008750 ObjectIdGetDatum [816]

I guess this means GetOldestNonRemovableTransactionId is not the main
cost in ResetVacStats though I don't understand why the syscache would
be so slow.

I think there's a facility for calculating the Horizons and then
reusing them for a while but I don't see how to use that here. It
would be appropriate I think.


>
> > Honestly I'm glad I wrote the test because it was hard to know whether
> > my code was doing anything at all without it (and it wasn't in the
> > first cut...) But I don't think there's much value in having it be in
> > the regression suite. We don't generally write tests to ensure that a
> > specific internal implementation behaves in the specific way it was
> > written to.
>
> To me it seems important to test that your change actually does what it
> intends to. Possibly the test needs to be relaxed some, but I do think we want
> tests for the change.
>
> Greetings,
>
> Andres Freund



--
greg
From 8dc1dc9d50e08c63d25a9c4c3e860453fbd9e531 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Thu, 31 Mar 

Re: Clean up hba.c of code freeing regexps

2023-04-17 Thread Nathan Bossart
On Fri, Apr 14, 2023 at 07:32:13AM +0900, Michael Paquier wrote:
> On Thu, Apr 13, 2023 at 11:58:51AM +0200, Alvaro Herrera wrote:
>> I agree with the downthread votes to clean this up now rather than
>> waiting.  Also, you're adding exactly zero lines of new code, so I don't
>> think feature freeze affects the decision.
> 
> Thanks, done that.
> 
> The commit log mentions Lab.c instead of hba.c.  I had that fixed
> locally, but it seems like I've messed up things a bit..  Sorry about
> that.

AFAICT this is committed, but the commitfest entry [0] is still set to
"needs review."  Can it be closed now?

[0] https://commitfest.postgresql.org/43/4277/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-17 Thread Melanie Plageman
On Sat, Apr 15, 2023 at 12:59:52PM +1200, David Rowley wrote:
> On Fri, 14 Apr 2023 at 19:20, Peter Eisentraut
>  wrote:
> >
> > I came across these new options and had a little bit of trouble figuring
> > them out from the documentation.  Maybe this could be polished a bit.
> >
> > vacuumdb --help says
> >
> >  --buffer-usage-limit=BUFSIZE
> >
> > I can guess what a "SIZE" might be, but is "BUFSIZE" different from a
> > "SIZE"?  Maybe simplify here.
> >
> > On the vacuumdb man page, the placeholder is
> >
> >  buffer_usage_limit
> >
> > which is yet another way of phrasing it.  Maybe also use "size" here?
> >
> > The VACUUM man page says
> >
> >  BUFFER_USAGE_LIMIT [ string ]
> >
> > which had me really confused.  The detailed description later doesn't
> > give any further explanation of possible values, except that
> > 0 is apparently a possible value, which in my mind is
> > not a string.  Then there is a link to guc-vacuum-buffer-usage-limit,
> > which lifts the mystery that this is really just an integer setting with
> > possible memory-size units, but it was really hard to figure that out
> > from the start!
> >
> > Moreover, on the VACUUM man page, right below BUFFER_USAGE_LIMIT, it
> > explains the different kinds of accepted values, and "string" wasn't
> > added there.  Maybe also change this to "size" here and add an
> > explanation there what kinds of sizes are possible.
> >
> > Finally, the locations of the new options in the various documentation
> > places seems a bit random.  The vacuumdb --help output and the man page
> > appear to be mostly alphabetical, so --buffer-usage-limit should be
> > after -a/--all.  (Also note that right now the option isn't even in the
> > same place in the --help output versus the man page.)
> 
> These are all valid points. I've attached a patch aiming to address
> each of them.

I like that we are now using "size" consistently instead of bufsize etc.

> 
> > The order of the options on the VACUUM man page doesn't make any sense
> > anymore.  This isn't really the fault of this patch, but maybe it's time
> > to do a fresh reordering there.
> 
> Agreed, that likely wasn't a big problem say about 5 years ago when we
> had far fewer options, but the number has grown quite a bit since
> then.
> 
> Right after I fix the points you've mentioned seems a good time to address 
> that.

Are we still thinking that reordering the VACUUM (and ANALYZE) options
makes sense. And, if so, should it be alphabetical within parameter
category? That is, all actual parameters (e.g. FULL and FREEZE) are
alphabetically organized first followed by all parameter types (e.g.
boolean and size) alphabetically listed?

- Melanie




Re: check_strxfrm_bug()

2023-04-17 Thread Nathan Bossart
On Thu, Dec 15, 2022 at 03:22:59PM +1300, Thomas Munro wrote:
> While studying Jeff's new crop of collation patches I noticed in
> passing that check_strxfrm_bug() must surely by now be unnecessary.
> The buffer overrun bugs were fixed a decade ago, and the relevant
> systems are way out of support.  If you're worried that the bugs might
> come back, then the test is insufficient: modern versions of both OSes
> have strxfrm_l(), which we aren't checking.  In any case, we also
> completely disable this stuff because of bugs and quality problems in
> every other known implementation, via TRUST_STRXFRM (or rather the
> lack of it).  So I think it's time to remove that function; please see
> attached.

Seems reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: optimize several list functions with SIMD intrinsics

2023-04-17 Thread Nathan Bossart
Here is a new patch set.  I've split it into two patches: one for the
64-bit functions, and one for the 32-bit functions.  I've also added tests
for pg_lfind64/pg_lfind64_idx and deduplicated the code a bit.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 7dbbf61e8a2546a73413c3768a49da318f7de7f5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 10 Apr 2023 09:10:47 -0700
Subject: [PATCH v3 1/2] speed up list_member_ptr and list_delete_ptr with SIMD

---
 src/backend/nodes/list.c  |  35 ++
 src/include/port/pg_lfind.h   | 112 ++
 src/include/port/simd.h   |  96 ++-
 .../test_lfind/expected/test_lfind.out|  12 ++
 .../modules/test_lfind/sql/test_lfind.sql |   2 +
 .../modules/test_lfind/test_lfind--1.0.sql|   8 ++
 src/test/modules/test_lfind/test_lfind.c  |  62 ++
 7 files changed, 324 insertions(+), 3 deletions(-)

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index a709d23ef1..92bc48de17 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -19,6 +19,7 @@
 
 #include "nodes/pg_list.h"
 #include "port/pg_bitutils.h"
+#include "port/pg_lfind.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
 
@@ -680,11 +681,15 @@ list_member(const List *list, const void *datum)
 bool
 list_member_ptr(const List *list, const void *datum)
 {
+#ifdef USE_NO_SIMD
 	const ListCell *cell;
+#endif
 
 	Assert(IsPointerList(list));
 	check_list_invariants(list);
 
+#ifdef USE_NO_SIMD
+
 	foreach(cell, list)
 	{
 		if (lfirst(cell) == datum)
@@ -692,6 +697,18 @@ list_member_ptr(const List *list, const void *datum)
 	}
 
 	return false;
+
+#else
+
+	Assert(sizeof(ListCell) == 8);
+	Assert(sizeof(void *) == 8);
+
+	if (list == NIL)
+		return false;
+
+	return pg_lfind64((uint64) datum, (uint64 *) list->elements, list->length);
+
+#endif
 }
 
 /*
@@ -875,12 +892,30 @@ list_delete_ptr(List *list, void *datum)
 	Assert(IsPointerList(list));
 	check_list_invariants(list);
 
+#ifdef USE_NO_SIMD
+
 	foreach(cell, list)
 	{
 		if (lfirst(cell) == datum)
 			return list_delete_cell(list, cell);
 	}
 
+#else
+
+	Assert(sizeof(ListCell) == 8);
+	Assert(sizeof(void *) == 8);
+
+	if (list == NIL)
+		return NIL;
+
+	cell = (ListCell *) pg_lfind64_idx((uint64) datum,
+	   (uint64 *) list->elements,
+	   list->length);
+	if (cell != NULL)
+		return list_delete_cell(list, cell);
+
+#endif
+
 	/* Didn't find a match: return the list unmodified */
 	return list;
 }
diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h
index 59aa8245ed..d08ad91ae8 100644
--- a/src/include/port/pg_lfind.h
+++ b/src/include/port/pg_lfind.h
@@ -177,4 +177,116 @@ pg_lfind32(uint32 key, uint32 *base, uint32 nelem)
 	return false;
 }
 
+/*
+ * pg_lfind64_internal
+ *
+ * Workhorse for pg_lfind64 and pg_lfind64_idx.
+ */
+static inline bool
+pg_lfind64_internal(uint64 key, uint64 *base, uint32 nelem, uint32 *i)
+{
+#ifdef USE_NO_SIMD
+
+	*i = 0;
+
+#else
+
+	/*
+	 * For better instruction-level parallelism, each loop iteration operates
+	 * on a block of four registers.
+	 */
+	const Vector64 keys = vector64_broadcast(key);	/* load copies of key */
+	const uint32 nelem_per_vector = sizeof(Vector64) / sizeof(uint64);
+	const uint32 nelem_per_iteration = 4 * nelem_per_vector;
+
+	/* round down to multiple of elements per iteration */
+	const uint32 tail_idx = nelem & ~(nelem_per_iteration - 1);
+
+	for (*i = 0; *i < tail_idx; *i += nelem_per_iteration)
+	{
+		Vector64	vals1,
+	vals2,
+	vals3,
+	vals4,
+	result1,
+	result2,
+	result3,
+	result4,
+	tmp1,
+	tmp2,
+	result;
+
+		/* load the next block into 4 registers */
+		vector64_load(, [*i]);
+		vector64_load(, [*i + nelem_per_vector]);
+		vector64_load(, [*i + nelem_per_vector * 2]);
+		vector64_load(, [*i + nelem_per_vector * 3]);
+
+		/* compare each value to the key */
+		result1 = vector64_eq(keys, vals1);
+		result2 = vector64_eq(keys, vals2);
+		result3 = vector64_eq(keys, vals3);
+		result4 = vector64_eq(keys, vals4);
+
+		/* combine the results into a single variable */
+		tmp1 = vector64_or(result1, result2);
+		tmp2 = vector64_or(result3, result4);
+		result = vector64_or(tmp1, tmp2);
+
+		/* see if there was a match */
+		if (vector64_is_highbit_set(result))
+			return true;
+	}
+#endif			/* ! USE_NO_SIMD */
+
+	return false;
+}
+
+/*
+ * pg_lfind64
+ *
+ * Return true if there is an element in 'base' that equals 'key', otherwise
+ * return false.
+ */
+static inline bool
+pg_lfind64(uint64 key, uint64 *base, uint32 nelem)
+{
+	uint32		i = 0;
+
+	if (pg_lfind64_internal(key, base, nelem, ))
+		return true;
+
+	/* Process the remaining elements one at a time. */
+	for (; i < nelem; i++)
+	{
+		if (key == base[i])
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * pg_lfind64_idx
+ *
+ * Return the address of the first element in 'base' that equals 

Re: Request for comment on setting binary format output per session

2023-04-17 Thread Tom Lane
Robert Haas  writes:
> On Mon, Apr 17, 2023 at 1:55 PM Jeff Davis  wrote:
>> Maybe we should have a single new message type 'x' to indicate a
>> message for a protocol extension, and then have a sub-message-type? It
>> might make error handling better for unexpected messages.

> ...
> The point of this thought experiment is to help us estimate how
> careful we need to be.

I tend to agree with the proposition that we aren't going to add new
message types very often, as long as we're careful to make them general
purpose.  Don't forget that adding a new message type isn't just a matter
of writing some spec text --- there has to be code backing it up.  We
will never introduce thousands of new message types, or if we do,
somebody factored it wrong and put data into the type code.

The fact that we've gotten away without adding *any* new message types
for about twenty years suggests to me that the growth rate isn't such
that we need sub-message-types yet.  I'd keep the structure the same
until such time as we can't choose a plausible code value for a new
message, and then maybe add the "x-and-subtype" convention Jeff suggests.

regards, tom lane




Re: Request for comment on setting binary format output per session

2023-04-17 Thread Robert Haas
On Mon, Apr 17, 2023 at 1:55 PM Jeff Davis  wrote:
> It involves introducing new message types which I didn't really
> consider. We might want to be careful about how many kinds of messages
> we introduce so that the one-letter codes are still managable. I've
> been frustrated in the past that we don't have separate symbols in the
> source code to refer to the message types (we just use literal 'S',
> etc.).

Right. That was part of the thinking behind the protocol session
parameter thing I was throwing out there.

> Maybe we should have a single new message type 'x' to indicate a
> message for a protocol extension, and then have a sub-message-type? It
> might make error handling better for unexpected messages.

I'm somewhat skeptical that we want every protocol extension in the
universe to use a single message type. I think that could lead to
munging together all sorts of messages that are actually really
different from each other. On the other hand, in a certain sense, we
don't really have a choice. The type byte for a protocol message can
only take on one of 256 possible values, and some of those are already
used, so if we add a bunch of stuff to the protocol, we're eventually
going to run short of byte values. In fact, even if we said, well, 'x'
means that it's an extended message and then there's a type byte as
the first byte of the payload, that only doubles the number of
possible message types before we run out of room, and maybe there's a
world where we eventually have thousands upon thousands of message
types. We'd need a longer type code than 1 byte to really get out from
under the problem, so if we add a message like what you're talking
about, we should probably do that.

But I don't know if we need to be too paranoid about this. For
example, suppose we were to agree on adding protocol session
parameters and make this the first one. To do that, suppose we add two
new messages to the protocol, ProtocolSessionParameterSet and
ProtocolSessionParameterReponse. And suppose we just pick single
letter codes for those, like we have right now. How much use would
such a mechanism get? It seems possible that we'd add as many as 5 or
10 such parameters in the next half-decade, but they'd all only need
those two new message types. We'd only need a different message type
if somebody wanted to customize something about the protocol that
didn't fit into that model, and that might happen, but I bet it
wouldn't happen that often. I feel like if we're careful to make sure
that the new protocol messages that we add are carefully designed to
be reasonably general, we'd add them very slowly. It seems very
possible that we could go a century or more without running out of
possible values. We could then decide to leave it to future hackers to
decide what to do about it when the remaining bit space starts to get
tight.

The point of this thought experiment is to help us estimate how
careful we need to be. I think that if we added messages with 1-byte
type codes for things as specific as SetTypesWithBinaryOutputAlways,
there would be a significant chance that we would run out of 1-byte
type codes while some of us are still around to be sad about it. Maybe
it wouldn't happen, but it seems risky. Furthermore, such messages are
FAR more specific than existing protocol messages like Query or
Execute or ErrorResponse which cover HUGE amounts of territory. I
think we need to be a level of abstraction removed. Something like
ProtocolSessionParameterSet seems good enough to me - I think we'll
run out of codes like that soon enough to matter. I don't think it
would be wrong to take that as far as you propose here, and just add
one new message type to cover all future developments, but it feels
like it might not really help anyone. A lot of code would probably
have to drill down and look at what type of extended message it was
before deciding what to do with it, which seems a bit annoying.

One thing to keep in mind is that it's possible that in the future we
might want protocol extensions for things that are very
performance-sensitive. For instance, I think it might be advantageous
to have something that is intermediate between the simple and extended
query protocol. The simple query protocol doesn't let you set
parameters, but the extended query protocol requires you to send a
whole series of messages (Parse-Bind-Describe-Execute-Sync) which
doesn't seem to be particularly efficient for either the client or the
server. I think it would be nice to have a way to send a single
message that says "run this query with these parameters." But, if we
had that, some clients might use it Really A Lot. They would therefore
want the message to be as short as possible, which means that using up
a single byte code for it would probably be desirable. On the other
hand, the kinds of things we're talking about here really shouldn't be
subjected to that level of use, and so if for this purpose we pick a
message format that is longer and wordier 

Re: Move defaults toward ICU in 16?

2023-04-17 Thread Tom Lane
Jeff Davis  writes:
> Is now a reasonable time to check it in and see what breaks? It looks
> like there are quite a few buildfarm members that specify neither --
> with-icu nor --without-icu.

I see you just pinged buildfarm-members again, so I'd think it's
polite to give people 24 hours or so to deal with that before
you break things.

(My animals are all set, I believe.)

regards, tom lane




Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Tom Lane
Andres Freund  writes:
> On 2023-04-17 13:50:30 -0400, Tom Lane wrote:
>> So previously, log_newpage_range could only have failed in very
>> unlikely circumstances, whereas now it's not hard to hit when
>> committing a table creation.  I wonder what other bugs may be
>> lurking.

> Oh, interesting. We haven't initialized the extra pages added by
> RelationAddExtraBlocks() (in <= 15) for quite a while now, so I'm a bit
> surprised it causes more issues for the VM / FSM. I guess it's that it's quite
> common in real workloads to contend on the extension lock and add extra
> blocks, but not in simple single-threaded tests?

I haven't tried hard to run it to ground, but maybe log_newpage_range
isn't used in that code path?  Seems like we'd have detected this
before now if the case were reachable without any crash involved.

regards, tom lane




Re: Move defaults toward ICU in 16?

2023-04-17 Thread Jeff Davis
On Mon, 2023-04-17 at 08:23 -0500, Justin Pryzby wrote:
> > I don't object to this patch.  I suggest waiting until next week to
> > commit
> > it and then see what happens.  It's easy to revert if it goes
> > terribly.
> 
> Is this still being considered for v16 ?

Yes, unless someone raises a procedural objection.

Is now a reasonable time to check it in and see what breaks? It looks
like there are quite a few buildfarm members that specify neither --
with-icu nor --without-icu.

Regards,
Jeff Davis





Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Andres Freund
Hi,

On 2023-04-17 13:50:30 -0400, Tom Lane wrote:
> I wrote:
> > Yeah, I just came to the same conclusion.  One thing I don't understand
> > yet: log_newpage_range is old (it looks like this back to v12), and
> > that Assert is older, so why doesn't this reproduce further back?
> > Maybe the state where all the pages are new didn't happen before?
> 
> Bingo: bisecting shows the failure started at
> 
> commit 3d6a98457d8e21d85bed86cfd3e1d1df1b260721
> Author: Andres Freund 
> Date:   Wed Apr 5 08:19:39 2023 -0700
> 
> Don't initialize page in {vm,fsm}_extend(), not needed
> 
> So previously, log_newpage_range could only have failed in very
> unlikely circumstances, whereas now it's not hard to hit when
> committing a table creation.  I wonder what other bugs may be
> lurking.

Oh, interesting. We haven't initialized the extra pages added by
RelationAddExtraBlocks() (in <= 15) for quite a while now, so I'm a bit
surprised it causes more issues for the VM / FSM. I guess it's that it's quite
common in real workloads to contend on the extension lock and add extra
blocks, but not in simple single-threaded tests?

Greetings,

Andres Freund




Fix typos and inconsistencies for v16

2023-04-17 Thread Alexander Lakhin

Hello hackers,

Please consider fixing the following unique words/identifiers introduced in v16:
1. addresess -> addresses
2. adminstrator -> administrator // the same typo found in src/backend/po/id.po, but perhaps it should be fixed via 
pgsql-translators

3. appeneded -> appended
4. appliciable -> applicable
5. BackroundPsql -> BackgroundPsql
6. binaies -> binaries
7. compresion -> compression
8. containsthe ->  contains the
9. contextes -> contexts
10. deparseAnalyzeTuplesSql -> deparseAnalyzeInfoSql // that function was 
renamed in 57d11ef0

11. DO_LARGE_OJECT_DATA -> DO_LARGE_OBJECT_DATA
12. doesnt't -> doesn't
13. dst_perminfo -> dst_perminfos
14. eror -> error
15. execpt -> except
16. forech -> foreach
17. GetResultRelCheckAsUser -> ExecGetResultRelCheckAsUser
18. GUCS -> GUCs
19. happend -> happened
20. immitated -> imitated

21. insert_xid -> tuple_xid // see bfcf1b348
22. ldap_add -> ldapadd_file
23. ldapbindpassw -> ldapbindpasswd
24. MemoryChunkSetExternal -> MemoryChunkSetHdrMaskExternal
25. non-encyrpted -> non-encrypted
26. --no-process_main -> --no-process-main
27. optionn -> option
28. Othewise -> Otherwise
29. parellel -> parallel
30. permissons -> permissions

31. pg_pwrite_zeroes -> pg_pwrite_zeros
32. pg_writev -> pg_pwritev
33. possbile -> possible
34. pqsymlink -> pgsymlink
35. PG_GET_WAL_FPI_BLOCK_COLS -> PG_GET_WAL_BLOCK_INFO_COLS
36. RangeVarCallbackOwnsTable -> RangeVarCallbackMaintainsTable // see 60684dd83
37. remaing -> remaining
38. ResourceOwnerForgetBufferIOs -> ResourceOwnerForgetBufferIO
39. RMGRDESC_UTILS_H -> RMGRDESC_UTILS_H_ // or may be the other way
40. rolenamehash -> rolename_hash

41. ROLERECURSE_SETROLe -> ROLERECURSE_SETROLE
42. sentinal -> sentinel
43. smgzerorextend -> smgrzeroextend
44. stacktoodeep -> rstacktoodeep // an excessive character was deleted with 
db4f21e4a?
45. tar_set_error -- remove (obsolete since ebfb814f7)
46. test_tranche_name -- remove (not used, see 006b69fd9)
47. varilables -> variables
48. xid_commit_status -> xmin_commit_status

Also, maybe OID_MAX should be removed from src/include/postgres_ext.h as it's 
unused since eb8312a22.

Beside that, this simple script:
for w in $(cat src/tools/pgindent/typedefs.list); do grep -q -P "\b$w\b" -r * --exclude 
typedefs.list || echo "$w"; done
detects 58 identifiers that don't exist in the source tree anymore (see 
typedefs.lost attached).
Maybe they should be removed from typedefs.list too.

Best regards,
Alexanderdiff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d728bd70b3..95dbe8b06c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5016,7 +5016,7 @@ postgresGetAnalyzeInfoForForeignTable(Relation relation, bool *can_tablesample)
 			pgfdw_report_error(ERROR, res, conn, false, sql.data);
 
 		if (PQntuples(res) != 1 || PQnfields(res) != 2)
-			elog(ERROR, "unexpected result from deparseAnalyzeTuplesSql query");
+			elog(ERROR, "unexpected result from deparseAnalyzeInfoSql query");
 		reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
 		relkind = *(PQgetvalue(res, 0, 1));
 	}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7d3b20168a..f6c0c5ca1a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -6272,7 +6272,7 @@ ProcessGUCArray(ArrayType *array, ArrayType *usersetArray,
 			userSetDatum = BoolGetDatum(false);
 
 		/*
-		 * USER SET values are appliciable only for PGC_USERSET parameters. We
+		 * USER SET values are applicable only for PGC_USERSET parameters. We
 		 * use InvalidOid as role in order to evade possible privileges of the
 		 * current user.
 		 */
diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index 3894a017f3..104c01d966 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -87,7 +87,7 @@ static size_t ps_buffer_cur_len;	/* nominal strlen(ps_buffer) */
 static size_t ps_buffer_fixed_size; /* size of the constant prefix */
 
 /*
- * Length of ps_buffer before the suffix was appeneded to the end, or 0 if we
+ * Length of ps_buffer before the suffix was appended to the end, or 0 if we
  * didn't set a suffix.
  */
 static size_t ps_buffer_nosuffix_len;
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 5aca9c1f06..423e1b7976 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -342,7 +342,7 @@ LZ4Stream_get_error(CompressFileHandle *CFH)
 /*
  * Initialize an already alloc'ed LZ4State struct for subsequent calls.
  *
- * Creates the necessary contexts for either compresion or decompression. When
+ * Creates the necessary contexts for either compression or decompression. When
  * compressing data (indicated by compressing=true), it additionally writes the
  * LZ4 header in the output stream.
  *
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 395abfe596..656a2e9897 

Re: Request for comment on setting binary format output per session

2023-04-17 Thread Jeff Davis
On Mon, 2023-04-17 at 12:17 -0400, Robert Haas wrote:
> In the case at hand, it seems like the problem could easily be solved
> by allowing the property to be changed after connection startup.
> Instead of using the protocol extension mechanism to negotiate a
> specific value for the property, we can use it to negotiate about
> whether or not some new protocol message that can be used to change
> that property is supported.

...

> Backing up a level, the purpose of the protocol extension mechanism
> is
> to help us agree on the communication protocol

Thank you, that seems like a better approach to me.

It involves introducing new message types which I didn't really
consider. We might want to be careful about how many kinds of messages
we introduce so that the one-letter codes are still managable. I've
been frustrated in the past that we don't have separate symbols in the
source code to refer to the message types (we just use literal 'S',
etc.).

Maybe we should have a single new message type 'x' to indicate a
message for a protocol extension, and then have a sub-message-type? It
might make error handling better for unexpected messages.

Also, is there any reason we'd want this concept to integrate with
connection strings/URIs? Probably not a good idea to turn on features
that way, but perhaps we'd want to support disabling protocol
extensions from a URI? This could be used to restrict authentication
methods or sources of authentication information.

> The reason why I suggest this is that I feel like there could be a
> bunch of things like this.

What's the trade-off between having one protocol extension (e.g.
_pq_protocol_session_parameters) that tries to work for multiple cases
(e.g. binary_formats and session_user) vs just having two protocol
extensions (_pq_set_session_user and _pq_set_binary_formats)?


> For example, one
> that would be really useful for connection poolers is the session
> user. The pooler would like to change the session user whenever the
> connection is changed to talk to a different client, and it would
> like
> that to happen in a way that can't be reversed by issuing any SQL
> command.

That sounds valuable to me whether we generalize with "protocol session
parameters" or not.

Regards,
Jeff Davis





Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Andres Freund
Hi,

On 2023-04-17 12:13:41 -0400, Tom Lane wrote:
> Matthias van de Meent  writes:
> > Looking at log_newpage_range, it seems like we're always trying to log
> > a record if startblk < endblk; but don't register the PageIsNew()
> > buffers in the range. That means that if the last buffers in the range
> > are new, this can result in no buffers being registered in the last
> > iteration of the main loop (if the number of non-new buffers in the
> > range is 0 (mod 32)).
> 
> Yeah, I just came to the same conclusion.  One thing I don't understand
> yet: log_newpage_range is old (it looks like this back to v12), and
> that Assert is older, so why doesn't this reproduce further back?
> Maybe the state where all the pages are new didn't happen before?
> Is that telling us there's a bug somewhere else?  Seems like a job
> for git bisect.

One plausible explanation is that bulk relation extension has made it more
likely to encounter this scenario. We had some bulk extension code before, but
it was triggered purely based on contention - quite unlikely in simple test
scenarios - but now we also bulk extend if we know that we'll insert multiple
pages (when coming from heap_multi_insert(), with sufficient data).

Greetings,

Andres Freund




Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Tom Lane
I wrote:
> Yeah, I just came to the same conclusion.  One thing I don't understand
> yet: log_newpage_range is old (it looks like this back to v12), and
> that Assert is older, so why doesn't this reproduce further back?
> Maybe the state where all the pages are new didn't happen before?

Bingo: bisecting shows the failure started at

commit 3d6a98457d8e21d85bed86cfd3e1d1df1b260721
Author: Andres Freund 
Date:   Wed Apr 5 08:19:39 2023 -0700

Don't initialize page in {vm,fsm}_extend(), not needed

So previously, log_newpage_range could only have failed in very
unlikely circumstances, whereas now it's not hard to hit when
committing a table creation.  I wonder what other bugs may be
lurking.

I'll patch it back to v12 anyway, since that function is
clearly wrong in isolation.

regards, tom lane




Re: Note new NULLS NOT DISTINCT on unique index tutorial page

2023-04-17 Thread Corey Huinker
On Wed, Apr 12, 2023 at 10:40 AM David Gilman 
wrote:

> The SQL Language part of the docs has a brief page on unique indexes.
> It doesn't mention the new NULLS NOT DISTINCT functionality on unique
> indexes and this is a good place to mention it, as it already warns
> the user about the old/default behavior.
>
>
I'm ok with the wording as-is, but perhaps we can phrase it as "distinct"
vs "not equal", thus leaning into the syntax a bit:

By default, null values in a unique column are considered distinct,
allowing multiple nulls in the column.


or maybe

By default, null values in a unique column are considered
DISTINCT, allowing multiple nulls in the column.


Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-17 Thread Jacob Champion
On Fri, Apr 14, 2023 at 3:36 PM Daniel Gustafsson  wrote:
> This "error: Success" error has been reported to the list numerous times as
> misleading, and I'd love to make progress on improving error reporting during
> the v17 cycle.

Agreed!

> The attached checks for the specific known error, and leave all the other 
> cases
> to the same logging that we have today.  It relies on the knowledge that 
> system
> sslrootcert configs has deferred loading, and will run with verify-full.  So 
> if
> we see an X509 failure in loading the local issuer cert here then we know the
> the user wanted to use the system CA pool for certificate verification but the
> root CA cannot be loaded for some reason.

This LGTM; I agree with your reasoning. Note that it won't fix the
(completely different) misleading error message for OpenSSL 3.0, but
since that's an *actively* unhelpful error message coming back from
OpenSSL, I don't think we want to override it. For 3.1, we have no
information and we're trying to fill in the gaps.

--Jacob




Re: recovery modules

2023-04-17 Thread Nathan Bossart
rebased

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From b8856e61d775bc248a292163facc0b227abdde97 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 23 Feb 2023 14:27:48 -0800
Subject: [PATCH v15 1/7] Move extra code out of the Pre/PostRestoreCommand()
 block.

If SIGTERM is received within this block, the startup process will
immediately proc_exit() in the signal handler, so it is inadvisable
to include any more code than is required in this section.  This
change moves the code recently added to this block (see 1b06d7b and
7fed801) to outside of the block.  This ensures that only system()
is called while proc_exit() might be called in the SIGTERM handler,
which is how this code worked from v8.4 to v14.
---
 src/backend/access/transam/xlogarchive.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index f3fb92c8f9..b998cd651c 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -159,20 +159,27 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			(errmsg_internal("executing restore command \"%s\"",
 			 xlogRestoreCmd)));
 
+	fflush(NULL);
+	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
+
 	/*
-	 * Check signals before restore command and reset afterwards.
+	 * PreRestoreCommand() informs the SIGTERM handler for the startup process
+	 * that it should proc_exit() right away.  This is done for the duration of
+	 * the system() call because there isn't a good way to break out while it
+	 * is executing.  Since we might call proc_exit() in a signal handler, it
+	 * is best to put any additional logic before or after the
+	 * PreRestoreCommand()/PostRestoreCommand() section.
 	 */
 	PreRestoreCommand();
 
 	/*
 	 * Copy xlog from archival storage to XLOGDIR
 	 */
-	fflush(NULL);
-	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
 	rc = system(xlogRestoreCmd);
-	pgstat_report_wait_end();
 
 	PostRestoreCommand();
+
+	pgstat_report_wait_end();
 	pfree(xlogRestoreCmd);
 
 	if (rc == 0)
-- 
2.25.1

>From e8105f300a2e141c8b5db478a16bc29e029b30df Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 14 Feb 2023 09:44:53 -0800
Subject: [PATCH v15 2/7] Don't proc_exit() in startup's SIGTERM handler if
 forked by system().

Instead, emit a message to STDERR and _exit() in this case.  This
change also adds assertions to proc_exit(), ProcKill(), and
AuxiliaryProcKill() to verify that these functions are not called
by a process forked by system(), etc.
---
 src/backend/postmaster/startup.c | 17 -
 src/backend/storage/ipc/ipc.c|  3 +++
 src/backend/storage/lmgr/proc.c  |  2 ++
 src/backend/utils/error/elog.c   | 28 
 src/include/utils/elog.h |  6 +-
 5 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index efc2580536..0e7de26bc2 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -19,6 +19,8 @@
  */
 #include "postgres.h"
 
+#include 
+
 #include "access/xlog.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
@@ -121,7 +123,20 @@ StartupProcShutdownHandler(SIGNAL_ARGS)
 	int			save_errno = errno;
 
 	if (in_restore_command)
-		proc_exit(1);
+	{
+		/*
+		 * If we are in a child process (e.g., forked by system() in
+		 * RestoreArchivedFile()), we don't want to call any exit callbacks.
+		 * The parent will take care of that.
+		 */
+		if (MyProcPid == (int) getpid())
+			proc_exit(1);
+		else
+		{
+			write_stderr_signal_safe("StartupProcShutdownHandler() called in child process\n");
+			_exit(1);
+		}
+	}
 	else
 		shutdown_requested = true;
 	WakeupRecovery();
diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index 1904d21795..d5097dc008 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -103,6 +103,9 @@ static int	on_proc_exit_index,
 void
 proc_exit(int code)
 {
+	/* proc_exit() is not safe in forked processes from system(), etc. */
+	Assert(MyProcPid == (int) getpid());
+
 	/* Clean up everything that must be cleaned up */
 	proc_exit_prepare(code);
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..b9e2c3aafe 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
 	dlist_head *procgloballist;
 
 	Assert(MyProc != NULL);
+	Assert(MyProc->pid == (int) getpid());  /* not safe if forked by system(), etc. */
 
 	/* Make sure we're out of the sync rep lists */
 	SyncRepCleanupAtProcExit();
@@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
 	PGPROC	   *proc;
 
 	Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
+	Assert(MyProc->pid == (int) getpid());  /* not safe if forked by system(), etc. */
 
 	auxproc = 

Re: Request for comment on setting binary format output per session

2023-04-17 Thread Robert Haas
On Wed, Mar 29, 2023 at 12:04 PM Jeff Davis  wrote:
> I'm starting to agree that the awkwardness around connection poolers is
> a problem. If that's the case, I'm wondering if the protocol extensions
> will ever be useful.

In the case at hand, it seems like the problem could easily be solved
by allowing the property to be changed after connection startup.
Instead of using the protocol extension mechanism to negotiate a
specific value for the property, we can use it to negotiate about
whether or not some new protocol message that can be used to change
that property is supported. If it is, then a new value can be set
whenever, and a connection pooler can switch the active value when it
associates the server's session with a different client session.

Alternatively, the protocol extension mechanism can be used to
negotiate an initial value for the property, with the understanding
that if any initial value is negotiated, that also implies that the
server will accept some new protocol message later in the session to
change the value. If no initial value is negotiated, the client can't
assume that the server even knows about that property and can't try to
change it.

Backing up a level, the purpose of the protocol extension mechanism is
to help us agree on the communication protocol -- that is, the set of
messages that we can send and receive on a certain connection. The
question for the protocol extension mechanism isn't "which types
should always be sent in binary format?" but "would it be ok if I
wanted you to always send certain types in binary format?", with the
idea that if the answer is yes it will still be necessary for the
client to let the server know which ones, but that's easy to do if
we've agreed on the concept that it's OK for me to ask the server for
that. And if it's OK for me to ask that once, it should also be OK for
me to later ask for something different.

This could, perhaps, be made even more general yet. We could define a
concept of "protocol session parameters" and make "which types are
always sent in binary format?" one of those parameters. So then the
conversation could go like this:

C: Hello! Do you know about protocol session parameters?
S: Why yes, actually I do.
C: Cool. I would like to set the protocol session parameter
types_always_in_binary_format=timestamptz. Does that work for you?
S: Sure thing! (or alternatively: Sadly, I've not heard of that
particular protocol session parameter, sorry to disappoint.)

The reason why I suggest this is that I feel like there could be a
bunch of things like this. The set of things to be sent in binary
format feels like a property of the wire protocol, not something
SQL-level that should be configured via SET.  Clients, drivers, and
connection poolers aren't going to want to have to worry about some
user screwing up the session by changing that property inside of a
function or procedure or whatever. But there could also be a bunch of
different things like this that we want to support. For example, one
that would be really useful for connection poolers is the session
user. The pooler would like to change the session user whenever the
connection is changed to talk to a different client, and it would like
that to happen in a way that can't be reversed by issuing any SQL
command. I expect in time we may find a bunch of others.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Tom Lane
Matthias van de Meent  writes:
> Looking at log_newpage_range, it seems like we're always trying to log
> a record if startblk < endblk; but don't register the PageIsNew()
> buffers in the range. That means that if the last buffers in the range
> are new, this can result in no buffers being registered in the last
> iteration of the main loop (if the number of non-new buffers in the
> range is 0 (mod 32)).

Yeah, I just came to the same conclusion.  One thing I don't understand
yet: log_newpage_range is old (it looks like this back to v12), and
that Assert is older, so why doesn't this reproduce further back?
Maybe the state where all the pages are new didn't happen before?
Is that telling us there's a bug somewhere else?  Seems like a job
for git bisect.

To be clear: log_newpage_range is certainly broken, and your fix looks
appropriate.  I'm just wondering what else we need to learn here.

regards, tom lane




Re: Direct I/O

2023-04-17 Thread Tom Lane
Robert Haas  writes:
> On Sat, Apr 15, 2023 at 2:19 PM Tom Lane  wrote:
>> I get the impression that we are going to need an actual runtime
>> test if we want to defend against this.  Not entirely convinced
>> it's worth the trouble.  Who, other than our deliberately rear-guard
>> buildfarm animals, is going to be building modern PG with such old
>> compilers?  (And more especially to the point, on platforms new
>> enough to have working O_DIRECT?)

> I don't think that I fully understand everything under discussion
> here, but I would just like to throw in a vote for trying to make
> failures as comprehensible as we reasonably can.

I'm not hugely concerned about this yet.  I think the reason for
slipping this into v16 as developer-only code is exactly that we need
to get a feeling for where the portability dragons live.  When (and
if) we try to make O_DIRECT mainstream, yes we'd better be sure that
any known failure cases are reported well.  But we need the data
about which those are, first.

regards, tom lane




Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Tom Lane
I wrote:
> Hmm.  I wonder if log_newpages() is confused here:
>   XLogEnsureRecordSpace(XLR_MAX_BLOCK_ID - 1, 0);

Oh, no, it's simpler than that: log_newpage_range is trying to
log zero page images, and ReserveXLogInsertLocation doesn't
like that because every WAL record should contain some data.
Will fix, thanks for report.

regards, tom lane




Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Matthias van de Meent
On Mon, 17 Apr 2023 at 17:53, Justin Pryzby  wrote:
>
> I hit this assertion while pg_restoring data into a v16 instance.
> postgresql16-server-16-alpha_20230417_PGDG.rhel7.x86_64
>
> wal_level=minimal and pg_dump --single-transaction both seem to be
> required to hit the issue.
>
> $ /usr/pgsql-16/bin/postgres -D ./pg16test -c maintenance_work_mem=1GB -c 
> max_wal_size=16GB -c wal_level=minimal -c max_wal_senders=0 -c port=5678 -c 
> logging_collector=no &
>
> $ time sudo -u postgres /usr/pgsql-16/bin/pg_restore -d postgres -p 5678 
> --single-transaction --no-tablespace ./curtables
>
> TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, 
> PID: 13564
>
> Core was generated by `postgres: postgres postgres [local] COMMIT 
>'.
> Program terminated with signal 6, Aborted.
> #0  0x7f28b8bd5387 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install 
> postgresql16-server-16-alpha_20230417_PGDG.rhel7.x86_64
> (gdb) bt
> #0  0x7f28b8bd5387 in raise () from /lib64/libc.so.6
> #1  0x7f28b8bd6a78 in abort () from /lib64/libc.so.6
> #2  0x009bc8c9 in ExceptionalCondition 
> (conditionName=conditionName@entry=0xa373e1 "size > SizeOfXLogRecord", 
> fileName=fileName@entry=0xa31b13 "xlog.c", lineNumber=lineNumber@entry=1055) 
> at assert.c:66
> #3  0x0057b049 in ReserveXLogInsertLocation (PrevPtr=0x2e3d750, 
> EndPos=, StartPos=, size=24) at 
> xlog.c:1055
> #4  XLogInsertRecord (rdata=rdata@entry=0xf187a0 , 
> fpw_lsn=fpw_lsn@entry=0, flags=, num_fpi=num_fpi@entry=0, 
> topxid_included=topxid_included@entry=false) at xlog.c:844
> #5  0x0058210c in XLogInsert (rmid=rmid@entry=0 '\000', 
> info=info@entry=176 '\260') at xloginsert.c:510
> #6  0x00582b09 in log_newpage_range (rel=rel@entry=0x2e1f628, 
> forknum=forknum@entry=FSM_FORKNUM, startblk=startblk@entry=0, 
> endblk=endblk@entry=3, page_std=page_std@entry=false) at xloginsert.c:1317


Looking at log_newpage_range, it seems like we're always trying to log
a record if startblk < endblk; but don't register the PageIsNew()
buffers in the range. That means that if the last buffers in the range
are new, this can result in no buffers being registered in the last
iteration of the main loop (if the number of non-new buffers in the
range is 0 (mod 32)).

A change like attached should fix the issue; or alternatively we could
force log the last (new) buffer when we detect this edge case.

Kind regards,

Matthias van de Meent


7882549d3b6ca83e83eda88d46f6f29afda0ae03.diff
Description: Binary data


Re: Direct I/O

2023-04-17 Thread Robert Haas
On Sat, Apr 15, 2023 at 2:19 PM Tom Lane  wrote:
> I get the impression that we are going to need an actual runtime
> test if we want to defend against this.  Not entirely convinced
> it's worth the trouble.  Who, other than our deliberately rear-guard
> buildfarm animals, is going to be building modern PG with such old
> compilers?  (And more especially to the point, on platforms new
> enough to have working O_DIRECT?)

I don't think that I fully understand everything under discussion
here, but I would just like to throw in a vote for trying to make
failures as comprehensible as we reasonably can. It makes me a bit
nervous to rely on things like "anybody who has O_DIRECT will also
have working alignment pragmas," because there's no relationship
between those things other than when we think they got implemented on
the platforms that are popular today. If somebody ships me a brand new
Deathstation 9000 that has O_DIRECT but NOT alignment pragmas, how
badly are things going to break and how hard is it going to be for me
to understand why it's not working?

I understand that nobody (including me) wants the code cluttered with
a bunch of useless cruft that caters only to hypothetical systems, and
I don't want us to spend a lot of effort building untestable
infrastructure that caters only to such machines. I just don't want us
to do things that are more magical than they need to be. If and when
something fails, it's real nice if you can easily understand why it
failed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Tom Lane
Justin Pryzby  writes:
> I hit this assertion while pg_restoring data into a v16 instance.
> postgresql16-server-16-alpha_20230417_PGDG.rhel7.x86_64

> wal_level=minimal and pg_dump --single-transaction both seem to be
> required to hit the issue.

Hmm.  I wonder if log_newpages() is confused here:

XLogEnsureRecordSpace(XLR_MAX_BLOCK_ID - 1, 0);

Why is XLR_MAX_BLOCK_ID - 1 enough, rather than XLR_MAX_BLOCK_ID?

regards, tom lane




Re: longfin missing gssapi_ext.h

2023-04-17 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I'm guessing it's not really an issue but it does make changing
> > configure a bit annoying on my Ubuntu 22.04, when I run autoconf2.69, I
> > end up with this additional hunk as changed from what our configure
> > currently has.
> 
> Not surprising.  Thanks to autoconf's long release cycles, individual
> distros often are carrying local patches that affect its output.
> To ensure consistent results across committers, our policy is that you
> should use built-from-upstream-source autoconf not a vendor's version.
> (In principle that could bite us sometime, but it hasn't yet.)

... making me more excited about the idea of getting over to meson.

> Also, you should generally run autoheader after autoconf.
> Checking things here, I notice that pg_config.h.in hasn't been
> updated for the last few gssapi-related commits:
> 
> $ git diff
> diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
> index 3665e79..6d572c3 100644
> *** a/src/include/pg_config.h.in
> --- b/src/include/pg_config.h.in
> ***
> *** 196,201 
> --- 196,207 
>   /* Define to 1 if you have the `getpeerucred' function. */
>   #undef HAVE_GETPEERUCRED
>   
> + /* Define to 1 if you have the  header file. */
> + #undef HAVE_GSSAPI_EXT_H
> + 
> + /* Define to 1 if you have the  header file. */
> + #undef HAVE_GSSAPI_GSSAPI_EXT_H
> + 
>   /* Define to 1 if you have the  header file. */
>   #undef HAVE_GSSAPI_GSSAPI_H
>   
> Shall I push that, or do you want to?

Hrmpf.  Sorry about that.  Please feel free and thanks for pointing it
out.

Thanks again,

Stephen


signature.asc
Description: PGP signature


Re: Allowing parallel-safe initplans

2023-04-17 Thread Tom Lane
Richard Guo  writes:
> So now it seems that the breakage of regression tests is more severe
> than being cosmetic.  I wonder if we need to update the comments to
> indicate the potential wrong results issue if we move the initPlans to
> the Gather node.

I wondered about that too, but how come neither of us saw non-cosmetic
failures (ie, actual query output changes not just EXPLAIN changes)
when we tried this?  Maybe the case is somehow not exercised, but if
so I'm more worried about adding regression tests than comments.

I think actually that it does work beyond the EXPLAIN weirdness,
because since e89a71fb4 the Gather machinery knows how to transmit
the values of Params listed in Gather.initParam to workers, and that
is filled in setrefs.c in a way that looks like it'd work regardless
of whether the Gather appeared organically or was stuck on by the
debug_parallel_query hackery.  I've not tried to verify that
directly though.

regards, tom lane




v16dev: TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, PID: 13564

2023-04-17 Thread Justin Pryzby
I hit this assertion while pg_restoring data into a v16 instance.
postgresql16-server-16-alpha_20230417_PGDG.rhel7.x86_64

wal_level=minimal and pg_dump --single-transaction both seem to be
required to hit the issue.

$ /usr/pgsql-16/bin/postgres -D ./pg16test -c maintenance_work_mem=1GB -c 
max_wal_size=16GB -c wal_level=minimal -c max_wal_senders=0 -c port=5678 -c 
logging_collector=no &

$ time sudo -u postgres /usr/pgsql-16/bin/pg_restore -d postgres -p 5678 
--single-transaction --no-tablespace ./curtables

TRAP: failed Assert("size > SizeOfXLogRecord"), File: "xlog.c", Line: 1055, 
PID: 13564

Core was generated by `postgres: postgres postgres [local] COMMIT   
 '.
Program terminated with signal 6, Aborted.
#0  0x7f28b8bd5387 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
postgresql16-server-16-alpha_20230417_PGDG.rhel7.x86_64
(gdb) bt
#0  0x7f28b8bd5387 in raise () from /lib64/libc.so.6
#1  0x7f28b8bd6a78 in abort () from /lib64/libc.so.6
#2  0x009bc8c9 in ExceptionalCondition 
(conditionName=conditionName@entry=0xa373e1 "size > SizeOfXLogRecord", 
fileName=fileName@entry=0xa31b13 "xlog.c", lineNumber=lineNumber@entry=1055) at 
assert.c:66
#3  0x0057b049 in ReserveXLogInsertLocation (PrevPtr=0x2e3d750, 
EndPos=, StartPos=, size=24) at 
xlog.c:1055
#4  XLogInsertRecord (rdata=rdata@entry=0xf187a0 , 
fpw_lsn=fpw_lsn@entry=0, flags=, num_fpi=num_fpi@entry=0, 
topxid_included=topxid_included@entry=false) at xlog.c:844
#5  0x0058210c in XLogInsert (rmid=rmid@entry=0 '\000', 
info=info@entry=176 '\260') at xloginsert.c:510
#6  0x00582b09 in log_newpage_range (rel=rel@entry=0x2e1f628, 
forknum=forknum@entry=FSM_FORKNUM, startblk=startblk@entry=0, 
endblk=endblk@entry=3, page_std=page_std@entry=false) at xloginsert.c:1317
#7  0x005d02f9 in smgrDoPendingSyncs (isCommit=isCommit@entry=true, 
isParallelWorker=isParallelWorker@entry=false) at storage.c:837
#8  0x00571637 in CommitTransaction () at xact.c:2225
#9  0x00572b25 in CommitTransactionCommand () at xact.c:3201
#10 0x0086afc7 in finish_xact_command () at postgres.c:2782
#11 0x0086d7e1 in exec_simple_query (query_string=0x2dec4f8 "COMMIT") 
at postgres.c:1307




Re: longfin missing gssapi_ext.h

2023-04-17 Thread Tom Lane
Stephen Frost  writes:
> I'm guessing it's not really an issue but it does make changing
> configure a bit annoying on my Ubuntu 22.04, when I run autoconf2.69, I
> end up with this additional hunk as changed from what our configure
> currently has.

Not surprising.  Thanks to autoconf's long release cycles, individual
distros often are carrying local patches that affect its output.
To ensure consistent results across committers, our policy is that you
should use built-from-upstream-source autoconf not a vendor's version.
(In principle that could bite us sometime, but it hasn't yet.)

Also, you should generally run autoheader after autoconf.
Checking things here, I notice that pg_config.h.in hasn't been
updated for the last few gssapi-related commits:

$ git diff
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 3665e79..6d572c3 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***
*** 196,201 
--- 196,207 
  /* Define to 1 if you have the `getpeerucred' function. */
  #undef HAVE_GETPEERUCRED
  
+ /* Define to 1 if you have the  header file. */
+ #undef HAVE_GSSAPI_EXT_H
+ 
+ /* Define to 1 if you have the  header file. */
+ #undef HAVE_GSSAPI_GSSAPI_EXT_H
+ 
  /* Define to 1 if you have the  header file. */
  #undef HAVE_GSSAPI_GSSAPI_H
  
Shall I push that, or do you want to?

regards, tom lane




Re: longfin missing gssapi_ext.h

2023-04-17 Thread Stephen Frost
Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
> > > Done that way.
> > 
> > Looks like you neglected to update the configure script proper?
> 
> Pah, indeed.  Will fix.  Sorry about that.

Fixed.

I'm guessing it's not really an issue but it does make changing
configure a bit annoying on my Ubuntu 22.04, when I run autoconf2.69, I
end up with this additional hunk as changed from what our configure
currently has.

Thoughts?

Thanks,

Stephen
diff --git a/configure b/configure
index 82efa0d3f1..5489cddce2 100755
--- a/configure
+++ b/configure
@@ -15348,7 +15348,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -15394,7 +15394,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -15418,7 +15418,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -15463,7 +15463,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -15487,7 +15487,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];


signature.asc
Description: PGP signature


Re: Synchronizing slots from primary to standby

2023-04-17 Thread Drouvot, Bertrand

Hi,

On 4/14/23 3:22 PM, Drouvot, Bertrand wrote:

Now that the "Minimal logical decoding on standby" patch series (mentioned 
up-thread) has been
committed, I think we can resume working on this one ("Synchronizing slots from 
primary to standby").

I'll work on a rebase and share it once done (unless someone already started 
working on a rebase).



Please find attached V5 (a rebase of V4 posted up-thread).

In addition to the "rebasing" work, the TAP test adds a test about conflict 
handling (logical slot invalidation)
relying on the work done in the "Minimal logical decoding on standby" patch 
series.

I did not look more at the patch (than what's was needed for the rebase) but 
plan to do so.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 655359eedf37d8f2e522aeb1ec8c48adfc1759b1 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Thu, 13 Apr 2023 11:32:28 +
Subject: [PATCH v5] Synchronize logical replication slots from primary to
 standby

---
 doc/src/sgml/config.sgml  |  34 ++
 src/backend/commands/subscriptioncmds.c   |   4 +-
 src/backend/postmaster/bgworker.c |   3 +
 .../libpqwalreceiver/libpqwalreceiver.c   |  95 
 src/backend/replication/logical/Makefile  |   1 +
 src/backend/replication/logical/launcher.c| 263 +++
 src/backend/replication/logical/meson.build   |   1 +
 .../replication/logical/reorderbuffer.c   |  86 
 src/backend/replication/logical/slotsync.c| 413 ++
 src/backend/replication/logical/tablesync.c   |  13 +-
 src/backend/replication/logical/worker.c  |   3 +-
 src/backend/replication/repl_gram.y   |  32 +-
 src/backend/replication/repl_scanner.l|   2 +
 src/backend/replication/slotfuncs.c   |   2 +-
 src/backend/replication/walsender.c   | 195 +
 src/backend/utils/activity/wait_event.c   |   3 +
 src/backend/utils/misc/guc_tables.c   |  26 ++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/commands/subscriptioncmds.h   |   3 +
 src/include/nodes/replnodes.h |   9 +
 src/include/replication/logicallauncher.h |   2 +
 src/include/replication/logicalworker.h   |   9 +
 src/include/replication/slot.h|   5 +-
 src/include/replication/walreceiver.h |  20 +
 src/include/replication/worker_internal.h |   8 +-
 src/include/utils/wait_event.h|   1 +
 src/test/recovery/meson.build |   1 +
 src/test/recovery/t/037_slot_sync.pl  | 130 ++
 28 files changed, 1272 insertions(+), 94 deletions(-)
   3.8% doc/src/sgml/
   7.1% src/backend/replication/libpqwalreceiver/
  54.7% src/backend/replication/logical/
  14.9% src/backend/replication/
   3.3% src/backend/
   4.0% src/include/replication/
  10.9% src/test/recovery/t/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 091a79d4f3..1360885208 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4466,6 +4466,23 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" 
"%p"'  # Windows
   
  
 
+ 
+  standby_slot_names (string)
+  
+   standby_slot_names configuration 
parameter
+  
+  
+  
+   
+List of physical replication slots that logical replication waits for.
+If a logical replication connection is meant to switch to a physical
+standby after the standby is promoted, the physical replication slot
+for the standby should be listed here.  This ensures that logical
+replication is not ahead of the physical standby.
+   
+  
+ 
+
  
 
 
@@ -4649,6 +4666,23 @@ ANY num_sync ( 
+  synchronize_slot_names (string)
+  
+   synchronize_slot_names configuration 
parameter
+  
+  
+  
+   
+Specifies a list of logical replication slots that a physical standby
+should synchronize from the primary server.  This is necessary to be
+able to retarget those logical replication connections to this standby
+if it gets promoted.  Specify * to synchronize all
+logical replication slots.  The default is empty.
+   
+  
+ 
+
  
 
 
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 3251d89ba8..8721706b79 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -991,7 +991,7 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data,
 
RemoveSubscriptionRel(sub->oid, relid);
 
-   logicalrep_worker_stop(sub->oid, relid);
+   logicalrep_worker_stop(MyDatabaseId, sub->oid, 
relid);
 
/*
 * For READY state, we would have already 
dropped the
@@ 

Re: longfin missing gssapi_ext.h

2023-04-17 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Done that way.
> 
> Looks like you neglected to update the configure script proper?

Pah, indeed.  Will fix.  Sorry about that.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: longfin missing gssapi_ext.h

2023-04-17 Thread Tom Lane
Stephen Frost  writes:
> Done that way.

Looks like you neglected to update the configure script proper?

regards, tom lane




Re: longfin missing gssapi_ext.h

2023-04-17 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > How about the attached which just switches from testing for
> > gss_init_sec_context to testing for gss_store_cred_into?
> 
> WFM.

Done that way.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-04-17 Thread Masahiko Sawada
On Fri, Apr 7, 2023 at 6:55 PM John Naylor  wrote:
>
> On Thu, Feb 16, 2023 at 11:44 PM Andres Freund  wrote:
> >
> > We really ought to replace the tid bitmap used for bitmap heap scans. The
> > hashtable we use is a pretty awful data structure for it. And that's not
> > filled in-order, for example.
>
> I spent some time studying tidbitmap.c, and not only does it make sense to 
> use a radix tree there, but since it has more complex behavior and stricter 
> runtime requirements, it should really be the thing driving the design and 
> tradeoffs, not vacuum:
>
> - With lazy expansion and single-value leaves, the root of a radix tree can 
> point to a single leaf. That might get rid of the need to track TBMStatus, 
> since setting a single-leaf tree should be cheap.
>

Instead of introducing single-value leaves to the radix tree as
another structure, can we store pointers to PagetableEntry as values?

> - Fixed-size PagetableEntry's are pretty large, but the tid compression 
> scheme used in this thread (in addition to being complex) is not a great fit 
> for tidbitmap because it makes it more difficult to track per-block metadata 
> (see also next point). With the "combined pointer-value slots" technique, if 
> a page's max tid offset is 63 or less, the offsets can be stored directly in 
> the pointer for the exact case. The lowest bit can tag to indicate a pointer 
> to a single-value leaf. That would complicate operations like 
> union/intersection and tracking "needs recheck", but it would reduce memory 
> use and node-traversal in common cases.
>
> - Managing lossy storage. With pure blocknumber keys, replacing exact storage 
> for a range of 256 pages amounts to replacing a last-level node with a single 
> leaf containing one lossy PagetableEntry. The leader could iterate over the 
> nodes, and rank the last-level nodes by how much storage they (possibly with 
> leaf children) are using, and come up with an optimal lossy-conversion plan.
>
> The above would address the points (not including better iteration and 
> parallel bitmap index scans) raised in
>
> https://www.postgresql.org/message-id/capsanrn5ywsows8ghqwbwajx1selxlntv54biq0z-j_e86f...@mail.gmail.com
>
> Ironically, by targeting a more difficult use case, it's easier since there 
> is less freedom. There are many ways to beat a binary search, but fewer good 
> ways to improve bitmap heap scan. I'd like to put aside vacuum for some time 
> and try killing two birds with one stone, building upon our work thus far.
>
> Note: I've moved the CF entry to the next CF, and set to waiting on author 
> for now. Since no action is currently required from Masahiko, I've added 
> myself as author as well. If tackling bitmap heap scan shows promise, we 
> could RWF and resurrect at a later time.

Thanks. I'm going to continue researching the memory limitation and
try lazy path expansion until PG17 development begins.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: longfin missing gssapi_ext.h

2023-04-17 Thread Tom Lane
Stephen Frost  writes:
> How about the attached which just switches from testing for
> gss_init_sec_context to testing for gss_store_cred_into?

WFM.

regards, tom lane




Re: Incremental sort for access method with ordered scan support (amcanorderbyop)

2023-04-17 Thread Richard Guo
On Sun, Apr 16, 2023 at 1:20 AM Miroslav Bendik 
wrote:

> Postgres allows incremental sort only for ordered indexes. Function
> build_index_paths dont build partial order paths for access methods
> with order support. My patch adds support for incremental ordering
> with access method.


I think this is a meaningful optimization.  I reviewed the patch and
here are the comments from me.

* I understand the new param 'match_pathkeys_length_p' is used to tell
how many presorted keys are useful.  I think list_length(orderbyclauses)
will do the same.  So there is no need to add the new param, thus we can
reduce the code diffs.

* Now that match_pathkeys_to_index() returns a prefix of the pathkeys
rather than returns NIL immediately when there is a failure to match, it
seems the two local variables 'orderby_clauses' and 'clause_columns' are
not necessary any more.  I think we can instead lappend the matched
'expr' and 'indexcol' to '*orderby_clauses_p' and '*clause_columns_p'
directly.  In this way we can still call 'return' when we come to a
failure to match.

* In build_index_paths(), I think the diff can be reduced to

-if (orderbyclauses)
-useful_pathkeys = root->query_pathkeys;
-else
-useful_pathkeys = NIL;
+useful_pathkeys = list_truncate(list_copy(root->query_pathkeys),
+list_length(orderbyclauses));

* Several comments in match_pathkeys_to_index() are out of date.  We
need to revise them to cope with the change.

* I think it's better to provide a test case.

Thanks
Richard


Re: Move defaults toward ICU in 16?

2023-04-17 Thread Justin Pryzby
On Wed, Apr 05, 2023 at 09:33:25AM +0200, Peter Eisentraut wrote:
> On 16.03.23 14:52, Peter Eisentraut wrote:
> > On 09.03.23 20:14, Jeff Davis wrote:
> > > > Let's come back to that after dealing with the other two.
> > > 
> > > Leaving 0001 open for now.
> > 
> > I suspect making a change like this now would result in a bloodbath on
> > the build farm that we could do without.  I suggest revisiting this
> > after the commit fest ends.
> 
> I don't object to this patch.  I suggest waiting until next week to commit
> it and then see what happens.  It's easy to revert if it goes terribly.

Is this still being considered for v16 ?

-- 
Justin




Re: longfin missing gssapi_ext.h

2023-04-17 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Pushed, thanks again to everyone.
> > I'll monitor the buildfarm and assuming there isn't anything unexpected
> > then I'll mark the open item as resolved now.
> 
> The Debian 7 (Wheezy) members of the buildfarm (lapwing, skate, snapper)
> are all getting past the gssapi_ext.h check you added and then failing
> like this:
> 
> ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith 
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels 
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv 
> -fexcess-precision=standard -g -O2 -Werror -I../../../src/include  
> -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS -D_GNU_SOURCE 
> -I/usr/include/libxml2  -I/usr/include/et  -c -o be-gssapi-common.o 
> be-gssapi-common.c
> be-gssapi-common.c: In function 'pg_store_delegated_credential':
> be-gssapi-common.c:110:2: error: unknown type name 
> 'gss_key_value_element_desc'
> be-gssapi-common.c:111:2: error: unknown type name 'gss_key_value_set_desc'
> be-gssapi-common.c:113:4: error: request for member 'key' in something not a 
> structure or union
> be-gssapi-common.c:114:4: error: request for member 'value' in something not 
> a structure or union
> be-gssapi-common.c:115:7: error: request for member 'count' in something not 
> a structure or union
> be-gssapi-common.c:116:7: error: request for member 'elements' in something 
> not a structure or union
> be-gssapi-common.c:119:2: error: implicit declaration of function 
> 'gss_store_cred_into' [-Werror=implicit-function-declaration]
> 
> Debian 7 has been EOL five years or so, so I don't mind saying "get a
> newer OS or disable gssapi".  However, is it worth adding another
> configure check to fail a little faster with whatever Kerberos
> version this is?  Checking that gss_store_cred_into() exists
> seems like the most obvious one of these things to test for.

Sure, I can certainly do that and agreed that it makes sense to check
for gss_store_cred_into().

How about the attached which just switches from testing for
gss_init_sec_context to testing for gss_store_cred_into?

Thanks!

Stephen
diff --git a/configure.ac b/configure.ac
index c53a9c788e..1362f57a27 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1340,8 +1340,8 @@ fi
 
 if test "$with_gssapi" = yes ; then
   if test "$PORTNAME" != "win32"; then
-AC_SEARCH_LIBS(gss_init_sec_context, [gssapi_krb5 gss 'gssapi -lkrb5 -lcrypto'], [],
-   [AC_MSG_ERROR([could not find function 'gss_init_sec_context' required for GSSAPI])])
+AC_SEARCH_LIBS(gss_store_cred_into, [gssapi_krb5 gss 'gssapi -lkrb5 -lcrypto'], [],
+   [AC_MSG_ERROR([could not find function 'gss_store_cred_into' required for GSSAPI])])
   else
 LIBS="$LIBS -lgssapi32"
   fi
diff --git a/meson.build b/meson.build
index 3405cc07ee..f1db5455b0 100644
--- a/meson.build
+++ b/meson.build
@@ -634,14 +634,14 @@ if not gssapiopt.disabled()
   endif
 
   if not have_gssapi
-  elif cc.has_function('gss_init_sec_context', dependencies: gssapi,
+  elif cc.has_function('gss_store_cred_into', dependencies: gssapi,
   args: test_c_args, include_directories: postgres_inc)
 cdata.set('ENABLE_GSS', 1)
 
 krb_srvtab = 'FILE:/@0@/krb5.keytab)'.format(get_option('sysconfdir'))
 cdata.set_quoted('PG_KRB_SRVTAB', krb_srvtab)
   elif gssapiopt.enabled()
-error('''could not find function 'gss_init_sec_context' required for GSSAPI''')
+error('''could not find function 'gss_store_cred_into' required for GSSAPI''')
   else
 have_gssapi = false
   endif


signature.asc
Description: PGP signature


Re: Support logical replication of DDLs

2023-04-17 Thread Peter Smith
Here are some more review comments for the patch 0002-2023_04_07-2

Note: This is a WIP review (part 2); the comments in this post are
only for the commit message and the PG docs

==
Commit message

1.
It's not obvious that those "-" (like "- For DROP object:") represent
major sections of this commit message. For example, at first, I could
not tell that the "We do this way because..." referred to the previous
section; Also it was not clear "TO IMPROVE:" is just an improvement
for the last section, not the entire patch.

In short, I think those main headings should be more prominent so the
commit message is clearly split into these sections.

e.g.
OVERVIEW


For non-rewrite ALTER object command and CREATE object command:
---

For DROP object:


For table_rewrite ALTER TABLE command:
-=

~~~

2.
To support DDL replication, it use event trigger and DDL deparsing
facilities. During CREATE PUBLICATION we register a command end trigger that
deparses the DDL (if the DDL is annotated as ddlreplok for DDL replication in
cmdtaglist.h) as a JSON blob, and WAL logs it. The event trigger is
automatically
removed at the time of DROP PUBLICATION. The WALSender decodes the WAL and sends
it downstream similar to other DML commands. The subscriber then converts JSON
back to the DDL command string and executes it. In the subscriber, we also add
the newly added rel to pg_subscription_rel so that the DML changes on the new
table can be replicated without having to manually run
"ALTER SUBSCRIPTION ... REFRESH PUBLICATION".

~

2a.
"it use event trigger" --> "we use the event trigger"

~

2b.
Maybe put 'command start' in quotes as you did later for 'command end'
in this commit message

~~~

3.
- For non-rewrite ALTER object command and
- CREATE object command:
we deparse the command at ddl_command_end event trigger and WAL log the
deparsed json string. The WALSender decodes the WAL and sends it to
subscriber if the created/altered table is published. It supports most of
ALTER TABLE command except some commands(DDL related to PARTITIONED TABLE
...) that introduced recently which haven't been supported by the current
ddl_deparser, we will support that later.

~

3a.
"we deparse" --> "We deparse"

~

3b.
"that introduced recently which haven't been" --> "that are recently
introduced but are not yet"

~

3c.
Is this information about unsupported ddl_parser stuff still accurate
or is patch 0001 already taking care of this?

~~~

4.
The 'command start' event handler logs a ddl message with the relids of
the tables that are dropped which the output plugin (pgoutput) stores in
its internal data structure after verifying that it is for a table that is
part of the publication. Later the 'command end' event handler sends the
actual drop message. Pgoutput on receiving the command end, only sends out
the drop command only if it is for one of the relids marked for deleting.

~

BEFORE
Pgoutput on receiving the command end, only sends out the drop command
only if it is for one of the relids marked for deleting.

SUGGESTION
On receiving the 'command end', pgoutput sends the DROP command only
if it is for one of the relids marked for deletion.

~~~

5.
The reason we have to do this is because, once the logical decoder
receives the 'command end' message,  the relid of the table is no longer
valid as it has been deleted as part of invalidations received for the
drop table command. It is no longer possible to verify if the table is
part of the publication list or not. To make this possible, I have added
two more elements to the ddl xlog and ddl message, (relid and cmdtype).


~

"I have added two more elements to..." ==> "two more elements are added to..."

~~~

6.
We could have also handled all this on the subscriber side as well, but
that would mean sending spurious ddl messages for tables that are not part
of the publication.

~

"as well" <-- not needed.

~~~

7.
- For table_rewrite ALTER TABLE command:
(ALTER COLUMN TYPE, ADD COLUMN DEFAULT, SET LOGGED, SET ACCESS METHOD)

we deparse the command and WAL log the deparsed json string at
table_rewrite event trigger. The WALSender decodes the WAL and sends it to
subscriber if the altered table is published. Then, the WALSender will
convert the upcoming rewrite INSERTs to UPDATEs and send them to
subscriber so that the data between publisher and subscriber can always be
consistent. Note that the tables that publish rewrite ddl must have a
replica identity configured in order to be able to replicate the upcoming
rewrite UPDATEs.

~

7.
"we deparse" --> "We deparse"

~~~

8.
(1) The data before the rewrite ddl could already be different among
publisher and subscriber. To make sure the extra data in subscriber which
doesn't exist in publisher also get rewritten, we need to let the
subscriber execute the original rewrite ddl to rewrite all the data at
first.

(2) the data 

RE: Support logical replication of DDLs

2023-04-17 Thread Zhijie Hou (Fujitsu)
On Wednesday, April 12, 2023 7:24 PM Amit Kapila  
wrote:
> 
> On Fri, Apr 7, 2023 at 8:52 AM houzj.f...@fujitsu.com
>  wrote:
> >
> 
> Few comments on 0001

Thanks for the comments.

> ===
> 1.
> + ConstrObjDomain,
> + ConstrObjForeignTable
> +} ConstraintObjType;
>
> These both object types don't seem to be supported by the first patch.
> So, I don't see why these should be part of it.
>

done, removed.

> 2.
> +append_string_object(ObjTree *tree, char *sub_fmt, char * 
> +object_name,
>
> Extra space before object_name.

done

>
> 3. Is there a reason to keep format_type_detailed() in ddl_deparse.c 
> instead of defining it in format_type.c where other format functions 
> reside? Earlier, we were doing this deparsing as an extension, so it 
> makes sense to define it locally but not sure if that is required now.
>

done, moved to format_type.c.

> 4.
> format_type_detailed()
> {
> ...
> + /*
> + * Check if it's a regular (variable length) array type.  As above,
> + * fixed-length array types such as "name" shouldn't get deconstructed.
> + */
> + array_base_type = typeform->typelem;
>
> This comment gives incomplete information. I think it is better to
> say: "We switch our attention to the array element type for certain 
> cases. See format_type_extended(). Then we can remove a similar 
> comment later in the function.
>

Improved the comment here.

> 5.
> +
> + switch (type_oid)
> + {
> + case INTERVALOID:
> + *typename = pstrdup("INTERVAL");
> + break;
> + case TIMESTAMPTZOID:
> + if (typemod < 0)
> + *typename = pstrdup("TIMESTAMP WITH TIME ZONE"); else
> + /* otherwise, WITH TZ is added by typmod. */ *typename = 
> + pstrdup("TIMESTAMP"); break; case TIMESTAMPOID:
> + *typename = pstrdup("TIMESTAMP");
> + break;
>
> In this switch case, use the type oid cases in the order of their value.
>

done

> 6.
> +static inline char *
> +get_type_storage(char storagetype)
>
> We already have a function with the name storage_name() which does 
> exactly what this function is doing. Shall we expose that and use it?
>

done

> 7.
> +static ObjTree *
> +new_objtree(char *fmt)
> +{
> + ObjTree*params;
> +
> + params = palloc0(sizeof(ObjTree));
>
> Here, the variable name params appear a bit odd. Shall we change it to 
> objtree or obj?
>

done

===

> Some more comments on 0001
> ==
>
> 1.
> +/*
> + * Subroutine for CREATE TABLE/CREATE DOMAIN deparsing.
> + *
> + * Given a table OID or domain OID, obtain its constraints and append 
> +them to
> + * the given elements list.  The updated list is returned.
> + *
> + * This works for typed tables, regular tables, and domains.
> + *
> + * Note that CONSTRAINT_FOREIGN constraints are always ignored.
> + */
> +static List *
> +obtainConstraints(List *elements, Oid relationId, Oid domainId,
> +   ConstraintObjType objType)
>
> Why do we need to support DOMAIN in this patch? Isn't this only for tables?

Moved to later patch.

>
> 2.
> obtainConstraints()
> {
> ..
> + switch (constrForm->contype)
> + {
> + case CONSTRAINT_CHECK:
> + contype = "check";
> + break;
> + case CONSTRAINT_FOREIGN:
> + continue; /* not here */
> + case CONSTRAINT_PRIMARY:
> + contype = "primary key";
> + break;
> + case CONSTRAINT_UNIQUE:
> + contype = "unique";
> + break;
> + case CONSTRAINT_TRIGGER:
> + contype = "trigger";
> + break;
> + case CONSTRAINT_EXCLUSION:
> + contype = "exclusion";
> + break;
> + default:
> + elog(ERROR, "unrecognized constraint type");
>
> It looks a bit odd that except CONSTRAINT_NOTNULL all other 
> constraints are handled here. I think the reason is callers themselves 
> deal with not null constraints, if so, we can probably add a comment.
>

Since the CONSTRAINT_NOTNULL(9ce04b5) has been removed, I didn't add comments 
here.

> 3.
> obtainConstraints()
> {
> ...
> + if (constrForm->conindid &&
> + (constrForm->contype == CONSTRAINT_PRIMARY ||
> + constrForm->contype == CONSTRAINT_UNIQUE || contype == 
> + constrForm->CONSTRAINT_EXCLUSION))
> + {
> + Oid   tblspc = get_rel_tablespace(constrForm->conindid);
> +
> + if (OidIsValid(tblspc))
> + append_string_object(constr,
> + "USING INDEX TABLESPACE %{tblspc}s", "tblspc", 
> + get_tablespace_name(tblspc));
> ...
> }
>
> How is it guaranteed that we can get tablespace_name after getting id?
> If there is something that prevents tablespace from being removed 
> between these two calls then it could be better to write a comment for 
> the same.
>

Done, changed code to check if valid tablespace_name is received as 
it may be concurrently dropped.


> 4. It seems RelationGetColumnDefault() assumed that the passed 
> attribute always had a default because it didn't verify the return 
> value of build_column_default(). Now, all but one of its callers in
> deparse_ColumnDef() check that attribute has a default value before 
> calling this function. I think either we change that caller or have an 
> error handling in RelationGetColumnDefault(). It might 

Re: Support logical replication of DDLs

2023-04-17 Thread vignesh C
On Sat, 15 Apr 2023 at 06:38, vignesh C  wrote:
>
> On Fri, 14 Apr 2023 at 13:06, vignesh C  wrote:
> >
> > Few comments:

Few more comments:
1) since missing_ok is passed as false, if there is an error the error
will be handled in find_string_in_jsonbcontainer, "missing operator
name" handling can be removed from here:
+/*
+ * Expand a JSON value as an operator name. The value may contain element
+ * "schemaname" (optional).
+ */
+static void
+expand_jsonval_operator(StringInfo buf, JsonbValue *jsonval)
+{
+   char   *str;
+   JsonbContainer *data = jsonval->val.binary.data;
+
+   str = find_string_in_jsonbcontainer(data, "schemaname", true, NULL);
+   /* Schema might be NULL or empty */
+   if (str != NULL && str[0] != '\0')
+   {
+   appendStringInfo(buf, "%s.", quote_identifier(str));
+   pfree(str);
+   }
+
+   str = find_string_in_jsonbcontainer(data, "objname", false, NULL);
+   if (!str)
+   ereport(ERROR,
+   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+   errmsg("missing operator name"));
+

2) This should be present at the beginning of the file before the functions:
+#define ADVANCE_PARSE_POINTER(ptr,end_ptr) \
+   do { \
+   if (++(ptr) >= (end_ptr)) \
+   ereport(ERROR, \
+
errcode(ERRCODE_INVALID_PARAMETER_VALUE), \
+   errmsg("unterminated format
specifier")); \
+   } while (0)
+

3) Should we add this to the documentation, we have documented other
event triggers like ddl_command_start, ddl_command_end, table_rewrite
and sql_drop at [1]:
 +   runlist = EventTriggerCommonSetup(command->parsetree,
+
   EVT_TableInitWrite,
+
   "table_init_write",
+
   );

4) The inclusion of stringinfo.h is not required, I was able to
compile the code without it:
+ *   src/backend/commands/ddl_json.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "lib/stringinfo.h"
+#include "tcop/ddl_deparse.h"
+#include "utils/builtins.h"
+#include "utils/jsonb.h"

5) schema and typmodstr might not be allocated, should we still call pfree?
+   schema = find_string_in_jsonbcontainer(data, "schemaname", true, NULL);
+   typename = find_string_in_jsonbcontainer(data, "typename", false, NULL);
+   typmodstr = find_string_in_jsonbcontainer(data, "typmod", true, NULL);
+   is_array = find_bool_in_jsonbcontainer(data, "typarray");
+   switch (is_array)
+   {
+   case tv_true:
+   array_decor = "[]";
+   break;
+
+   case tv_false:
+   array_decor = "";
+   break;
+
+   case tv_absent:
+   default:
+   ereport(ERROR,
+
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+   errmsg("missing typarray element"));
+   }
+
+   if (schema == NULL)
+   appendStringInfo(buf, "%s", quote_identifier(typename));
+   else if (schema[0] == '\0')
+   appendStringInfo(buf, "%s", typename);  /* Special
typmod needs */
+   else
+   appendStringInfo(buf, "%s.%s", quote_identifier(schema),
+quote_identifier(typename));
+
+   appendStringInfo(buf, "%s%s", typmodstr ? typmodstr : "", array_decor);
+   pfree(schema);
+   pfree(typename);
+   pfree(typmodstr);

6) SHould the following from ddl_deparse_expand_command function
header be moved to expand_one_jsonb_element function header, as the
specified are being handled in expand_one_jsonb_element.
* % expand to a literal %
 * I expand as a single, non-qualified identifier
 * D expand as a possibly-qualified identifier
 * T expand as a type name
 * O expand as an operator name
 * L expand as a string literal (quote using single quotes)
 * s expand as a simple string (no quoting)
 * n expand as a simple number (no quoting)
 * R expand as a role name (possibly quoted name, or PUBLIC)

 7) In ddl_deparse.c we have used elog(ERROR, ...) for error handling
and in ddl_json.c we have used ereport(ERROR, ...) for error handling,
Is this required for any special handling?

8) Is this required as part of create table implementation:
8.a)
+/*
+ * EventTriggerAlterTypeStart
+ * Save data about a single part of an ALTER TYPE.
+ *
+ * ALTER TABLE can have multiple subcommands which might include DROP COLUMN
+ * command and ALTER TYPE referring the drop column in USING expression.
+ * As the dropped column cannot be accessed after the execution of DROP COLUMN,
+ * a special trigger is required to handle this case before the drop column is
+ * executed.
+ */
+void
+EventTriggerAlterTypeStart(AlterTableCmd *subcmd, Relation rel)
+{

8.b)
+/*
+ * EventTriggerAlterTypeEnd
+ * Finish up saving an ALTER TYPE 

Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-04-17 Thread Drouvot, Bertrand

Hi,

On 4/17/23 11:55 AM, Alvaro Herrera wrote:

On 2023-Apr-12, Drouvot, Bertrand wrote:


I'm not sure if adding those 2 tests should be considered as an open
item. I can add this open item if we think that makes sense. I'd be
happy to do so but it looks like I don't have the privileges to edit
https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items


I think adding extra tests for new code can definitely be considered an
open item, since those tests might help to discover issues in said new
code.



Thanks for the feedback! Added as an open item.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-04-17 Thread Alvaro Herrera
On 2023-Apr-12, Drouvot, Bertrand wrote:

> I'm not sure if adding those 2 tests should be considered as an open
> item. I can add this open item if we think that makes sense. I'd be
> happy to do so but it looks like I don't have the privileges to edit
> https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items

I think adding extra tests for new code can definitely be considered an
open item, since those tests might help to discover issues in said new
code.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: [PATCH] Add support for postgres:// URIs to PGDATABASE environment variable

2023-04-17 Thread Rémi Lapeyre


> Le 17 avr. 2023 à 03:25, Tom Lane  a écrit :
> 
> You can do this:
> 
> $ psql -d "postgres://localhost/test"
> 
> but that's not the same thing as reinterpreting the dbname field
> of what we have already determined to be a connection string.
> 

Yes, I know see the difference, I got confused by the way the code reuses the 
dbname keyword to pass the connection string and the fact that 

$ psql --dbname "postgres://localhost/test"

works, but the dbname parameter of psql is not the same as the dbname used by 
libpq.


> Perhaps there is a case for inventing a new environment variable
> that can do what you're suggesting.  But you would have to make
> a case that it's worth doing, and also define how it interacts
> with all the other PGxxx environment variables.

I think it could be convenient to have such an environment variable but given 
your feedback I will just make it specific to my application rather than a part 
of libpq.

Best,
Rémi

Re: eclg -C ORACLE breaks data

2023-04-17 Thread Kyotaro Horiguchi
(sorry for the wrong subject..)

At Mon, 17 Apr 2023 17:00:59 +0900, Michael Paquier  wrote 
in 
> On Mon, Apr 10, 2023 at 05:35:00PM +0900, Kyotaro Horiguchi wrote:
> > This results in overwriting str[-1], the last byte of the preceding
> > numeric in this case, with 0x00, representing the digit '0'. When
> > callers of ecpg_get_data() explicitly pass zero as varcharsize, they
> > provide storage that precisely fitting the data.
> 
> Good find, that's clearly wrong.  The test case is interesting.  On
> HEAD, the processing of the second field eats up the data of the first
> field.
>
> > However, it remains
> > uncertain if this assumption is valid when ecpg_store_result() passes
> > var->varcharsize which is also zero. Consequently, the current fix
> > presumes exact-fit storage when varcharsize is zero.
> 
> Based on what I can read in sqlda.c (ecpg_set_compat_sqlda() and
> ecpg_set_native_sqlda()), the data length calculated adds an extra
> byte to the data length when storing the data references in sqldata.
> execute.c and ecpg_store_result() is actually much trickier than that
> (see particularly the part where the code does an "allocate memory for
> NULL pointers", where varcharsize could also be 0), still I agree that
> this assumption should be OK.  The code is as it is for many years,
> with its logic to do an estimation of allocation first, and then read
> the data at once in the whole area allocated..
> 
> This thinko has been introduced by 3b7ab43, so this needs to go down
> to v11.  I'll see to that.

Thanks for picking this up.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply

2023-04-17 Thread Kyotaro Horiguchi
At Mon, 17 Apr 2023 15:21:02 +0900, Etsuro Fujita  
wrote in 
> > >> Although the primary message is the same, the supplemental message pro=
> vides additional context that can help distinguish which function is report=
> ing the message.
> > >
> > > If the user is familiar with the PQgetCancel/PQcancel internals, this
> > > is true, but if not, I do not think this is always true.  Consider
> > > this error message, for example:
> > >
> > > 2023-04-14 17:48:55.862 JST [24344] WARNING:  could not send cancel
> > > request: invalid integer value "999" for connection option
> > > "keepalives"
> > >
> > > It would be hard for users without the knowledge about those internals
> > > to distinguish that from this message.  For average users, I think it
> > > would be good to use a more distinguishable error message.
> >
> > In this case, I believe that they should be able to understand that an in=
> valid integer value "999" was specified in the "keepalives" connect=
> ion option, which caused the warning message. Then, they would need to chec=
> k the setting of the "keepalives" option and correct it if necessary.
> 
> Maybe my explanation was not clear.  Let me explain.  Assume that a
> user want to identify the place where the above error was thrown.
> Using grep with =E2=80=9Dcould not send cancel request=E2=80=9D, the user c=
> an find the
> two places emitting the message in pgfdw_cancel_query_begin: one for
> PQgetCancel and one for PQcancel.  If the user are familiar with the
> PQgetCancel/PQcancel internals, the user can determine, from the
> supplemental message, that the error was thrown by the former.  But if
> not, the user cannot do so.  To support the unfamiliar user as well, I
> think it would be a good idea to use a more appropriate message for
> PQgetCancel that is different from "could not send cancel request=E2=80=9D.
> 
> (I agree that most users would not care about the places where errors
> were thrown, but I think some users would, and actually, I do when
> investigating unfamiliar errors.)

If PGgetCancel() fails due to invalid keepliave-related values, It
seems like a bug that needs fixing, regardless of whether we display
an error message when PGgetCacncel() fails.  The only error case of
PGgetCancel() that could occur in pgfdw_cancel_query_begin() is a
malloc() failure, which currently does not set an error message (I'm
not sure we can do that in that case, though..).

In my opinion, PQconnectPoll and PQgetCancel should use the same
parsing function or PQconnectPoll should set parsed values, making
unnecessary for PQgetCancel to parse the same parameter
again. Additionally, PQgetCancel should set appropriate error messages
for all failure modes.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Support logical replication of DDLs

2023-04-17 Thread Peter Smith
Hi, here are some review comments for the patch 0002-2023_04_07-2

Note: This is a WIP review. The patch is quite large and I have
managed to only look at ~50% of it. I will continue reviewing this
same 0002-2023_04_07-2 and send more comments at a later time.
Meanwhile, here are the review comments I have so far...

==
General

1. Field/Code order

I guess it makes little difference but it was a bit disconcerting that
the new DDL field member is popping up in all different order
everywhere.

e.g. In pg_publication.h, FormData_pg_publication comes last
e.g. In describe.c: it comes immediately after the "All Tables" column
e.g. In pg_publication.c, GetPublication: it comes after truncated and
before viaroot.

IMO it is better to try to keep the same consistent order everywhere
unless there is some reason not to.

~~~

2. Inconsistent acronym case

Use consistent uppercase for JSON and DDL instead of sometimes json
and ddl. There are quite a few random examples in the commit message
but might be worth searching the entire patch to make all comments use
consistent case.

==
src/backend/replication/logical/proto.c

3. logicalrep_read_ddl

+/*
+ * Read DDL MESSAGE from stream
+ */
+char *
+logicalrep_read_ddl(StringInfo in, XLogRecPtr *lsn,
+const char **prefix,
+Size *sz)

Should this just say "Read DDL from stream"?

(It matches the function name better, and none of the other Read XXX
say Read XXX MESSAGE)

Alternatively, maybe that comment is correct, but in that case,
perhaps change the function name --> logicalrep_read_ddl_message().



4. logicalrep_write_ddl

+/*
+ * Write DDL MESSAGE to stream
+ */
+void
+logicalrep_write_ddl(StringInfo out, XLogRecPtr lsn,
+ const char *prefix, Size sz, const char *message)

Ditto previous review comment #3

==
src/backend/tcop/cmdtag.c

5. GetCommandTagsForDDLRepl

+CommandTag *
+GetCommandTagsForDDLRepl(int *ncommands)
+{
+ CommandTag *ddlrepl_commands = palloc0(COMMAND_TAG_NEXTTAG *
sizeof(CommandTag));
+ *ncommands = 0;
+
+ for(int i = 0; i < COMMAND_TAG_NEXTTAG; i++)
+ {
+ if (tag_behavior[i].ddl_replication_ok)
+ ddlrepl_commands[(*ncommands)++] = (CommandTag) i;
+ }
+
+ return ddlrepl_commands;
+}

5a.
I felt that maybe it would be better to iterate using CommandTag enums
instead of int indexes.

~

5b.
I saw there is another code fragment in GetCommandTagEnum() that uses
lengthof(tag_behaviour) instead of checking the COMMAND_TAG_NEXTTAG.

It might be more consistent to use similar code here too. Something like:

const int ntags = lengthof(tag_behavior) - 1;
CommandTag *ddlrepl_commands = palloc0(ntags * sizeof(CommandTag));
*ncommands = 0;

for(CommandTag tag = 0; i < nTags; tag++)
if (tag_behavior[tag].ddl_replication_ok)
ddlrepl_commands[(*ncommands)++] = tag;

==
src/bin/pg_dump/pg_dump.c

6.
@@ -4213,7 +4224,10 @@ dumpPublication(Archive *fout, const
PublicationInfo *pubinfo)
  first = false;
  }

- appendPQExpBufferChar(query, '\'');
+ appendPQExpBufferStr(query, "'");
+
+ if (pubinfo->pubddl_table)
+ appendPQExpBufferStr(query, ", ddl = 'table'");

The change from appendPQExpBufferChar to appendPQExpBufferStr did not
seem a necessary part of this patch.

~~~

7. getPublicationEventTriggers

+/*
+ * getPublicationEventTriggers
+ *   get the publication event triggers that should be skipped
+ */
+static void
+getPublicationEventTriggers(Archive *fout, SimpleStringList *skipTriggers)

Given the way this function is invoked, it might be more appropriate
to name it like getEventTriggersToBeSkipped(), with the comment saying
that for now we just we skip the PUBLICATION DDL event triggers.

~~~

8. getEventTriggers

  /* Decide whether we want to dump it */
- selectDumpableObject(&(evtinfo[i].dobj), fout);
+ if (simple_string_list_member(, evtinfo[i].evtname))
+ evtinfo[i].dobj.dump= DUMP_COMPONENT_NONE;
+ else
+ selectDumpableObject(&(evtinfo[i].dobj), fout);
  }

+ simple_string_list_destroy();
+

8a.
Missing whitespace before '='

~

8b.
Scanning a list within a loop may not be efficient. I suppose this
code must be assuming that there are not 1000s of publications, and
therefore the skipTriggers string list will be short enough to ignore
such inefficiency concerns.

IMO a simpler alternative be to throw away the
getPublicationEventTriggers() and the list scanning logic, and instead
simply skip any event triggers where the name starts with
PUB_EVENT_TRIG_PREFIX (e.g. the correct prefix, not the current code
one -- see other review comment for pg_publication.h). Are there any
problems doing it that way?

==
src/bin/pg_dump/t/002_pg_dump.pl

9.
  create_sql   => 'CREATE PUBLICATION pub2
  FOR ALL TABLES
- WITH (publish = \'\');',
+ WITH (publish = \'\', ddl = \'\');',
  regexp => qr/^
  \QCREATE PUBLICATION pub2 FOR ALL TABLES WITH (publish = '');\E

9a.
I was not sure of the purpose of this test. Is it for showing that
ddl='' is equivalent to not having any ddl option?

~

9b.
Missing test cases for dumping other 

Re: eclg -C ORACLE breaks data

2023-04-17 Thread Michael Paquier
On Mon, Apr 10, 2023 at 05:35:00PM +0900, Kyotaro Horiguchi wrote:
> This results in overwriting str[-1], the last byte of the preceding
> numeric in this case, with 0x00, representing the digit '0'. When
> callers of ecpg_get_data() explicitly pass zero as varcharsize, they
> provide storage that precisely fitting the data.

Good find, that's clearly wrong.  The test case is interesting.  On
HEAD, the processing of the second field eats up the data of the first
field.

> However, it remains
> uncertain if this assumption is valid when ecpg_store_result() passes
> var->varcharsize which is also zero. Consequently, the current fix
> presumes exact-fit storage when varcharsize is zero.

Based on what I can read in sqlda.c (ecpg_set_compat_sqlda() and
ecpg_set_native_sqlda()), the data length calculated adds an extra
byte to the data length when storing the data references in sqldata.
execute.c and ecpg_store_result() is actually much trickier than that
(see particularly the part where the code does an "allocate memory for
NULL pointers", where varcharsize could also be 0), still I agree that
this assumption should be OK.  The code is as it is for many years,
with its logic to do an estimation of allocation first, and then read
the data at once in the whole area allocated..

This thinko has been introduced by 3b7ab43, so this needs to go down
to v11.  I'll see to that.
--
Michael


signature.asc
Description: PGP signature


Re: regression coverage gaps for gist and hash indexes

2023-04-17 Thread Alexander Lakhin

Hi,

31.03.2023 17:00, Alexander Lakhin wrote:

31.03.2023 15:55, Tom Lane wrote:

See also the thread about bug #16329 [1]. Alexander promised to look
into improving the test coverage in this area, maybe he can keep an
eye on the WAL logic coverage too.


Yes, I'm going to analyze that area too. Maybe it'll take more time
(a week or two) if I encounter some bugs there (for now I observe anomalies
with gist__int_ops), but I will definitely try to improve the gist testing.


After 2+ weeks of researching I'd like to summarize my findings.
1) The checking query proposed in [1] could be improved by adding
the restriction "tgk.v = brute.v" to the condition:
WHERE tgk.k >> point(brute.min - 1, 0) AND tgk.k << point(brute.max + 1, 0)
Otherwise that query gives a false positive after
insert into test_gist_killtuples values(point(505, 0));

The similar improved condition could be placed in hash_index_killtuples.sql.

Yet another improvement for the checking query could be done with the
replacement:
min(k <-> point(0, 0)), max(k <-> point(0, 0)) ->
min(k <-> point(0, k[1])), max(p <-> point(0, k[1])) ...

It doesn't change the query plan dramatically, but the query becomes more
universal (it would work for points with any non-negative integer x).

2) I've checked clang`s scan-build notices related to gist as I planned [2],
namely:
Logic error    Branch condition evaluates to a garbage value 
src/backend/access/gist/gistutil.c    gistCompressValues    606
Logic error    Dereference of null pointer src/backend/access/gist/gist.c    
gistFindCorrectParent    1099
Logic error    Dereference of null pointer src/backend/access/gist/gist.c    
gistdoinsert    671
Logic error    Dereference of null pointer src/backend/access/gist/gist.c    
gistfinishsplit    1339
Logic error    Dereference of null pointer src/backend/access/gist/gist.c    
gistplacetopage    340
Logic error    Dereference of null pointer 
src/backend/access/gist/gistbuildbuffers.c gistPushItupToNodeBuffer    366
Logic error    Result of operation is garbage or undefined src/backend/access/gist/gistbuildbuffers.c 
gistRelocateBuildBuffersOnSplit    677

Logic error    Result of operation is garbage or undefined 
src/backend/access/gist/gistutil.c    gistchoose    463
Unused code    Dead assignment    src/backend/access/gist/gist.c gistdoinsert   
 843

And found that all of them (except for the last one, that doesn't worth
fixing, IMO) are false positives (I can present detailed explanations if it
could be of interest.) So I see no grounds here to build new tests on.

3) To date I found other anomalies more or less related to gist:
fillfactor is ignored for sorted index build mode, which is effectively default 
now [3]
amcheck verification for gist is not yet ready to use [4] (and the latest patch 
doesn't apply to the current HEAD)
bug #17888: Incorrect memory access in gist__int_ops for an input array with 
many elements [5]

4) I've constructed some tests, that provide full coverage for
gistFindCorrectParent(), reach for "very rare situation", and for
gistfixsplit(), but all these tests execute concurrent queries, so they
can't be implemented as simple regression tests. Moreover, I could not find
any explicit problems when reaching those places (I used the checking query
from [1] in absence of other means to check gist indexes), so I see no value
in developing (not to speak of committing) these tests for now. I'm going to
further explore the gist behavior in those dark corners, but it looks like
a long-term task, so I think it shouldn't delay the gist coverage improvement
already proposed.

5)
02.04.2023 20:50, Andres Freund wrote:

Looks like the test in [1] could be made a lot cheaper by changing 
effective_cache_size
for just that test:

The effective_cache_size is accounted only when buffering = auto, but in
that test we have buffering = on, so changing it wouldn't help there.

While looking at gist-related tests, I've noticed an incorrect comment
in index_including_gist.sql:
 * 1.1. test CREATE INDEX with buffered build

It's incorrect exactly because with the default effective_cache_size the
buffered build mode is not enabled for that index size (I've confirmed
this with the elog(LOG,..) placed inside gistInitBuffering()).

So I'd like to propose the patch attached, that:
a) demonstrates the bug #16329:
With 8e5eef50c reverted, I get:
**00:00:00:11.179 1587838** Valgrind detected 1 error(s) during execution of "CREATE INDEX tbl_gist_idx ON tbl_gist 
using gist (c4) INCLUDE (c1,c2,c3) WITH (buffering = on);"

b) makes the comment in index_including_gist.sql correct
c) increases a visible test coverage a little, in particular:
 Function 'gistBuffersReleaseBlock'
-Lines executed:66.67% of 9
+Lines executed:100.00% of 9
d) doesn't increase the test duration significantly:
without valgrind I see difference: 84 ms -> 93 ms, under vagrind: 13513 ms -> 
14511 ms

Thus, I think, it's worth to split the activity related to gist testing
improvement to 

Re: Allowing parallel-safe initplans

2023-04-17 Thread Richard Guo
On Mon, Apr 17, 2023 at 10:57 AM Richard Guo  wrote:

> The initPlan has been moved from the Result node to the Gather node.  As
> a result, when doing tuple projection for the Result node, we'd get a
> ParamExecData entry with NULL execPlan.  So the initPlan does not get
> chance to be executed.  And we'd get the output as the default value
> from the ParamExecData entry, which is zero as shown.
>
> So now I begin to wonder if this wrong result issue is possible to exist
> in other places where we move initPlans.  But I haven't tried hard to
> verify that.
>

I looked further into this issue and I believe other places are good.
The problem with this query is that the es/ecxt_param_exec_vals used to
store info about the initplan is not the same one as in the Result
node's expression context for projection, because we've forked a new
process for the parallel worker and then created and initialized a new
EState node, and allocated a new es_param_exec_vals array for the new
EState.  When doing projection for the Result node, the current code
just goes ahead and accesses the new es_param_exec_vals, thus fails to
retrieve the info about the initplan.  Hmm, I doubt this is sensible.

So now it seems that the breakage of regression tests is more severe
than being cosmetic.  I wonder if we need to update the comments to
indicate the potential wrong results issue if we move the initPlans to
the Gather node.

Thanks
Richard


Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply

2023-04-17 Thread Etsuro Fujita
On Fri, Apr 14, 2023 at 11:28 PM Fujii Masao
 wrote:
> On 2023/04/14 18:59, Etsuro Fujita wrote:
> >> The primary message basically should avoid reference to implementation 
> >> details such as specific structure names like PGcancel, shouldn't it, as 
> >> per the error message style guide?
> >
> > I do not think that PGcancel is that specific, as it is described in
> > the user-facing documentation [1].  (In addition, the error message I
> > proposed was created by copying the existing error message "could not
> > create OpenSSL BIO structure" in contrib/sslinfo.c.)
>
> I think that mentioning PGcancel in the error message could be confusing for 
> average users who are just running a query on a foreign table and encounter 
> the error message after pressing Ctrl-C. They may not understand why the 
> PGcancel struct is referenced in the error message while accessing foreign 
> tables. It could be viewed as an internal detail that is not necessary for 
> the user to know.

Ok, understood.  I do not think it is wrong to use "could not send
cancel request” for PQgetCancel as well, but I feel that that is not
perfect for PQgetCancel, because that function never sends a cancel
request; that function just initializes the request.  So how about
"could not initialize cancel request”, instead?

> >> Although the primary message is the same, the supplemental message 
> >> provides additional context that can help distinguish which function is 
> >> reporting the message.
> >
> > If the user is familiar with the PQgetCancel/PQcancel internals, this
> > is true, but if not, I do not think this is always true.  Consider
> > this error message, for example:
> >
> > 2023-04-14 17:48:55.862 JST [24344] WARNING:  could not send cancel
> > request: invalid integer value "999" for connection option
> > "keepalives"
> >
> > It would be hard for users without the knowledge about those internals
> > to distinguish that from this message.  For average users, I think it
> > would be good to use a more distinguishable error message.
>
> In this case, I believe that they should be able to understand that an 
> invalid integer value "999" was specified in the "keepalives" 
> connection option, which caused the warning message. Then, they would need to 
> check the setting of the "keepalives" option and correct it if necessary.

Maybe my explanation was not clear.  Let me explain.  Assume that a
user want to identify the place where the above error was thrown.
Using grep with ”could not send cancel request”, the user can find the
two places emitting the message in pgfdw_cancel_query_begin: one for
PQgetCancel and one for PQcancel.  If the user are familiar with the
PQgetCancel/PQcancel internals, the user can determine, from the
supplemental message, that the error was thrown by the former.  But if
not, the user cannot do so.  To support the unfamiliar user as well, I
think it would be a good idea to use a more appropriate message for
PQgetCancel that is different from "could not send cancel request”.

(I agree that most users would not care about the places where errors
were thrown, but I think some users would, and actually, I do when
investigating unfamiliar errors.)

Best regards,
Etsuro Fujita