date:20230515

On Mon, May 15, 2023 at 05:54:53PM -0700, Andres Freund wrote:
> Yes. numactl --physcpubind ... in my case.  Linux has an optimization where it
> does not need to send an IPI when the client and server are scheduled on the
> same core. For single threaded ping-pong tasks like pgbench -c1, that can make
> a huge difference, particularly on larger CPUs.  So you get a lot better
> performance when forcing things to be colocated.

Yes, that's not bringing the numbers higher with the simple cases I
reported previously, either.

Anyway, even if I cannot see such a high difference, I don't see how
to bring back the original numbers you are reporting without doing
more inlining and tying COERCE_SQL_SYNTAX more tightly within the
executor's portions for the FuncExprs, and there are the collation
assumptions as well.  Perhaps that's not the correct thing to do with
SQLValueFunction remaining around, but nothing can be done for v16, so
I am planning to just revert the change before beta1, and look at it
again later, from scratch.
--
Michael


signature.asc
Description: PGP signature

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

On Tue, May 16, 2023 at 1:14 AM Michael Paquier  wrote:

> On Mon, May 15, 2023 at 06:01:02PM -0700, Andres Freund wrote:
> > Why those tweaks are necessary is precisely what I am asking for.
>
> Then the perl script generates the same structures for all the wait
> event classes, with all the events associated to that.  There may be a
> point in keeping extension and buffer pin out of this refactoring, but
> reworking these like that moves in a single place all the wait event
> definitions, making future additions easier.  Buffer pin actually
> needed a small rename to stick with the naming rules of the other
> classes.
>
+1 (Nice explanation, Improving things, Consistency. Always good)

>
> These are the two things refactored in the patch, explaining the what.
> The reason behind the why is to make the script in charge of
> generating all these structures and functions consistent for all the
> wait event classes, simply.  Treating all the wait event classes
> together eases greatly the generation of the documentation, so that it
> is possible to enforce an ordering of the tables of each class used to
> list each wait event type attached to them.  Does this explanation
> make sense?
>
+1 (To me, but I am not important. But having this saved in the thread!!!)

>
> Lock and LWLock are funkier because of the way they grab wait events
> for the inner function, still these had better have their
> documentation generated so as all the SGML tables created for all the
> wait event tables are ordered alphabetically, in the same way as
> HEAD.
> --
> Michael
>
+1 (Whatever helps automate/generate the docs)

Kirk...

Re: Conflict between regression tests namespace & transactions due to recent changes

Noah Misch  writes:
> For the record, I'm fairly sure s/public, test_ns_schema_1/test_ns_schema_1/
> on the new namespace tests would also solve things.  Those tests don't need
> "public" in the picture.  Nonetheless, +1 for your proposal.

Hmm, I'd not read the test case all that closely, but I did think
that including "public" in the search path was an important part
of it.  If it is not, maybe the comments could use adjustment.

regards, tom lane

Re: Conflict between regression tests namespace & transactions due to recent changes

2023-05-15 Thread Noah Misch

On Mon, May 15, 2023 at 12:16:08PM -0400, Tom Lane wrote:
> Marina Polyakova  writes:
> > IIUC the conflict was caused by
> 
> > +SET search_path to public, test_ns_schema_1;
> > +CREATE SCHEMA test_ns_schema_2
> > +   CREATE VIEW abc_view AS SELECT a FROM abc;
> 
> > because the parallel regression test transactions had already created 
> > the table abc and was trying to drop it.
> 
> Hmm.  I'd actually fix the blame on transactions.sql here.  Creating
> a table named as generically as "abc" is horribly bad practice in
> a set of concurrent tests.  namespace.sql is arguably okay, since
> it's creating that table name in a private schema.
> 
> I'd be inclined to fix this by doing s/abc/something-else/g in
> transactions.sql.

For the record, I'm fairly sure s/public, test_ns_schema_1/test_ns_schema_1/
on the new namespace tests would also solve things.  Those tests don't need
"public" in the picture.  Nonetheless, +1 for your proposal.

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

On Mon, May 15, 2023 at 06:01:02PM -0700, Andres Freund wrote:
> On 2023-05-16 09:38:54 +0900, Michael Paquier wrote:
>> On Mon, May 15, 2023 at 05:17:16PM -0700, Andres Freund wrote:
>>> IMO the submission should include why automating requires these changes 
>>> (yours
>>> doesn't really either). I can probably figure it out if I stare a bit at the
>>> code and read the other thread - but I shouldn't need to.
>>
>> Hm?  My previous message includes two reasons..
>
> It explained the motivation, but not why that requires the specific
> changes. At least not in a way that I could quickly undestand.
>
>> The extensions and buffer pin parts need a few internal tweaks to make
>> the other changes much easier to do, which is what the patch of this
>> thread is doing.
>
> Why those tweaks are necessary is precisely what I am asking for.

Not sure how to answer to that without copy-pasting some code, and
that's what is written in the patch.  Anyway, I am used to the context
of what's wanted, so here you go with more details.  As a whole, the
patch reshapes the code to be more consistent for all the wait event 
classes, so as it is possible to generate the code refactored by the
patch of this thread in a single way for all the classes.

The first change is related to the functions associated to each class,
used to fetch a specific wait event.  On HEAD, all the wait event classes
use a dedicated function (activity, IPC, etc.) like that:
case PG_WAIT_BLAH:
{
WaitEventBlah w = (WaitEventBlah) wait_event_info;
event_name = pgstat_get_wait_blah(w);
break;
}

There are two exceptions to that, the wait event classes for extension
and buffer pin, that just do that because these classes have only one
wait event currently: 
case PG_WAIT_EXTENSION:
   event_name = "Extension";
   break
[...]
case PG_WAIT_BUFFER_PIN:
event_name = "BufferPin";
break;
The first thing changed is to introduce two new functions for these
two classes, to work the same way as the other classes.  The script in
charge of generating the code from the wait event .txt file will just
build the internals of these functions.

The second change is to rework the enum structures for extension and
buffer pin, to be consistent with the other classes, so as all the
classes have structures shaped like that:
typedef enum
{
   WAIT_EVENT_1 = PG_WAIT_BLAH,
   [...]
   WAIT_EVENT_N
} WaitEventBlah;

Then the perl script generates the same structures for all the wait
event classes, with all the events associated to that.  There may be a
point in keeping extension and buffer pin out of this refactoring, but
reworking these like that moves in a single place all the wait event
definitions, making future additions easier.  Buffer pin actually
needed a small rename to stick with the naming rules of the other
classes.

These are the two things refactored in the patch, explaining the what.
The reason behind the why is to make the script in charge of
generating all these structures and functions consistent for all the
wait event classes, simply.  Treating all the wait event classes 
together eases greatly the generation of the documentation, so that it
is possible to enforce an ordering of the tables of each class used to
list each wait event type attached to them.  Does this explanation
make sense?

Lock and LWLock are funkier because of the way they grab wait events
for the inner function, still these had better have their
documentation generated so as all the SGML tables created for all the
wait event tables are ordered alphabetically, in the same way as
HEAD.
--
Michael

signature.asc
Description: PGP signature

Missing warning on revokes with grant options

2023-05-15 Thread Joseph Koshakow

Hi Hackers,

I noticed some confusing behavior with REVOKE recently. Normally if
REVOKE fails to revoke anything a warning is printed. For example, see
the following scenario:

```
test=# SELECT current_role;
 current_role
--
 joe
(1 row)

test=# CREATE ROLE r1;
CREATE ROLE
test=# CREATE TABLE t ();
CREATE TABLE
test=# GRANT SELECT ON TABLE t TO r1;
GRANT
test=# SET ROLE r1;
SET
test=> REVOKE SELECT ON TABLE t FROM r1;
WARNING:  no privileges could be revoked for "t"
WARNING:  no privileges could be revoked for column "tableoid" of relation
"t"
WARNING:  no privileges could be revoked for column "cmax" of relation "t"
WARNING:  no privileges could be revoked for column "xmax" of relation "t"
WARNING:  no privileges could be revoked for column "cmin" of relation "t"
WARNING:  no privileges could be revoked for column "xmin" of relation "t"
WARNING:  no privileges could be revoked for column "ctid" of relation "t"
REVOKE
test=> SELECT relacl FROM pg_class WHERE relname = 't';
   relacl
-
 {joe=arwdDxtm/joe,r1=r/joe}
(1 row)

```

However, if the REVOKE fails and the revoker has a grant option on the
privilege, then no warning is emitted. For example, see the following
scenario:

```
test=# SELECT current_role;
 current_role
--
 joe
(1 row)

test=# CREATE ROLE r1;
CREATE ROLE
test=# CREATE TABLE t ();
CREATE TABLE
test=# GRANT SELECT ON TABLE t TO r1 WITH GRANT OPTION;
GRANT
test=# SET ROLE r1;
SET
test=> REVOKE SELECT ON TABLE t FROM r1;
REVOKE
test=> SELECT relacl FROM pg_class WHERE relname = 't';
relacl
--
 {joe=arwdDxtm/joe,r1=r*/joe}
(1 row)

```
The warnings come from restrict_and_check_grant() in aclchk.c. The
psuedo code is

  if (revoked_privileges & available_grant_options == 0)
emit_warning()

In the second example, `r1` does have the proper grant options so no
warning is emitted. However, the revoke has no actual effect.

Reading through the docs [0], I'm not actually sure if the REVOKE
in the second example should succeed or not. At first it says:

> A user can only revoke privileges that were granted directly by that
> user. If, for example, user A has granted a privilege with grant
> option to user B, and user B has in turn granted it to user C, then
> user A cannot revoke the privilege directly from C.

Which seems pretty clear that you can only revoke privileges that you
directly granted. However later on it says:

> As long as some privilege is available, the command will proceed, but
>it will revoke only those privileges for which the user has grant
> options.
...
> while the other forms will issue a warning if grant options for any
> of the privileges specifically named in the command are not held.

Which seems to imply that you can revoke a privilege as long as you
have a grant option on that privilege.

Either way I think the REVOKE should either fail and emit a warning
OR succeed and emit no warning.

I wasn't able to locate where the check for
> A user can only revoke privileges that were granted directly by that
> user.
is in the code, but we should probably just add a warning there.

- Joe Koshakow

[0] https://www.postgresql.org/docs/15/sql-revoke.html

pgbench: can we add a way to specify the schema to write to?

Currently, I believe that we are frowning on writing things directly to
Public by default.

Also, we had already taken that step in our systems.  Furthermore we force
Public to be the first Schema, so this restriction enforces that everything
created must be assigned to the appropriate schema.

That could make us a bit more sensitive to how pgbench operates.  But this
becomes an opportunity to sharpen a tool.

But logging in to our regular users it has no ability to create it's tables
and do it's work.  At the same time, we want exactly this login, because we
want to benchmark things in our application.

Our lives would be simplified if there was a simple way to specify a schema
for pgbench to use.  It would always reference that schema, and it would
not be in the path.  Making it operate "just like our apps" while benching
marking our items.

While we are here, SHOULD we consider having pgbench have a default schema
name it creates and then cleans up?  And then if someone wants to override
that, they can?

Kirk...

Re: psql: Could we get "-- " prefixing on the QUERY outputs? (ECHO_HIDDEN)

On Mon, May 15, 2023 at 10:28 AM Tom Lane  wrote:

> Alvaro Herrera  writes:
> > It's worth considering what will readline history do with the comment.
> > As I recall, we keep /* comments */ together with the query that
> > follows, but the -- comments are keep in a separate history entry.
> > So that's one more reason to prefer /*  */
>
> Good point.
>
> > (To me, that also suggests to remove the asterisk line after each query,
> > and to keep just the one before.)
>
> Meh ... the one after serves to separate a query from its output.
>
> regards, tom lane
>

Actually, I love the feedback!

I just tested whether or not you see the trailing comment line.  And I ONLY
see it in the windows version of PSQL.
And ONLY if you paste it directly in at the command line.
[Because it sends the text line by line, I assume]

Further Testing:

calling with: psql -f  -- no output of the comments (or the query is seen)
-- Windows/Linux

with \e editing... In Linux nothing is displayed from the query!

with \e editing in Windows... I found it buggy when I tossed in (\pset
pager 0) as the first line.  It blew everything up (LOL)
\pset: extra argument "attcollation," ignored
\pset: extra argument "a.attidentity," ignored
\pset: extra argument "a.attgenerated" ignored
\pset: extra argument "FROM" ignored
\pset: extra argument "pg_catalog.pg_attribute" ignored

With that said, I DEFINITELY Move to Remove the secondary comment.  It's
just noise.
and /* */ comments it will be for the topside.

Also, I will take a quick peek at the parse failure that is in windows \e
[Which always does this weird doubling of lines].  But no promises here.
It will be good enough to identify the problem.

Kirk...

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

Hi,

On 2023-05-16 09:38:54 +0900, Michael Paquier wrote:
> On Mon, May 15, 2023 at 05:17:16PM -0700, Andres Freund wrote:
> > IMO the submission should include why automating requires these changes 
> > (yours
> > doesn't really either). I can probably figure it out if I stare a bit at the
> > code and read the other thread - but I shouldn't need to.
> 
> Hm?  My previous message includes two reasons..

It explained the motivation, but not why that requires the specific
changes. At least not in a way that I could quickly undestand.


> The extensions and buffer pin parts need a few internal tweaks to make
> the other changes much easier to do, which is what the patch of this
> thread is doing.

Why those tweaks are necessary is precisely what I am asking for.

Greetings,

Andres Freund

Re: benchmark results comparing versions 15.2 and 16

Hi,

On 2023-05-16 09:42:31 +0900, Michael Paquier wrote:
> > I get quite variable performance if I don't pin client / server to the same
> > core, but even the slow performance is faster than 45k.
> 
> Okay.  You mean with something like taskset or similar, I guess?

Yes. numactl --physcpubind ... in my case.  Linux has an optimization where it
does not need to send an IPI when the client and server are scheduled on the
same core. For single threaded ping-pong tasks like pgbench -c1, that can make
a huge difference, particularly on larger CPUs.  So you get a lot better
performance when forcing things to be colocated.

Greetings,

Andres Freund

Re: benchmark results comparing versions 15.2 and 16

On Mon, May 15, 2023 at 05:14:47PM -0700, Andres Freund wrote:
> On 2023-05-15 14:20:24 +0900, Michael Paquier wrote:
>> On Thu, May 11, 2023 at 01:28:40PM +0900, Michael Paquier wrote:
>> Extreme is adapted for a worst-case scenario.  Looking at my notes
>> from a few months back, that's kind of what I did on my laptop, which
>> was the only machine I had at hand back then:
>> - Compilation of code with -O2.
> 
> I assume without assertions as well?

Yup, no assertions.

> 45k seems too low for a modern machine, given that I get > 80k in such a
> workload, on a workstation with server CPUs (i.e. many cores, but not that
> fast individually).  Hence wondering about assertions being enabled...

Nope, disabled.

> I get quite variable performance if I don't pin client / server to the same
> core, but even the slow performance is faster than 45k.

Okay.  You mean with something like taskset or similar, I guess?
--
Michael


signature.asc
Description: PGP signature

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

On Mon, May 15, 2023 at 05:17:16PM -0700, Andres Freund wrote:
> IMO the submission should include why automating requires these changes (yours
> doesn't really either). I can probably figure it out if I stare a bit at the
> code and read the other thread - but I shouldn't need to.

Hm?  My previous message includes two reasons..  Anyway, I assume that
my previous message is also missing the explanation from the other
thread that this is to translate a .txt file shaped similarly to
errcodes.txt for the wait events (sections as wait event classes,
listing the elements) into automatically-generated SGML and C code :)

The extensions and buffer pin parts need a few internal tweaks to make
the other changes much easier to do, which is what the patch of this
thread is doing.
--
Michael

signature.asc
Description: PGP signature

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

Hi,

On 2023-05-16 07:30:54 +0900, Michael Paquier wrote:
> On Mon, May 15, 2023 at 11:29:56AM -0700, Andres Freund wrote:
> > Without an explanation for why this change is needed for [1], it's hard to
> > give useful feedback...
> 
> The point is to integrate the wait event classes for buffer pin and
> extension into the txt file that automates the creation of the SGML
> and C code associated to them.

IMO the submission should include why automating requires these changes (yours
doesn't really either). I can probably figure it out if I stare a bit at the
code and read the other thread - but I shouldn't need to.

Greetings,

Andres Freund

Re: benchmark results comparing versions 15.2 and 16

Hi,

On 2023-05-15 14:20:24 +0900, Michael Paquier wrote:
> On Thu, May 11, 2023 at 01:28:40PM +0900, Michael Paquier wrote:
> > On Tue, May 09, 2023 at 09:48:24AM -0700, Andres Freund wrote:
> >> On 2023-05-08 12:11:17 -0700, Andres Freund wrote:
> >>> I can reproduce a significant regression due to f193883fc of a workload 
> >>> just
> >>> running
> >>> SELECT CURRENT_TIMESTAMP;
> >>> 
> >>> A single session running it on my workstation via pgbench -Mprepared gets
> >>> before:
> >>> tps = 89359.128359 (without initial connection time)
> >>> after:
> >>> tps = 83843.585152 (without initial connection time)
> >>> 
> >>> Obviously this is an extreme workload, but that nevertheless seems too 
> >>> large
> >>> to just accept...
> 
> Extreme is adapted for a worst-case scenario.  Looking at my notes
> from a few months back, that's kind of what I did on my laptop, which
> was the only machine I had at hand back then:
> - Compilation of code with -O2.

I assume without assertions as well?


> I have re-run a bit more pgbench (1 client, prepared query with a
> single SELECT on a SQL keyword, etc.).  And, TBH, I am not seeing as
> much difference as you do (nothing with default pgbench setup, FWIW),
> still that's able to show a bit more difference than the other two
> cases.

> HEAD shows me an average output close to 43900 TPS (3 run of
> 60s each, for instance), while relying on SQLValueFunction shows an
> average of 45000TPS.  That counts for ~2.4% output regression here
> on bigbox based on these numbers.  Not a regression as high as
> mentioned above, still that's visible.

45k seems too low for a modern machine, given that I get > 80k in such a
workload, on a workstation with server CPUs (i.e. many cores, but not that
fast individually).  Hence wondering about assertions being enabled...

I get quite variable performance if I don't pin client / server to the same
core, but even the slow performance is faster than 45k.

Greetings,

Andres Freund

Re: Using make_ctags leaves tags files in git

On Mon, May 15, 2023 at 12:33:17PM +0200, Alvaro Herrera wrote:
> But make_ctags is *our* script, so I think this rule applies to them as
> well.  (In any case, what can be hurt?  We're not going to add any files
> to git named "tags" anyway.)

Yes, you have a point about the origin of the script generating the
tags.  One thing is that one can still add a file even if listed in
what to ignore, as long as it is done with git-add -f.  Okay, that's
not going to happen.

(FWIW, looking at my stuff, I have just set up that globally in 2018
after seeing the other thread.)
--
Michael

signature.asc
Description: PGP signature

Re: [DOC] Update ALTER SUBSCRIPTION documentation v2

2023-05-15 Thread Peter Smith

On Mon, May 15, 2023 at 11:36 PM Robert Sjöblom
 wrote:
>
>
>
> On 2023-05-05 15:17, Robert Sjöblom wrote:
> >
> > Hi,
> >
> > We have recently used the PostgreSQL documentation when setting up our
> > logical replication. We noticed there was a step missing in the
> > documentation on how to drop a logical replication subscription with a
> > replication slot attached.
>
> Following discussions, please see revised documentation patch.
>

LGTM.

BTW, in the previous thread, there was also a suggestion from Amit [1]
to change the errhint in a similar way. There was no reply to Amit's
idea, so it's not clear whether it's an accidental omission from your
v2 patch or not.

--
[1] 
https://www.postgresql.org/message-id/CAA4eK1J11phiaoCOmsjNqPZ9BOWyLXYrfgrm5vU2uCFPF2kN1Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: issue with meson builds on msys2

Hi,

On 2023-05-15 15:30:28 -0700, Andres Freund wrote:
> As soon as either the pg_ctl for the start, or the whole bash invocation, has
> stdin redirected, the problem vanishes.

For a moment I thought this could be related to InheritStdHandles() - but no,
it doesn't make a difference.

There's loads of handles referencing cygwin alive in pg_ctl.

Based on difference in strace output for bash -c "pg_ctl stop" for the case
where start redirected stdin (#1) and where not (#2), it looks like some part
of msys / cygwin sees that stdin is alive when preparing to execute "pg_ctl
stop", and then runs into trouble.

The way we start the child process on windows makes the use of cmd.exe for
redirection pretty odd.

I couldn't trivially reproduce this with a much simpler case (just nohup
sleep). Perhaps it's dependent on a wrapper cmd or such.

Greetings,

Andres Freund

Re: Conflict between regression tests namespace & transactions due to recent changes

On Mon, May 15, 2023 at 11:23:18PM +0300, Marina Polyakova wrote:
> On 2023-05-15 19:16, Tom Lane wrote:
>> Hmm.  I'd actually fix the blame on transactions.sql here.  Creating
>> a table named as generically as "abc" is horribly bad practice in
>> a set of concurrent tests.  namespace.sql is arguably okay, since
>> it's creating that table name in a private schema.
>> 
>> I'd be inclined to fix this by doing s/abc/something-else/g in
>> transactions.sql.
>
> Maybe use a separate schema for all new objects in the transaction test?..
> See diff_set_tx_schema.patch.

Sure, you could do that to bypass the failure (without the "public"
actually?), leaving non-generic names around.  Still I'd agree with
Tom here and just rename the objects to something more in line with
the context of the test to make things a bit more greppable.  These
could be renamed as transaction_tab or transaction_view, for example.
--
Michael


signature.asc
Description: PGP signature

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

On Mon, May 15, 2023 at 11:29:56AM -0700, Andres Freund wrote:
> Without an explanation for why this change is needed for [1], it's hard to
> give useful feedback...

The point is to integrate the wait event classes for buffer pin and
extension into the txt file that automates the creation of the SGML
and C code associated to them.  Doing the refactoring of this patch
has two advantages:
- Being able to easily organize the tables for each wait event type
alphabetically, the same way as HEAD, for all wait event classes.
- Minimizing the number of exception rules needed in the perl script
that transforms the txt file into SGML and the C code to list all the
events associated in a class, where one function is used for each wait
event class.  Currently the wait event class extension does not use
that.

This impacts only the internal object names, not the wait event
strings or the class associated to them.  So this does not change the
contents of pg_stat_activity.
--
Michael


signature.asc
Description: PGP signature

Re: issue with meson builds on msys2

Hi,

On 2023-05-15 13:13:26 -0700, Andres Freund wrote:
> It wouldn't really - the echo $? inside the system() would report the
> error. Which it doesn't - note the "0" in the second output.

Ah. Interesting. Part of the issue is perl (or msys?) swalling some error
details.

I could see more details in strace once I added another layer of shell
evaluation inside the system() call.

  190  478261 [main] bash 44432 frok::parent: CreateProcessW 
(C:\tools\nmsys64\usr\bin\bash.exe, C:\tools\nmsys64\usr\bin\bash.exe, 0, 0, 1, 
0x420, 0, 0, 0x7BE10, 0x7
BDB0)
--- Process 7152 created
[...]
 1556  196093 [main] bash 44433 child_info_spawn::worker: pid 44433, prog_arg 
./tmp_install/tools/nmsys64/home/pgrunner/bf/root/HEAD/inst/bin/pg_ctl, cmd 
line C:\tools\nmsys6
4\home\pgrunner\bf\root\HEAD\pgsql.build\tmp_install\tools\nmsys64\home\pgrunner\bf\root\HEAD\inst\bin\pg_ctl.exe
 -D t -w -l logfile stop)
  128  196221 [main] bash 44433! child_info_spawn::worker: new process name 
\\?\C:\tools\nmsys64\home\pgrunner\bf\root\HEAD\pgsql.build\tmp_install\tools\nmsys64\home\pgrunne
r\bf\root\HEAD\inst\bin\pg_ctl.exe
[...]
--- Process 6136 (pid: 44433) exited with status 0x0
[...]
--- Process 7152 exited with status 0xc13a
5292450 5816310 [waitproc] bash 44432 pinfo::maybe_set_exit_code_from_windows: 
pid 44433, exit value - old 0x0, windows 0xC13A, MSYS 0x802

So indeed, pg_ctl exits with 0, but bash ends up with a different exit code.

What's very interesting here is that the error is 0xC13A, which is quite
different from the 33280 that perl then reports.  From what I can see bash
actually returns 0xC13A - I don't know how perl ends up with 33280 /
0x8200 from that.

Either way, 0xC13A is interesting - that's 0xC13A,
STATUS_CONTROL_C_EXIT.

Very interestingly the problem vanishes as soon as I add a redirection for
standard input into the mix.  Notably it suffices to redirect stdin in the
pg_ctl *start*, even if not done for pg_ctl stop.  There also is no issue if
perl's stdin is redirected from /dev/null.

My guess is that msys has an issue with refcounting consoles across multiple
processes.

After that I was able to reproduce the issue without really involving perl:

bash -c './tmp_install/tools/nmsys64/home/pgrunner/bf/root/HEAD/inst/bin/pg_ctl 
-D t -w -l logfile start > startlog 2>&1; 
./tmp_install/tools/nmsys64/home/pgrunner/bf/root/HEAD/inst/bin/pg_ctl -D t -w 
-l logfile stop > stoplog 2>&1; echo inner: $?'; echo outer: $?

+ bash -c 
'./tmp_install/tools/nmsys64/home/pgrunner/bf/root/HEAD/inst/bin/pg_ctl -D t -w 
-l logfile start > startlog 2>&1; 
./tmp_install/tools/nmsys64/home/pgrunner/bf/root/HEAD/inst/bin/pg_ctl -D t -w 
-l logfile stop > stoplog 2>&1; echo inner: $?'
inner: 130
+ echo outer: 0
outer: 0

If you add -e, the inner: is obviously "transferred" to the outer: output.

As soon as either the pg_ctl for the start, or the whole bash invocation, has
stdin redirected, the problem vanishes.

Greetings,

Andres Freund

Re: Memory leak from ExecutorState context?

2023-05-15 Thread Jehan-Guillaume de Rorthais

Hi,

Thanks for your review!

On Sat, 13 May 2023 23:47:53 +0200
Tomas Vondra  wrote:

> Thanks for the patches. A couple mostly minor comments, to complement
> Melanie's review:
> 
> 0001
> 
> I'm not really sure about calling this "hybrid hash-join". What does it
> even mean? The new comments simply describe the existing batching, and
> how we increment number of batches, etc.

Unless I'm wrong, I believed it comes from the "Hybrid hash join algorithm" as
described here (+ see pdf in this page ref):

  https://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join

I added the ref in the v7 documentation to avoid futur confusion, is it ok?

> 0002
> 
> I wouldn't call the new ExecHashJoinSaveTuple parameter spillcxt but
> just something generic (e.g. cxt). Yes, we're passing spillCxt, but from
> the function's POV it's just a pointer.

changed in v7.

> I also wouldn't move the ExecHashJoinSaveTuple comment inside - it just
> needs to be reworded that we're expecting the context to be with the
> right lifespan. The function comment is the right place to document such
> expectations, people are less likely to read the function body.

moved and reworded in v7.

> The modified comment in hashjoin.h has a some alignment issues.

I see no alignment issue. I suppose what bother you might be the spaces
before spillCxt and batchCxt to show they are childs of hashCxt? Should I
remove them?

Regards,
>From 997a82a0ee9eda1774b5210401b2d9bf281318da Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 30 Apr 2020 07:16:28 -0700
Subject: [PATCH 1/3] Describe hybrid hash join implementation

This is just a draft to spark conversation on what a good comment might
be like in this file on how the hybrid hash join algorithm is
implemented in Postgres. I'm pretty sure this is the accepted term for
this algorithm https://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join
---
 src/backend/executor/nodeHashjoin.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 0a3f32f731..a39164c15d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -10,6 +10,47 @@
  * IDENTIFICATION
  *	  src/backend/executor/nodeHashjoin.c
  *
+ *   HYBRID HASH JOIN
+ *
+ *  This is based on the hybrid hash join algorythm described shortly in the
+ *  following page, and in length in the pdf in page's notes:
+ *
+ *https://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join
+ *
+ *  If the inner side tuples of a hash join do not fit in memory, the hash join
+ *  can be executed in multiple batches.
+ *
+ *  If the statistics on the inner side relation are accurate, planner chooses a
+ *  multi-batch strategy and estimates the number of batches.
+ *
+ *  The query executor measures the real size of the hashtable and increases the
+ *  number of batches if the hashtable grows too large.
+ *
+ *  The number of batches is always a power of two, so an increase in the number
+ *  of batches doubles it.
+ *
+ *  Serial hash join measures batch size lazily -- waiting until it is loading a
+ *  batch to determine if it will fit in memory. While inserting tuples into the
+ *  hashtable, serial hash join will, if that tuple were to exceed work_mem,
+ *  dump out the hashtable and reassign them either to other batch files or the
+ *  current batch resident in the hashtable.
+ *
+ *  Parallel hash join, on the other hand, completes all changes to the number
+ *  of batches during the build phase. If it increases the number of batches, it
+ *  dumps out all the tuples from all batches and reassigns them to entirely new
+ *  batch files. Then it checks every batch to ensure it will fit in the space
+ *  budget for the query.
+ *
+ *  In both parallel and serial hash join, the executor currently makes a best
+ *  effort. If a particular batch will not fit in memory, it tries doubling the
+ *  number of batches. If after a batch increase, there is a batch which
+ *  retained all or none of its tuples, the executor disables growth in the
+ *  number of batches globally. After growth is disabled, all batches that would
+ *  have previously triggered an increase in the number of batches instead
+ *  exceed the space allowed.
+ *
+ *  TODO: should we discuss that tuples can only spill forward?
+ *
  * PARALLELISM
  *
  * Hash joins can participate in parallel query execution in several ways.  A
-- 
2.40.1

>From 52e989d5eb6405e86e0d460dffbfaca292bc4274 Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais 
Date: Mon, 27 Mar 2023 15:54:39 +0200
Subject: [PATCH 2/3] Allocate hash batches related BufFile in a dedicated
 context

---
 src/backend/executor/nodeHash.c   | 43 ---
 src/backend/executor/nodeHashjoin.c   | 22 
 src/backend/utils/sort/sharedtuplestore.c |  8 +
 src/include/executor/hashjoin.h   | 30 ++--
 src/include/executor/nodeHashjoin.h

Re: Order changes in PG16 since ICU introduction

2023-05-15 Thread Jeff Davis

On Mon, 2023-05-08 at 14:59 -0700, Jeff Davis wrote:
> The easiest thing to do is revert it for now, and after we sort out
> the
> memcmp() path for the ICU provider, then I can commit it again (after
> that point it would just be code cleanup and should have no
> functional
> impact).

The conversion won't be entirely dead code even after we handle the "C"
locale with memcmp(): for a locale like "C.UTF-8", it will still be
passed to the collation provider (same as with libc), and in that case,
we should still convert that to a language tag consistently across ICU
versions.

For it to be entirely dead code, we would need to convert any locale
with language "C" (e.g. "C.UTF-8") to use the memcmp() path. I'm fine
with that, but that's not what the libc provider does today, and
perhaps we should be consistent between the two. If we do leave the
code in place, we can document that specific "en-US-u-va-posix" locale
so that it's not too surprising for users.

Regards,
Jeff Davis

Re: Order changes in PG16 since ICU introduction

2023-05-15 Thread Jeff Davis

On Sat, 2023-05-13 at 13:00 +0300, Alexander Lakhin wrote:
> On the current master (after 455f948b0, and before f7faa9976, of
> course)
> I get an ASAN-detected failure with the following query:
> CREATE COLLATION col (provider = icu, locale = '123456789012');
> 

Thank you for the report!

ICU source specifically says:

  /** 
   * Useful constant for the maximum size of the language
 part of a locale ID. 
   * (including the terminating NULL).
   * @stable ICU 2.0  
   */
  #define ULOC_LANG_CAPACITY 12

So the fact that it returning success without nul-terminating the
result is an ICU bug in my opinion, and I reported it here:

https://unicode-org.atlassian.net/browse/ICU-22394

Unfortunately that means we need to be a bit more paranoid and always
check for that warning, even if we have a preallocated buffer of the
"correct" size. It also means that both U_STRING_NOT_TERMINATED_WARNING
and U_BUFFER_OVERFLOW_ERROR will be user-facing errors (potentially
scary), unless we check for those errors each time and report specific
errors for them.

Another approach is to always check the length and loop using dynamic
allocation so that we never overflow (and forget about any static
buffers). That seems like overkill given that the problem case is
obviously invalid input; I think as long as we catch it and throw an
ERROR it's fine. But I can do this if others think it's worthwhile.

Patch attached. It just checks for the U_STRING_NOT_TERMINATED_WARNING
in a few places and turns it into an ERROR. It also cleans up the loop
for uloc_getLanguageTag() to check for U_STRING_NOT_TERMINATED_WARNING
rather than (U_SUCCESS(status) && len >= buflen).

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

From 9c8e9272ca807c9f75a7b32fa3190700cc600260 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Mon, 15 May 2023 13:35:07 -0700
Subject: [PATCH] ICU: check for U_STRING_NOT_TERMINATED_WARNING.

In some cases, ICU can fail to NUL-terminate a result string even if
using an appropriately-sized static buffer. The caller must either
check for len >= buflen or U_STRING_NOT_TERMINATED_WARNING.

The specific problem is related to uloc_getLanguage(), but add the
check in other cases as well.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/2098874d-c111-41e4-9063-30bcf1352...@gmail.com
---
 src/backend/utils/adt/pg_locale.c | 29 +++--
 src/bin/initdb/initdb.c   | 15 ---
 2 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f0b6567da1..1cf93b2d20 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2468,7 +2468,7 @@ pg_ucol_open(const char *loc_str)

 		status = U_ZERO_ERROR;
 		uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, );
-		if (U_FAILURE(status))
+		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
 		{
 			ereport(ERROR,
 	(errmsg("could not get language from locale \"%s\": %s",
@@ -2504,7 +2504,7 @@ pg_ucol_open(const char *loc_str)
 		 * Pretend the error came from ucol_open(), for consistent error
 		 * message across ICU versions.
 		 */
-		if (U_FAILURE(status))
+		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
 		{
 			ucol_close(collator);
 			ereport(ERROR,
@@ -2639,7 +2639,8 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 	status = U_ZERO_ERROR;
 	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
  buff_uchar, len_uchar, );
-	if (U_FAILURE(status))
+	if (U_FAILURE(status) ||
+		status == U_STRING_NOT_TERMINATED_WARNING)
 		ereport(ERROR,
 (errmsg("%s failed: %s", "ucnv_fromUChars",
 		u_errorName(status;
@@ -2681,7 +2682,7 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,
 	icu_locale_id = palloc(len + 1);
 	*status = U_ZERO_ERROR;
 	len = uloc_canonicalize(loc, icu_locale_id, len + 1, status);
-	if (U_FAILURE(*status))
+	if (U_FAILURE(*status) || *status == U_STRING_NOT_TERMINATED_WARNING)
 		return;

 	lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id));
@@ -2765,7 +2766,6 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,

 	pfree(lower_str);
 }
-
 #endif

 /*
@@ -2789,7 +2789,7 @@ icu_language_tag(const char *loc_str, int elevel)

 	status = U_ZERO_ERROR;
 	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, );
-	if (U_FAILURE(status))
+	if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
 	{
 		if (elevel > 0)
 			ereport(elevel,
@@ -2811,19 +2811,12 @@ icu_language_tag(const char *loc_str, int elevel)
 	langtag = palloc(buflen);
 	while (true)
 	{
-		int32_t		len;
-
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, );
+

Re: Conflict between regression tests namespace & transactions due to recent changes

2023-05-15 Thread Marina Polyakova


On 2023-05-15 19:16, Tom Lane wrote:

Marina Polyakova  writes:

IIUC the conflict was caused by



+SET search_path to public, test_ns_schema_1;
+CREATE SCHEMA test_ns_schema_2
+   CREATE VIEW abc_view AS SELECT a FROM abc;



because the parallel regression test transactions had already created
the table abc and was trying to drop it.


Hmm.  I'd actually fix the blame on transactions.sql here.  Creating
a table named as generically as "abc" is horribly bad practice in
a set of concurrent tests.  namespace.sql is arguably okay, since
it's creating that table name in a private schema.

I'd be inclined to fix this by doing s/abc/something-else/g in
transactions.sql.

regards, tom lane


Maybe use a separate schema for all new objects in the transaction 
test?.. See diff_set_tx_schema.patch.


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/test/regress/expected/transactions.out b/src/test/regress/expected/transactions.out
index 2b2cff7d9120a139532d1a6a2f2bc79e463bc4fa..6a3758bafb54327fbe341dfd89457e53ffce47dd 100644
--- a/src/test/regress/expected/transactions.out
+++ b/src/test/regress/expected/transactions.out
@@ -1,6 +1,10 @@
 --
 -- TRANSACTIONS
 --
+-- Use your schema to avoid conflicts with objects with the same names in
+-- parallel tests.
+CREATE SCHEMA test_tx_schema;
+SET search_path TO test_tx_schema,public;
 BEGIN;
 CREATE TABLE xacttest (a smallint, b real);
 INSERT INTO xacttest VALUES
diff --git a/src/test/regress/sql/transactions.sql b/src/test/regress/sql/transactions.sql
index 7ee5f6aaa55a686f5484664c3fae224c08bde42a..0e6d6453edaa776425524d43abf5ae3cfca3ac45 100644
--- a/src/test/regress/sql/transactions.sql
+++ b/src/test/regress/sql/transactions.sql
@@ -2,6 +2,11 @@
 -- TRANSACTIONS
 --
 
+-- Use your schema to avoid conflicts with objects with the same names in
+-- parallel tests.
+CREATE SCHEMA test_tx_schema;
+SET search_path TO test_tx_schema,public;
+
 BEGIN;
 
 CREATE TABLE xacttest (a smallint, b real);

Re: issue with meson builds on msys2

Hi,

On 2023-05-15 16:01:39 -0400, Andrew Dunstan wrote:
> On 2023-05-15 Mo 15:38, Andres Freund wrote:
> > Hi,
> > 
> > On 2023-05-05 07:08:39 -0400, Andrew Dunstan wrote:
> > > If you want to play I can arrange access.
> > Andrew did - thanks!
> > 
> > 
> > A first observeration is that making the shell command slightly more
> > complicated, by echoing $? after pg_ctl, prevents the error:
> > 
> > /usr/bin/perl -e 'system(qq{"bin/pg_ctl" -D data-C -w -l logfile start > 
> > startlog 2>&1}) ;system(qq{"bin/pg_ctl" -D data-C -w -l logfile stop > 
> > stoplog 2>&1;}) ; print $? ? "BANG: $?\n" : "OK\n";'
> > BANG: 33280
> > 
> > /usr/bin/perl -e 'system(qq{"bin/pg_ctl" -D data-C -w -l logfile start > 
> > startlog 2>&1}) ;system(qq{"bin/pg_ctl" -D data-C -w -l logfile stop > 
> > stoplog 2>&1; echo $?}) ; print $? ? "BANG: $?\n" : "OK\n";'
> > 0
> > OK
> 
> 
> You're now testing something else, namely the return of the echo rather than
> the call to pg_ctl, so I don't think this is any kind of answer. It would
> just be ignoring the result of pg_ctl.

It wouldn't really - the echo $? inside the system() would report the
error. Which it doesn't - note the "0" in the second output.


> > Andrew, is it ok if modify pg_ctl.c and rebuild? I don't know how "detached"
> > from the actual buildfarm animal the system you gave me access to is...
> > 
> 
> Feel free to do anything you want. This is a completely separate instance
> from the buildfarm animals. When we're done with this issue the EC2 instance
> will go away.

Thanks!

Greetings,

Andres Freund

Re: createuser --memeber and PG 16

2023-05-15 Thread Nathan Bossart

On Mon, May 15, 2023 at 04:27:04PM +0900, Michael Paquier wrote:
> On Fri, May 12, 2023 at 04:35:34PM +0200, Peter Eisentraut wrote:
>> it's not intuitive whether foo becomes a member of bar or bar becomes a
>> member of foo.  Maybe something more verbose like --member-of would help?
> 
> Indeed, presented like that it could be confusing, and --member-of
> sounds like it could be a good idea instead of --member.

--member specifieѕ an existing role that will be given membership to the
new role (i.e., GRANT newrole TO existingrole).  IMO --member-of sounds
like the new role will be given membership to the specified existing role
(i.e., GRANT existingrole TO newrole).  IOW a command like

createuser newrole --member-of existingrole

would make existingrole a "member of" newrole according to \du.  Perhaps
--role should be --member-of because it makes the new role a member of the
existing role.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Upgrade of git.postgresql.org

2023-05-15 Thread Magnus Hagander

On Mon, May 15, 2023 at 9:37 PM Magnus Hagander  wrote:
>
> The pginfra team is about to begin an upgrade of git.postgresql.org to
> a new version of debian. During the operation there may be some
> intermittent outages -- we will let you know once the upgrade is
> complete.

This upgrade has now been completed. If you see any issues with it
from now on, please let us know and we'll look into it!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: issue with meson builds on msys2

2023-05-15 Thread Andrew Dunstan



On 2023-05-15 Mo 15:38, Andres Freund wrote:

Hi,

On 2023-05-05 07:08:39 -0400, Andrew Dunstan wrote:

If you want to play I can arrange access.

Andrew did - thanks!


A first observeration is that making the shell command slightly more
complicated, by echoing $? after pg_ctl, prevents the error:

/usr/bin/perl -e 'system(qq{"bin/pg_ctl" -D data-C -w -l logfile start > startlog 2>&1}) 
;system(qq{"bin/pg_ctl" -D data-C -w -l logfile stop > stoplog 2>&1;}) ; print $? ? "BANG: $?\n" : 
"OK\n";'
BANG: 33280

/usr/bin/perl -e 'system(qq{"bin/pg_ctl" -D data-C -w -l logfile start > startlog 2>&1}) 
;system(qq{"bin/pg_ctl" -D data-C -w -l logfile stop > stoplog 2>&1; echo $?}) ; print $? ? "BANG: $?\n" : 
"OK\n";'
0
OK



You're now testing something else, namely the return of the echo rather 
than the call to pg_ctl, so I don't think this is any kind of answer. It 
would just be ignoring the result of pg_ctl.





So does manually or or via a subshell adding another layer of shell.


As Andrew observed earlier, the issue does not occur when not performing
redirection of the output. One interesting bit there is that the perl docs for
system include:
https://perldoc.perl.org/functions/system


If there are no shell metacharacters in the argument, it is split into words
and passed directly to execvp, which is more efficient. On Windows, only the
system PROGRAM LIST syntax will reliably avoid using the shell; system LIST,
even with more than one element, will fall back to the shell if the first
spawn fails.

My guesss is that the issue somehow is triggered around the shell handling.


One relevant bit: If I use strace (from msys) within system, the subprograms
(shell and pg_ctl) actually exit with 0, from what I can tell - but 33280
still is returned. Unfortunately, if I use strace for all of perl, the error
vanishes.


Perhaps are some odd interactions with the stuff that InheritstdHandles()
does?



I observed the same thing with strace. Kind of a Heisenbug.




Andrew, is it ok if modify pg_ctl.c and rebuild? I don't know how "detached"
from the actual buildfarm animal the system you gave me access to is...



Feel free to do anything you want. This is a completely separate 
instance from the buildfarm animals. When we're done with this issue the 
EC2 instance will go away.


If you use the script just run in test mode or from-source mode, so it 
doesn't try to report results (that would fail anyway, as it doesn't 
have a registered secret). You might have to force have_ipc_run to 0. Or 
you can just build / install manually.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: issue with meson builds on msys2

Hi,

On 2023-05-05 07:08:39 -0400, Andrew Dunstan wrote:
> If you want to play I can arrange access.

Andrew did - thanks!

A first observeration is that making the shell command slightly more
complicated, by echoing $? after pg_ctl, prevents the error:

/usr/bin/perl -e 'system(qq{"bin/pg_ctl" -D data-C -w -l logfile start > 
startlog 2>&1}) ;system(qq{"bin/pg_ctl" -D data-C -w -l logfile stop > stoplog 
2>&1;}) ; print $? ? "BANG: $?\n" : "OK\n";'
BANG: 33280

/usr/bin/perl -e 'system(qq{"bin/pg_ctl" -D data-C -w -l logfile start > 
startlog 2>&1}) ;system(qq{"bin/pg_ctl" -D data-C -w -l logfile stop > stoplog 
2>&1; echo $?}) ; print $? ? "BANG: $?\n" : "OK\n";'
0
OK

So does manually or or via a subshell adding another layer of shell.

As Andrew observed earlier, the issue does not occur when not performing
redirection of the output. One interesting bit there is that the perl docs for
system include:
https://perldoc.perl.org/functions/system

> If there are no shell metacharacters in the argument, it is split into words
> and passed directly to execvp, which is more efficient. On Windows, only the
> system PROGRAM LIST syntax will reliably avoid using the shell; system LIST,
> even with more than one element, will fall back to the shell if the first
> spawn fails.

My guesss is that the issue somehow is triggered around the shell handling.

One relevant bit: If I use strace (from msys) within system, the subprograms
(shell and pg_ctl) actually exit with 0, from what I can tell - but 33280
still is returned. Unfortunately, if I use strace for all of perl, the error
vanishes.

Perhaps are some odd interactions with the stuff that InheritstdHandles()
does?

Andrew, is it ok if modify pg_ctl.c and rebuild? I don't know how "detached"
from the actual buildfarm animal the system you gave me access to is...

Greetings,

Andres Freund

Upgrade of git.postgresql.org

2023-05-15 Thread Magnus Hagander

The pginfra team is about to begin an upgrade of git.postgresql.org to
a new version of debian. During the operation there may be some
intermittent outages -- we will let you know once the upgrade is
complete.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Move un-parenthesized syntax docs to "compatibility" for few SQL commands

2023-05-15 Thread Nathan Bossart

On Tue, Apr 25, 2023 at 03:18:44PM +0900, Michael Paquier wrote:
> On Mon, Apr 24, 2023 at 09:52:21AM -0700, Nathan Bossart wrote:
>> I think this can wait for v17, but if there's a strong argument for doing
>> some of this sooner, we can reevaluate.
> 
> FWIW, I agree to hold on this stuff for v17~ once it opens for
> business.  There is no urgency here.

There's still some time before we'll be able to commit any of these, but
here is an attempt at addressing all the feedback thus far.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From b27c3f95bb8da5a8e71cfb3a438a998ff9852cf2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 15 May 2023 12:02:09 -0700
Subject: [PATCH v4 1/3] Rearrange CLUSTER rules in gram.y.

This change moves the unparenthesized syntaxes for CLUSTER to the
end of the ClusterStmt rules in preparation for a follow-up commit
that will move these syntaxes to the "Compatibility" section of the
CLUSTER documentation.  The documentation for the unparenthesized
syntaxes has also been consolidated.

Suggested-by: Melanie Plageman
Discussion https://postgr.es/m/CAAKRu_bc5uHieG1976kGqJKxyWtyQt9yvktjsVX%2Bi7NOigDjOA%40mail.gmail.com
---
 doc/src/sgml/ref/cluster.sgml |  3 +--
 src/backend/parser/gram.y | 27 +--
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 29f0f1fd90..35b8deaec1 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -21,9 +21,8 @@ PostgreSQL documentation
 
  
 
-CLUSTER [VERBOSE] table_name [ USING index_name ]
 CLUSTER ( option [, ...] ) table_name [ USING index_name ]
-CLUSTER [VERBOSE]
+CLUSTER [VERBOSE] [ replaceable class="parameter">table_name [ USING index_name ] ]
 
 where option can be one of:
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d6426f3b8e..a0b741ea72 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11574,33 +11574,32 @@ CreateConversionStmt:
 /*
  *
  *		QUERY:
- *CLUSTER [VERBOSE]  [ USING  ]
- *CLUSTER [ (options) ]  [ USING  ]
- *CLUSTER [VERBOSE]
+ *CLUSTER (options)  [ USING  ]
+ *CLUSTER [VERBOSE] [  [ USING  ] ]
  *CLUSTER [VERBOSE]  ON  (for pre-8.3)
  *
  */
 
 ClusterStmt:
-			CLUSTER opt_verbose qualified_name cluster_index_specification
+			CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
 {
 	ClusterStmt *n = makeNode(ClusterStmt);
 
-	n->relation = $3;
-	n->indexname = $4;
-	n->params = NIL;
-	if ($2)
-		n->params = lappend(n->params, makeDefElem("verbose", NULL, @2));
+	n->relation = $5;
+	n->indexname = $6;
+	n->params = $3;
 	$$ = (Node *) n;
 }
-
-			| CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+			/* unparenthesized VERBOSE kept for pre-14 compatibility */
+			| CLUSTER opt_verbose qualified_name cluster_index_specification
 {
 	ClusterStmt *n = makeNode(ClusterStmt);
 
-	n->relation = $5;
-	n->indexname = $6;
-	n->params = $3;
+	n->relation = $3;
+	n->indexname = $4;
+	n->params = NIL;
+	if ($2)
+		n->params = lappend(n->params, makeDefElem("verbose", NULL, @2));
 	$$ = (Node *) n;
 }
 			| CLUSTER opt_verbose
-- 
2.25.1

>From 98364f6360a1b48e772c4d1dc878030a7613dfff Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 15 May 2023 12:13:12 -0700
Subject: [PATCH v4 2/3] Support parenthesized syntax for CLUSTER without a
 table name.

b5913f6120 added a parenthesized syntax for CLUSTER, but it
requires specifying a table name.  This is unlike commands such as
VACUUM and ANALYZE, which do not require specifying a table in the
parenthesized syntax.  This change resolves this inconsistency.

This is preparatory work for a follow-up commit that will move the
unparenthesized syntax to the "Compatibility" section of the
CLUSTER documentation.

Reviewed-by: Melanie Plageman
Discussion: https://postgr.es/m/CAAKRu_bc5uHieG1976kGqJKxyWtyQt9yvktjsVX%2Bi7NOigDjOA%40mail.gmail.com
---
 doc/src/sgml/ref/cluster.sgml |  2 +-
 src/backend/parser/gram.y | 12 +++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 35b8deaec1..33aab6a4ab 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CLUSTER ( option [, ...] ) table_name [ USING index_name ]
+CLUSTER ( option [, ...] ) [ table_name [ USING index_name ] ]
 CLUSTER [VERBOSE] [ replaceable class="parameter">table_name [ USING index_name ] ]
 
 where option can be one of:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0b741ea72..d41ae1ab0e 100644
---

Re: cutting down the TODO list thread

2023-05-15 Thread Robert Haas

On Mon, May 15, 2023 at 2:05 PM Matthias van de Meent
 wrote:
> > Add SET PERFORMANCE_TIPS option to suggest INDEX, VACUUM, VACUUM ANALYZE, 
> > and CLUSTER
> > -> There are external tools that help with this kind of analysis
>
> Althrough there are external tools which help with the analysis, the
> silent complaint of this item seems to be "PostgreSQL doesn't provide
> the user with actionable maintenance tasks/DDL info", and I
> wholeheartedly agree with that.

Well, if SET PERFORMANCE_TIPS is adding a new GUC, that's inherently
something that could not happen in a contrib module, only in core. If
it's dedicated syntax of some kind, same thing. I am not at all
convinced that adding something spelled SET PERFORMANCE_TIPS in any
form is a good idea, but at the very least someone would have to
propose what the possible values of that option would be and what they
would do in a pretty detailed way for us to decide whether we liked it
or not. It seems to me that it would be quite difficult to get any
kind of consensus. If you came up with several different kinds of
performance tips and made the GUC a comma-separated list of tip types,
I suspect that you'd get a good bit of opposition: maybe Tom would
dislike one of the tip types for some reason, me a second, and Andres
a third. If you remove all three of those you've gutted your
implementation. Whee.

But even if we leave the syntax aside, it is very difficult, IMHO, to
come up with something in this area that makes sense to put in core.
There are so many things you could warn about, and so many possible
algorithms that you could use to emit warnings. We do have a few
things somewhat like this in core, like the warning that checkpoints
are happening too close together, or the hint that when you said DROP
TABLE olders maybe you really meant DROP TABLE orders. But in those
cases, the situation is fairly unambiguous: if your checkpoints are
happening too close together, you should probably raise max_wal_size,
as long as it's not going to run you out of disk space. If you
specified a non-existent object name, you should probably correct the
object name to something that does exist.

But things like CREATE INDEX or CLUSTER are a lot trickier. I struggle
to think of what individual PostgreSQL command would have enough
context to know that those things are a good idea. For example,
clustering a table on an index makes sense if (1) there are queries
that would run faster with the clustering and (2) they are run
frequently enough and are expensive enough that the savings would be
material and (3) the clustering wouldn't degrade so quickly as to be
pointless. But I don't see how it would be possible to discover this
situation without instrumenting the whole workload, or at least having
some trace of the workload. Even if you have the data, you probably
need to do a bunch of number-crunching to come up with good
recommendations, and that's expensive, and you probably have to be OK
with a significantly higher risk of wrong answers, too, because the
past may be different from the future, and the planner's estimates of
what the clustering would save might be wrong.

I wouldn't go so far as to say that doing anything of this sort is
absolutely and categorically hopeless, but suggesting to an aspiring
hacker (or even an established one) that they go try to implement SET
PERFORMANCE_TIPS isn't helpful at all. At least in my opinion, it's
not clear what that means, or that we want it, or what we might want
instead, or even that we want anything at all. We should aim to have a
TODO list filled with things that are actionable and likely to be
worth the effort someone might choose to invest in them.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

Hi,

On 2023-05-15 10:07:04 +0200, Drouvot, Bertrand wrote:
> Please find attached a patch to $SUBJECT.
> 
> This is preliminary work to autogenerate some wait events
> code and documentation done in [1].
> 
> The patch introduces 2 new "wait events" (WAIT_EVENT_EXTENSION
> and WAIT_EVENT_BUFFER_PIN) and their associated functions
> (pgstat_get_wait_extension() and pgstat_get_wait_bufferpin() resp.)
> 
> Please note that that would not break extensions outside contrib
> that make use of the existing PG_WAIT_EXTENSION.
> 
> [1]: 
> https://www.postgresql.org/message-id/flat/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9...@gmail.com
> 
> Looking forward to your feedback,

Without an explanation for why this change is needed for [1], it's hard to
give useful feedback...

Greetings,

Andres Freund

Re: cutting down the TODO list thread

Robert Haas  writes:
> On Sat, May 13, 2023 at 12:42 AM John Naylor
>  wrote:
>> Add support for polymorphic arguments and return types to languages other 
>> than PL/PgSQL
>> -> Seems if someone needed this, they would say so (no thread).
>> 
>> Add support for OUT and INOUT parameters to languages other than PL/PgSQL
>> -> Ditto

> These actually seem like pretty interesting projects.

Yeah.  I'm surprised that nobody has gotten around to scratching
this itch yet.

regards, tom lane

Re: cutting down the TODO list thread

2023-05-15 Thread Matthias van de Meent

On Sat, 13 May 2023 at 06:42, John Naylor  wrote:
>
>
> On Mon, Feb 6, 2023 at 11:04 AM John Naylor  
> wrote:
> > I'll try to get back to culling the list items at the end of April.
>
> I've made another pass at this. Previously, I went one or two sections at a 
> time, but this time I tried just skimming the whole thing and noting what 
> jumps out at me. Also, I've separated things into three categories: Remove, 
> move to "not wanted list", and revise. Comments and objections welcome, as 
> always.
>
> [...]
> 
> 2. Propose to move to the "Not Wanted list":
>
> (Anything already at the bottom under the heading "Features We Do Not Want", 
> with the exception of "threads in a single process". I'll just remove that -- 
> if we ever decide that's worth pursuing, it'll be because we decided we can't 
> really avoid it anymore, and in that case we surely don't need to put it 
> here.)
>
> Add SET PERFORMANCE_TIPS option to suggest INDEX, VACUUM, VACUUM ANALYZE, and 
> CLUSTER
> -> There are external tools that help with this kind of analysis

Althrough there are external tools which help with the analysis, the
silent complaint of this item seems to be "PostgreSQL doesn't provide
the user with actionable maintenance tasks/DDL info", and I
wholeheartedly agree with that. Finding out which plans are bad, why
they're bad, and how to fix them currently has quite a steep learning
curve; and while external tools do help, they are not at all commonly
available.

The result I got when searching for "automatic postgresql index
suggestions" was a combination of hypopg, pg_qualstats and some manual
glue to get suggested indexes in the current database - but none of
these are available in the main distribution.

I'm not asking this to be part of the main PostgreSQL binary, but I
don't think that the idea of 'automated suggestions' should be moved
to the "not wanted" list - I'd suggest adding it to a list for Contrib
instead, if we're insisting on removing it from the main TODO list.

Kind regards,

Matthias van de Meent

PS. note how we already have _some_ suggestions about vacuum and
reindex in PostgreSQL, but that is only when things are obviously
wrong, and we don't make what I would call intelligent suggestions -
in one place we still suggest to shut down the postmaster and then
vacuum specific databases in single-user mode.

Re: cutting down the TODO list thread

2023-05-15 Thread Robert Haas

On Sat, May 13, 2023 at 12:42 AM John Naylor
 wrote:
> Add support for polymorphic arguments and return types to languages other 
> than PL/PgSQL
> -> Seems if someone needed this, they would say so (no thread).
>
> Add support for OUT and INOUT parameters to languages other than PL/PgSQL
> -> Ditto

These actually seem like pretty interesting projects.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Large files for relations

2023-05-15 Thread Robert Haas

On Fri, May 12, 2023 at 9:53 AM Stephen Frost  wrote:
> While I tend to agree that 1GB is too small, 1TB seems like it's
> possibly going to end up on the too big side of things, or at least,
> if we aren't getting rid of the segment code then it's possibly throwing
> away the benefits we have from the smaller segments without really
> giving us all that much.  Going from 1G to 10G would reduce the number
> of open file descriptors by quite a lot without having much of a net
> change on other things.  50G or 100G would reduce the FD handles further
> but starts to make us lose out a bit more on some of the nice parts of
> having multiple segments.

This is my view as well, more or less. I don't really like our current
handling of relation segments; we know it has bugs, and making it
non-buggy feels difficult. And there are performance issues as well --
file descriptor consumption, for sure, but also probably that crossing
a file boundary likely breaks the operating system's ability to do
readahead to some degree. However, I think we're going to find that
moving to a system where we have just one file per relation fork and
that file can be arbitrarily large is not fantastic, either. Jim's
point about running into filesystem limits is a good one (hi Jim, long
time no see!) and the problem he points out with ext4 is almost
certainly not the only one. It doesn't just have to be filesystems,
either. It could be a limitation of an archiving tool (tar, zip, cpio)
or a file copy utility or whatever as well. A quick Google search
suggests that most such things have been updated to use 64-bit sizes,
but my point is that the set of things that can potentially cause
problems is broader than just the filesystem. Furthermore, even when
there's no hard limit at play, a smaller file size can occasionally be
*convenient*, as in Pavel's example of using hard links to share
storage between backups. From that point of view, a 16GB or 64GB or
256GB file size limit seems more convenient than no limit and more
convenient than a large limit like 1TB.

However, the bugs are the flies in the ointment (ahem). If we just
make the segment size bigger but don't get rid of segments altogether,
then we still have to fix the bugs that can occur when you do have
multiple segments. I think part of Thomas's motivation is to dodge
that whole category of problems. If we gradually deprecate
multi-segment mode in favor of single-file-per-relation-fork, then the
fact that the segment handling code has bugs becomes progressively
less relevant. While that does make some sense, I'm not sure I really
agree with the approach. The problem is that we're trading problems
that we at least theoretically can fix somehow by hitting our code
with a big enough hammer for an unknown set of problems that stem from
limitations of software we don't control, maybe don't even know about.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Large files for relations

2023-05-15 Thread MARK CALLAGHAN

On Fri, May 12, 2023 at 4:02 PM Thomas Munro  wrote:

> On Sat, May 13, 2023 at 4:41 AM MARK CALLAGHAN  wrote:
> > Repeating what was mentioned on Twitter, because I had some experience
> with the topic. With fewer files per table there will be more contention on
> the per-inode mutex (which might now be the per-inode rwsem). I haven't
> read filesystem source in a long time. Back in the day, and perhaps today,
> it was locked for the duration of a write to storage (locked within the
> kernel) and was briefly locked while setting up a read.
> >
> > The workaround for writes was one of:
> > 1) enable disk write cache or use battery-backed HW RAID to make writes
> faster (yes disks, I encountered this prior to 2010)
> > 2) use XFS and O_DIRECT in which case the per-inode mutex (rwsem) wasn't
> locked for the duration of a write
> >
> > I have a vague memory that filesystems have improved in this regard.
>
> (I am interpreting your "use XFS" to mean "use XFS instead of ext4".)
>

Yes, although when the decision was made it was probably ext-3 -> XFS.  We
suffered from fsync a file == fsync the filesystem
because MySQL binlogs use buffered IO and are appended on write. Switching
from ext-? to XFS was an easy perf win
so I don't have much experience with ext-? over the past decade.

> Right, 80s file systems like UFS (and I suspect ext and ext2, which
>

Late 80s is when I last hacked on Unix fileys code, excluding browsing XFS
and ext source. Unix was easy back then -- one big kernel lock covers
everything.

> some time sooner).  Currently our code believes that it is not safe to
> call fdatasync() for files whose size might have changed.  There is no
>

Long ago we added code for InnoDB to avoid fsync/fdatasync in some cases
when O_DIRECT was used. While great for performance
we also forgot to make sure they were still done when files were extended.
Eventually we fixed that.

Thanks for all of the details.

-- 
Mark Callaghan
mdcal...@gmail.com

Re: cutting down the TODO list thread

2023-05-15 Thread Bruce Momjian

On Sat, May 13, 2023 at 01:31:21AM -0400, Tom Lane wrote:
> John Naylor  writes:
> > I've made another pass at this. Previously, I went one or two sections at a
> > time, but this time I tried just skimming the whole thing and noting what
> > jumps out at me. Also, I've separated things into three categories: Remove,
> > move to "not wanted list", and revise. Comments and objections welcome, as
> > always.
> 
> Generally agree that the items you've listed are obsolete.  Some comments:

I agree with this email and John's.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.

Re: Conflict between regression tests namespace & transactions due to recent changes

Marina Polyakova  writes:
> IIUC the conflict was caused by

> +SET search_path to public, test_ns_schema_1;
> +CREATE SCHEMA test_ns_schema_2
> +   CREATE VIEW abc_view AS SELECT a FROM abc;

> because the parallel regression test transactions had already created 
> the table abc and was trying to drop it.

Hmm.  I'd actually fix the blame on transactions.sql here.  Creating
a table named as generically as "abc" is horribly bad practice in
a set of concurrent tests.  namespace.sql is arguably okay, since
it's creating that table name in a private schema.

I'd be inclined to fix this by doing s/abc/something-else/g in
transactions.sql.

regards, tom lane

Conflict between regression tests namespace & transactions due to recent changes

2023-05-15 Thread Marina Polyakova


Hello, hackers!

When running tests for version 15, we found a conflict between 
regression tests namespace & transactions due to recent changes [1].


diff -w -U3 .../src/test/regress/expected/transactions.out 
.../src/bin/pg_upgrade/tmp_check/results/transactions.out

--- .../src/test/regress/expected/transactions.out ...
+++ .../src/bin/pg_upgrade/tmp_check/results/transactions.out ...
@@ -899,6 +899,9 @@

 RESET default_transaction_read_only;
 DROP TABLE abc;
+ERROR:  cannot drop table abc because other objects depend on it
+DETAIL:  view test_ns_schema_2.abc_view depends on table abc
+HINT:  Use DROP ... CASCADE to drop the dependent objects too.
 -- Test assorted behaviors around the implicit transaction block 
created
 -- when multiple SQL commands are sent in a single Query message.  
These
 -- tests rely on the fact that psql will not break SQL commands apart 
at a

...

IIUC the conflict was caused by

+SET search_path to public, test_ns_schema_1;
+CREATE SCHEMA test_ns_schema_2
+   CREATE VIEW abc_view AS SELECT a FROM abc;

because the parallel regression test transactions had already created 
the table abc and was trying to drop it.


ISTM the patch diff.patch fixes this problem...

[1] 
https://github.com/postgres/postgres/commit/dbd5795e7539ec9e15c0d4ed2d05b1b18d2a3b09


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/test/regress/expected/namespace.out b/src/test/regress/expected/namespace.out
index a62fd8ded015e6ee56784e4341e5ac1a4ed5621f..bcb6a75cf230fdc30f728201c2a2baef1838ed18 100644
--- a/src/test/regress/expected/namespace.out
+++ b/src/test/regress/expected/namespace.out
@@ -35,10 +35,15 @@ SHOW search_path;
 
 -- verify that the correct search_path preserved
 -- after creating the schema and on commit
+-- create your table to not use the same table from transactions test
 BEGIN;
 SET search_path to public, test_ns_schema_1;
 CREATE SCHEMA test_ns_schema_2
-   CREATE VIEW abc_view AS SELECT a FROM abc;
+   CREATE VIEW abc_view AS SELECT a FROM abc
+   CREATE TABLE abc (
+  a serial,
+  b int UNIQUE
+   );
 SHOW search_path;
search_path
 --
@@ -53,7 +58,9 @@ SHOW search_path;
 (1 row)
 
 DROP SCHEMA test_ns_schema_2 CASCADE;
-NOTICE:  drop cascades to view test_ns_schema_2.abc_view
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table test_ns_schema_2.abc
+drop cascades to view test_ns_schema_2.abc_view
 -- verify that the objects were created
 SELECT COUNT(*) FROM pg_class WHERE relnamespace =
 (SELECT oid FROM pg_namespace WHERE nspname = 'test_ns_schema_1');
diff --git a/src/test/regress/sql/namespace.sql b/src/test/regress/sql/namespace.sql
index 3474f5ecf4215068fd89ec52f9ee8b95ddff36bb..95b9ed7b00807222e94e30221348f58039c63257 100644
--- a/src/test/regress/sql/namespace.sql
+++ b/src/test/regress/sql/namespace.sql
@@ -28,10 +28,16 @@ SHOW search_path;
 
 -- verify that the correct search_path preserved
 -- after creating the schema and on commit
+-- create your table to not use the same table from transactions test
 BEGIN;
 SET search_path to public, test_ns_schema_1;
 CREATE SCHEMA test_ns_schema_2
-   CREATE VIEW abc_view AS SELECT a FROM abc;
+   CREATE VIEW abc_view AS SELECT a FROM abc
+
+   CREATE TABLE abc (
+  a serial,
+  b int UNIQUE
+   );
 SHOW search_path;
 COMMIT;
 SHOW search_path;

Re: walsender performance regression due to logical decoding on standby changes

2023-05-15 Thread Bharath Rupireddy

On Mon, May 15, 2023 at 6:14 PM Masahiko Sawada  wrote:
>
> On Mon, May 15, 2023 at 1:49 PM Thomas Munro  wrote:
> >
> > On Fri, May 12, 2023 at 11:58 PM Bharath Rupireddy
> >  wrote:
> > > Andres, rightly put it - 'mis-using' CV infrastructure. It is simple,
> > > works, and makes the WalSndWakeup() easy solving the performance
> > > regression.
> >
> > Yeah, this seems OK, and better than the complicated alternatives.  If
> > one day we want to implement CVs some other way so that this
> > I-know-that-CVs-are-really-made-out-of-latches abstraction leak
> > becomes a problem, and we still need this, well then we can make a
> > separate latch-wait-list thing.
>
> +1

Thanks for acknowledging the approach.

On Mon, May 15, 2023 at 2:11 PM Drouvot, Bertrand
 wrote:
>
> Thanks for V2! It does look good to me and I like the fact that
> WalSndWakeup() does not need to loop on all the Walsenders slot
> anymore (for both the physical and logical cases).

Indeed, it doesn't have to.

On Mon, May 15, 2023 at 7:50 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Thanks for updating the patch. I did some simple primary->standby replication 
> test for the
> patch and can see the degradation doesn't happen in the replication after 
> applying it[1].
>
> One nitpick in the comment:
>
> +* walsenders. It makes WalSndWakeup() callers life easy.
>
> callers life easy => callers' life easy.

Changed.

> [1]
> max_wal_senders = 100
> before regression(ms)after regression(ms)v2 patch(ms)
> 13394.4013  14141.2615  13455.2543
> Compared with before regression 5.58%   0.45%
>
> max_wal_senders = 200
> before regression(ms)after regression(ms) v2 patch(ms)
> 13280.8507  14597.1173  13632.0606
> Compared with before regression 9.91%   1.64%
>
> max_wal_senders = 300
> before regression(ms)after regression(ms) v2 patch(ms)
> 13535.0232  16735.7379  13705.7135
> Compared with before regression 23.65%  1.26%

Yes, the numbers with v2 patch look close to where we were before.
Thanks for confirming. Just wondering, where does this extra
0.45%/1.64%/1.26% coming from?

Please find the attached v3 with the review comment addressed.

Do we need to add an open item for this issue in
https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items? If yes, can
anyone in this loop add one?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v3-0001-Optimize-walsender-wake-up-logic-with-Conditional.patch
Description: Binary data

Re: psql: Could we get "-- " prefixing on the QUERY outputs? (ECHO_HIDDEN)

Alvaro Herrera  writes:
> It's worth considering what will readline history do with the comment.
> As I recall, we keep /* comments */ together with the query that
> follows, but the -- comments are keep in a separate history entry.
> So that's one more reason to prefer /*  */

Good point.

> (To me, that also suggests to remove the asterisk line after each query,
> and to keep just the one before.)

Meh ... the one after serves to separate a query from its output.

regards, tom lane

Re: Memory leak from ExecutorState context?

2023-05-15 Thread Jehan-Guillaume de Rorthais

On Sun, 14 May 2023 00:10:00 +0200
Tomas Vondra  wrote:

> On 5/12/23 23:36, Melanie Plageman wrote:
> > Thanks for continuing to work on this.
> > 
> > Are you planning to modify what is displayed for memory usage in
> > EXPLAIN ANALYZE?

Yes, I already start to work on this. Tracking spilling memory in
spaceUsed/spacePeak change the current behavior of the serialized HJ because it
will increase the number of batch much faster, so this is a no go for v16.

I'll try to accumulate the total allocated (used+not used) spill context memory
in instrumentation. This is gross, but it avoids to track the spilling memory
in its own structure entry.

> We could do that, but we can do that separately - it's a separate and
> independent improvement, I think.

+1

> Also, do you have a proposal how to change the explain output? In
> principle we already have the number of batches, so people can calculate
> the "peak" amount of memory (assuming they realize what it means).

We could add the batch memory consumption with the number of batches. Eg.:

  Buckets: 4096 (originally 4096)  
  Batches: 32768 (originally 8192) using 256MB
  Memory Usage: 192kB

> I think the main problem with adding this info to EXPLAIN is that I'm
> not sure it's very useful in practice. I've only really heard about this
> memory explosion issue when the query dies with OOM or takes forever,
> but EXPLAIN ANALYZE requires the query to complete.

It could be useful to help admins tuning their queries realize that the current
number of batches is consuming much more memory than the join itself.

This could help them fix the issue before OOM happen.

Regards,

Re: psql: Could we get "-- " prefixing on the QUERY outputs? (ECHO_HIDDEN)

2023-05-15 Thread Alvaro Herrera

On 2023-May-15, Tom Lane wrote:

> Laurenz Albe  writes:
> > On Mon, 2023-05-15 at 08:37 +0200, Pavel Stehule wrote:
> >> This looks little bit strange
> >> 
> >> What about /* comments
> >> Like
> >> /*** Query /
> >> Or just
> >>  Query 
> 
> > +1 for either of Pavel's suggestions.
> 
> +1.  Probably the slash-star way would be less visually surprising
> to people who are used to the current output.

It's worth considering what will readline history do with the comment.
As I recall, we keep /* comments */ together with the query that
follows, but the -- comments are keep in a separate history entry.
So that's one more reason to prefer /*  */

(To me, that also suggests to remove the asterisk line after each query,
and to keep just the one before.)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)

Re: psql: Could we get "-- " prefixing on the QUERY outputs? (ECHO_HIDDEN)

Laurenz Albe  writes:
> On Mon, 2023-05-15 at 08:37 +0200, Pavel Stehule wrote:
>> This looks little bit strange
>> 
>> What about /* comments
>> Like
>> /*** Query /
>> Or just
>>  Query 

> +1 for either of Pavel's suggestions.

+1.  Probably the slash-star way would be less visually surprising
to people who are used to the current output.

Checking the psql source code for "", I see that the single-step
feature could use the same treatment.

regards, tom lane

Re: running logical replication as the subscription owner

2023-05-15 Thread Jelte Fennema

On Fri, 24 Mar 2023 at 19:37, Robert Haas  wrote:
> > > > I think there's some important tests missing related to this:
> > > > 1. Ensuring that SECURITY_RESTRICTED_OPERATION things are enforced
> > > > when the user **does not** have SET ROLE permissions to the
> > > > subscription owner, e.g. don't allow SET ROLE from a trigger.
> > > > 2. Ensuring that SECURITY_RESTRICTED_OPERATION things are not enforced
> > > > when the user **does** have SET ROLE permissions to the subscription
> > > > owner, e.g. allows SET ROLE from trigger.
> > > Yeah, if we stick with the current approach we should probably add
> > > tests for that stuff.
> >
> > Even if we don't, we should still have tests showing that the security 
> > restrictions that we intend to put in place actually do their job.
>
> Yeah, I just don't want to write the tests and then decide to change
> the behavior and then have to write them over again. It's not so much
> fun that I'm yearning to do it twice.

I forgot to follow up on this before, but based on the bug found by
Amit. I think it would be good to still add these tests.

Re: createuser --memeber and PG 16

2023-05-15 Thread Bruce Momjian

On Mon, May 15, 2023 at 04:33:27PM +0900, Michael Paquier wrote:
> On Thu, May 11, 2023 at 09:34:42AM -0400, Bruce Momjian wrote:
> > On Thu, May 11, 2023 at 02:21:22PM +0200, Daniel Gustafsson wrote:
> >> IIRC there were a number of ideas presented in that thread but backwards
> >> compatibility with --role already "taken" made it complicated, so --role 
> >> and
> >> --member were the least bad options.
> >> 
> >>> At a minimum I would like to apply the attached doc patch to PG 16 to
> >>> improve awkward wording in CREATE ROLE and createuser.
> >> 
> >> No objection.
> 
> None from here as well.
> 
> >> +role.  (This in effect makes the new role a group.)
> >> While not introduced here, isn't the latter part interesting enough to 
> >> warrant
> >> not being inside parenthesis?
> > 
> > The concept of group itself is deprecated, which I think is why the
> > parenthesis are used.
> 
> Not sure on this one.  The original docs come from 58d214e, and this
> sentence was already in there.

True.  I have removed the parenthesis in this updated patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.
diff --git a/doc/src/sgml/ref/create_role.sgml b/doc/src/sgml/ref/create_role.sgml
index 4a84461b28..7249fc7432 100644
--- a/doc/src/sgml/ref/create_role.sgml
+++ b/doc/src/sgml/ref/create_role.sgml
@@ -285,10 +285,11 @@ in sync when changing the above synopsis!
   IN ROLE role_name
   

-The IN ROLE clause lists one or more existing
-roles to which the new role will be immediately added as a new
-member.  (Note that there is no option to add the new role as an
-administrator; use a separate GRANT command to do that.)
+The IN ROLE clause causes the new role to
+be automatically added as a member of the specified existing
+roles. (Note that there is no option to add the new role as an
+administrator; use a separate GRANT command
+to do that.)

   
  
@@ -306,9 +307,9 @@ in sync when changing the above synopsis!
   ROLE role_name
   

-The ROLE clause lists one or more existing
-roles which are automatically added as members of the new role.
-(This in effect makes the new role a group.)
+The ROLE clause causes one or more specified
+existing roles to be automatically added as members of the new
+role.  This in effect makes the new role a group.

   
  
diff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index c5c74b86a2..58ed111642 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -85,11 +85,10 @@ PostgreSQL documentation
   --admin=role
   

-Indicates a role that will be immediately added as a member of the new
+Indicates an existing role that will be automatically added as a member of the new
 role with admin option, giving it the right to grant membership in the
-new role to others.  Multiple roles to add as members (with admin
-option) of the new role can be specified by writing multiple
--a switches.
+new role to others.  Multiple existing roles can be specified by
+writing multiple -a switches.

   
  
@@ -153,11 +152,10 @@ PostgreSQL documentation
   --role=role
   

- Indicates a role to which this role will be added immediately as a new
- member. Multiple roles to which this role will be added as a member
- can be specified by writing multiple
- -g switches.
- 
+Indicates the new role should be automatically added as a member
+of the specified existing role. Multiple existing roles can be
+specified by writing multiple -g switches.
+   
   
  
 
@@ -227,9 +225,9 @@ PostgreSQL documentation
   --member=role
   

-Indicates role that will be immediately added as a member of the new
-role.  Multiple roles to add as members of the new role can be specified
-by writing multiple -m switches.
+Indicates the specified existing role should be automatically
+added as a member of the new role. Multiple existing roles can
+be specified by writing multiple -m switches.

Re: running logical replication as the subscription owner

2023-05-15 Thread Jelte Fennema

On Mon, 15 May 2023 at 14:47, Masahiko Sawada  wrote:
> Thank you for the patch! I think we might want to have tests for it.

Yes, code looks good. But indeed some tests would be great. It seems
we forgot to actually do:

On Fri, 12 May 2023 at 13:55, Ajin Cherian  wrote:
> CREATE ROLE regress_alice NOSUPERUSER LOGIN;
> CREATE ROLE regress_admin SUPERUSER LOGIN;
> ...
>
> What I see is that as part of tablesync, the trigger invokes an
> updates admin_audit which it shouldn't, as the table owner
> of alice.test should not have access to the
> table admin_audit. This means the table copy is being invoked as the
> subscription owner and not the table owner.

I think having this as a tap/regress test would be very useful.

Re: [DOC] Update ALTER SUBSCRIPTION documentation v2

2023-05-15 Thread Robert Sjöblom




On 2023-05-05 15:17, Robert Sjöblom wrote:


Hi,

We have recently used the PostgreSQL documentation when setting up our 
logical replication. We noticed there was a step missing in the 
documentation on how to drop a logical replication subscription with a 
replication slot attached.


Following discussions, please see revised documentation patch.

Best regards,
Robert Sjöblom
--
Innehållet i detta e-postmeddelande är konfidentiellt och avsett endast för 
adressaten.Varje spridning, kopiering eller utnyttjande av innehållet är 
förbjuden utan tillåtelse av avsändaren. Om detta meddelande av misstag 
gått till fel adressat vänligen radera det ursprungliga meddelandet och 
underrätta avsändaren via e-post
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index 8d997c983f..4be6ddb873 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -86,8 +86,9 @@ DROP SUBSCRIPTION [ IF EXISTS ] nameDROP SUBSCRIPTION command will fail.  To proceed in
-   this situation, disassociate the subscription from the replication slot by
-   executing ALTER SUBSCRIPTION ... SET (slot_name = NONE).
+   this situation, first DISABLE the subscription, and then 
+   disassociate it from the replication slot by executing 
+   ALTER SUBSCRIPTION ... SET (slot_name = NONE). 
After that, DROP SUBSCRIPTION will no longer attempt any
actions on a remote host.  Note that if the remote replication slot still
exists, it (and any related table synchronization slots) should then be

Re: running logical replication as the subscription owner

2023-05-15 Thread Masahiko Sawada

On Mon, May 15, 2023 at 5:44 PM Ajin Cherian  wrote:
>
> On Fri, May 12, 2023 at 9:55 PM Ajin Cherian  wrote:
> >
> > If nobody else is working on this, I can come up with a patch to fix this
> >
>
> Attaching a patch which attempts to fix this.
>

Thank you for the patch! I think we might want to have tests for it.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: walsender performance regression due to logical decoding on standby changes

2023-05-15 Thread Masahiko Sawada

On Mon, May 15, 2023 at 1:49 PM Thomas Munro  wrote:
>
> On Fri, May 12, 2023 at 11:58 PM Bharath Rupireddy
>  wrote:
> > Andres, rightly put it - 'mis-using' CV infrastructure. It is simple,
> > works, and makes the WalSndWakeup() easy solving the performance
> > regression.
>
> Yeah, this seems OK, and better than the complicated alternatives.  If
> one day we want to implement CVs some other way so that this
> I-know-that-CVs-are-really-made-out-of-latches abstraction leak
> becomes a problem, and we still need this, well then we can make a
> separate latch-wait-list thing.

+1

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Using make_ctags leaves tags files in git

2023-05-15 Thread Alvaro Herrera

On 2023-May-14, Tom Lane wrote:

> Steve Chavez  writes:
> > In this case I just propose adding 'tags'. I believe it's reasonable to
> > ignore these as they're produced by make_ctags.
> 
> Our policy on this is that the project's .gitignore files should ignore
> files that are produced by our standard build scripts.

But make_ctags is *our* script, so I think this rule applies to them as
well.  (In any case, what can be hurt?  We're not going to add any files
to git named "tags" anyway.)

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Order changes in PG16 since ICU introduction

2023-05-15 Thread Peter Eisentraut

On 11.05.23 23:29, Jeff Davis wrote:

New patch series attached.

=== 0001: fix bug that allows creating hidden collations

Bug:
https://www.postgresql.org/message-id/051c9395cf880307865ee8b17acdbf7f838c1e39.ca...@j-davis.com

This is still being debated in the other thread. Not really related to
this thread, so I suggest dropping it from this patch series.

=== 0002: handle some kinds of libc-stlye locale strings

ICU used to handle libc locale strings like 'fr_FR@euro', but doesn't
in later versions. Handle them in postgres for consistency.

I tend to agree with ICU that these variants are obsolete, and we don't
need to support them anymore. If this were a tiny patch, then maybe ok,
but the way it's presented here the whole code is duplicated between
pg_locale.c and initdb.c, which is not great.

=== 0003: reduce icu_validation_level to WARNING

Given that we've seen some inconsistency in which locale names are
accepted in different ICU versions, it seems best not to be too strict.
Peter Eisentraut suggested that it be set to ERROR originally, but a
WARNING should be sufficient to see problems without introducing risks
migrating to version 16.

I'm not sure why this is the conclusion. Presumably, the detection
capabilities of ICU improve over time, so we want to take advantage of
that? What are some example scenarios where this change would help?

=== 0004-0006:

To solve the issues that have come up in this thread, we need CREATE
DATABASE (and createdb and initdb) to use LOCALE to mean the collation
locale regardless of which provider is in use (which is what 0006
does).

0006 depends on ICU handling libc locale names. It already does a good
job for most libc locale names (though patch 0002 fixes a few cases
where it doesn't). There may be more cases, but for the most part libc
names are interpreted in a reasonable way. But one important case is
missing: ICU does not handle the "C" locale as we expect (that is,
using memcmp()).

We've already allowed users to create ICU collations with the C locale
in the past, which uses the root collation (not memcmp()), and we need
to keep supporting that for upgraded clusters.

I'm not sure I agree that we need to keep supporting that. The only way
you could get that in past releases is if you specify explicitly, "give
me provider ICU and locale C", and then it wouldn't actually even work
correctly. So nobody should be using that in practice, and nobody
should have stumbled into that combination of settings by accident.

3. Introduce collation provider "none", which is always memcmp-based
(patch 0004). It's equivalent to the libc locale=C, but it allows
specifying the LC_COLLATE and LC_CTYPE independently. A command like
CREATE DATABASE ... LOCALE_PROVIDER='icu' ICU_LOCALE='C'
LC_COLLATE='en_US' would get changed (with a NOTICE) to provider "none"
(patch 0005), so you'd have datlocprovider=none, datcollate=en_US. For
the database default collation, that would always use memcmp(), but the
server environment LC_COLLATE would be set to en_US as the user
specified.

This seems most attractive, but I think it's quite invasive at this
point, especially given the dubious premise (see above).

=== 0007: Add a GUC to control the default collation provider

Having a GUC would make it easier to migrate to ICU without surprises.
This only affects the default for CREATE COLLATION, not CREATE DATABASE
(and obviously not initdb).

It's not clear to me why we would want that. Also not clear why it
should only affect CREATE COLLATION.

Re: Missing update of all_hasnulls in BRIN opclasses

2023-05-15 Thread Alvaro Herrera

On 2023-May-07, Tomas Vondra wrote:

> > Álvaro wrote:
> >> In backbranches, the new field to BrinMemTuple needs to be at the end of
> >> the struct, to avoid ABI breakage.
> 
> Unfortunately, this is not actually possible :-(
> 
> The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't
> place anything after it. I think we have three options:
> 
> a) some other approach? - I really can't see any, except maybe for going
> back to the previous approach (i.e. encoding the info using the existing
> BrinValues allnulls/hasnulls flags)

Actually, mine was quite the stupid suggestion: the BrinMemTuple already
has a 3 byte hole in the place where you originally wanted to add the
flag:

struct BrinMemTuple {
_Bool  bt_placeholder;   /* 0 1 */

/* XXX 3 bytes hole, try to pack */

BlockNumberbt_blkno; /* 4 4 */
MemoryContext  bt_context;   /* 8 8 */
Datum *bt_values;/*16 8 */
_Bool *bt_allnulls;  /*24 8 */
_Bool *bt_hasnulls;  /*32 8 */
BrinValues bt_columns[]; /*40 0 */

/* size: 40, cachelines: 1, members: 7 */
/* sum members: 37, holes: 1, sum holes: 3 */
/* last cacheline: 40 bytes */
};

so putting it there was already not causing any ABI breakage.  So, the
solution to this problem of not being able to put it at the end is just
to return the struct to your original formulation.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)

Re: pgsql: Clean up role created in new subscription test.

2023-05-15 Thread Daniel Gustafsson

> On 30 Mar 2023, at 22:29, Tom Lane  wrote:

> Well, we could do "select rolname from pg_roles order by 1" and
> actually compare the results of the two selects.  That might be
> advisable anyway, in order to produce a complaint with useful
> detail when there is something wrong.

I took a look at this and came up with the attached.  This adds a new parameter
to pg_regress for specifying a test which will be executed before and after the
suite, where the first invocation creates the expectfile for the second.  For
storing the expecfile the temp dir creation is somewhat refactored.  I've added
a sample test in the patch (to regress, not ECPG), but I'm sure it can be
expanded to be a bit more interesting.  The comment which is now incorrectly
formatted was left like that to make review easier, if this gets committed it
will be fixed then.

I opted for this to use the machinery that pg_regress already has rather than
add a new mechanism (and dependency) for running and verifying queries.  This
also avoids hardcoding the test making it easier to have custom queries during
hacking etc.

Looking at this I also found a bug introduced in the TAP format patch, which
made failed single run tests report as 0ms due to the parameters being mixed up
in the report function call.  This is in 0002, which I'll apply to HEAD
regardless of 0001 as they are unrelated.

--
Daniel Gustafsson

v1-0002-pg_regress-Fix-reported-runtime-for-failed-single.patch
Description: Binary data

v1-0001-pg_regress-Add-database-verification-test.patch
Description: Binary data

Re: running logical replication as the subscription owner

2023-05-15 Thread Ajin Cherian

On Fri, May 12, 2023 at 9:55 PM Ajin Cherian  wrote:
>
> If nobody else is working on this, I can come up with a patch to fix this
>

Attaching a patch which attempts to fix this.

regards,
Ajin Cherian
Fujitsu Australia


v1-0001-Fix-issue-where-the-copy-command-does-not-adhere-.patch
Description: Binary data

Re: walsender performance regression due to logical decoding on standby changes

2023-05-15 Thread Drouvot, Bertrand


Hi,

On 5/15/23 4:19 AM, Zhijie Hou (Fujitsu) wrote:

On Friday, May 12, 2023 7:58 PM Bharath Rupireddy 
 wrote:


On Wed, May 10, 2023 at 3:23 PM Drouvot, Bertrand

That's not the case with the attached v2 patch. Please have a look.




Thanks for V2! It does look good to me and I like the fact that
WalSndWakeup() does not need to loop on all the Walsenders slot
anymore (for both the physical and logical cases).


Thanks for updating the patch. I did some simple primary->standby replication 
test for the
patch and can see the degradation doesn't happen in the replication after 
applying it[1].



Thanks for the performance regression measurement!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Autogenerate some wait events code and documentation

2023-05-15 Thread Drouvot, Bertrand


Hi,

On 5/6/23 4:23 AM, Michael Paquier wrote:

On Thu, May 04, 2023 at 08:39:49AM +0200, Drouvot, Bertrand wrote:

On 5/1/23 1:59 AM, Michael Paquier wrote:
I'm not sure I like it. First, it does break the "Note" ordering as compare
to the current documentation. That's not a big deal though.

Secondly, what If we need to add some note(s) in the future for
another wait class? Having all the notes after all the tables are
generated would sound weird to me.


Appending these notes at the end of all the tables does not strike me
as a big dea, TBH.  But, well, my sole opinion is not the final choice
either.  For now, I am mostly tempted to keep the generation script as
minimalistic as possible.



I agree that's not a big deal and I'm not against having these notes at the end
of all the tables.


We could discuss another approach for the "Note" part if there is a
need to add one for an existing/new wait class though.




means, that was more a NIT comment from my side.


Documenting what's expected from the wait event classes is critical in
the .txt file as that's what developers are going to look at when
adding a new wait event.  Adding them in the header is less appealing
to me considering that is it now generated, and the docs provide a lot
of explanation as well.



Your argument that the header is now generated makes me change my mind: I
know think that having the comments in the .txt file is enough.


This has as extra consequence to require a change in
wait_event.h so as PG_WAIT_BUFFER_PIN is renamed to PG_WAIT_BUFFERPIN,
equally fine by me.  Logically, this rename should be done in a patch
of its own, for clarity.


Yes, I can look at it.
[...]
Agree, I'll look at this.


Thanks!


Please find the dedicated patch proposal in [1].

[1]: 
https://www.postgresql.org/message-id/c6f35117-4b20-4c78-1df5-d3056010dcf5%40gmail.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

2023-05-15 Thread Drouvot, Bertrand


Hi hackers,

Please find attached a patch to $SUBJECT.

This is preliminary work to autogenerate some wait events
code and documentation done in [1].

The patch introduces 2 new "wait events" (WAIT_EVENT_EXTENSION
and WAIT_EVENT_BUFFER_PIN) and their associated functions
(pgstat_get_wait_extension() and pgstat_get_wait_bufferpin() resp.)

Please note that that would not break extensions outside contrib
that make use of the existing PG_WAIT_EXTENSION.

[1]: 
https://www.postgresql.org/message-id/flat/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9...@gmail.com

Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From b48b0189067b6ce799636e469a10b0e265ff4473 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Sat, 13 May 2023 07:59:08 +
Subject: [PATCH v1] Introducing WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

---
 contrib/dblink/dblink.c  |  4 +-
 contrib/pg_prewarm/autoprewarm.c |  4 +-
 contrib/postgres_fdw/connection.c|  6 +--
 src/backend/storage/buffer/bufmgr.c  |  2 +-
 src/backend/storage/ipc/standby.c|  2 +-
 src/backend/utils/activity/wait_event.c  | 66 +---
 src/include/utils/wait_event.h   | 20 ++-
 src/test/modules/test_shm_mq/setup.c |  2 +-
 src/test/modules/test_shm_mq/test.c  |  2 +-
 src/test/modules/worker_spi/worker_spi.c |  2 +-
 10 files changed, 91 insertions(+), 19 deletions(-)
   8.9% contrib/dblink/
   4.7% contrib/pg_prewarm/
   9.4% contrib/postgres_fdw/
   3.4% src/backend/storage/buffer/
   3.3% src/backend/storage/ipc/
  48.6% src/backend/utils/activity/
  14.4% src/include/utils/
   4.6% src/test/modules/test_shm_mq/

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 55f75eff36..f167cb71d4 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -203,7 +203,7 @@ dblink_get_conn(char *conname_or_str,
dblink_connstr_check(connstr);
 
/* OK to make connection */
-   conn = libpqsrv_connect(connstr, PG_WAIT_EXTENSION);
+   conn = libpqsrv_connect(connstr, WAIT_EVENT_EXTENSION);
 
if (PQstatus(conn) == CONNECTION_BAD)
{
@@ -293,7 +293,7 @@ dblink_connect(PG_FUNCTION_ARGS)
dblink_connstr_check(connstr);
 
/* OK to make connection */
-   conn = libpqsrv_connect(connstr, PG_WAIT_EXTENSION);
+   conn = libpqsrv_connect(connstr, WAIT_EVENT_EXTENSION);
 
if (PQstatus(conn) == CONNECTION_BAD)
{
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 93835449c0..d0efc9e524 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -237,7 +237,7 @@ autoprewarm_main(Datum main_arg)
(void) WaitLatch(MyLatch,
 WL_LATCH_SET | 
WL_EXIT_ON_PM_DEATH,
 -1L,
-PG_WAIT_EXTENSION);
+WAIT_EVENT_EXTENSION);
}
else
{
@@ -264,7 +264,7 @@ autoprewarm_main(Datum main_arg)
(void) WaitLatch(MyLatch,
 WL_LATCH_SET | 
WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 delay_in_ms,
-PG_WAIT_EXTENSION);
+WAIT_EVENT_EXTENSION);
}
 
/* Reset the latch, loop. */
diff --git a/contrib/postgres_fdw/connection.c 
b/contrib/postgres_fdw/connection.c
index da32d503bc..25d0c43b64 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -530,7 +530,7 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
/* OK to make connection */
conn = libpqsrv_connect_params(keywords, values,
   
false,   /* expand_dbname */
-  
PG_WAIT_EXTENSION);
+  
WAIT_EVENT_EXTENSION);
 
if (!conn || PQstatus(conn) != CONNECTION_OK)
ereport(ERROR,
@@ -863,7 +863,7 @@ pgfdw_get_result(PGconn *conn, const char *query)
   
WL_LATCH_SET | WL_SOCKET_READABLE |
   
WL_EXIT_ON_PM_DEATH,
   
PQsocket(conn),
-

Re: Redundant strlen(query) in get_rel_infos

2023-05-15 Thread Daniel Gustafsson

> On 15 May 2023, at 09:45, Michael Paquier  wrote:
> On Thu, May 11, 2023 at 11:57:37AM +0200, Daniel Gustafsson wrote:

>> Looking at the snprintf sites made me remember a patchset I worked on last 
>> year
>> (but I don't remember if I ended up submitting); there is no need to build 
>> one
>> of the queries on the stack as it has no variables.  The attached 0003 (which
>> needs a reindent of the query text) comes from that patchset.  I think we
>> should do this regardless.
> 
> Not sure that this is an improvement in itself as
> get_tablespace_paths() includes QUERY_ALLOC because
> executeQueryOrDie() does so, so this could become a problem if
> someones decides to copy-paste this code with a query becomes longer
> than QUERY_ALLOC once built?  Perhaps that's not worth worrying,

We already have lots of invocations of executeQueryOrDie which doesn't pass via
a QUERY_ALLOC buffer so I don't see any risk with adding one more.

--
Daniel Gustafsson

Re: psql: Could we get "-- " prefixing on the QUERY outputs? (ECHO_HIDDEN)

2023-05-15 Thread Laurenz Albe

On Mon, 2023-05-15 at 08:37 +0200, Pavel Stehule wrote:
> Dne po 15. 5. 2023 8:01 uživatel Kirk Wolak  napsal:
> > This would be a trivial change.  Willing to do it, and push it.
> > 
> > In effect, we have this GREAT feature:
> > \set ECHO_HIDDON on
> > 
> > Which outputs a bunch of queries (as you all know).
> > But somehow nobody thought that a user might want to paste ALL of the 
> > queries into their query editor, or even into another psql session, via (\e)
> > and NOT get a ton of syntax errors?
> > 
> > As an example: (added -- and a space)
> > 
> > -- * QUERY **
> > SELECT c2.relname, i.indisprimary, i.indisunique, i.indisclustered, 
> > i.indisvalid, pg_catalog.pg_get_indexdef(i.indexrelid, 0, true),
> >   pg_catalog.pg_get_constraintdef(con.oid, true), contype, condeferrable, 
> > condeferred, i.indisreplident, c2.reltablespace
> > FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i
> >   LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid AND 
> > conindid = i.indexrelid AND contype IN ('p','u','x'))
> > WHERE c.oid = '21949943' AND c.oid = i.indrelid AND i.indexrelid = c2.oid
> > ORDER BY i.indisprimary DESC, c2.relname;
> > -- **
> 
> This looks little bit strange
> 
> What about /* comments
> 
> Like
> 
> /*** Query /
> 
> Or just 
> 
>  Query 

+1 for either of Pavel's suggestions.

Yours,
Laurenz Albe

Re: Redundant strlen(query) in get_rel_infos

On Thu, May 11, 2023 at 11:57:37AM +0200, Daniel Gustafsson wrote:
> I think it's intentionally done in 73b9952e82 as defensive coding, and given
> that this is far from a hot codepath I think leaving them is better.
> 
> Instead I think it would be more worthwhile to remove these snprintf() made
> queries and use PQExpbuffers.  29aeda6e4e6 introduced that in pg_upgrade and 
> it
> is more in line with how we build queries in other tools.

Good idea to reduce the overall presence of QUERY_ALLOC in the
surroundings.

> Looking at the snprintf sites made me remember a patchset I worked on last 
> year
> (but I don't remember if I ended up submitting); there is no need to build one
> of the queries on the stack as it has no variables.  The attached 0003 (which
> needs a reindent of the query text) comes from that patchset.  I think we
> should do this regardless.

Not sure that this is an improvement in itself as
get_tablespace_paths() includes QUERY_ALLOC because
executeQueryOrDie() does so, so this could become a problem if
someones decides to copy-paste this code with a query becomes longer
than QUERY_ALLOC once built?  Perhaps that's not worth worrying, but I
like your suggestion of applying more PQExpbuffers, particularly if
applied in a consistent way across the board.  It could matter if the
code of get_tablespace_paths() is changed to use a query with
parameters.
--
Michael


signature.asc
Description: PGP signature

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-05-15 Thread Peter Smith

Hi Kuroda-san.

I looked at the latest patch v13-0001. Here are some minor comments.

==
src/bin/pg_upgrade/info.c

1. get_logical_slot_infos_per_db

I noticed that the way this is coded, 'ntups' and 'num_slots' seems to
have exactly the same meaning. IMO you can simplify this by removing
'ntups'.

BEFORE
+ int ntups;
+ int num_slots = 0;

SUGGESTION
+ int num_slots;

~

BEFORE
+ ntups = PQntuples(res);
+
+ if (ntups)
+ {

SUGGESTION
+ num_slots = PQntuples(res);
+
+ if (num_slots)
+ {

~

BEFORE
+ slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);

SUGGESTION
+ slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) *
num_slots);

~

BEFORE
+ for (slotnum = 0; slotnum < ntups; slotnum++)
+ {
+ LogicalSlotInfo *curr = [num_slots++];

SUGGESTION
+ for (slotnum = 0; slotnum < ntups; slotnum++)
+ {
+ LogicalSlotInfo *curr = [slotnum];

==

2. get_logical_slot_infos, print_slot_infos

> >
> > In another thread [1] I am posting some minor patch changes to the
> > VERBOSE logging (changes to double-quotes and commas etc.). Please
> > keep a watch on that thread because if gets pushed then this one will
> > be impacted. e.g. your logging here ought also to include the same
> > suggested double quotes.
>
> I thought it would be pushed soon, so the suggestion was included.

OK, but I think you have accidentally missed adding similar new double
quotes to all other VERBOSE logging in your patch.

e.g. see get_logical_slot_infos:
pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name);

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: createuser --memeber and PG 16

On Thu, May 11, 2023 at 09:34:42AM -0400, Bruce Momjian wrote:
> On Thu, May 11, 2023 at 02:21:22PM +0200, Daniel Gustafsson wrote:
>> IIRC there were a number of ideas presented in that thread but backwards
>> compatibility with --role already "taken" made it complicated, so --role and
>> --member were the least bad options.
>> 
>>> At a minimum I would like to apply the attached doc patch to PG 16 to
>>> improve awkward wording in CREATE ROLE and createuser.
>> 
>> No objection.

None from here as well.

>> +role.  (This in effect makes the new role a group.)
>> While not introduced here, isn't the latter part interesting enough to 
>> warrant
>> not being inside parenthesis?
> 
> The concept of group itself is deprecated, which I think is why the
> parenthesis are used.

Not sure on this one.  The original docs come from 58d214e, and this
sentence was already in there.
--
Michael


signature.asc
Description: PGP signature

Re: createuser --memeber and PG 16

On Fri, May 12, 2023 at 04:35:34PM +0200, Peter Eisentraut wrote:
> it's not intuitive whether foo becomes a member of bar or bar becomes a
> member of foo.  Maybe something more verbose like --member-of would help?

Indeed, presented like that it could be confusing, and --member-of
sounds like it could be a good idea instead of --member.
--
Michael


signature.asc
Description: PGP signature

Re: psql: Could we get "-- " prefixing on the QUERY outputs? (ECHO_HIDDEN)

On Mon, May 15, 2023 at 2:37 AM Pavel Stehule 
wrote:

> Hi
>
> Dne po 15. 5. 2023 8:01 uživatel Kirk Wolak  napsal:
>
>> This would be a trivial change.  Willing to do it, and push it.
>>
>> In effect, we have this GREAT feature:
>> \set ECHO_HIDDON on
>> -- **
>>
>> Kirk...
>>
>
> This looks little bit strange
>
> What about /* comments
>
> Like
>
> /*** Query /
>
> Or just
>
>  Query 
>
> Regards
>
> Pavel
>

Actually, I am open to suggestions.
/* */
Are good comments, usually safer!

Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

On Mon, May 15, 2023 at 03:22:29PM +1200, Thomas Munro wrote:
> LGTM.

Thanks, fixed!
--
Michael


signature.asc
Description: PGP signature

Re: psql: Could we get "-- " prefixing on the QUERY outputs? (ECHO_HIDDEN)

2023-05-15 Thread Pavel Stehule

Hi

Dne po 15. 5. 2023 8:01 uživatel Kirk Wolak  napsal:

> This would be a trivial change.  Willing to do it, and push it.
>
> In effect, we have this GREAT feature:
> \set ECHO_HIDDON on
>
> Which outputs a bunch of queries (as you all know).
> But somehow nobody thought that a user might want to paste ALL of the
> queries into their query editor, or even into another psql session, via (\e)
> and NOT get a ton of syntax errors?
>
> As an example: (added -- and a space)
>
> -- * QUERY **
> SELECT c2.relname, i.indisprimary, i.indisunique, i.indisclustered,
> i.indisvalid, pg_catalog.pg_get_indexdef(i.indexrelid, 0, true),
>   pg_catalog.pg_get_constraintdef(con.oid, true), contype, condeferrable,
> condeferred, i.indisreplident, c2.reltablespace
> FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i
>   LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid AND
> conindid = i.indexrelid AND contype IN ('p','u','x'))
> WHERE c.oid = '21949943' AND c.oid = i.indrelid AND i.indexrelid = c2.oid
> ORDER BY i.indisprimary DESC, c2.relname;
> -- **
>
> -- * QUERY **
> SELECT pol.polname, pol.polpermissive,
>   CASE WHEN pol.polroles = '{0}' THEN NULL ELSE
> pg_catalog.array_to_string(array(select rolname from pg_catalog.pg_roles
> where oid = any (pol.polroles) order by 1),',') END,
>   pg_catalog.pg_get_expr(pol.polqual, pol.polrelid),
>   pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid),
>   CASE pol.polcmd
> WHEN 'r' THEN 'SELECT'
> WHEN 'a' THEN 'INSERT'
> WHEN 'w' THEN 'UPDATE'
> WHEN 'd' THEN 'DELETE'
> END AS cmd
> FROM pg_catalog.pg_policy pol
> WHERE pol.polrelid = '21949943' ORDER BY 1;
> -- **
>
> Kirk...
>

This looks little bit strange

What about /* comments

Like

/*** Query /

Or just

 Query 

Regards

Pavel

>

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-05-15 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thank you for reviewing! PSA new version.

> 1. check_new_cluster
> 
> + if (user_opts.include_logical_slots)
> + {
> + get_logical_slot_infos(_cluster);
> + check_for_parameter_settings(_cluster);
> + }
> +
>   check_new_cluster_is_empty();
> ~
> 
> The code is OK, but maybe your reply/explanation (see [2] #2) saying
> get_logical_slot_infos() needs to be called before
> check_new_cluster_is_empty() would be good to have in a comment here?

Indeed, added.

> src/bin/pg_upgrade/info.c
> 
> 2. get_logical_slot_infos
> 
> + if (ntups)
> + slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);
> + else
> + {
> + slotinfos = NULL;
> + goto cleanup;
> + }
> +
> + i_slotname = PQfnumber(res, "slot_name");
> + i_plugin = PQfnumber(res, "plugin");
> + i_twophase = PQfnumber(res, "two_phase");
> +
> + for (slotnum = 0; slotnum < ntups; slotnum++)
> + {
> + LogicalSlotInfo *curr = [num_slots++];
> +
> + curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
> + curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
> + curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
> + }
> +
> +cleanup:
> + PQfinish(conn);
> 
> IMO the goto/label coding is not warranted here - a simple if/else can
> do the same thing.

Yeah, I could simplify by if-statement. Additionally, some definitions of 
variables
are moved to the code block.

> 3. free_db_and_rel_infos, free_logical_slot_infos
> 
> static void
> free_db_and_rel_infos(DbInfoArr *db_arr)
> {
> int dbnum;
> 
> for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
> {
> free_rel_infos(_arr->dbs[dbnum].rel_arr);
> pg_free(db_arr->dbs[dbnum].db_name);
> }
> pg_free(db_arr->dbs);
> db_arr->dbs = NULL;
> db_arr->ndbs = 0;
> }
> 
> ~
> 
> In v12 now you removed the free_logical_slot_infos(). But isn't it
> better to still call free_logical_slot_infos() from the above
> free_db_and_rel_infos() still so the slot memory
> (dbinfo->slot_arr.slots) won't stay lying around?

The free_db_and_rel_infos() is called at restore phase, and slot_arr has 
malloc'd
members only when logical slots are defined on new_cluster. In this case the 
FATAL
error is occured in the checking phase, so there is no possibility to reach 
restore
phase.

> 4. get_logical_slot_infos, print_slot_infos
> 
> In another thread [1] I am posting some minor patch changes to the
> VERBOSE logging (changes to double-quotes and commas etc.). Please
> keep a watch on that thread because if gets pushed then this one will
> be impacted. e.g. your logging here ought also to include the same
> suggested double quotes.

I thought it would be pushed soon, so the suggestion was included.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v13-0001-pg_upgrade-Add-include-logical-replication-slots.patch
Description:  v13-0001-pg_upgrade-Add-include-logical-replication-slots.patch


v13-0002-Always-persist-to-disk-logical-slots-during-a-sh.patch
Description:  v13-0002-Always-persist-to-disk-logical-slots-during-a-sh.patch


v13-0003-pg_upgrade-Add-check-function-for-include-logica.patch
Description:  v13-0003-pg_upgrade-Add-check-function-for-include-logica.patch


v13-0004-Change-the-method-used-to-check-logical-replicat.patch
Description:  v13-0004-Change-the-method-used-to-check-logical-replicat.patch

Re: Allow pg_archivecleanup to remove backup history files

On Mon, May 15, 2023 at 03:16:46PM +0900, torikoshia wrote:
> On 2023-05-15 09:18, Michael Paquier wrote:
>> How about plugging in some long options, and use something more
>> explicit like --clean-backup-history?
> 
> Agreed.

If you begin to implement that, it seems to me that this should be
shaped with a first separate patch that refactors the code to use
getopt_long(), and a second patch for the proposed feature that builds
on top of it.
--
Michael


signature.asc
Description: PGP signature

Re: Allow pg_archivecleanup to remove backup history files

2023-05-15 Thread torikoshia


On 2023-05-15 09:18, Michael Paquier wrote:

On Fri, May 12, 2023 at 05:53:45PM +0900, torikoshia wrote:

On 2023-05-10 17:52, Bharath Rupireddy wrote:
I was a little concerned about what to do when deleting both the files
ending in .gz and backup history files.
Is making it possible to specify both "-x .backup" and "-x .gz" the 
way to

go?

I also concerned someone might add ".backup" to WAL files, but does 
that

usually not happen?


Depends on the archive command, of course.  I have seen code using
this suffix for some segment names in the past, FWIW.


Thanks for the information.
I'm going to stop adding special logic for "-x .backup" and add a new
option for removing backup history files.


Comments on the patch:
1. Why just only the backup history files? Why not remove the 
timeline
history files too? Is it because there may not be as many tli 
switches

happening as backups?


Yeah, do you think we should also add logic for '-x .history'?


Timeline history files can be critical pieces when it comes to
assigning a TLI as these could be retrieved by a restore_command
during recovery for a TLI jump or just assign a new TLI number at the
end of recovery, still you could presumably remove the TLI history
files that are older than the TLI the segment defined refers too.
(pg_archivecleanup has no specific logic to look at the history with
child TLIs for example, to keep it simple, and I'd rather keep it this
way).  There may be an argument for making that optional, of course,
but it does not strike me as really necessary compared to the backup
history files which are just around for debugging purposes.


Agreed.


2.+sub remove_backuphistoryfile_run_check
+{
Why to invent a new function when run_check() can be made generic 
with

few arguments passed?


Thanks, I'm going to reconsider it.


+   
+ Remove files including backup history files, whose suffix is
.backup.
+ Note that when oldestkeptwalfile
is a backup history file,
+ specified file is kept and only preceding WAL files and
backup history files are removed.
+   

This addition to the documentation looks unprecise to me.  Backup
history files have a more complex format than just the .backup
suffix, and this is documented in backup.sgml.


I'm going to remove the explanation for the backup history files and
just add the hyperlink to the original explanation in backup.sgml.


How about plugging in some long options, and use something more
explicit like --clean-backup-history?


Agreed.



-   if ((IsXLogFileName(walfile) || IsPartialXLogFileName(walfile)) &&
+   if (((IsXLogFileName(walfile) || IsPartialXLogFileName(walfile)) ||
+   (removeBackupHistoryFile && 
IsBackupHistoryFileName(walfile))) &&

strcmp(walfile + 8, exclusiveCleanupFileName + 8) < 0)

Could it be a bit cleaner to split this check in two, saving one level
of indentation on the way for its most inner loop?  I would imagine
something like:
/* Check file name */
if (!IsXLogFileName(walfile) &&
!IsPartialXLogFileName(walfile))
continue;
/* Check cutoff point */
if (strcmp(walfile + 8, exclusiveCleanupFileName + 8) >= 0)
continue;
//rest of the code doing the unlinks.
--

Thanks, that looks better.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

psql: Could we get "-- " prefixing on the QUERY outputs? (ECHO_HIDDEN)