Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-07-24 Thread Andres Freund
Hi,

Hm, in some cases your patch is better, but in others both the old code
(8692f6644e7) and HEAD beat yours on my machine. TBH, not entirely sure why.

prep:
COPY (SELECT generate_series(1, 200) a, (random() * 10 - 5)::int b, 
3243423 c) TO '/tmp/lotsaints.copy';
DROP TABLE lotsaints; CREATE UNLOGGED TABLE lotsaints(a int, b int, c int);

benchmark:
psql -qX -c 'truncate lotsaints' && pgbench -n -P1 -f <( echo "COPY lotsaints 
FROM '/tmp/lotsaints.copy';") -t 15

I disabled turbo mode, pinned the server to a single core of my Xeon Gold 5215:

HEAD:   812.690

your patch: 821.354

strtoint from 8692f6644e7:  824.543

strtoint from 6b423ec677d^: 806.678

(when I say strtoint from, I did not replace the goto labels, so that part is
unchanged and unrelated)


IOW, for me the code from 15 is the fastest by a good bit... There's an imul,
sure, but the fact that it sets a flag makes it faster than having to add more
tests and branches.


Looking at a profile reminded me of how silly it is that we are wasting a good
chunk of the time in these isdigit() checks, even though we already rely on on
the ascii values via (via *ptr++ - '0'). I think that's done in the headers
for some platforms, but not others (glibc). And we've even done already for
octal and binary!

Open coding isdigit() gives us:


HEAD:   797.434

your patch: 803.570

strtoint from 8692f6644e7:  778.943

strtoint from 6b423ec677d^: 777.741


It's somewhat odd that HEAD and your patch switch position here...


> - else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
> + /* process hex digits */
> + else if (ptr[1] == 'x' || ptr[1] == 'X')
>   {
>
>   firstdigit = ptr += 2;

I find this unnecessarily hard to read. I realize it's been added in
6fcda9aba83, but I don't see a reason to use such a construct here.


I find it somewhat grating how much duplication there now is in this
code due to being repeated for all the bases...


Greetings,

Andres Freund




Re: [BUG] Crash on pgbench initialization.

2023-07-24 Thread Andres Freund
Hi,

On 2023-07-24 09:42:44 -0700, Andres Freund wrote:
> > I don't know this code at all, but I hope that this can be solved with
> > just Anton's proposed patch.
> 
> Yes, it's just that off-by-one.  I need to check if there's a similar bug for
> local / temp table buffers though.

Doesn't appear that way. We *do* fail if there's only 1 remaining buffer, but
we already did before my change (because we also need to pin the fsm). I don't
think that's an issue worth worrying about, if all-1 of local buffers are
pinned, you're going to have problems.

Thanks Anton / Victoria for the report and fix. Pushed.

Greetings,

Andres Freund




Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2023-07-24 Thread Thomas Munro
On Tue, Jul 25, 2023 at 8:18 AM Robert Haas  wrote:
> (Yeah, I know we have code to verify checksums during a base
> backup, but as discussed elsewhere, it doesn't work.)

BTW the the code you are referring to there seems to think 4KB
page-halves are atomic; not sure if that's imagining page-level
locking in ancient Linux (?), or imagining default setvbuf() buffer
size observed with some specific implementation of fread(), or
confusing power-failure-sector-based atomicity with concurrent access
atomicity, or something else, but for the record what we actually see
in this scenario on ext4 is the old/new page contents mashed together
on much smaller boundaries (maybe cache lines), caused by duelling
concurrent memcpy() to/from, independent of any buffer/page-level
implementation details we might have been thinking of with that code.
Makes me wonder if it's even technically sound to examine the LSN.

> It's also why we
> have to force full-page write on during a backup. But the whole thing
> is nasty because you can't really verify anything about the backup you
> just took. It may be full of gibberish blocks but don't worry because,
> if all goes well, recovery will fix it. But you won't really know
> whether recovery actually does fix it. You just kind of have to cross
> your fingers and hope.

Well, not without also scanning the WAL for FPIs, anyway...  And
conceptually, that's why I think we probably want an 'FPI' of the
control file somewhere.




Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2023-07-24 Thread Peter Geoghegan
I've been working on a variety of improvements to nbtree's native
ScalarArrayOpExpr execution. This builds on Tom's work in commit
9e8da0f7.

Attached patch is still at the prototype stage. I'm posting it as v1 a
little earlier than I usually would because there has been much back
and forth about it on a couple of other threads involving Tomas Vondra
and Jeff Davis -- seems like it would be easier to discuss with
working code available.

The patch adds two closely related enhancements to ScalarArrayOp
execution by nbtree:

1. Execution of quals with ScalarArrayOpExpr clauses during nbtree
index scans (for equality-strategy SK_SEARCHARRAY scan keys) can now
"advance the scan's array keys locally", which sometimes avoids
significant amounts of unneeded pinning/locking of the same set of
index pages.

SAOP index scans become capable of eliding primitive index scans for
the next set of array keys in line in cases where it isn't truly
necessary to descend the B-Tree again. Index scans are now capable of
"sticking with the existing leaf page for now" when it is determined
that the end of the current set of array keys is physically close to
the start of the next set of array keys (the next set in line to be
materialized by the _bt_advance_array_keys state machine). This is
often possible.

Naturally, we still prefer to advance the array keys in the
traditional way ("globally") much of the time. That means we'll
perform another _bt_first/_bt_search descent of the index, starting a
new primitive index scan. Whether we try to skip pages on the leaf
level or stick with the current primitive index scan (by advancing
array keys locally) is likely to vary a great deal. Even during the
same index scan. Everything is decided dynamically, which is the only
approach that really makes sense.

This optimization can significantly lower the number of buffers pinned
and locked in cases with significant locality, and/or with many array
keys with no matches. The savings (when measured in buffers
pined/locked) can be as high as 10x, 100x, or even more. Benchmarking
has shown that transaction throughput for variants of "pgbench -S"
designed to stress the implementation (hundreds of array constants)
under concurrent load can have up to 5.5x higher transaction
throughput with the patch. Less extreme cases (10 array constants,
spaced apart) see about a 20% improvement in throughput. There are
similar improvements to latency for the patch, in each case.

2. The optimizer now produces index paths with multiple SAOP clauses
(or other clauses we can safely treat as "equality constraints'') on
each of the leading columns from a composite index -- all while
preserving index ordering/useful pathkeys in most cases.

The nbtree work from item 1 is useful even with the simplest IN() list
query involving a scan of a single column index. Obviously, it's very
inefficient for the nbtree code to use 100 primitive index scans when
1 is sufficient. But that's not really why I'm pursuing this project.
My real goal is to implement (or to enable the implementation of) a
whole family of useful techniques for multi-column indexes. I call
these "MDAM techniques", after the 1995 paper "Efficient Search of
Multidimensional B-Trees" [1][2]-- MDAM is short for "multidimensional
access method". In the context of the paper, "dimension" refers to
dimensions in a decision support system.

The most compelling cases for the patch all involve multiple index
columns with multiple SAOP clauses (especially where each column
represents a separate "dimension", in the DSS sense). It's important
that index sort be preserved whenever possible, too. Sometimes this is
directly useful (e.g., because the query has an ORDER BY), but it's
always indirectly needed, on the nbtree side (when the optimizations
are applicable at all). The new nbtree code now has special
requirements surrounding SAOP search type scan keys with composite
indexes. These requirements make changes in the optimizer all but
essential.

Index order
===

As I said, there are cases where preserving index order is immediately
and obviously useful, in and of itself. Let's start there.

Here's a test case that you can run against the regression test database:

pg@regression:5432 =# create index order_by_saop on tenk1(two,four,twenty);
CREATE INDEX

pg@regression:5432 =# EXPLAIN (ANALYZE, BUFFERS)
select ctid, thousand from tenk1
where two in (0,1) and four in (1,2) and twenty in (1,2)
order by two, four, twenty limit 20;

With the patch, this query gets 13 buffer hits. On the master branch,
it gets 1377 buffer hits -- which exceeds the number you'll get from a
sequential scan by about 4x. No coaxing was required to get the
planner to produce this plan on the master branch. Almost all of the
savings shown here are related to heap page buffer hits -- the nbtree
changes don't directly help in this particular example (strictly
speaking, you only need the optimizer changes to get this result).

Obviously, the 

Re: Removing the fixed-size buffer restriction in hba.c

2023-07-24 Thread Michael Paquier
On Mon, Jul 24, 2023 at 08:21:54PM -0400, Tom Lane wrote:
> Yeah, I dithered about that.  I felt like removing it only to put it
> back later would be silly, but then again maybe there won't ever be
> a need to put it back.  I'm OK with removing it if there's not
> objections.

Seeing what this code does, the odds of needing an err_msg seem rather
low to me, so I would just remove it for now for the sake of clarity.
It can always be added later if need be.
--
Michael


signature.asc
Description: PGP signature


Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2023-07-24 Thread Thomas Munro
On Tue, Jul 25, 2023 at 6:04 AM Stephen Frost  wrote:
> * Thomas Munro (thomas.mu...@gmail.com) wrote:
> > Here's a new minimal patch that solves only the bugs in basebackup +
> > the simple SQL-facing functions that read the control file, by simply
> > acquiring ControlFileLock in the obvious places.  This should be
> > simple enough for back-patching?
>
> I don't particularly like fixing this in a way that only works for
> pg_basebackup and means that the users of other backup tools don't have
> a solution to this problem.  What are we supposed to tell users of
> pgBackRest when they see this fix for pg_basebackup in the next set of
> minor releases and they ask us if we've addressed this risk?
>
> We might be able to accept the 're-read on CRC failure' approach, if it
> were being used for pg_controldata and we documented that external
> tools and especially backup tools using the low-level API are required
> to check the CRC and to re-read on failure if accessing a running
> system.

Hi Stephen, David, and thanks for looking.  Alright, let's try that idea out.

0001 + 0002.  Acquire the lock in the obvious places in the backend,
to completely exclude the chance of anything going wrong for the easy
cases, including pg_basebackup.  (This is the v4 from yesterday).
And...

0003.  Document this problem and how to detect it, in the
low-level-backup section.  Better words welcome.  And...

0004.  Retries for front-end programs, using the idea suggested by Tom
(to wit: if it fails, retry until it fails with the same CRC value
twice).  It's theoretically imperfect, but it's highly unlikely to
fail in practice and the best we can do without file locks or a
connection to the server AFAICT.  (This is the first patch I posted,
adjusted to give up after 10 (?) loops, and not to bother at all in
backend code since that takes ControlFileLock with the 0001 patch).

> While it's not ideal, maybe we could get away with changing the contents
> of the backup_label as part of a back-patch?  The documentation, at
> least on a quick look, says to copy the second field from pg_backup_stop
> into a backup_label file but doesn't say anything about what those
> contents are or if they can change.  That would at least address the
> concern of backups ending up with a corrupt pg_control file and not
> being able to be restored, even if tools aren't updated to verify the
> CRC or similar.  Of course, that's a fair bit of code churn for a
> backpatch, which I certainly understand as being risky.  If we can't
> back-patch that, it might still be the way to go moving forward, while
> also telling tools to check the CRC.  (I'm not going to try to figure
> out some back-patchable pretend solution for this for shell scripts that
> pretend to be able to backup running PG databases; this issue is
> probably the least of their issues anyway...)

I think you're probably right that anyone following those instructions
would be OK if we just back-patched such a thing, but it all seems a
little too much to me.  +1 to looking into that for v17 (and I guess
maybe someone could eventually argue for back-patching much later with
experience).  I'm sure other solutions are possible too... other
places to put a safe atomic copy of the control file could include: in
the WAL, or in extra files (pg_control.XXX) in the data directory +
some infallible way for recovery to choose which one to start up from.
Or something.
From 6322d7159ae418582392fa6c0e11863a016378f4 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 31 Jan 2023 20:01:49 +1300
Subject: [PATCH v5 1/4] Acquire ControlFileLock in base backups.

The control file could previously be overwritten while a base backup was
reading it.  On some file systems (ext4, ntfs), the concurrent read
could see a mixture of old and new contents.  The resulting copy would
be corrupted, and the new cluster would not be able to start up.  Other
files are read without interlocking, but those will be corrected by
replaying the log.

Exclude this possibility by acquiring ControlFileLock.  Copy into a
temporary buffer first because we don't want to hold the lock while in
sendFile() code that might sleep if throttling is configured.  This
requires a minor adjustment to sendFileWithContent() to support binary
data.

This does not address the same possibility when using using external
file system copy tools (as described in the documentation under "Making
a Base Backup Using the Low Level API").

Back-patch to all supported releases.

Reviewed-by: Anton A. Melnikov  (earlier version)
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 src/backend/backup/basebackup.c | 36 ++---
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 45be21131c..f03496665f 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -24,6 +24,7 @@
 #include 

Re: Removing the fixed-size buffer restriction in hba.c

2023-07-24 Thread Tom Lane
Michael Paquier  writes:
> I find the choice to keep err_msg in next_token() a bit confusing in
> next_field_expand().  If no errors are possible, why not just get rid
> of it?

Yeah, I dithered about that.  I felt like removing it only to put it
back later would be silly, but then again maybe there won't ever be
a need to put it back.  I'm OK with removing it if there's not
objections.

regards, tom lane




Re: Removing the fixed-size buffer restriction in hba.c

2023-07-24 Thread Michael Paquier
On Mon, Jul 24, 2023 at 03:58:31PM -0700, Nathan Bossart wrote:
> On Mon, Jul 24, 2023 at 03:07:07PM -0300, Fabrízio de Royes Mello wrote:
>>> Given the infrequency of complaints, I'm inclined to apply
>>> the more thorough fix only in HEAD, and to just raise MAX_TOKEN
>>> in the back branches.  Thoughts?
>>>
>> 
>> It makes sense to change it only in HEAD.
> 
> I wouldn't be opposed to back-patching the more thorough fix, but I don't
> feel too strongly about it.

- * The token, if any, is returned at *buf (a buffer of size bufsz), and
+ * The token, if any, is returned into *buf, and
  * *lineptr is advanced past the token.

The comment indentation is a bit off here.

  * In event of an error, log a message at ereport level elevel, and also
- * set *err_msg to a string describing the error.  Currently the only
- * possible error is token too long for buf.
+ * set *err_msg to a string describing the error.  Currently no error
+ * conditions are defined.

I find the choice to keep err_msg in next_token() a bit confusing in
next_field_expand().  If no errors are possible, why not just get rid
of it?

FWIW, I don't feel strongly about backpatching that either, so for the
back branches I'd choose to bump up the token size limit and call it a
day.
--
Michael


signature.asc
Description: PGP signature


Re: Row pattern recognition

2023-07-24 Thread Vik Fearing

On 7/24/23 02:22, Tatsuo Ishii wrote:

What we are talking about here is *defining* a window
frame.

Well, we are defining a "reduced" window frame within a (full) window
frame. A "reduced" window frame is calculated each time when a window
function is called.



Why?  It should only be recalculated when the current row changes and
we need a new frame.  The reduced window frame *is* the window frame
for all functions over that window.


We already recalculate a frame each time a row is processed even
without RPR. See ExecWindowAgg.


Yes, after each row.  Not for each function.


Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
which means the frame head is changed each time current row position
changes.


Off topic for now: I wonder why this restriction is in place and whether 
we should respect or ignore it.  That is a discussion for another time, 
though.



I strongly disagree with this.  Window function do not need to know
how the frame is defined, and indeed they should not.


We already break the rule by defining *support functions. See
windowfuncs.c.


The support functions don't know anything about the frame, they just 
know when a window function is monotonically increasing and execution 
can either stop or be "passed through".



WinGetFuncArgInFrame should answer yes or no and the window function
just works on that. Otherwise we will get extension (and possibly even
core) functions that don't handle the frame properly.


Maybe I can move row_is_in_reduced_frame into WinGetFuncArgInFrame
just for convenience.


I have two comments about this:

It isn't just for convenience, it is for correctness.  The window 
functions do not need to know which rows they are *not* operating on.


There is no such thing as a "full" or "reduced" frame.  The standard 
uses those terms to explain the difference between before and after RPR 
is applied, but window functions do not get to choose which frame they 
apply over.  They only ever apply over the reduced window frame.

--
Vik Fearing





Re: [PATCH] Check more invariants during syscache initialization

2023-07-24 Thread Michael Paquier
On Mon, Jul 24, 2023 at 09:58:15PM +0300, Aleksander Alekseev wrote:
> Currently InitCatalogCache() has only one Assert() for cacheinfo[]
> that checks .reloid. The proposed patch adds sanity checks for the
> rest of the fields.

-   Assert(cacheinfo[cacheId].reloid != 0);
+   Assert(cacheinfo[cacheId].reloid != InvalidOid);
+   Assert(cacheinfo[cacheId].indoid != InvalidOid);
No objections about checking for the index OID given out to catch
any failures at an early stage before doing an actual lookup?  I guess
that you've added an incorrect entry and noticed the problem only when
triggering a syscache lookup for the new incorrect entry?

+   Assert(key[i] != InvalidAttrNumber);

Yeah, same here.

+   Assert((cacheinfo[cacheId].nkeys > 0) && (cacheinfo[cacheId].nkeys <= 
4));

Is this addition actually useful?  The maximum number of lookup keys
is enforced at API level with the SearchSysCache() family.  Or perhaps
you've been able to break this layer in a way I cannot imagine and
were looking at a quick way to spot the inconsistency?
--
Michael


signature.asc
Description: PGP signature


Re: Removing the fixed-size buffer restriction in hba.c

2023-07-24 Thread Nathan Bossart
On Mon, Jul 24, 2023 at 03:07:07PM -0300, Fabrízio de Royes Mello wrote:
> On Mon, Jul 24, 2023 at 2:53 PM Tom Lane  wrote:
>>
>> We got a complaint at [1] about how a not-so-unreasonable LDAP
>> configuration can hit the "authentication file token too long,
>> skipping" error case in hba.c's next_token().  I think we've
>> seen similar complaints before, although a desultory archives
>> search didn't turn one up.
>>
>> A minimum-change response would be to increase the MAX_TOKEN
>> constant from 256 to (say) 1K or 10K.  But it wouldn't be all
>> that hard to replace the fixed-size buffer with a StringInfo,
>> as attached.
>>
> 
> +1 for replacing it with StringInfo. And the patch LGTM!

I'd suggest noting that next_token() clears buf, but otherwise it looks
reasonable to me, too.

>> Given the infrequency of complaints, I'm inclined to apply
>> the more thorough fix only in HEAD, and to just raise MAX_TOKEN
>> in the back branches.  Thoughts?
>>
> 
> It makes sense to change it only in HEAD.

I wouldn't be opposed to back-patching the more thorough fix, but I don't
feel too strongly about it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [BUG] Crash on pgbench initialization.

2023-07-24 Thread Michael Paquier
On Mon, Jul 24, 2023 at 03:54:33PM +0200, Alvaro Herrera wrote:
> I see Michael marked it as such in the open items
> page, but did not CC Andres, so I'm doing that here now.

Indeed, thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Improve pg_stat_statements by making jumble handle savepoint names better

2023-07-24 Thread Michael Paquier
On Mon, Jul 24, 2023 at 04:09:23PM -0400, Greg Sabino Mullane wrote:
> Without the patch, the only solution is to keep raising
> pg_stat_statements.max to larger and larger values to compensate for the
> pollution of the
> statement pool.

Yes, that can be painful depending on your workload.

boolchain;  /* AND CHAIN option */
+   int location;   /* token location, or -1 if unknown */
 } TransactionStmt;
[...]
+   if ($f eq 'savepoint_name') {
+   print $jff "\tJUMBLE_LOCATION(location);\n";
+   next;
+   }

Shouldn't this new field be marked as query_jumble_location instead of
forcing the perl script to do that?  Or is there something in the
structure of TransactionStmt that makes the move difficult because of
the other transaction commands supported?

Testing for new query patterns is very important for such patches.
Could you add something in pg_stat_statements's utility.sql?  I think
that you should check the compilations of at least two savepoints with
different names to see that they compile the same query ID, for a
bunch of patterns in the grammar, say:
BEGIN;
SAVEPOINT p1;
ROLLBACK TO SAVEPOINT p1;
ROLLBACK TRANSACTION TO SAVEPOINT p1;
RELEASE SAVEPOINT p1;
SAVEPOINT p2;
ROLLBACK TO SAVEPOINT p2;
ROLLBACK TRANSACTION TO SAVEPOINT p2;
RELEASE SAVEPOINT p2;
COMMIT;
--
Michael


signature.asc
Description: PGP signature


Re: Avoid unused value (src/fe_utils/print.c)

2023-07-24 Thread Ranier Vilela
Em sex., 21 de jul. de 2023 às 09:13, Karina Litskevich <
litskevichkar...@gmail.com> escreveu:

>
>
> On Wed, Jul 12, 2023 at 1:46 AM Ranier Vilela  wrote:
>
>>  As there is consensus to keep the no-op assignment,
>> I will go ahead and reject the patch.
>>
>
> In your patch you suggest removing two assignments, and we only have
> consensus to keep one of them. The other one is an obvious no-op.
>
> I attached a patch that removes only one assignment. Could you please try
> it and check whether Coverity is still complaining about need_recordsep
> variable?
>
Yeah.
Checked today, Coverity does not complain about need_recordsep.

best regards,
Ranier Vilela


Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2023-07-24 Thread Robert Haas
On Fri, Jul 21, 2023 at 8:52 PM Thomas Munro  wrote:
> Idea for future research:  Perhaps pg_backup_stop()'s label-file
> output should include the control file image (suitably encoded)?  Then
> the recovery-from-label code could completely ignore the existing
> control file, and overwrite it using that copy.  It's already
> partially ignoring it, by using the label file's checkpoint LSN
> instead of the control file's.  Perhaps the captured copy could
> include the correct LSN already, simplifying that code, and the low
> level backup procedure would not need any additional steps or caveats.
> No more atomicity problem for low-level-backups... but probably not
> something we would back-patch, for such a rare failure mode.

I don't really know what the solution is, but this is a general
problem with the low-level backup API, and I think it sucks pretty
hard. Here, we're talking about the control file, but the same problem
exists with the data files. We try to work around that but it's all
hacks. Unless your backup tool has special magic powers of some kind,
you can't take a backup using either pg_basebackup or the low-level
API and then check that individual blocks have valid checksums, or
that they have sensible, interpretable contents, because they might
not. (Yeah, I know we have code to verify checksums during a base
backup, but as discussed elsewhere, it doesn't work.) It's also why we
have to force full-page write on during a backup. But the whole thing
is nasty because you can't really verify anything about the backup you
just took. It may be full of gibberish blocks but don't worry because,
if all goes well, recovery will fix it. But you won't really know
whether recovery actually does fix it. You just kind of have to cross
your fingers and hope.

It's unclear to me how we could do better, especially when using the
low-level API. BASE_BACKUP could read via shared_buffers instead of
the FS, and I think that might be a good idea if we can defend
adequately against cache poisoning, but with the low-level API someone
may just be calling a FS-level snapshot primitive. Unless we're
prepared to pause all writes while that happens, I don't know how to
do better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Improve pg_stat_statements by making jumble handle savepoint names better

2023-07-24 Thread Greg Sabino Mullane
Please find attached a patch to jumble savepoint name, to prevent certain
transactional commands from filling up pg_stat_statements.
This has been a problem with some busy systems that use django, which likes
to wrap everything in uniquely named savepoints. Soon, over 50% of
your pg_stat_statements buffer is filled with savepoint stuff, pushing out
the more useful queries. As each query is unique, it looks like this:

postgres=# select calls, query, queryid from pg_stat_statements where query
~ 'save|release|rollback' order by 2;

 calls |query |   queryid
---+--+--
 1 | release b900150983cd24fb0d6963f7d28e17f7 |  8797482500264589878
 1 | release ed9407630eb1000c0f6b63842defa7de | -9206510099095862114
 1 | rollback | -2049453941623996126
 1 | rollback to c900150983cd24fb0d6963f7d28e17f7 | -5335832667999552746
 1 | savepoint b900150983cd24fb0d6963f7d28e17f7   | -117254996647181
 1 | savepoint c47bce5c74f589f4867dbd57e9ca9f80   |   355123032993044571
 1 | savepoint c900150983cd24fb0d6963f7d28e17f7   | -5921314469994822125
 1 | savepoint d8f8e0260c64418510cefb2b06eee5cd   |  -981090856656063578
 1 | savepoint ed9407630eb1000c0f6b63842defa7de   |   -25952890433218603

As the actual name of the savepoint is not particularly useful, the patch
will basically ignore the savepoint name and allow things to be collapsed:

 calls | query  |   queryid
---++--
 2 | release $1 | -7998168840889089775
 1 | rollback   |  3749380189022910195
 1 | rollback to $1 | -1816677871228308673
 5 | savepoint $1   |  6160699978368237767

Without the patch, the only solution is to keep raising
pg_stat_statements.max to larger and larger values to compensate for the
pollution of the
statement pool.

Cheers,
Greg


savepoint_jumble.v1.patch
Description: Binary data


Re: psql not responding to SIGINT upon db reconnection

2023-07-24 Thread Tristan Partin

On Mon Jul 24, 2023 at 12:43 PM CDT, Tom Lane wrote:

"Tristan Partin"  writes:
> v3 is attached which fixes up some code comments I added which I hadn't 
> attached to the commit already, sigh.


I don't care for this patch at all.  You're bypassing the pqsignal
abstraction layer that the rest of psql goes through, and the behavior
you're implementing isn't very nice.  People do not expect ^C to
kill psql - it should just stop the \c attempt and leave you as you
were.

Admittedly, getting PQconnectdbParams to return control on SIGINT
isn't too practical.  But you could probably replace that with a loop
around PQconnectPoll and test for CancelRequested in the loop.


That sounds like a much better solution. Attached you will find a v4 
that implements your suggestion. Please let me know if there is 
something that I missed. I can confirm that the patch works.


$ ./build/src/bin/psql/psql -h pg.neon.tech
NOTICE:  Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/xxx


NOTICE:  Connecting to database.
psql (17devel, server 15.3)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, 
compression: off)
Type "help" for help.

tristan957=> \c
NOTICE:  Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/yyy


	^Cconnection to server at "pg.neon.tech" (3.18.6.96), port 5432 failed: 
	Previous connection kept
	tristan957=> 


--
Tristan Partin
Neon (https://neon.tech)
From 73a95bc537f205cdac567f3f1860b08cf8323e17 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 24 Jul 2023 11:12:59 -0500
Subject: [PATCH v4] Allow SIGINT to cancel psql database reconnections

After installing the SIGINT handler in psql, SIGINT can no longer cancel
database reconnections. For instance, if the user starts a reconnection
and then needs to do some form of interaction (ie psql is polling),
there is no way to cancel the reconnection process currently.

Use PQconnectStartParams() in order to insert a CancelRequested check
into the polling loop.
---
 src/bin/psql/command.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 6733f008fd..bfecc25801 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -40,6 +40,7 @@
 #include "large_obj.h"
 #include "libpq-fe.h"
 #include "libpq/pqcomm.h"
+#include "libpq/pqsignal.h"
 #include "mainloop.h"
 #include "portability/instr_time.h"
 #include "pqexpbuffer.h"
@@ -3577,11 +3578,33 @@ do_connect(enum trivalue reuse_previous_specification,
 		values[paramnum] = NULL;
 
 		/* Note we do not want libpq to re-expand the dbname parameter */
-		n_conn = PQconnectdbParams(keywords, values, false);
+		n_conn = PQconnectStartParams(keywords, values, false);
 
 		pg_free(keywords);
 		pg_free(values);
 
+		while (true) {
+			int		socket;
+
+			if (CancelRequested)
+break;
+
+			socket = PQsocket(n_conn);
+			if (socket == -1)
+break;
+
+			switch (PQconnectPoll(n_conn)) {
+			case PGRES_POLLING_OK:
+			case PGRES_POLLING_FAILED:
+break;
+			case PGRES_POLLING_READING:
+			case PGRES_POLLING_WRITING:
+continue;
+			case PGRES_POLLING_ACTIVE:
+pg_unreachable();
+			}
+		}
+
 		if (PQstatus(n_conn) == CONNECTION_OK)
 			break;
 
-- 
Tristan Partin
Neon (https://neon.tech)



WaitForOlderSnapshots in DETACH PARTITION causes deadlocks

2023-07-24 Thread Imseih (AWS), Sami
Hi,

While recently looking into partition maintenance, I found a case in which
DETACH PARTITION FINALIZE could case deadlocks. This occurs when a
ALTER TABLE DETACH CONCURRENTLY, followed by a cancel, the followed by
an ALTER TABLE DETACH FINALIZE, and this sequence of steps occur in the
middle of other REPEATABLE READ/SERIALIZABLE transactions. These RR/SERIALIZABLE
should not accessed the partition until the FINALIZE step starts. See the 
attached
repro.

This seems to occur as the FINALIZE is calling WaitForOlderSnapshot [1]
to make sure that all older snapshots are completed before the finalize
is completed and the detached partition is removed from the parent table
association.

WaitForOlderSnapshots is used here to ensure that snapshots older than
the start of the ALTER TABLE DETACH CONCURRENTLY are completely removed
to guarantee consistency, however it does seem to cause deadlocks for at
least RR/SERIALIZABLE transactions.

There are cases [2] in which certain operations are accepted as not being
MVCC-safe, and now I am wondering if this is another case? Deadlocks are
not a good scenario, and WaitForOlderSnapshot does not appear to do
anything for READ COMMITTED transactions.

So do we actually need WaitForOlderSnapshot in the FINALIZE code? and
Could be acceptable that this operation is marked as not MVCC-safe like
the other aforementioned operations?

Perhaps I am missing some important point here, so any feedback will be
appreciated.

Regards,

Sami Imseih
Amazon Web Services (AWS)


[1] 
https://github.com/postgres/postgres/blob/master/src/backend/commands/tablecmds.c#L18757-L18764
[2] https://www.postgresql.org/docs/current/mvcc-caveats.html


partition_detach_finalize_deadlock_repro.sql
Description: partition_detach_finalize_deadlock_repro.sql


Re: cataloguing NOT NULL constraints

2023-07-24 Thread Robert Haas
On Mon, Jul 24, 2023 at 6:33 AM Alvaro Herrera  wrote:
> That's the first thing I proposed actually.  I got one vote down from
> Robert Haas[1], but while the idea seems to have had support from Justin
> Pryzby (in \dt++) [2] and definitely did from Peter Eisentraut [3], I do
> not like it too much myself, mainly because the partition list has a
> very similar treatment and I find that one an annoyance.

I think I might want to retract my earlier -1 vote. I mean, I agree
with former me that having the \d+ output get a whole lot longer is
not super-appealing. But I also agree with Dean that having this
information available somewhere is probably important, and I also
agree with your point that inventing \d++ for this isn't necessarily a
good idea. I fear that will just result in having to type an extra
plus sign any time you want to see all of the table details, to make
sure that psql knows that you really mean it. So, maybe showing it in
the \d+ output as Dean proposes is the least of evils.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: proposal: psql: show current user in prompt

2023-07-24 Thread Pavel Stehule
Hi

po 8. 5. 2023 v 14:22 odesílatel Jelte Fennema  napsal:

> I'm very much in favor of adding a way to support reporting other GUC
> variables than the current hardcoded list. This can be quite useful to
> support some amount of session state in connection poolers.
>
> Some general feedback on the patch:
> 1. I don't think the synchronization mechanism that you added should
> be part of PQlinkParameterStatus. It seems likely for people to want
> to turn on reporting for multiple GUCs in one go. Having to
> synchronize for each would introduce unnecessary round trips. Maybe
> you also don't care about syncing at all at this point in time.
>

I don't understand how it can be possible to do it without.  I need to
process possible errors, and then I need to read and synchronize protocol.
I didn't inject
this feature to some oher flow, so I need to implement a complete process.
For example, PQsetClientEncoding does a PQexec call, which is much more
expensive. Unfortunately, for this feature, I cannot change some local
state variables, but I need to change the state of the server. Own message
is necessary, because we don't want to be limited by the current
transaction state, and then we cannot reuse PQexec.

The API can be changed from PQlinkParameterStatus(PGconn *conn, const char
*paramName)

to

PQlinkParameterStatus(PGconn *conn, int nParamNames, const char *paramNames)

What do you think?

Regards

Pavel


>
> 2. Support for this message should probably require a protocol
> extension. There is another recent thread that discusses about adding
> message types and protocol extensions:
>
> https://www.postgresql.org/message-id/flat/CA%2BTgmoaxfJ3whOqnxTjT-%2BHDgZYbEho7dVHHsuEU2sgRw17OEQ%40mail.gmail.com#acd99fde0c037cc6cb35d565329b6e00
>
> 3. Some tests for this new libpq API should be added in
> src/test/modules/libpq_pipeline
>
> 4. s/massages/messages
>
>
> Finally, I think this patch should be split into two commits:
> 1. adding custom GUC_REPORT protocol support+libpq API
> 2. using this libpq API in psql for the user prompt
>
> If you have multiple commits (which are rebased on master), you can
> very easily create multiple patch files using this command:
> git format-patch master --base=master --reroll-count={version_number_here}
>
>
> On Fri, 28 Apr 2023 at 07:00, Pavel Stehule 
> wrote:
> >
> >
> >
> > čt 27. 4. 2023 v 7:39 odesílatel Pavel Stehule 
> napsal:
> >>
> >> Hi
> >>
> >> rebased version + fix warning possibly uninitialized variable
> >
> >
> > fix not unique function id
> >
> > Regards
> >
> > Pavel
> >
> >>
> >> Regards
> >>
> >> Pavel
> >>
>


Re: cataloguing NOT NULL constraints

2023-07-24 Thread Dean Rasheed
On Mon, 24 Jul 2023 at 17:42, Alvaro Herrera  wrote:
>
> On 2023-Jul-24, Dean Rasheed wrote:
>
> > Something else I noticed: the error message from ALTER TABLE ... ADD
> > CONSTRAINT in the case of a duplicate constraint name is not very
> > friendly:
> >
> > ERROR:  duplicate key value violates unique constraint
> > "pg_constraint_conrelid_contypid_conname_index"
> > DETAIL:  Key (conrelid, contypid, conname)=(16540, 0, nn) already exists.
> >

To reproduce this error, try to create 2 constraints with the same
name on different columns:

create table foo(a int, b int);
alter table foo add constraint nn not null a;
alter table foo add constraint nn not null b;

I found another, separate issue:

create table p1(a int not null);
create table p2(a int);
create table foo () inherits (p1,p2);
alter table p2 add not null a;

ERROR:  column "a" of table "foo" is already NOT NULL

whereas doing "alter table p2 alter column a set not null" works OK,
merging the constraints as expected.

Regards,
Dean




Re: Inefficiency in parallel pg_restore with many tables

2023-07-24 Thread Nathan Bossart
On Sat, Jul 22, 2023 at 10:57:03PM -0700, Nathan Bossart wrote:
> On Sat, Jul 22, 2023 at 07:47:50PM -0400, Tom Lane wrote:
>> I wonder whether we can't provide some alternate definition or "skin"
>> for binaryheap that preserves the Datum API for backend code that wants
>> that, while providing a void *-based API for frontend code to use.
> 
> I can give this a try next, but it might be rather #ifdef-heavy.

Here is a sketch of this approach.  It required fewer #ifdefs than I was
expecting.  At the moment, this one seems like the winner to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 13017359cc6dd35f3e465280373b4b82e87b7c5b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 22 Jul 2023 15:04:45 -0700
Subject: [PATCH v5 1/3] make binaryheap available to frontend

---
 src/backend/lib/Makefile |  1 -
 src/backend/lib/meson.build  |  1 -
 src/common/Makefile  |  1 +
 src/{backend/lib => common}/binaryheap.c | 37 +---
 src/common/meson.build   |  1 +
 src/include/lib/binaryheap.h | 23 ++-
 6 files changed, 44 insertions(+), 20 deletions(-)
 rename src/{backend/lib => common}/binaryheap.c (91%)

diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 9dad31398a..b6cefd9cca 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -13,7 +13,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = \
-	binaryheap.o \
 	bipartite_match.o \
 	bloomfilter.o \
 	dshash.o \
diff --git a/src/backend/lib/meson.build b/src/backend/lib/meson.build
index 974cab8776..b4e88f54ae 100644
--- a/src/backend/lib/meson.build
+++ b/src/backend/lib/meson.build
@@ -1,7 +1,6 @@
 # Copyright (c) 2022-2023, PostgreSQL Global Development Group
 
 backend_sources += files(
-  'binaryheap.c',
   'bipartite_match.c',
   'bloomfilter.c',
   'dshash.c',
diff --git a/src/common/Makefile b/src/common/Makefile
index 113029bf7b..cc5c54dcee 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -48,6 +48,7 @@ LIBS += $(PTHREAD_LIBS)
 OBJS_COMMON = \
 	archive.o \
 	base64.o \
+	binaryheap.o \
 	checksum_helper.o \
 	compression.o \
 	config_info.o \
diff --git a/src/backend/lib/binaryheap.c b/src/common/binaryheap.c
similarity index 91%
rename from src/backend/lib/binaryheap.c
rename to src/common/binaryheap.c
index 1737546757..a6be4b42ae 100644
--- a/src/backend/lib/binaryheap.c
+++ b/src/common/binaryheap.c
@@ -6,15 +6,22 @@
  * Portions Copyright (c) 2012-2023, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
- *	  src/backend/lib/binaryheap.c
+ *	  src/common/binaryheap.c
  *
  *-
  */
 
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
 #include "postgres.h"
+#endif
 
 #include 
 
+#ifdef FRONTEND
+#include "common/logging.h"
+#endif
 #include "lib/binaryheap.h"
 
 static void sift_down(binaryheap *heap, int node_off);
@@ -34,7 +41,7 @@ binaryheap_allocate(int capacity, binaryheap_comparator compare, void *arg)
 	int			sz;
 	binaryheap *heap;
 
-	sz = offsetof(binaryheap, bh_nodes) + sizeof(Datum) * capacity;
+	sz = offsetof(binaryheap, bh_nodes) + sizeof(bh_elem_type) * capacity;
 	heap = (binaryheap *) palloc(sz);
 	heap->bh_space = capacity;
 	heap->bh_compare = compare;
@@ -106,10 +113,14 @@ parent_offset(int i)
  * afterwards.
  */
 void
-binaryheap_add_unordered(binaryheap *heap, Datum d)
+binaryheap_add_unordered(binaryheap *heap, bh_elem_type d)
 {
 	if (heap->bh_size >= heap->bh_space)
+#ifdef FRONTEND
+		pg_fatal("out of binary heap slots");
+#else
 		elog(ERROR, "out of binary heap slots");
+#endif
 	heap->bh_has_heap_property = false;
 	heap->bh_nodes[heap->bh_size] = d;
 	heap->bh_size++;
@@ -138,10 +149,14 @@ binaryheap_build(binaryheap *heap)
  * the heap property.
  */
 void
-binaryheap_add(binaryheap *heap, Datum d)
+binaryheap_add(binaryheap *heap, bh_elem_type d)
 {
 	if (heap->bh_size >= heap->bh_space)
+#ifdef FRONTEND
+		pg_fatal("out of binary heap slots");
+#else
 		elog(ERROR, "out of binary heap slots");
+#endif
 	heap->bh_nodes[heap->bh_size] = d;
 	heap->bh_size++;
 	sift_up(heap, heap->bh_size - 1);
@@ -154,7 +169,7 @@ binaryheap_add(binaryheap *heap, Datum d)
  * without modifying the heap. The caller must ensure that this
  * routine is not used on an empty heap. Always O(1).
  */
-Datum
+bh_elem_type
 binaryheap_first(binaryheap *heap)
 {
 	Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
@@ -169,10 +184,10 @@ binaryheap_first(binaryheap *heap)
  * that this routine is not used on an empty heap. O(log n) worst
  * case.
  */
-Datum
+bh_elem_type
 binaryheap_remove_first(binaryheap *heap)
 {
-	Datum		result;
+	bh_elem_type result;
 
 	Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
 
@@ -204,7 +219,7 @@ binaryheap_remove_first(binaryheap *heap)
  * sifting the new node down.
  */
 void

Re: Use of additional index columns in rows filtering

2023-07-24 Thread Peter Geoghegan
On Mon, Jul 24, 2023 at 11:37 AM Jeff Davis  wrote:
> I see. You're concerned that lowering the cost of an index scan path
> too much, due to pushing down a clause as an Index Filter, could cause
> it to out-compete a more "robust" plan.

The optimizer correctly determines that 3 index scans (plus a bitmap
OR node) are more expensive than 1 very similar index scan. It's hard
to argue with that.

> That might be true but I'm not sure what to do about that unless we
> incorporate some "robustness" measure into the costing. If every
> measure we have says one plan is better, don't we have to choose it?

I'm mostly concerned about the possibility itself -- it's not a matter
of tuning the cost. I agree that that approach would probably be
hopeless.

There is a principled (albeit fairly involved) way of addressing this.
The patch allows the optimizer to produce a plan that has 1 index
scan, that treats the first column as an index qual, and the second
column as a filter condition. There is no fundamental reason why we
can't just have 1 index scan that makes both columns into index quals
(instead of 3 highly duplicative variants of the same index scan).
That's what I'm working towards right now.

> If I understand correctly, you mean we couldn't use an Index Filter on
> a key column? That seems overly restrictive, there are plenty of
> clauses that might be useful as an Index Filter but cannot be an Index
> Cond for one reason or another.

I think that you're probably right about it being overly restrictive
-- that was just a starting point for discussion. Perhaps there is an
identifiable class of clauses that can benefit, but don't have the
downside that I'm concerned about.

-- 
Peter Geoghegan




[PATCH] Check more invariants during syscache initialization

2023-07-24 Thread Aleksander Alekseev
Hi,

Currently InitCatalogCache() has only one Assert() for cacheinfo[]
that checks .reloid. The proposed patch adds sanity checks for the
rest of the fields.

-- 
Best regards,
Aleksander Alekseev


v1-0001-Check-more-invariants-during-syscache-initializat.patch
Description: Binary data


Re: Use of additional index columns in rows filtering

2023-07-24 Thread Tom Lane
Jeff Davis  writes:
> On Mon, 2023-07-24 at 13:58 -0400, Tom Lane wrote:
>> Please do not put in any code that assumes that restriction clause
>> order is preserved, or encourages users to think it is.

> Agreed. I didn't mean to add any extra guarantee of preserving clause
> order; just to follow the current way order_qual_clauses() works, which
> has a comment saying:
> "So we just order by security level then estimated per-tuple cost,
> being careful not to change the order when (as is often the case) the
> estimates are identical."
> I assumed that the reason for "being careful" above was to not
> unnecessarily override how the user writes the qual clauses, but
> perhaps there's another reason?

I think the point was just to not make any unnecessary behavioral
changes from the way things were before we added that sorting logic.
But there are other places that will result in clause ordering changes,
plus there's the whole business of possibly-intermixed restriction and
join clauses.

regards, tom lane




Re: Use of additional index columns in rows filtering

2023-07-24 Thread Jeff Davis
On Mon, 2023-07-24 at 10:54 -0700, Peter Geoghegan wrote:
> The case in question shows a cheaper plan replacing a more expensive
> plan -- so it's a win by every conventional measure. But, the new
> plan
> is less robust in the sense that I described yesterday: it will be
> much slower than the current plan when there happens to be many more
> "thousand = 42" tuples than expected. We have a very high chance of a
> small benefit (we save repeated index page accesses), but a very low
> chance of a high cost (we incur many more heap accesses). Which seems
> less than ideal.

I see. You're concerned that lowering the cost of an index scan path
too much, due to pushing down a clause as an Index Filter, could cause
it to out-compete a more "robust" plan.

That might be true but I'm not sure what to do about that unless we
incorporate some "robustness" measure into the costing. If every
measure we have says one plan is better, don't we have to choose it?

> The original concern was limited to non-key columns from INCLUDE
> indexes. If you only apply the optimization there then you don't run
> the risk of generating a path that "out competes" a more robust path
> in the sense that I've described. This is obviously true because
> there
> can't possibly be index quals/scan keys for non-key columns within
> the
> index AM.

If I understand correctly, you mean we couldn't use an Index Filter on
a key column? That seems overly restrictive, there are plenty of
clauses that might be useful as an Index Filter but cannot be an Index
Cond for one reason or another.

Regards,
Jeff Davis





Re: Use of additional index columns in rows filtering

2023-07-24 Thread Jeff Davis
On Mon, 2023-07-24 at 13:58 -0400, Tom Lane wrote:
> Please do not put in any code that assumes that restriction clause
> order is preserved, or encourages users to think it is.

Agreed. I didn't mean to add any extra guarantee of preserving clause
order; just to follow the current way order_qual_clauses() works, which
has a comment saying:

"So we just order by security level then estimated per-tuple cost,
being careful not to change the order when (as is often the case) the
estimates are identical."

I assumed that the reason for "being careful" above was to not
unnecessarily override how the user writes the qual clauses, but
perhaps there's another reason?

Regardless, my point was just to make minimal changes now that are
unlikely to cause regressions. If we come up with better ways of
ordering the clauses later, that could be part of a separate change. (I
think Peter G. is pointing out a complication with that idea, to which
I'll respond separately.)

Regards,
Jeff Davis





Re: Removing the fixed-size buffer restriction in hba.c

2023-07-24 Thread Fabrízio de Royes Mello
On Mon, Jul 24, 2023 at 2:53 PM Tom Lane  wrote:
>
> We got a complaint at [1] about how a not-so-unreasonable LDAP
> configuration can hit the "authentication file token too long,
> skipping" error case in hba.c's next_token().  I think we've
> seen similar complaints before, although a desultory archives
> search didn't turn one up.
>
> A minimum-change response would be to increase the MAX_TOKEN
> constant from 256 to (say) 1K or 10K.  But it wouldn't be all
> that hard to replace the fixed-size buffer with a StringInfo,
> as attached.
>

+1 for replacing it with StringInfo. And the patch LGTM!


>
> Given the infrequency of complaints, I'm inclined to apply
> the more thorough fix only in HEAD, and to just raise MAX_TOKEN
> in the back branches.  Thoughts?
>

It makes sense to change it only in HEAD.

Regards,

--
Fabrízio de Royes Mello


Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2023-07-24 Thread Stephen Frost
Greetings,

(Adding David Steele into the CC on this one...)

* Thomas Munro (thomas.mu...@gmail.com) wrote:
> This is a frustrating thread, because despite the last patch solving
> most of the problems we discussed, it doesn't address the
> low-level-backup procedure in a nice way.  We'd have to tell users
> they have to flock that file, or add a new step "pg_controldata --raw
> > pg_control", which seems weird when they already have a connection
> to the server.

flock'ing the file strikes me as dangerous to ask external tools to do
due to the chances of breaking the running system if they don't do it
right.  I'm generally open to the idea of having the backup tool have to
do more complicated work to be correct but that just seems likely to
cause issues.  Also- haven't looked yet, but I'm not sure that's even
possible if your backup tool is running as a user who only has read
access to the data directory?  I don't want us to give up on that
feature.

> Maybe it just doesn't matter if eg the pg_controldata program can
> spuriously fail if pointed at a running system, and I was being too
> dogmatic trying to fix even that.  Maybe we should just focus on
> fixing backups.  Even there, I am beginning to suspect we are solving
> this problem in the wrong place when a higher level change could
> simplify the problem away.

For a running system.. perhaps pg_controldata should be connecting to
the database and calling functions there?  Or just complain that the
system is online and tell the user to do that?

> Idea for future research:  Perhaps pg_backup_stop()'s label-file
> output should include the control file image (suitably encoded)?  Then
> the recovery-from-label code could completely ignore the existing
> control file, and overwrite it using that copy.  It's already
> partially ignoring it, by using the label file's checkpoint LSN
> instead of the control file's.  Perhaps the captured copy could
> include the correct LSN already, simplifying that code, and the low
> level backup procedure would not need any additional steps or caveats.
> No more atomicity problem for low-level-backups... but probably not
> something we would back-patch, for such a rare failure mode.

I like this general direction and wonder if we could possibly even push
a bit harder on it: have the backup_label include the control file's
contents in some form that is understood and then tell tools to *not*
copy the running pg_control file ... and maybe even complain if a
pg_control file exists when we detect that backup_label has the control
file's image.  We've certainly had problems in the past where people
would just nuke the backup_label file, even though they were restoring
from a backup, because they couldn't start the system up since their
restore command wasn't set up properly or their WAL archive was missing.

Being able to get rid of the control file being in the backup at all
would make it harder for someone to get to a corrupted-but-running
system and that seems like it's a good thing.

> Here's a new minimal patch that solves only the bugs in basebackup +
> the simple SQL-facing functions that read the control file, by simply
> acquiring ControlFileLock in the obvious places.  This should be
> simple enough for back-patching?

I don't particularly like fixing this in a way that only works for
pg_basebackup and means that the users of other backup tools don't have
a solution to this problem.  What are we supposed to tell users of
pgBackRest when they see this fix for pg_basebackup in the next set of
minor releases and they ask us if we've addressed this risk?

We might be able to accept the 're-read on CRC failure' approach, if it
were being used for pg_controldata and we documented that external
tools and especially backup tools using the low-level API are required
to check the CRC and to re-read on failure if accessing a running
system.

While it's not ideal, maybe we could get away with changing the contents
of the backup_label as part of a back-patch?  The documentation, at
least on a quick look, says to copy the second field from pg_backup_stop
into a backup_label file but doesn't say anything about what those
contents are or if they can change.  That would at least address the
concern of backups ending up with a corrupt pg_control file and not
being able to be restored, even if tools aren't updated to verify the
CRC or similar.  Of course, that's a fair bit of code churn for a
backpatch, which I certainly understand as being risky.  If we can't
back-patch that, it might still be the way to go moving forward, while
also telling tools to check the CRC.  (I'm not going to try to figure
out some back-patchable pretend solution for this for shell scripts that
pretend to be able to backup running PG databases; this issue is
probably the least of their issues anyway...)

A couple of interesting notes on this though- pgBackRest doesn't only
read the pg_control file at backup time, we also check it at
archive_command time, to 

Re: Use of additional index columns in rows filtering

2023-07-24 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Jul 24, 2023 at 10:36 AM Jeff Davis  wrote:
>> Could we start out conservatively and push down as an Index Filter
>> unless there is some other clause ahead of it that can't be pushed
>> down? That would allow users to have some control by writing clauses in
>> the desired order or wrapping them in functions with a declared cost.

> I'm a bit concerned about cases like the one I described from the
> regression tests.

Please do not put in any code that assumes that restriction clause
order is preserved, or encourages users to think it is.  There are
already cases where that's not so, eg equivalence clauses tend to
get shuffled around.

regards, tom lane




Re: Use of additional index columns in rows filtering

2023-07-24 Thread Peter Geoghegan
On Mon, Jul 24, 2023 at 10:36 AM Jeff Davis  wrote:
> I'm getting a bit lost in this discussion as well -- for the purposes
> of this feature, we only need to know whether to push down a clause as
> an Index Filter or not, right?

I think so.

> Could we start out conservatively and push down as an Index Filter
> unless there is some other clause ahead of it that can't be pushed
> down? That would allow users to have some control by writing clauses in
> the desired order or wrapping them in functions with a declared cost.

I'm a bit concerned about cases like the one I described from the
regression tests.

The case in question shows a cheaper plan replacing a more expensive
plan -- so it's a win by every conventional measure. But, the new plan
is less robust in the sense that I described yesterday: it will be
much slower than the current plan when there happens to be many more
"thousand = 42" tuples than expected. We have a very high chance of a
small benefit (we save repeated index page accesses), but a very low
chance of a high cost (we incur many more heap accesses). Which seems
less than ideal.

One obvious way of avoiding that problem (that's probably overly
conservative) is to just focus on the original complaint from Maxim.
The original concern was limited to non-key columns from INCLUDE
indexes. If you only apply the optimization there then you don't run
the risk of generating a path that "out competes" a more robust path
in the sense that I've described. This is obviously true because there
can't possibly be index quals/scan keys for non-key columns within the
index AM.

-- 
Peter Geoghegan




Removing the fixed-size buffer restriction in hba.c

2023-07-24 Thread Tom Lane
We got a complaint at [1] about how a not-so-unreasonable LDAP
configuration can hit the "authentication file token too long,
skipping" error case in hba.c's next_token().  I think we've
seen similar complaints before, although a desultory archives
search didn't turn one up.

A minimum-change response would be to increase the MAX_TOKEN
constant from 256 to (say) 1K or 10K.  But it wouldn't be all
that hard to replace the fixed-size buffer with a StringInfo,
as attached.

Given the infrequency of complaints, I'm inclined to apply
the more thorough fix only in HEAD, and to just raise MAX_TOKEN
in the back branches.  Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/PH0PR04MB8294A4C5A65D9D492CBBD349C002A%40PH0PR04MB8294.namprd04.prod.outlook.com

diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 5d4ddbb04d..629463a00f 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -56,8 +56,6 @@
 #endif
 
 
-#define MAX_TOKEN	256
-
 /* callback data for check_network_callback */
 typedef struct check_network_data
 {
@@ -161,8 +159,8 @@ pg_isblank(const char c)
  * commas, beginning of line, and end of line.  Blank means space or tab.
  *
  * Tokens can be delimited by double quotes (this allows the inclusion of
- * blanks or '#', but not newlines).  As in SQL, write two double-quotes
- * to represent a double quote.
+ * commas, blanks, and '#', but not newlines).  As in SQL, write two
+ * double-quotes to represent a double quote.
  *
  * Comments (started by an unquoted '#') are skipped, i.e. the remainder
  * of the line is ignored.
@@ -171,7 +169,7 @@ pg_isblank(const char c)
  * Thus, if a continuation occurs within quoted text or a comment, the
  * quoted text or comment is considered to continue to the next line.)
  *
- * The token, if any, is returned at *buf (a buffer of size bufsz), and
+ * The token, if any, is returned into *buf, and
  * *lineptr is advanced past the token.
  *
  * Also, we set *initial_quote to indicate whether there was quoting before
@@ -180,30 +178,28 @@ pg_isblank(const char c)
  * we want to allow that to support embedded spaces in file paths.)
  *
  * We set *terminating_comma to indicate whether the token is terminated by a
- * comma (which is not returned).
+ * comma (which is not returned, nor advanced over).
  *
  * In event of an error, log a message at ereport level elevel, and also
- * set *err_msg to a string describing the error.  Currently the only
- * possible error is token too long for buf.
+ * set *err_msg to a string describing the error.  Currently no error
+ * conditions are defined.
  *
- * If successful: store null-terminated token at *buf and return true.
- * If no more tokens on line: set *buf = '\0' and return false.
+ * If successful: store dequoted token in *buf and return true.
+ * If no more tokens on line: set *buf to empty and return false.
  * If error: fill buf with truncated or misformatted token and return false.
  */
 static bool
-next_token(char **lineptr, char *buf, int bufsz,
+next_token(char **lineptr, StringInfo buf,
 		   bool *initial_quote, bool *terminating_comma,
 		   int elevel, char **err_msg)
 {
 	int			c;
-	char	   *start_buf = buf;
-	char	   *end_buf = buf + (bufsz - 1);
 	bool		in_quote = false;
 	bool		was_quote = false;
 	bool		saw_quote = false;
 
-	Assert(end_buf > start_buf);
-
+	/* Initialize output parameters */
+	resetStringInfo(buf);
 	*initial_quote = false;
 	*terminating_comma = false;
 
@@ -226,22 +222,6 @@ next_token(char **lineptr, char *buf, int bufsz,
 			break;
 		}
 
-		if (buf >= end_buf)
-		{
-			*buf = '\0';
-			ereport(elevel,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-	 errmsg("authentication file token too long, skipping: \"%s\"",
-			start_buf)));
-			*err_msg = "authentication file token too long";
-			/* Discard remainder of line */
-			while ((c = (*(*lineptr)++)) != '\0')
-;
-			/* Un-eat the '\0', in case we're called again */
-			(*lineptr)--;
-			return false;
-		}
-
 		/* we do not pass back a terminating comma in the token */
 		if (c == ',' && !in_quote)
 		{
@@ -250,7 +230,7 @@ next_token(char **lineptr, char *buf, int bufsz,
 		}
 
 		if (c != '"' || was_quote)
-			*buf++ = c;
+			appendStringInfoChar(buf, c);
 
 		/* Literal double-quote is two double-quotes */
 		if (in_quote && c == '"')
@@ -262,7 +242,7 @@ next_token(char **lineptr, char *buf, int bufsz,
 		{
 			in_quote = !in_quote;
 			saw_quote = true;
-			if (buf == start_buf)
+			if (buf->len == 0)
 *initial_quote = true;
 		}
 
@@ -275,9 +255,7 @@ next_token(char **lineptr, char *buf, int bufsz,
 	 */
 	(*lineptr)--;
 
-	*buf = '\0';
-
-	return (saw_quote || buf > start_buf);
+	return (saw_quote || buf->len > 0);
 }
 
 /*
@@ -409,21 +387,23 @@ static List *
 next_field_expand(const char *filename, char **lineptr,
   int elevel, int depth, char **err_msg)
 {
-	char		buf[MAX_TOKEN];
+	StringInfoData buf;
 	bool		trailing_comma;
 	bool		

Re: psql not responding to SIGINT upon db reconnection

2023-07-24 Thread Tom Lane
"Tristan Partin"  writes:
> v3 is attached which fixes up some code comments I added which I hadn't 
> attached to the commit already, sigh.

I don't care for this patch at all.  You're bypassing the pqsignal
abstraction layer that the rest of psql goes through, and the behavior
you're implementing isn't very nice.  People do not expect ^C to
kill psql - it should just stop the \c attempt and leave you as you
were.

Admittedly, getting PQconnectdbParams to return control on SIGINT
isn't too practical.  But you could probably replace that with a loop
around PQconnectPoll and test for CancelRequested in the loop.

regards, tom lane




Re: Use of additional index columns in rows filtering

2023-07-24 Thread Jeff Davis
On Sun, 2023-07-23 at 14:04 +0200, Tomas Vondra wrote:
> But I'm getting a bit lost regarding what's the proposed costing
> strategy. It's hard to follow threads spanning days, with various
> other
> distractions, etc.

I'm getting a bit lost in this discussion as well -- for the purposes
of this feature, we only need to know whether to push down a clause as
an Index Filter or not, right?

Could we start out conservatively and push down as an Index Filter
unless there is some other clause ahead of it that can't be pushed
down? That would allow users to have some control by writing clauses in
the desired order or wrapping them in functions with a declared cost.

Regards,
Jeff Davis





Re: Inefficiency in parallel pg_restore with many tables

2023-07-24 Thread Pierre Ducroquet
On Saturday, July 15, 2023 7:47:12 PM CEST Tom Lane wrote:
> I'm not sure how big a deal this is in practice: in most situations
> the individual jobs are larger than they are in this toy example,
> plus the initial non-parallelizable part of the restore is a bigger
> bottleneck anyway with this many tables.  Still, we do have one
> real-world complaint, so maybe we should look into improving it.

Hi

For what it's worth, at my current job it's kind of a big deal. I was going to 
start looking at the bad performance I got on pg_restore for some databases 
with over 50k tables (in 200 namespaces) when I found this thread. The dump 
weights in about 2,8GB, the toc.dat file is 230MB, 50 120 tables, 142 069 
constraints and 73 669 indexes.

HEAD pg_restore duration: 30 minutes
pg_restore with latest patch from Nathan Bossart: 23 minutes

This is indeed better, but there is still a lot of room for improvements. With 
such usecases, I was able to go much faster using the patched pg_restore with 
a script that parallelize on each schema instead of relying on the choices 
made by pg_restore. It seems the choice of parallelizing only the data loading 
is losing nice speedup opportunities with a huge number of objects.

patched pg_restore + parallel restore of schemas: 10 minutes

Anyway, the patch works really fine as is, and I will certainly keep trying 
future iterations.

Regards

 Pierre







Re: cataloguing NOT NULL constraints

2023-07-24 Thread Vik Fearing

On 7/24/23 18:42, Alvaro Herrera wrote:

55490 17devel 3166154=# create table foo (a int constraint nn not null);
CREATE TABLE
55490 17devel 3166154=# alter table foo add constraint nn not null a;
ERROR:  column "a" of table "foo" is already NOT NULL


Surely this should be a WARNING or INFO?  I see no reason to ERROR here.
--
Vik Fearing





Re: cataloguing NOT NULL constraints

2023-07-24 Thread Alvaro Herrera
Hello,

While discussing the matter of multiple constraints with Vik Fearing, I
noticed that we were throwing an unnecessary error if you used

CREATE TABLE foo (a int NOT NULL NOT NULL);

That would die with "redundant NOT NULL declarations", but current
master doesn't do that; and we don't do it for UNIQUE UNIQUE either.
So I modified the patch to make it ignore the dupe and create a single
constraint.  This (and rebasing to current master) are the only changes
in v15.

I have not changed the psql presentation, but I'll do as soon as we have
rough consensus on what to do.  To reiterate, the options are:

1. Don't show the constraint names.  This is what the current patch does

2. Show the constraint name in \d+ in the "nullable" column.
   I did this early on, to much booing.

3. Show the constraint name in \d++ (a new command) tabular output

4. Show the constraint name in the footer of \d+
   I also did this at some point; there are some +1s and some -1s.

5. Show the constraint name in the footer of \d++


Many thanks, Dean, for the discussion so far.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree.  (Don Knuth)




Re: Permute underscore separated components of columns before fuzzy matching

2023-07-24 Thread Mikhail Gribkov
Hello Arne,

yep, now the warnings have gone. And I must thank you for quite a fun time
I had here testing your patch :) I tried even some weird combinations like
this:
postgres=# create table t("_ __ ___" int);
CREATE TABLE
postgres=# select "__ _ ___" from t;
ERROR:  column "__ _ ___" does not exist
LINE 1: select "__ _ ___" from t;
   ^
HINT:  Perhaps you meant to reference the column "t._ __ ___".
postgres=# select "___ __ _" from t;
ERROR:  column "___ __ _" does not exist
LINE 1: select "___ __ _" from t;
   ^
HINT:  Perhaps you meant to reference the column "t._ __ ___".
postgres=#

... and it still worked fine.
Honestly I'm not entirely sure fixing only two switched words is worth the
effort, but the declared goal is clearly achieved.

I think the patch is good to go, although you need to fix code formatting.
At least the char*-definition and opening "{" brackets are conspicuous.
Maybe there are more: it is worth running pgindend tool.

And it would be much more convenient to work with your patch if every next
version file will have a unique name (maybe something like "_v2", "_v3"
etc. suffixes)

--
 best regards,
Mikhail A. Gribkov

e-mail: youzh...@gmail.com
*http://www.flickr.com/photos/youzhick/albums
*
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick



On Mon, Jul 17, 2023 at 1:42 AM Arne Roland  wrote:

> Hello Mikhail,
>
> I'm sorry. Please try attached patch instead.
>
> Thank you for having a look!
>
> Regards
> Arne
>
> --
> *From:* Mikhail Gribkov 
> *Sent:* Thursday, July 6, 2023 13:31
> *To:* Arne Roland 
> *Cc:* Pg Hackers 
> *Subject:* Re: Permute underscore separated components of columns before
> fuzzy matching
>
> Hello Arne,
>
> The goal of supporting words-switching hints sounds interesting and I've
> tried to apply your patch.
> The patch was applied smoothly to the latest master and check-world
> reported no problems. Although I had problems after trying to test the new
> functionality.
>
> I tried to simply mix words in pg_stat_activity.wait_event_type:
>
> postgres=# select wait_type_event from pg_stat_activity ;
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end
> in MessageContext 0x559d668aaf30
> WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
> 2023-07-06 14:12:35.968 MSK [1480] WARNING:  

Re: psql not responding to SIGINT upon db reconnection

2023-07-24 Thread Tristan Partin
v3 is attached which fixes up some code comments I added which I hadn't 
attached to the commit already, sigh.


--
Tristan Partin
Neon (https://neon.tech)
From 7f9554944911c77aa1a1900537a91e1e7bd75d93 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 24 Jul 2023 11:12:59 -0500
Subject: [PATCH v3] Allow SIGINT to cancel psql database reconnections

After installing the SIGINT handler in psql, SIGINT can no longer cancel
database reconnections. For instance, if the user starts a reconnection
and then needs to do some form of interaction (ie psql is polling),
there is no way to cancel the reconnection process currently.

Restore the default SIGINT handler while polling in PQconnectdbParams(),
and put the custom handler back afterward.
---
 src/bin/psql/command.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 6733f008fd..11dedbb4c6 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -40,6 +40,7 @@
 #include "large_obj.h"
 #include "libpq-fe.h"
 #include "libpq/pqcomm.h"
+#include "libpq/pqsignal.h"
 #include "mainloop.h"
 #include "portability/instr_time.h"
 #include "pqexpbuffer.h"
@@ -3530,8 +3531,11 @@ do_connect(enum trivalue reuse_previous_specification,
 	{
 		const char **keywords = pg_malloc((nconnopts + 1) * sizeof(*keywords));
 		const char **values = pg_malloc((nconnopts + 1) * sizeof(*values));
+		int			rc;
 		int			paramnum = 0;
 		PQconninfoOption *ci;
+		struct sigaction oldact;
+		struct sigaction newact = { 0 };
 
 		/*
 		 * Copy non-default settings into the PQconnectdbParams parameter
@@ -3576,9 +3580,24 @@ do_connect(enum trivalue reuse_previous_specification,
 		keywords[paramnum] = NULL;
 		values[paramnum] = NULL;
 
+		/*
+		 * Restore the default SIGINT behavior while within libpq. Otherwise,
+		 * SIGINT can never exit from polling for the database connection.
+		 * Failure to restore the default is non-fatal.
+		 */
+		newact.sa_handler = SIG_DFL;
+		rc = sigaction(SIGINT, , );
+
 		/* Note we do not want libpq to re-expand the dbname parameter */
 		n_conn = PQconnectdbParams(keywords, values, false);
 
+		/*
+		 * Only attempt to restore the old handler if we successfully overrode
+		 * it initially.
+		 */
+		if (rc == 0)
+			sigaction(SIGINT, , NULL);
+
 		pg_free(keywords);
 		pg_free(values);
 
-- 
Tristan Partin
Neon (https://neon.tech)



Re: psql not responding to SIGINT upon db reconnection

2023-07-24 Thread Tristan Partin

v2 is attached which fixes a grammatical issue in a comment.

--
Tristan Partin
Neon (https://neon.tech)
From b9ccfc3c84a25b8616fd40495954bb6f77788e28 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 24 Jul 2023 11:12:59 -0500
Subject: [PATCH v2] Allow SIGINT to cancel psql database reconnections

After installing the SIGINT handler in psql, SIGINT can no longer cancel
database reconnections. For instance, if the user starts a reconnection
and then needs to do some form of interaction (ie psql is polling),
there is no way to cancel the reconnection process currently.

Restore the default SIGINT handler while polling in PQconnectdbParams(),
and put the custom handler back afterward.
---
 src/bin/psql/command.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 6733f008fd..e40413a229 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -40,6 +40,7 @@
 #include "large_obj.h"
 #include "libpq-fe.h"
 #include "libpq/pqcomm.h"
+#include "libpq/pqsignal.h"
 #include "mainloop.h"
 #include "portability/instr_time.h"
 #include "pqexpbuffer.h"
@@ -3530,8 +3531,11 @@ do_connect(enum trivalue reuse_previous_specification,
 	{
 		const char **keywords = pg_malloc((nconnopts + 1) * sizeof(*keywords));
 		const char **values = pg_malloc((nconnopts + 1) * sizeof(*values));
+		int			rc;
 		int			paramnum = 0;
 		PQconninfoOption *ci;
+		struct sigaction oldact;
+		struct sigaction newact = { 0 };
 
 		/*
 		 * Copy non-default settings into the PQconnectdbParams parameter
@@ -3576,9 +3580,24 @@ do_connect(enum trivalue reuse_previous_specification,
 		keywords[paramnum] = NULL;
 		values[paramnum] = NULL;
 
+		/*
+		 * Restore the default SIGINT behavior while within libpq. Otherwise, we
+		 * can never exit from polling for the database connection. Failure to
+		 * restore is non-fatal.
+		 */
+		newact.sa_handler = SIG_DFL;
+		rc = sigaction(SIGINT, , );
+
 		/* Note we do not want libpq to re-expand the dbname parameter */
 		n_conn = PQconnectdbParams(keywords, values, false);
 
+		/*
+		 * Only attempt to restore the old handler if we successfully overwrote
+		 * it initially.
+		 */
+		if (rc == 0)
+			sigaction(SIGINT, , NULL);
+
 		pg_free(keywords);
 		pg_free(values);
 
-- 
Tristan Partin
Neon (https://neon.tech)



Re: psql not responding to SIGINT upon db reconnection

2023-07-24 Thread Tristan Partin

On Mon Jul 24, 2023 at 12:00 PM CDT, Gurjeet Singh wrote:

On Mon, Jul 24, 2023 at 9:26 AM Tristan Partin  wrote:

> attached patch

+/*
+ * Restore the default SIGINT behavior while within libpq.
Otherwise, we
+ * can never exit from polling for the database connection. Failure to
+ * restore is non-fatal.
+ */
+newact.sa_handler = SIG_DFL;
+rc = sigaction(SIGINT, , );

There's no action taken if rc != 0. It doesn't seem right to
continue as if everything's fine when the handler registration fails.
At least a warning is warranted, so that the user reports such
failures to the community.


If we fail to go back to the default handler, then we just get the 
behavior we currently have. I am not sure logging a message like "Failed 
to restore default SIGINT handler" is that useful to the user. It isn't 
actionable, and at the end of the day isn't going to affect them for the 
most part. They also aren't even aware that default handler was ever 
overridden in the first place. I'm more than happy to add a debug log if 
it is the blocker to getting this patch accepted however.


--
Tristan Partin
Neon (https://neon.tech)




Re: psql not responding to SIGINT upon db reconnection

2023-07-24 Thread Gurjeet Singh
On Mon, Jul 24, 2023 at 9:26 AM Tristan Partin  wrote:

> attached patch

+/*
+ * Restore the default SIGINT behavior while within libpq.
Otherwise, we
+ * can never exit from polling for the database connection. Failure to
+ * restore is non-fatal.
+ */
+newact.sa_handler = SIG_DFL;
+rc = sigaction(SIGINT, , );

There's no action taken if rc != 0. It doesn't seem right to
continue as if everything's fine when the handler registration fails.
At least a warning is warranted, so that the user reports such
failures to the community.

Best regards,
Gurjeet
http://Gurje.et




Re: [BUG] Crash on pgbench initialization.

2023-07-24 Thread Andres Freund
Hi,

On 2023-07-24 15:54:33 +0200, Alvaro Herrera wrote:
> On 2023-Jul-24, Michael Paquier wrote:
> 
> > On Sun, Jul 23, 2023 at 11:21:47PM +0300, Anton A. Melnikov wrote:
> > > After some research, found this happens when the LimitAdditionalPins() 
> > > returns exactly zero.
> > > In the current master, this will happen e.g. if shared_buffers = 10MB and 
> > > max_worker_processes = 40.
> > > Then the command "pgbench --initialize postgres" will lead to crash.
> > > See the backtrace attached.
> > > 
> > > There is a fix in the patch applied. Please take a look on it.
> > 
> > Nice catch, issue reproduced here so I am adding an open item for now.
> > (I have not looked at the patch, yet.)
> 
> Hmm, I see that all this code was added by 31966b151e6a, which makes
> this Andres' item.  I see Michael marked it as such in the open items
> page, but did not CC Andres, so I'm doing that here now.

Thanks - I had indeed not seen this.  I can't really keep up with the list at
all times...


> I don't know this code at all, but I hope that this can be solved with
> just Anton's proposed patch.

Yes, it's just that off-by-one.  I need to check if there's a similar bug for
local / temp table buffers though.

Greetings,

Andres Freund




Re: cataloguing NOT NULL constraints

2023-07-24 Thread Alvaro Herrera
On 2023-Jul-24, Dean Rasheed wrote:

> Something else I noticed: the error message from ALTER TABLE ... ADD
> CONSTRAINT in the case of a duplicate constraint name is not very
> friendly:
> 
> ERROR:  duplicate key value violates unique constraint
> "pg_constraint_conrelid_contypid_conname_index"
> DETAIL:  Key (conrelid, contypid, conname)=(16540, 0, nn) already exists.
> 
> To match the error message for other constraint types, this should be:
> 
> ERROR:  constraint "nn" for relation "foo" already exists

Hmm, how did you get this one?  I can't reproduce it:

55490 17devel 3166154=# create table foo (a int constraint nn not null);
CREATE TABLE
55490 17devel 3166154=# alter table foo add constraint nn not null a;
ERROR:  column "a" of table "foo" is already NOT NULL

55490 17devel 3166154=# drop table foo;
DROP TABLE

55490 17devel 3166154=# create table foo (a int);
CREATE TABLE
Duración: 1,472 ms
55490 17devel 3166154=# alter table foo add constraint nn not null a, add 
constraint nn not null a;
ERROR:  column "a" of table "foo" is already NOT NULL

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente" (UPM, 1972)




psql not responding to SIGINT upon db reconnection

2023-07-24 Thread Tristan Partin
Neon provides a quick start mechanism for psql using the following 
workflow:


$ psql -h pg.neon.tech
NOTICE:  Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/xxx

Upon navigating to the link, the user selects their database to work 
with. The psql process is then connected to a Postgres database at Neon.


psql provides a way to reconnect to the database from within itself: \c. 
If, after connecting to a database such as the process previously 
mentioned, the user starts a reconnection to the database from within 
psql, the process will refuse to interrupt the reconnection attempt 
after sending SIGINT to it. 


tristan=> \c
NOTICE:  Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/yyy

^C
^C^C
^C^C

I am not really sure if this was a problem on Windows machines, but the 
attached patch should support it if it does. If this was never a problem 
on Windows, it might be best to check for any regressions.


This was originally discussed in the a GitHub issue[0] at Neon.

[0]: https://github.com/neondatabase/neon/issues/1449

--
Tristan Partin
Neon (https://neon.tech)
From b9ccfc3c84a25b8616fd40495954bb6f77788e28 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 24 Jul 2023 11:12:59 -0500
Subject: [PATCH v1] Allow SIGINT to cancel psql database reconnections

After installing the SIGINT handler in psql, SIGINT can no longer cancel
database reconnections. For instance, if the user starts a reconnection
and then needs to do some form of interaction (ie psql is polling),
there is no way to cancel the reconnection process currently.

Restore the default SIGINT handler while polling in PQconnectdbParams(),
and put the custom handler back afterward.
---
 src/bin/psql/command.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 6733f008fd..e40413a229 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -40,6 +40,7 @@
 #include "large_obj.h"
 #include "libpq-fe.h"
 #include "libpq/pqcomm.h"
+#include "libpq/pqsignal.h"
 #include "mainloop.h"
 #include "portability/instr_time.h"
 #include "pqexpbuffer.h"
@@ -3530,8 +3531,11 @@ do_connect(enum trivalue reuse_previous_specification,
 	{
 		const char **keywords = pg_malloc((nconnopts + 1) * sizeof(*keywords));
 		const char **values = pg_malloc((nconnopts + 1) * sizeof(*values));
+		int			rc;
 		int			paramnum = 0;
 		PQconninfoOption *ci;
+		struct sigaction oldact;
+		struct sigaction newact = { 0 };
 
 		/*
 		 * Copy non-default settings into the PQconnectdbParams parameter
@@ -3576,9 +3580,24 @@ do_connect(enum trivalue reuse_previous_specification,
 		keywords[paramnum] = NULL;
 		values[paramnum] = NULL;
 
+		/*
+		 * Restore the default SIGINT behavior while within libpq. Otherwise, we
+		 * can never exit from polling for the database connection. Failure to
+		 * restore is non-fatal.
+		 */
+		newact.sa_handler = SIG_DFL;
+		rc = sigaction(SIGINT, , );
+
 		/* Note we do not want libpq to re-expand the dbname parameter */
 		n_conn = PQconnectdbParams(keywords, values, false);
 
+		/*
+		 * Only attempt to restore the old handler if we successfully overwrote
+		 * it initially.
+		 */
+		if (rc == 0)
+			sigaction(SIGINT, , NULL);
+
 		pg_free(keywords);
 		pg_free(values);
 
-- 
Tristan Partin
Neon (https://neon.tech)



Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Tomas Vondra
On 7/24/23 12:40, Amit Kapila wrote:
> On Wed, Jul 5, 2023 at 8:21 PM Ashutosh Bapat
>  wrote:
>>
>> 0005, 0006 and 0007 are all related to the initial sequence sync. [3]
>> resulted in 0007 and I think we need it. That leaves 0005 and 0006 to
>> be reviewed in this response.
>>
>> I followed the discussion starting [1] till [2]. The second one
>> mentions the interlock mechanism which has been implemented in 0005
>> and 0006. While I don't have an objection to allowing LOCKing a
>> sequence using the LOCK command, I am not sure whether it will
>> actually work or is even needed.
>>
>> The problem described in [1] seems to be the same as the problem
>> described in [2]. In both cases we see the sequence moving backwards
>> during CATCHUP. At the end of catchup the sequence is in the right
>> state in both the cases.
>>
> 
> I think we could see backward sequence value even after the catchup
> phase (after the sync worker is exited and or the state of rel is
> marked as 'ready' in pg_subscription_rel). The point is that there is
> no guarantee that we will process all the pending WAL before
> considering the sequence state is 'SYNCDONE' and or 'READY'. For
> example, after copy_sequence, I see values like:
> 
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
> 165 |   0 | t
> (1 row)
> postgres=# select nextval('s');
>  nextval
> -
>  166
> (1 row)
> postgres=# select nextval('s');
>  nextval
> -
>  167
> (1 row)
> postgres=# select currval('s');
>  currval
> -
>  167
> (1 row)
> 
> Then during the catchup phase:
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
>  33 |   0 | t
> (1 row)
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
>  66 |   0 | t
> (1 row)
> 
> postgres=# select * from pg_subscription_rel;
>  srsubid | srrelid | srsubstate | srsublsn
> -+-++---
>16394 |   16390 | r  | 0/16374E8
>16394 |   16393 | s  | 0/1637700
> (2 rows)
> 
> postgres=# select * from pg_subscription_rel;
>  srsubid | srrelid | srsubstate | srsublsn
> -+-++---
>16394 |   16390 | r  | 0/16374E8
>16394 |   16393 | r  | 0/1637700
> (2 rows)
> 
> Here Sequence relid id 16393. You can see sequence state is marked as ready.
> 

Right, but "READY" just means the apply caught up if the LSN where the
sync finished ...

> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
>  66 |   0 | t
> (1 row)
> 
> Even after that, see below the value of the sequence is still not
> caught up. Later, when the apply worker processes all the WAL, the
> sequence state will be caught up.
> 

And how is this different from what tablesync does for tables? For that
'r' also does not mean it's fully caught up, IIRC. What matters is
whether the sequence since this moment can go back. And I don't think it
can, because that would require replaying changes from before we did
copy_sequence ...

> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
> 165 |   0 | t
> (1 row)
> 
> So, there will be a window where the sequence won't be caught up for a
> certain period of time and any usage of it (even after the sync is
> finished) during that time could result in inconsistent behaviour.
> 
> The other question is whether it is okay to allow the sequence to go
> backwards even during the initial sync phase? The reason I am asking
> this question is that for the time sequence value moves backwards, one
> is allowed to use it on the subscriber which will result in using
> out-of-sequence values. For example, immediately, after copy_sequence
> the values look like this:
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
> 133 |  32 | t
> (1 row)
> postgres=# select nextval('s');
>  nextval
> -
>  134
> (1 row)
> postgres=# select currval('s');
>  currval
> -
>  134
> (1 row)
> 
> But then during the sync phase, it can go backwards and one is allowed
> to use it on the subscriber:
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
>  66 |   0 | t
> (1 row)
> postgres=# select nextval('s');
>  nextval
> -
>   67
> (1 row)
> 

As I wrote earlier, I think the agreement was we make no guarantees
about what happens during the sync.

Also, not sure what you mean by "no one is allowed to use it on
subscriber" - that is only allowed after a failover/switchover, after
sequence sync completes.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Alvaro Herrera
On 2023-Jul-24, Tomas Vondra wrote:

> On 7/24/23 13:14, Alvaro Herrera wrote:

> > Do you mind if I get this one pushed later today?  Or feel free to push
> > it yourself, if you want.  It's an annoying patch to keep seeing posted
> > over and over, with no further value.  
> 
> Feel free to push. It's your patch, after all.

Thanks, done.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Learn about compilers. Then everything looks like either a compiler or
a database, and now you have two problems but one of them is fun."
https://twitter.com/thingskatedid/status/1456027786158776329




Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Tomas Vondra



On 7/24/23 14:53, Ashutosh Bapat wrote:
> On Thu, Jul 20, 2023 at 8:22 PM Tomas Vondra
>  wrote:
> 
>>>
>>> PFA such edits in 0002 and 0006 patches. Let me know if those look
>>> correct. I think we
>>> need similar changes to the documentation and comments in other places.
>>>
>>
>> OK, I merged the changes into the patches, with some minor changes to
>> the wording etc.
> 
> Thanks.
> 
> 
>>
>>> In sequence_decode() we skip sequence changes when fast forwarding.
>>> Given that smgr_decode() is only to supplement sequence_decode(), I
>>> think it's correct to do the same in smgr_decode() as well. Simillarly
>>> skipping when we don't have full snapshot.
>>>
>>
>> I don't follow, smgr_decode already checks ctx->fast_forward.
> 
> In your earlier email you seemed to expressed some doubts about the
> change skipping code in smgr_decode(). To that, I gave my own
> perspective of why the change skipping code in smgr_decode() is
> correct. I think smgr_decode is doing the right  thing, IMO. No change
> required there.
> 

I think that was referring to the skipping we do for logical messages:

if (message->transactional &&
!SnapBuildProcessChange(builder, xid, buf->origptr))
return;
else if (!message->transactional &&
 (SnapBuildCurrentState(builder) != SNAPBUILD_CONSISTENT ||
  SnapBuildXactNeedsSkip(builder, buf->origptr)))
return;

I concluded we don't need to do that here.

>>
>>> Some minor comments on 0006 patch
>>>
>>> +/* make sure the relfilenode creation is associated with the XID */
>>> +if (XLogLogicalInfoActive())
>>> +GetCurrentTransactionId();
>>>
>>> I think this change is correct and is inline with similar changes in 0002. 
>>> But
>>> I looked at other places from where DefineRelation() is called. For regular
>>> tables it is called from ProcessUtilitySlow() which in turn does not call
>>> GetCurrentTransactionId(). I am wondering whether we are just discovering a
>>> class of bugs caused by not associating an xid with a newly created
>>> relfilenode.
>>>
>>
>> Not sure. Why would it be a bug?
> 
> This discussion is unrelated to sequence decoding but let me add it
> here. If we don't know the transaction ID that created a relfilenode,
> we wouldn't know whether to roll back that creation if the transaction
> gets rolled back during recovery. But maybe that doesn't matter since
> the relfilenode is not visible in any of the catalogs, so it just lies
> there unused.
> 

I think that's unrelated to this patch.

> 
>>
>>> +void
>>> +ReorderBufferAddRelFileLocator(ReorderBuffer *rb, TransactionId xid,
>>> +   RelFileLocator rlocator)
>>> +{
>>> ... snip ...
>>> +
>>> +/* sequence changes require a transaction */
>>> +if (xid == InvalidTransactionId)
>>> +return;
>>>
>>> IIUC, with your changes in DefineSequence() in this patch, this should not
>>> happen. So this condition will never be true. But in case it happens, this 
>>> code
>>> will not add the relfilelocation to the hash table and we will deem the
>>> sequence change as non-transactional. Isn't it better to just throw an error
>>> and stop replication if that (ever) happens?
>>>
>>
>> It can't happen for sequence, but it may happen when creating a
>> non-sequence relfilenode. In a way, it's a way to skip (some)
>> unnecessary relfilenodes.
> 
> Ah! The comment is correct but cryptic. I didn't read it to mean this.
> 

OK, I'll improve the comment.

>>> +/*
>>> + * To support logical decoding of sequences, we require the sequence
>>> + * callback. We decide it here, but only check it later in the 
>>> wrappers.
>>> + *
>>> + * XXX Isn't it wrong to define only one of those callbacks? Say we
>>> + * only define the stream_sequence_cb() - that may get strange results
>>> + * depending on what gets streamed. Either none or both?
>>> + *
>>> + * XXX Shouldn't sequence be defined at slot creation time, similar
>>> + * to two_phase? Probably not.
>>> + */
>>>
>>> Do you intend to keep these XXX's as is? My previous comments on this 
>>> comment
>>> block are in [1].
> 
> This comment remains unanswered.
> 

I think the conclusion was we don't need to do that. I forgot to remove
the comment, though.

>>>
>>> In fact, given that whether or not sequences are replicated is decided by 
>>> the
>>> protocol version, do we really need LogicalDecodingContext::sequences? 
>>> Drawing
>>> parallel with WAL messages, I don't think it's needed.
>>>
>>
>> Right. We do that for two_phase because you can override that when
>> creating the subscription - sequences allowed that too initially, but
>> then we ditched that. So I don't think we need this.
> 
> Then we should just remove that member and its references.
> 

The member is still needed - it says whether the plugin has callbacks
for sequence decoding or not (just like we have a flag for streaming,
for example). I see the XXX comment in 

Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-07-24 Thread Alvaro Herrera
On 2023-Jul-20, Andrew Dunstan wrote:

> On 2023-07-20 Th 05:52, Alvaro Herrera wrote:
> > On 2023-Jul-19, Andrew Dunstan wrote:
> > 
> > > The result you report suggest to me that somehow the old version is no
> > > longer a PostgreSQL::Version object.  Here's the patch I suggest:
> > Ahh, okay, that makes more sense; and yes, it does work.
> 
> Your patch LGTM

Thanks for looking.  I pushed it to 16 and master.  I considered
applying all the way down to 9.2, but I decided it'd be pointless.
We can backpatch later if we find there's need.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree.  (Don Knuth)




bt_recheck_sibling_links contrib/amcheck/verify_nbtree.c

2023-07-24 Thread jian he
hi.
I am confused with bt_recheck_sibling_links.

>
>  ereport(ERROR,
>  (errcode(ERRCODE_INDEX_CORRUPTED),
>  errmsg("left link/right link pair in index \"%s\" not in agreement",
> RelationGetRelationName(state->rel)),
> errdetail_internal("Block=%u left block=%u left link from block=%u.",
> state->targetblock, leftcurrent,
> btpo_prev_from_target)));

should we put the above code into the branch {if (!state->readonly)} ?
I can understand it if put inside.
now the whole function be like:

if (!state->readonly)
{}
erreport{}.

-
if (btpo_prev_from_target == leftcurrent)
{
/* Report split in left sibling, not target (or new target) */
ereport(DEBUG1,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg_internal("harmless concurrent page split detected in index \"%s\"",
RelationGetRelationName(state->rel)),
errdetail_internal("Block=%u new right sibling=%u original right sibling=%u.",
leftcurrent, newtargetblock,
state->targetblock)));
return;
}

only concurrency read case,  (btpo_prev_from_target == leftcurrent)
will be true? If so, then I am confused with the ereport content.




Re: [BUG] Crash on pgbench initialization.

2023-07-24 Thread Alvaro Herrera
On 2023-Jul-24, Michael Paquier wrote:

> On Sun, Jul 23, 2023 at 11:21:47PM +0300, Anton A. Melnikov wrote:
> > After some research, found this happens when the LimitAdditionalPins() 
> > returns exactly zero.
> > In the current master, this will happen e.g. if shared_buffers = 10MB and 
> > max_worker_processes = 40.
> > Then the command "pgbench --initialize postgres" will lead to crash.
> > See the backtrace attached.
> > 
> > There is a fix in the patch applied. Please take a look on it.
> 
> Nice catch, issue reproduced here so I am adding an open item for now.
> (I have not looked at the patch, yet.)

Hmm, I see that all this code was added by 31966b151e6a, which makes
this Andres' item.  I see Michael marked it as such in the open items
page, but did not CC Andres, so I'm doing that here now.

I don't know this code at all, but I hope that this can be solved with
just Anton's proposed patch.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)




Re: Use COPY for populating all pgbench tables

2023-07-24 Thread Tristan Partin

Michael,

Once again I appreciate your patience with this patchset. Thanks for 
your help and reviews.


--
Tristan Partin
Neon (https://neon.tech)




Re: POC: GROUP BY optimization

2023-07-24 Thread Tomas Vondra



On 7/24/23 14:04, Andrey Lepikhov wrote:
> On 24/7/2023 16:56, Tomas Vondra wrote:
>> On 7/24/23 04:10, Andrey Lepikhov wrote:
>>> On 20/7/2023 18:46, Tomas Vondra wrote:
 On 7/20/23 08:37, Andrey Lepikhov wrote:
> On 3/10/2022 21:56, Tom Lane wrote:
>> Revert "Optimize order of GROUP BY keys".
>>
>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>> several follow-on fixes.
>> ...
>> Since we're hard up against the release deadline for v15, let's
>> revert these changes for now.  We can always try again later.
>
> It may be time to restart the project. As a first step, I rebased the
> patch on the current master. It wasn't trivial because of some latest
> optimizations (a29eab, 1349d27 and 8d83a5d).
> Now, Let's repeat the review and rewrite the current path according to
> the reasons uttered in the revert commit.

 I think the fundamental task is to make the costing more reliable, and
 the commit message 443df6e2db points out a couple challenges in this
 area. Not sure how feasible it is to address enough of them ...

 1) procost = 1.0 - I guess we could make this more realistic by doing
 some microbenchmarks and tuning the costs for the most expensive cases.

 2) estimating quicksort comparisons - This relies on ndistinct
 estimates, and I'm not sure how much more reliable we can make those.
 Probably not much :-( Not sure what to do about this, the only thing I
 can think of is to track "reliability" of the estimates and only do the
 reordering if we have high confidence in the estimates. That means
 we'll
 miss some optimization opportunities, but it should limit the risk.
>>> I read up on the history of this thread.
>>> As I see, all the problems mentioned above can be beaten by excluding
>>> the new cost model at all. We can sort GROUP BY columns according to the
>>> 'ndistinct' value.
>>> I see the reason for introducing the cost model in [1]. The main
>>> supporting point here is that with this patch, people couldn't optimize
>>> the query by themselves, organizing the order of the columns in a more
>>> optimal way. But now we have at least the GUC to switch off the
>>> behaviour introduced here. Also, some extensions, like the well-known
>>> pg_hint_plan, can help with automation.
>>
>> I think the main concern is that if we reorder the group keys and get it
>> wrong, it's a regression. If that happens often (due to costing based on
>> poor stats), it's a problem. Yes, there's a GUC, but that's a rather
>> blunt instrument, unfortunately.
> I see. My point here is if the ndistinct of one column is much more than
> the ndistinct of another, it is more probable that this correlation will
> be kept in the grouping phase. Here we can get some regression, which
> can be overweighed by the possibility below.

I think the word "probable" hits what the problem is. Because what if it
isn't? Also, we don't actually need ndistinct for individual attributes,
but for the groups of attributes. Imagine you do

GROUP BY a, b

so you need to estimate whether to do (a,b) or (b,a). You need to calculate

  ndistinct(a,b) / ndistinct(a)

and

  ndistinct(b,a) / ndistinct(b)

And similarly for more grouping keys. This is why we have ndistinct
extended statistics, mostly.

>>
>>> So, how about committing of the undoubted part of the feature and
>>> working on the cost model in a new thread?
>>>
>>
>> But Tom's commit message says this:
>>
>>  Worse, to arrive at estimates of the number of calls made to the
>>  lower-order-column comparison functions, the code needs to make
>>  estimates of the numbers of distinct values of multiple columns,
>>  which are necessarily even less trustworthy than per-column stats.
>>
>> so I'm not sure this really counts as "undoubted".
> Don't try to estimate multiple  columns - just sort columns according to
> the value of ndistinct as a heuristic.

Surely you realize how easy such simple heuristics can fail, right?

> I think we should estimate the number of values of multiple columns only
> if we have extended statistics on these columns. And this can extend the
> applicability of extended statistics.
> 

I don't see why this should estimate ndistinct differently than every
other place. That'd certainly be surprising. I can be convinced, but
there needs to be sound justification why that's the right way to do
(i.e. why it's less likely to cause poor planning choices).

> The suggestion above is just an attempt to gather low-hanging fruit
> right now. If it is not applicable, we will go a long way.
> 
I'm not sure this qualities as "low hanging fruit" - that generally
means simple changes with little risk. But you're using it for rather
simplistic heuristic that can easily misfire (I think). At least that's
my impression, I can be convinced otherwise.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The 

Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Ashutosh Bapat
On Thu, Jul 20, 2023 at 10:19 PM Tomas Vondra
 wrote:
>
> FWIW there's two questions related to the switch to XLOG_SMGR_CREATE.
>
> 1) Does smgr_decode() need to do the same block as sequence_decode()?
>
> /* Skip the change if already processed (per the snapshot). */
> if (transactional &&
> !SnapBuildProcessChange(builder, xid, buf->origptr))
> return;
> else if (!transactional &&
>  (SnapBuildCurrentState(builder) != SNAPBUILD_CONSISTENT ||
>   SnapBuildXactNeedsSkip(builder, buf->origptr)))
> return;
>
> I don't think it does. Also, we don't have any transactional flag here.
> Or rather, everything is transactional ...

Right.

>
>
> 2) Currently, the sequences hash table is in reorderbuffer, i.e. global.
> I was thinking maybe we should have it in the transaction (because we
> need to do cleanup at the end). It seem a bit inconvenient, because then
> we'd need to either search htabs in all subxacts, or transfer the
> entries to the top-level xact (otoh, we already do that with snapshots),
> and cleanup on abort.
>
> What do you think?

Hash table per transaction seems saner design. Adding it to the top
level transaction should be fine. The entry will contain an XID
anyway. If we add it to every subtransaction we will need to search
hash table in each of the subtransactions when deciding whether a
sequence change is transactional or not. Top transaction is a
reasonable trade off.

-- 
Best Wishes,
Ashutosh Bapat




Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Tomas Vondra



On 7/24/23 13:14, Alvaro Herrera wrote:
> On 2023-Jul-20, Tomas Vondra wrote:
> 
>> From 809d60be7e636b8505027ad87bcb9fc65224c47b Mon Sep 17 00:00:00 2001
>> From: Tomas Vondra 
>> Date: Wed, 5 Apr 2023 22:49:41 +0200
>> Subject: [PATCH 1/6] Make test_decoding ddl.out shorter
>>
>> Some of the test_decoding test output was extremely wide, because it
>> deals with toasted values, and the aligned mode causes psql to produce
>> 200kB of dashes. Turn that off temporarily using \pset to avoid it.
> 
> Do you mind if I get this one pushed later today?  Or feel free to push
> it yourself, if you want.  It's an annoying patch to keep seeing posted
> over and over, with no further value.  
> 

Feel free to push. It's your patch, after all.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Ashutosh Bapat
On Thu, Jul 20, 2023 at 8:22 PM Tomas Vondra
 wrote:

> >
> > PFA such edits in 0002 and 0006 patches. Let me know if those look
> > correct. I think we
> > need similar changes to the documentation and comments in other places.
> >
>
> OK, I merged the changes into the patches, with some minor changes to
> the wording etc.

Thanks.


>
> > In sequence_decode() we skip sequence changes when fast forwarding.
> > Given that smgr_decode() is only to supplement sequence_decode(), I
> > think it's correct to do the same in smgr_decode() as well. Simillarly
> > skipping when we don't have full snapshot.
> >
>
> I don't follow, smgr_decode already checks ctx->fast_forward.

In your earlier email you seemed to expressed some doubts about the
change skipping code in smgr_decode(). To that, I gave my own
perspective of why the change skipping code in smgr_decode() is
correct. I think smgr_decode is doing the right  thing, IMO. No change
required there.

>
> > Some minor comments on 0006 patch
> >
> > +/* make sure the relfilenode creation is associated with the XID */
> > +if (XLogLogicalInfoActive())
> > +GetCurrentTransactionId();
> >
> > I think this change is correct and is inline with similar changes in 0002. 
> > But
> > I looked at other places from where DefineRelation() is called. For regular
> > tables it is called from ProcessUtilitySlow() which in turn does not call
> > GetCurrentTransactionId(). I am wondering whether we are just discovering a
> > class of bugs caused by not associating an xid with a newly created
> > relfilenode.
> >
>
> Not sure. Why would it be a bug?

This discussion is unrelated to sequence decoding but let me add it
here. If we don't know the transaction ID that created a relfilenode,
we wouldn't know whether to roll back that creation if the transaction
gets rolled back during recovery. But maybe that doesn't matter since
the relfilenode is not visible in any of the catalogs, so it just lies
there unused.


>
> > +void
> > +ReorderBufferAddRelFileLocator(ReorderBuffer *rb, TransactionId xid,
> > +   RelFileLocator rlocator)
> > +{
> > ... snip ...
> > +
> > +/* sequence changes require a transaction */
> > +if (xid == InvalidTransactionId)
> > +return;
> >
> > IIUC, with your changes in DefineSequence() in this patch, this should not
> > happen. So this condition will never be true. But in case it happens, this 
> > code
> > will not add the relfilelocation to the hash table and we will deem the
> > sequence change as non-transactional. Isn't it better to just throw an error
> > and stop replication if that (ever) happens?
> >
>
> It can't happen for sequence, but it may happen when creating a
> non-sequence relfilenode. In a way, it's a way to skip (some)
> unnecessary relfilenodes.

Ah! The comment is correct but cryptic. I didn't read it to mean this.

> > +/*
> > + * To support logical decoding of sequences, we require the sequence
> > + * callback. We decide it here, but only check it later in the 
> > wrappers.
> > + *
> > + * XXX Isn't it wrong to define only one of those callbacks? Say we
> > + * only define the stream_sequence_cb() - that may get strange results
> > + * depending on what gets streamed. Either none or both?
> > + *
> > + * XXX Shouldn't sequence be defined at slot creation time, similar
> > + * to two_phase? Probably not.
> > + */
> >
> > Do you intend to keep these XXX's as is? My previous comments on this 
> > comment
> > block are in [1].

This comment remains unanswered.

> >
> > In fact, given that whether or not sequences are replicated is decided by 
> > the
> > protocol version, do we really need LogicalDecodingContext::sequences? 
> > Drawing
> > parallel with WAL messages, I don't think it's needed.
> >
>
> Right. We do that for two_phase because you can override that when
> creating the subscription - sequences allowed that too initially, but
> then we ditched that. So I don't think we need this.

Then we should just remove that member and its references.

-- 
Best Wishes,
Ashutosh Bapat




Re: logicalrep_message_type throws an error

2023-07-24 Thread Ashutosh Bapat
On Sat, Jul 22, 2023 at 10:18 AM Amit Kapila  wrote:

> >
> > I think it could confuse us if an invalid message is not a printable 
> > character.
> >
>

That's a good point.

> Right. I'll push and backpatch this till 15 by Tuesday unless you guys
> think otherwise.

WFM.

-- 
Best Wishes,
Ashutosh Bapat




Re: POC: GROUP BY optimization

2023-07-24 Thread Andrey Lepikhov

On 24/7/2023 16:56, Tomas Vondra wrote:

On 7/24/23 04:10, Andrey Lepikhov wrote:

On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.


I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

I read up on the history of this thread.
As I see, all the problems mentioned above can be beaten by excluding
the new cost model at all. We can sort GROUP BY columns according to the
'ndistinct' value.
I see the reason for introducing the cost model in [1]. The main
supporting point here is that with this patch, people couldn't optimize
the query by themselves, organizing the order of the columns in a more
optimal way. But now we have at least the GUC to switch off the
behaviour introduced here. Also, some extensions, like the well-known
pg_hint_plan, can help with automation.


I think the main concern is that if we reorder the group keys and get it
wrong, it's a regression. If that happens often (due to costing based on
poor stats), it's a problem. Yes, there's a GUC, but that's a rather
blunt instrument, unfortunately.
I see. My point here is if the ndistinct of one column is much more than 
the ndistinct of another, it is more probable that this correlation will 
be kept in the grouping phase. Here we can get some regression, which 
can be overweighed by the possibility below.



So, how about committing of the undoubted part of the feature and
working on the cost model in a new thread?



But Tom's commit message says this:

 Worse, to arrive at estimates of the number of calls made to the
 lower-order-column comparison functions, the code needs to make
 estimates of the numbers of distinct values of multiple columns,
 which are necessarily even less trustworthy than per-column stats.

so I'm not sure this really counts as "undoubted".
Don't try to estimate multiple  columns - just sort columns according to 
the value of ndistinct as a heuristic.
I think we should estimate the number of values of multiple columns only 
if we have extended statistics on these columns. And this can extend the 
applicability of extended statistics.


The suggestion above is just an attempt to gather low-hanging fruit 
right now. If it is not applicable, we will go a long way.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Alvaro Herrera
On 2023-Jul-20, Tomas Vondra wrote:

> From 809d60be7e636b8505027ad87bcb9fc65224c47b Mon Sep 17 00:00:00 2001
> From: Tomas Vondra 
> Date: Wed, 5 Apr 2023 22:49:41 +0200
> Subject: [PATCH 1/6] Make test_decoding ddl.out shorter
> 
> Some of the test_decoding test output was extremely wide, because it
> deals with toasted values, and the aligned mode causes psql to produce
> 200kB of dashes. Turn that off temporarily using \pset to avoid it.

Do you mind if I get this one pushed later today?  Or feel free to push
it yourself, if you want.  It's an annoying patch to keep seeing posted
over and over, with no further value.  

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")




Re: cataloguing NOT NULL constraints

2023-07-24 Thread Alvaro Herrera
On 2023-Jul-24, Dean Rasheed wrote:

> Hmm, I'm not so sure. I think perhaps multiple NOT NULL constraints on
> the same column should just be allowed, otherwise things might get
> confusing. For example:
> 
 create table p1 (a int not null check (a > 0));
 create table p2 (a int not null check (a > 0));
 create table foo () inherits (p1, p2);

Have a look at the conislocal / coninhcount values.  These should
reflect the fact that the constraint has multiple sources; and the
constraint does disappear if you drop it from both sources.

> If I then drop the p1 constraints:
> 
> alter table p1 drop constraint p1_a_check;
> alter table p1 drop constraint p1_a_not_null;
> 
> I end up with column "a" still being not null, and the "p1_a_not_null"
> constraint still being there on foo, which seems even more
> counter-intuitive, because I just dropped that constraint, and it
> really should now be the "p2_a_not_null" constraint that makes "a" not
> null:

I can see that it might make sense to not inherit the constraint name in
some cases.  Perhaps:

1. never inherit a name.  Each table has its own constraint name always
2. only inherit if there's a single parent
3. always inherit the name from the first parent (current implementation)

> So I'd say that ALTER TABLE ... ADD NOT NULL should always add a
> constraint, even if there already is one. For example ALTER TABLE ...
> ADD UNIQUE does nothing to prevent multiple unique constraints on the
> same column(s). It seems pretty dumb, but maybe there is a reason to
> allow it, and it doesn't feel like we should be second-guessing what
> the user wants.

That was my initial implementation but I changed it to allowing a single
constraint because of the way the standard describes SET NOT NULL;
specifically, 11.15  says that "If the
column descriptor of C does not contain an indication that C is defined
as NOT NULL, then:" a constraint is added; otherwise (i.e., such an
indication does exist), nothing happens.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La virtud es el justo medio entre dos defectos" (Aristóteles)




Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Tomas Vondra



On 7/24/23 08:31, Amit Kapila wrote:
> On Thu, Jul 20, 2023 at 8:22 PM Tomas Vondra
>  wrote:
>>
>> OK, I merged the changes into the patches, with some minor changes to
>> the wording etc.
>>
> 
> I think we can do 0001-Make-test_decoding-ddl.out-shorter-20230720
> even without the rest of the patches. Isn't it a separate improvement?
> 

True.

> I see that origin filtering (origin=none) doesn't work with this
> patch. You can see this by using the following statements:
> Node-1:
> postgres=# create sequence s;
> CREATE SEQUENCE
> postgres=# create publication mypub for all sequences;
> CREATE PUBLICATION
> 
> Node-2:
> postgres=# create sequence s;
> CREATE SEQUENCE
> postgres=# create subscription mysub_sub connection '' publication
> mypub with (origin=none);
> NOTICE:  created replication slot "mysub_sub" on publisher
> CREATE SUBSCRIPTION
> postgres=# create publication mypub_sub for all sequences;
> CREATE PUBLICATION
> 
> Node-1:
> create subscription mysub_pub connection '...' publication mypub_sub
> with (origin=none);
> NOTICE:  created replication slot "mysub_pub" on publisher
> CREATE SUBSCRIPTION
> 
> SELECT nextval('s') FROM generate_series(1,100);
> 
> After that, you can check on the subscriber that sequences values are
> overridden with older values:
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
>  67 |   0 | t
> (1 row)
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
> 100 |   0 | t
> (1 row)
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
> 133 |   0 | t
> (1 row)
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
>  67 |   0 | t
> (1 row)
> 
> I haven't verified all the details but I think that is because we
> don't set XLOG_INCLUDE_ORIGIN while logging sequence values.
> 

Hmmm, yeah. I guess we'll need to set XLOG_INCLUDE_ORIGIN with
wal_level=logical.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Tomas Vondra



On 7/24/23 12:40, Amit Kapila wrote:
> On Wed, Jul 5, 2023 at 8:21 PM Ashutosh Bapat
>  wrote:
>>
>> 0005, 0006 and 0007 are all related to the initial sequence sync. [3]
>> resulted in 0007 and I think we need it. That leaves 0005 and 0006 to
>> be reviewed in this response.
>>
>> I followed the discussion starting [1] till [2]. The second one
>> mentions the interlock mechanism which has been implemented in 0005
>> and 0006. While I don't have an objection to allowing LOCKing a
>> sequence using the LOCK command, I am not sure whether it will
>> actually work or is even needed.
>>
>> The problem described in [1] seems to be the same as the problem
>> described in [2]. In both cases we see the sequence moving backwards
>> during CATCHUP. At the end of catchup the sequence is in the right
>> state in both the cases.
>>
> 
> I think we could see backward sequence value even after the catchup
> phase (after the sync worker is exited and or the state of rel is
> marked as 'ready' in pg_subscription_rel). The point is that there is
> no guarantee that we will process all the pending WAL before
> considering the sequence state is 'SYNCDONE' and or 'READY'. For
> example, after copy_sequence, I see values like:
> 
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
> 165 |   0 | t
> (1 row)
> postgres=# select nextval('s');
>  nextval
> -
>  166
> (1 row)
> postgres=# select nextval('s');
>  nextval
> -
>  167
> (1 row)
> postgres=# select currval('s');
>  currval
> -
>  167
> (1 row)
> 
> Then during the catchup phase:
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
>  33 |   0 | t
> (1 row)
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
>  66 |   0 | t
> (1 row)
> 
> postgres=# select * from pg_subscription_rel;
>  srsubid | srrelid | srsubstate | srsublsn
> -+-++---
>16394 |   16390 | r  | 0/16374E8
>16394 |   16393 | s  | 0/1637700
> (2 rows)
> 
> postgres=# select * from pg_subscription_rel;
>  srsubid | srrelid | srsubstate | srsublsn
> -+-++---
>16394 |   16390 | r  | 0/16374E8
>16394 |   16393 | r  | 0/1637700
> (2 rows)
> 
> Here Sequence relid id 16393. You can see sequence state is marked as ready.
> 
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
>  66 |   0 | t
> (1 row)
> 
> Even after that, see below the value of the sequence is still not
> caught up. Later, when the apply worker processes all the WAL, the
> sequence state will be caught up.
> 
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
> 165 |   0 | t
> (1 row)
> 
> So, there will be a window where the sequence won't be caught up for a
> certain period of time and any usage of it (even after the sync is
> finished) during that time could result in inconsistent behaviour.
> 

I'm rather confused about which node these queries are executed on.
Presumably some of it is on publisher, some on subscriber?

Can you create a reproducer (TAP test demonstrating this?) I guess it
might require adding some sleeps to hit the right timing ...

> The other question is whether it is okay to allow the sequence to go
> backwards even during the initial sync phase? The reason I am asking
> this question is that for the time sequence value moves backwards, one
> is allowed to use it on the subscriber which will result in using
> out-of-sequence values. For example, immediately, after copy_sequence
> the values look like this:
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
> 133 |  32 | t
> (1 row)
> postgres=# select nextval('s');
>  nextval
> -
>  134
> (1 row)
> postgres=# select currval('s');
>  currval
> -
>  134
> (1 row)
> 
> But then during the sync phase, it can go backwards and one is allowed
> to use it on the subscriber:
> postgres=# select * from s;
>  last_value | log_cnt | is_called
> +-+---
>  66 |   0 | t
> (1 row)
> postgres=# select nextval('s');
>  nextval
> -
>   67
> (1 row)
> 

Well, as for going back during the sync phase, I think the agreement was
that's acceptable, as we don't make guarantees about that. The question
is what's the state at the end of the sync (which I think leads to the
first part of your message).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Amit Kapila
On Wed, Jul 5, 2023 at 8:21 PM Ashutosh Bapat
 wrote:
>
> 0005, 0006 and 0007 are all related to the initial sequence sync. [3]
> resulted in 0007 and I think we need it. That leaves 0005 and 0006 to
> be reviewed in this response.
>
> I followed the discussion starting [1] till [2]. The second one
> mentions the interlock mechanism which has been implemented in 0005
> and 0006. While I don't have an objection to allowing LOCKing a
> sequence using the LOCK command, I am not sure whether it will
> actually work or is even needed.
>
> The problem described in [1] seems to be the same as the problem
> described in [2]. In both cases we see the sequence moving backwards
> during CATCHUP. At the end of catchup the sequence is in the right
> state in both the cases.
>

I think we could see backward sequence value even after the catchup
phase (after the sync worker is exited and or the state of rel is
marked as 'ready' in pg_subscription_rel). The point is that there is
no guarantee that we will process all the pending WAL before
considering the sequence state is 'SYNCDONE' and or 'READY'. For
example, after copy_sequence, I see values like:

postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
165 |   0 | t
(1 row)
postgres=# select nextval('s');
 nextval
-
 166
(1 row)
postgres=# select nextval('s');
 nextval
-
 167
(1 row)
postgres=# select currval('s');
 currval
-
 167
(1 row)

Then during the catchup phase:
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
 33 |   0 | t
(1 row)
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
 66 |   0 | t
(1 row)

postgres=# select * from pg_subscription_rel;
 srsubid | srrelid | srsubstate | srsublsn
-+-++---
   16394 |   16390 | r  | 0/16374E8
   16394 |   16393 | s  | 0/1637700
(2 rows)

postgres=# select * from pg_subscription_rel;
 srsubid | srrelid | srsubstate | srsublsn
-+-++---
   16394 |   16390 | r  | 0/16374E8
   16394 |   16393 | r  | 0/1637700
(2 rows)

Here Sequence relid id 16393. You can see sequence state is marked as ready.

postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
 66 |   0 | t
(1 row)

Even after that, see below the value of the sequence is still not
caught up. Later, when the apply worker processes all the WAL, the
sequence state will be caught up.

postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
165 |   0 | t
(1 row)

So, there will be a window where the sequence won't be caught up for a
certain period of time and any usage of it (even after the sync is
finished) during that time could result in inconsistent behaviour.

The other question is whether it is okay to allow the sequence to go
backwards even during the initial sync phase? The reason I am asking
this question is that for the time sequence value moves backwards, one
is allowed to use it on the subscriber which will result in using
out-of-sequence values. For example, immediately, after copy_sequence
the values look like this:
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
133 |  32 | t
(1 row)
postgres=# select nextval('s');
 nextval
-
 134
(1 row)
postgres=# select currval('s');
 currval
-
 134
(1 row)

But then during the sync phase, it can go backwards and one is allowed
to use it on the subscriber:
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
 66 |   0 | t
(1 row)
postgres=# select nextval('s');
 nextval
-
  67
(1 row)

-- 
With Regards,
Amit Kapila.




Re: Issue in _bt_getrootheight

2023-07-24 Thread Gurjeet Singh
On Fri, Jul 21, 2023 at 10:42 AM Tom Lane  wrote:
>
> Gurjeet Singh  writes:
> > Please see attached the patch that does this. Let me know if this patch 
> > helps.
>
> I don't like this patch one bit, because it adds a lot of overhead
> (i.e., an extra index_open/close cycle for every btree index in every
> query) to support a tiny minority use-case.

I anticipated the patch's performance impact may be a concern, but
before addressing it I wanted to see if the patch actually helped
Index Adviser. Ahmed has confirmed that my proposed patch works for
him.

I believe the additional index_open() would not affect the performance
significantly, since the very same indexes were index_open()ed just
before calling the get_relation_info_hook. All the relevant caches
would be quite fresh because of the index_open() in the same function
above. And since the locks taken on these indexes haven't been
released, we don't have to work hard to take any new locks (hence the
index_open() with NoLock flag).

> How come we don't
> already know whether the index is hypothetical at the point where
> _bt_getrootheight is called now?

Because the 'hypthetical' flag is not stored in catalogs, and that's
okay; see below.

At that point, the only indication that an index may be a hypothetical
index is if RelationGetNumberOfBlocks() returns 0 for it, and that's
what Ahmed's proposed patch relied on. But I think extrapolating that
info->pages==0 implies it's a hypothetical index, is stretching that
assumption too far.

> Actually, looking at the existing comment at the call site:
>
> /*
>  * Allow a plugin to editorialize on the info we obtained from the
>  * catalogs.  Actions might include altering the assumed relation size,
>  * removing an index, or adding a hypothetical index to the indexlist.
>  */
> if (get_relation_info_hook)
> (*get_relation_info_hook) (root, relationObjectId, inhparent, rel);
>
> reminds me that the design intention was that hypothetical indexes
> would get added to the list by get_relation_info_hook itself.
> If that's not how the index adviser is operating, maybe we need
> to have a discussion about that.

Historically, to avoid having to hand-create the IndexOptInfo and risk
getting something wrong, the Index Adviser has used index_create() to
create a full-blown btree index, (sans that actual build step, with
skip_build = true), and saving the returned OID. This ensured that all
the catalog entries were in place before it called the
standard_planner(). This way Postgres would build IndexOptInfo from
the entries in the catalog, as usual. Then, inside the
get_relation_info_hook() callback, Index Adviser identifies these
virtual indexes by their OID, and at that point marks them with
hypothetical=true.

After planning is complete, the Index Adviser scans the plan to find
any IndexScan objects that have indexid matching the saved OIDs.

Index Adviser performs the whole thing in a subtransaction, which gets
rolled back. So the hypothetical indexes are not visible to any other
transaction, ever.

Assigning OID to a hypothetical index is necessary, and I believe
index_create() is the right way to do it. In fact, in the 9.1 cycle
there was a bug fixed (a2095f7fb5, where the hypothetical flag was
also invented), to solve precisely this problem; to allow the Index
Adviser to use OIDs to identify hypothetical indexes that were
used/chosen by the planner.

But now I believe this architecture of the Index Adviser needs to
change, primarily to alleviate the performance impact of creating
catalog entries, subtransaction overhead, and the catalog bloat caused
by index_create() (and then rolling back the subtransaction). As part
of this architecture change, the Index Adviser will have to cook up
IndexOptInfo objects and append them to the relation. And that should
line up with the design intention you mention.

But the immediate blocker is how to assign OIDs to the hypothetical
indexes so that all hypothetical indexes chosen by the planner can be
identified by the Index Adviser. I'd like the Index Adviser to work on
read-only /standby nodes as well, but that wouldn't be possible
because calling GetNewObjectId() is not allowed during recovery. I see
that HypoPG uses a neat little hack [1]. Perhaps Index Adviser will
also have to resort to that trick.

[1]: hypo_get_min_fake_oid() finds the usable oid range below
FirstNormalObjectId
https://github.com/HypoPG/hypopg/blob/57d832ce7a2937fe7d42b113c7e95dd1f129795b/hypopg.c#L458

Best regards,
Gurjeet
http://Gurje.et




Re: cataloguing NOT NULL constraints

2023-07-24 Thread Alvaro Herrera
On 2023-Jul-24, Dean Rasheed wrote:

> Hmm, I don't particularly like that approach, because I think it will
> be difficult to cram any additional details into the table, and also I
> don't know whether having multiple not null constraints for a
> particular column can be entirely ruled out.
> 
> I may well be in the minority here, but I think the best way is to
> list them in a separate footer section, in the same way as CHECK
> constraints, allowing other constraint properties to be included. So
> it might look something like:

That's the first thing I proposed actually.  I got one vote down from
Robert Haas[1], but while the idea seems to have had support from Justin
Pryzby (in \dt++) [2] and definitely did from Peter Eisentraut [3], I do
not like it too much myself, mainly because the partition list has a
very similar treatment and I find that one an annoyance.

> and also I don't know whether having multiple not null constraints for
> a particular column can be entirely ruled out.

I had another look at the standard.  In 11.26 () it says that "If [the constraint being removed]
causes some column COL to be known not nullable and no other constraint
causes COL to be known not nullable, then the nullability characteristic
of the column descriptor of COL is changed to possibly nullable".  Which
supports the idea that there might be multiple such constraints.
(However, we could also read this as meaning that the PK could be one
such constraint while NOT NULL is another one.)

However, 11.16 ( as part of 11.12 ), says that DROP NOT NULL causes the indication of
the column as NOT NULL to be removed.  This, to me, says that if you do
have multiple such constraints, you'd better remove them all with that
command.  All in all, I lean towards allowing just one as best as we
can.

[1] 
https://postgr.es/m/ca+tgmobnoxt83y1qesbnvarhfm-flwwkduyiv84e+psaydw...@mail.gmail.com
[2] https://postgr.es/m/20230301223214.GC4268%40telsasoft.com
[3] https://postgr.es/m/1c4f3755-2d10-cae9-647f-91a9f006410e%40enterprisedb.com

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
“Cuando no hay humildad las personas se degradan” (A. Christie)




Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Bharath Rupireddy
On Mon, Jul 24, 2023 at 12:01 PM Amit Kapila  wrote:
>
> On Thu, Jul 20, 2023 at 8:22 PM Tomas Vondra
>  wrote:
> >
> > OK, I merged the changes into the patches, with some minor changes to
> > the wording etc.
> >
>
> I think we can do 0001-Make-test_decoding-ddl.out-shorter-20230720
> even without the rest of the patches. Isn't it a separate improvement?

+1. Yes, it can go separately. It would even be better if the test can
be modified to capture the toasted data into a psql variable before
insert into the table, and compare it with output of
pg_logical_slot_get_changes.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: cataloguing NOT NULL constraints

2023-07-24 Thread Dean Rasheed
Something else I noticed: the error message from ALTER TABLE ... ADD
CONSTRAINT in the case of a duplicate constraint name is not very
friendly:

ERROR:  duplicate key value violates unique constraint
"pg_constraint_conrelid_contypid_conname_index"
DETAIL:  Key (conrelid, contypid, conname)=(16540, 0, nn) already exists.

To match the error message for other constraint types, this should be:

ERROR:  constraint "nn" for relation "foo" already exists

Regards,
Dean




Re: cataloguing NOT NULL constraints

2023-07-24 Thread Dean Rasheed
On Thu, 20 Jul 2023 at 16:31, Alvaro Herrera  wrote:
>
> On 2023-Jul-13, Dean Rasheed wrote:
>
> > Something else I noticed is that the result from "ALTER TABLE ...
> > ALTER COLUMN ... DROP NOT NULL" is no longer easily predictable -- if
> > there are multiple NOT NULL constraints on the column, it just drops
> > one (chosen at random?) and leaves the others. I think that it should
> > either drop all the constraints, or throw an error. Either way, I
> > would expect that if DROP NOT NULL succeeds, the result is that the
> > column is nullable.
>
> Hmm, there shouldn't be multiple NOT NULL constraints for the same
> column; if there's one, a further SET NOT NULL should do nothing.  At
> some point the code was creating two constraints, but I realized that
> trying to support multiple constraints caused other problems, and it
> seems to serve no purpose, so I removed it.  Maybe there are ways to end
> up with multiple constraints, but at this stage I would say that those
> are bugs to be fixed, rather than something we want to keep.
>

Hmm, I'm not so sure. I think perhaps multiple NOT NULL constraints on
the same column should just be allowed, otherwise things might get
confusing. For example:

create table p1 (a int not null check (a > 0));
create table p2 (a int not null check (a > 0));
create table foo () inherits (p1, p2);

causes foo to have 2 CHECK constraints, but only 1 NOT NULL constraint:

\d foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   | not null |
Check constraints:
"p1_a_check" CHECK (a > 0)
"p2_a_check" CHECK (a > 0)
Inherits: p1,
  p2

select conname from pg_constraint where conrelid = 'foo'::regclass;
conname
---
 p1_a_not_null
 p2_a_check
 p1_a_check
(3 rows)

which I find a little counter-intuitive / inconsistent. If I then drop
the p1 constraints:

alter table p1 drop constraint p1_a_check;
alter table p1 drop constraint p1_a_not_null;

I end up with column "a" still being not null, and the "p1_a_not_null"
constraint still being there on foo, which seems even more
counter-intuitive, because I just dropped that constraint, and it
really should now be the "p2_a_not_null" constraint that makes "a" not
null:

\d foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   | not null |
Check constraints:
"p2_a_check" CHECK (a > 0)
Inherits: p1,
  p2

select conname from pg_constraint where conrelid = 'foo'::regclass;
conname
---
 p1_a_not_null
 p2_a_check
(2 rows)

I haven't thought through various other cases in any detail, but I
can't help feeling that it would be simpler and more logical /
consistent to just allow multiple NOT NULL constraints on a column,
rather than trying to enforce a rule that only one is allowed. That
way, I think it would be easier for the user to keep track of why a
column is not null.

So I'd say that ALTER TABLE ... ADD NOT NULL should always add a
constraint, even if there already is one. For example ALTER TABLE ...
ADD UNIQUE does nothing to prevent multiple unique constraints on the
same column(s). It seems pretty dumb, but maybe there is a reason to
allow it, and it doesn't feel like we should be second-guessing what
the user wants.

Regards,
Dean




Re: POC: GROUP BY optimization

2023-07-24 Thread Tomas Vondra
On 7/24/23 04:10, Andrey Lepikhov wrote:
> On 20/7/2023 18:46, Tomas Vondra wrote:
>> On 7/20/23 08:37, Andrey Lepikhov wrote:
>>> On 3/10/2022 21:56, Tom Lane wrote:
 Revert "Optimize order of GROUP BY keys".

 This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
 several follow-on fixes.
 ...
 Since we're hard up against the release deadline for v15, let's
 revert these changes for now.  We can always try again later.
>>>
>>> It may be time to restart the project. As a first step, I rebased the
>>> patch on the current master. It wasn't trivial because of some latest
>>> optimizations (a29eab, 1349d27 and 8d83a5d).
>>> Now, Let's repeat the review and rewrite the current path according to
>>> the reasons uttered in the revert commit.
>>
>> I think the fundamental task is to make the costing more reliable, and
>> the commit message 443df6e2db points out a couple challenges in this
>> area. Not sure how feasible it is to address enough of them ...
>>
>> 1) procost = 1.0 - I guess we could make this more realistic by doing
>> some microbenchmarks and tuning the costs for the most expensive cases.
>>
>> 2) estimating quicksort comparisons - This relies on ndistinct
>> estimates, and I'm not sure how much more reliable we can make those.
>> Probably not much :-( Not sure what to do about this, the only thing I
>> can think of is to track "reliability" of the estimates and only do the
>> reordering if we have high confidence in the estimates. That means we'll
>> miss some optimization opportunities, but it should limit the risk.
> I read up on the history of this thread.
> As I see, all the problems mentioned above can be beaten by excluding
> the new cost model at all. We can sort GROUP BY columns according to the
> 'ndistinct' value.
> I see the reason for introducing the cost model in [1]. The main
> supporting point here is that with this patch, people couldn't optimize
> the query by themselves, organizing the order of the columns in a more
> optimal way. But now we have at least the GUC to switch off the
> behaviour introduced here. Also, some extensions, like the well-known
> pg_hint_plan, can help with automation.

I think the main concern is that if we reorder the group keys and get it
wrong, it's a regression. If that happens often (due to costing based on
poor stats), it's a problem. Yes, there's a GUC, but that's a rather
blunt instrument, unfortunately.

> So, how about committing of the undoubted part of the feature and
> working on the cost model in a new thread?
> 

But Tom's commit message says this:

Worse, to arrive at estimates of the number of calls made to the
lower-order-column comparison functions, the code needs to make
estimates of the numbers of distinct values of multiple columns,
which are necessarily even less trustworthy than per-column stats.

so I'm not sure this really counts as "undoubted".


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: multiple membership grants and information_schema.applicable_roles

2023-07-24 Thread Pavel Luzanov

On 24.07.2023 09:42, Pavel Luzanov wrote:

Is IS_GRANTABLE a key column for ROLE_AUTHORIZATION_DESCRIPTORS?
If not, duplicates is not possible. Right?


The answer is: no.
Duplicate pairs (grantee, role_name) is impossible only with defined key 
with this two columns.

If there is no such key or key contain another column, for example grantor,
then the information_schema.applicable_roles view definition is correct 
in this part.


--
Pavel Luzanov
Postgres Professional: https://postgrespro.com





Re: cataloguing NOT NULL constraints

2023-07-24 Thread Dean Rasheed
On Thu, 20 Jul 2023 at 16:31, Alvaro Herrera  wrote:
>
> On 2023-Jul-13, Dean Rasheed wrote:
>
> > I see that it's already been discussed, but I don't like the fact that
> > there is no way to get hold of the new constraint names in psql. I
> > think for the purposes of dropping named constraints, and also
> > possible future stuff like NOT VALID / DEFERRABLE constraints, having
> > some way to get their names will be important.
>
> Yeah, so there are two proposals:
>
> 1. Have \d+ replace the "not null" literal in the \d+ display with the
> constraint name; if the column is not nullable because of the primary
> key, it says "(primary key)" instead.  There's a patch for this in the
> thread somewhere.
>
> 2. Same, but use \d++ for this purpose
>
> Using ++ would be a novelty in psql, so I'm hesitant to make that an
> integral part of the current proposal.  However, to me (2) seems to most
> comfortable way forward, because while you're right that people do need
> the constraint name from time to time, this is very seldom the case, so
> polluting \d+ might inconvenience people for not much gain.  And people
> didn't like having the constraint name in \d+.
>
> Do you have an opinion on these ideas?
>

Hmm, I don't particularly like that approach, because I think it will
be difficult to cram any additional details into the table, and also I
don't know whether having multiple not null constraints for a
particular column can be entirely ruled out.

I may well be in the minority here, but I think the best way is to
list them in a separate footer section, in the same way as CHECK
constraints, allowing other constraint properties to be included. So
it might look something like:

\d foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   | not null |
 b  | integer |   | not null |
 c  | integer |   | not null |
 d  | integer |   | not null |
Indexes:
"foo_pkey" PRIMARY KEY, btree (a, b)
Check constraints:
"foo_a_check" CHECK (a > 0)
"foo_b_check" CHECK (b > 0) NO INHERIT NOT VALID
Not null constraints:
"foo_c_not_null" NOT NULL c
"foo_d_not_null" NOT NULL d NO INHERIT

As for CHECK constraints, the contents of each constraint line would
match the "table_constraint" SQL syntax needed to reconstruct the
constraint. Doing it this way allows for things like NOT VALID and
DEFERRABLE to be added in the future.

I think that's probably too verbose for a plain "\d", but I think it
would be OK with "\d+".

Regards,
Dean




Re: Support worker_spi to execute the function dynamically.

2023-07-24 Thread Michael Paquier
On Mon, Jul 24, 2023 at 01:50:45PM +0530, Bharath Rupireddy wrote:
> I disagree with removing SQL tests from the worker_spi module. As said
> upthread, it makes the worker_spi a fully demonstrable
> extension/module - one can just take it, start adding required
> functionality and test-cases (both SQL and TAP) for a new module.

Which is basically the same thing with TAP except that these are
grouped now?  The value of a few raw SQL queries with a
NO_INSTALLCHECK does not strike me as enough on top of having to
maintain two different sets of tests.  I'd still choose the cheap and
extensible path here.

> I agree that moving to TAP tests will reduce test run time by 1.9
> seconds, but to me personally this is not an optimization we must be
> doing at the expense of demonstrability.

In a large parallel run, the difference can be felt.
--
Michael


signature.asc
Description: PGP signature


Re: Support worker_spi to execute the function dynamically.

2023-07-24 Thread Bharath Rupireddy
On Mon, Jul 24, 2023 at 1:10 PM Michael Paquier  wrote:
>
> On Mon, Jul 24, 2023 at 08:31:01AM +0530, Bharath Rupireddy wrote:
> > I also added a note atop worker_spi.c that the module also
> > demonstrates how to write core (SQL) tests and extended (TAP) tests.
>
> In terms of runtime the benefits are here for me.  Note that with the
> first part of the test (previously in sql/), we don't lose coverage
> with the loop of the workers so I agree that only checking that these
> are launched is OK once worker_spi is in shared_preload_libraries.
> However, I think that we should make sure that they are connected to
> the correct database 'mydb'.  I have updated the test to do that.
>
> So, what do you think about the attached?

I disagree with removing SQL tests from the worker_spi module. As said
upthread, it makes the worker_spi a fully demonstrable
extension/module - one can just take it, start adding required
functionality and test-cases (both SQL and TAP) for a new module. I
agree that moving to TAP tests will reduce test run time by 1.9
seconds, but to me personally this is not an optimization we must be
doing at the expense of demonstrability.

Having said that, others might have a different opinion here.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [BUG] Crash on pgbench initialization.

2023-07-24 Thread Michael Paquier
On Sun, Jul 23, 2023 at 11:21:47PM +0300, Anton A. Melnikov wrote:
> After some research, found this happens when the LimitAdditionalPins() 
> returns exactly zero.
> In the current master, this will happen e.g. if shared_buffers = 10MB and 
> max_worker_processes = 40.
> Then the command "pgbench --initialize postgres" will lead to crash.
> See the backtrace attached.
> 
> There is a fix in the patch applied. Please take a look on it.

Nice catch, issue reproduced here so I am adding an open item for now.
(I have not looked at the patch, yet.)
--
Michael


signature.asc
Description: PGP signature


Re: Support worker_spi to execute the function dynamically.

2023-07-24 Thread Michael Paquier
On Mon, Jul 24, 2023 at 08:31:01AM +0530, Bharath Rupireddy wrote:
> I also added a note atop worker_spi.c that the module also
> demonstrates how to write core (SQL) tests and extended (TAP) tests.

The value of the SQL tests comes down to the DO blocks that emulate
what the TAP tests could equally be able to do.  While we already have
some places that do something similar (slot.sql or postgres_fdw.sql),
the SQL tests of worker_spi count for a total of five queries, which
is not much with one cluster initialized:
- One pg_reload_conf() to work a loop to happen in the worker.
- Two sanity checks.
- Two wait emulations.

Anyway, most people that do serious hacking on this list care about
the runtime of the tests all the time, and I am not on board in making
things slower for the sake of showing a test example here
particularly if there are ways to make them faster (long-term, we
should be able to do the init step only once for most cases), and
because we *have to* switch to TAP to have more advanced scenarios for
the custom wait events or just dynamic work launches based on what we
set on shared_preload_libraries.  On top of that, we have other
examples in the tree that emulate waits for plain SQL tests to satisfy
assumptions with some follow-up query.

So, I don't really agree with the value gained here compared to the
execution cost of initializing two clusters for this module.  I have
taken the time to check how the runtime changes when switching to TAP
for all the scenarios discussed here, and from my laptop, I can see
that:
- HEAD takes 4.4s, for only the sql/ test.
- Your latest patch is at 5.6s.
- My version attached to this message is at 3.7s.

In terms of runtime the benefits are here for me.  Note that with the
first part of the test (previously in sql/), we don't lose coverage
with the loop of the workers so I agree that only checking that these
are launched is OK once worker_spi is in shared_preload_libraries.
However, I think that we should make sure that they are connected to
the correct database 'mydb'.  I have updated the test to do that.

So, what do you think about the attached?
--
Michael
From ea9f1b905723716037d9f44b9afbcd05075e9b27 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 24 Jul 2023 16:37:10 +0900
Subject: [PATCH v4] Add TAP tests to worker_spi module

This commit adds TAP tests to check if worker_spi starts background
workers as expected.  sql/worker_spi.sql is gone, replaced by an
equivalent in TAP.

While here, this commit changes the names of bg workers to
distinguish static and dynamic bg workers.
---
 src/test/modules/worker_spi/Makefile  |  8 +-
 src/test/modules/worker_spi/dynamic.conf  |  2 -
 .../worker_spi/expected/worker_spi.out| 50 -
 src/test/modules/worker_spi/meson.build   |  9 +--
 .../modules/worker_spi/sql/worker_spi.sql | 35 -
 .../modules/worker_spi/t/001_worker_spi.pl| 75 +++
 src/test/modules/worker_spi/worker_spi.c  |  8 +-
 7 files changed, 83 insertions(+), 104 deletions(-)
 delete mode 100644 src/test/modules/worker_spi/dynamic.conf
 delete mode 100644 src/test/modules/worker_spi/expected/worker_spi.out
 delete mode 100644 src/test/modules/worker_spi/sql/worker_spi.sql
 create mode 100644 src/test/modules/worker_spi/t/001_worker_spi.pl

diff --git a/src/test/modules/worker_spi/Makefile b/src/test/modules/worker_spi/Makefile
index cbf9b2e37f..024b34cdbb 100644
--- a/src/test/modules/worker_spi/Makefile
+++ b/src/test/modules/worker_spi/Makefile
@@ -6,13 +6,7 @@ EXTENSION = worker_spi
 DATA = worker_spi--1.0.sql
 PGFILEDESC = "worker_spi - background worker example"
 
-REGRESS = worker_spi
-
-# enable our module in shared_preload_libraries for dynamic bgworkers
-REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/worker_spi/dynamic.conf
-
-# Disable installcheck to ensure we cover dynamic bgworkers.
-NO_INSTALLCHECK = 1
+TAP_TESTS = 1
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/src/test/modules/worker_spi/dynamic.conf b/src/test/modules/worker_spi/dynamic.conf
deleted file mode 100644
index bfe015f664..00
--- a/src/test/modules/worker_spi/dynamic.conf
+++ /dev/null
@@ -1,2 +0,0 @@
-shared_preload_libraries = worker_spi
-worker_spi.database = contrib_regression
diff --git a/src/test/modules/worker_spi/expected/worker_spi.out b/src/test/modules/worker_spi/expected/worker_spi.out
deleted file mode 100644
index dc0a79bf75..00
--- a/src/test/modules/worker_spi/expected/worker_spi.out
+++ /dev/null
@@ -1,50 +0,0 @@
-CREATE EXTENSION worker_spi;
-SELECT worker_spi_launch(4) IS NOT NULL;
- ?column? 
---
- t
-(1 row)
-
--- wait until the worker completes its initialization
-DO $$
-DECLARE
-	visible bool;
-	loops int := 0;
-BEGIN
-	LOOP
-		visible := table_name IS NOT NULL
-			FROM information_schema.tables
-			WHERE table_schema = 'schema4' AND table_name = 'counted';
-		IF visible OR loops > 120 * 10 THEN EXIT; END IF;
-		PERFORM 

Re: Support worker_spi to execute the function dynamically.

2023-07-24 Thread Masahiro Ikeda

On 2023-07-24 12:01, Bharath Rupireddy wrote:

I'm attaching the v3 patch.


I verified it works and it looks good to me.
Thanks to your work, I will be able to implement tests for
custom wait events.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: multiple membership grants and information_schema.applicable_roles

2023-07-24 Thread Pavel Luzanov

On 23.07.2023 23:03, Tom Lane wrote:

CREATE RECURSIVE VIEW APPLICABLE_ROLES ( GRANTEE, ROLE_NAME, IS_GRANTABLE ) AS
 ( ( SELECT GRANTEE, ROLE_NAME, IS_GRANTABLE
 FROM DEFINITION_SCHEMA.ROLE_AUTHORIZATION_DESCRIPTORS
 WHERE ( GRANTEE IN
 ( CURRENT_USER, 'PUBLIC' )
  OR
 GRANTEE IN
 ( SELECT ROLE_NAME
   FROM ENABLED_ROLES ) ) )
   UNION
   ( SELECT RAD.GRANTEE, RAD.ROLE_NAME, RAD.IS_GRANTABLE
 FROM DEFINITION_SCHEMA.ROLE_AUTHORIZATION_DESCRIPTORS RAD
   JOIN
  APPLICABLE_ROLES R
 ON
RAD.GRANTEE = R.ROLE_NAME ) );

The UNION would remove rows only when they are duplicates across all
three columns.


Hm, I think there is one more thing to check in the SQL standard.
Is IS_GRANTABLE a key column for ROLE_AUTHORIZATION_DESCRIPTORS?
If not, duplicates is not possible. Right?

Can't check now, since I don't have access to the SQL standard definition.


I do see what seems like a different issue: the standard appears to expect
that indirect role grants should also be shown (via the recursive CTE),
and we are not doing that.


I noticed this, but the view stays unchanged so long time.
I thought it was done intentionally.

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com





Re: logical decoding and replication of sequences, take 2

2023-07-24 Thread Amit Kapila
On Thu, Jul 20, 2023 at 8:22 PM Tomas Vondra
 wrote:
>
> OK, I merged the changes into the patches, with some minor changes to
> the wording etc.
>

I think we can do 0001-Make-test_decoding-ddl.out-shorter-20230720
even without the rest of the patches. Isn't it a separate improvement?

I see that origin filtering (origin=none) doesn't work with this
patch. You can see this by using the following statements:
Node-1:
postgres=# create sequence s;
CREATE SEQUENCE
postgres=# create publication mypub for all sequences;
CREATE PUBLICATION

Node-2:
postgres=# create sequence s;
CREATE SEQUENCE
postgres=# create subscription mysub_sub connection '' publication
mypub with (origin=none);
NOTICE:  created replication slot "mysub_sub" on publisher
CREATE SUBSCRIPTION
postgres=# create publication mypub_sub for all sequences;
CREATE PUBLICATION

Node-1:
create subscription mysub_pub connection '...' publication mypub_sub
with (origin=none);
NOTICE:  created replication slot "mysub_pub" on publisher
CREATE SUBSCRIPTION

SELECT nextval('s') FROM generate_series(1,100);

After that, you can check on the subscriber that sequences values are
overridden with older values:
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
 67 |   0 | t
(1 row)
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
100 |   0 | t
(1 row)
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
133 |   0 | t
(1 row)
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
 67 |   0 | t
(1 row)

I haven't verified all the details but I think that is because we
don't set XLOG_INCLUDE_ORIGIN while logging sequence values.

-- 
With Regards,
Amit Kapila.