Re: Regression in COPY FROM caused by 9f8377f7a2

2023-09-25 Thread Laurenz Albe
On Mon, 2023-09-25 at 17:49 -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 2023-09-25 Mo 11:06, Andrew Dunstan wrote:
> > > On 2023-09-25 Mo 04:59, Laurenz Albe wrote:
> > > > CREATE TABLE boom (t character varying(5) DEFAULT 'a long string');
> 
> > Thinking about this a little more, wouldn't it be better if we checked 
> > at the time we set the default that the value is actually valid for the 
> > given column? This is only one manifestation of a problem you could run 
> > into given this table definition.
> 
> I dunno, it seems at least possible that someone would do this
> deliberately as a means of preventing the column from being defaulted.
> In any case, the current behavior has stood for a very long time and
> no one has complained that an error should be thrown sooner.

Moreover, this makes restoring a pg_dump from v15 to v16 fail, which
should never happen.  This is how I got that bug report.

Yours,
Laurenz Albe




Re: Incorrect handling of OOM in WAL replay leading to data loss

2023-09-25 Thread Michael Paquier
On Thu, Aug 10, 2023 at 02:59:07PM +0900, Michael Paquier wrote:
> My apologies if I sounded unclear here.  It seems to me that we should
> wrap the patch on [1] first, and get it backpatched.  At least that
> makes for less conflicts when 0001 gets merged for HEAD when we are
> able to set a proper error code.  (Was looking at it, actually.)

Now that Thomas Munro has addressed the original problem to be able to
trust correctly xl_tot_len with bae868caf22, I am coming back to this
thread.

First, attached is a rebased set:
- 0001 to introduce the new error infra for xlogreader.c with an error
code, so as callers can make the difference between an OOM and an
invalid record.
- 0002 to tweak the startup process.  Once again, I've taken the
approach to make the startup process behave like a standby on crash
recovery: each time that an OOM is found, we loop and retry.
- 0003 to emulate an OOM failure, that can be used with the script
attached to see that we don't stop recovery too early.

>>> I guess that it depends on how much responsiveness one may want.
>>> Forcing a failure on OOM is at least something that users would be
>>> immediately able to act on when we don't run a standby but just
>>> recover from a crash, while a standby would do what it is designed to
>>> do, aka continue to replay what it can see.  One issue with the
>>> wait-and-continue is that a standby may loop continuously on OOM,
>>> which could be also bad if there's a replication slot retaining WAL on
>>> the primary.  Perhaps that's just OK to keep doing that for a
>>> standby.  At least this makes the discussion easier for the sake of
>>> this thread: just consider the case of crash recovery when we don't
>>> have a standby.
>> 
>> Yeah, I'm with you on focusing on crash recovery cases; that's what I
>> intended. Sorry for any confusion.
> 
> Okay, so we're on the same page here, keeping standbys as they are and
> do something for the crash recovery case.

For the crash recovery case, one argument that stood out in my mind is
that causing a hard failure has the disadvantage to force users to do
again WAL replay from the last redo position, which may be far away
even if the checkpointer now runs during crash recovery.  What I am
proposing on this thread has the merit to avoid that.  Anyway, let's
discuss more before settling this point for the crash recovery case.

By the way, anything that I am proposing here cannot be backpatched
because of the infrastructure changes required in walreader.c, so I am
going to create a second thread with something that could be
backpatched (yeah, likely FATALs on OOM to stop recovery from doing
something bad)..
--
Michael
From 3d7bb24fc2f9f070273b63208819c4e54e428d18 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 26 Sep 2023 15:40:05 +0900
Subject: [PATCH v4 1/3] Add infrastructure to report error codes in WAL reader

This commits moves the error state coming from WAL readers into a new
structure, that includes the existing pointer to the error message
buffer, but it also gains an error code that fed back to the callers of
the following routines:
XLogPrefetcherReadRecord()
XLogReadRecord()
XLogNextRecord()
DecodeXLogRecord()

This will help in improving the decisions to take during recovery
depending on the failure more reported.
---
 src/include/access/xlogprefetcher.h   |   2 +-
 src/include/access/xlogreader.h   |  33 +++-
 src/backend/access/transam/twophase.c |   8 +-
 src/backend/access/transam/xlog.c |   6 +-
 src/backend/access/transam/xlogprefetcher.c   |   4 +-
 src/backend/access/transam/xlogreader.c   | 170 --
 src/backend/access/transam/xlogrecovery.c |  14 +-
 src/backend/access/transam/xlogutils.c|   2 +-
 src/backend/replication/logical/logical.c |   9 +-
 .../replication/logical/logicalfuncs.c|   9 +-
 src/backend/replication/slotfuncs.c   |   8 +-
 src/backend/replication/walsender.c   |   8 +-
 src/bin/pg_rewind/parsexlog.c |  24 +--
 src/bin/pg_waldump/pg_waldump.c   |  10 +-
 contrib/pg_walinspect/pg_walinspect.c |  11 +-
 src/tools/pgindent/typedefs.list  |   2 +
 16 files changed, 201 insertions(+), 119 deletions(-)

diff --git a/src/include/access/xlogprefetcher.h b/src/include/access/xlogprefetcher.h
index 7dd7f20ad0..5563ad1a67 100644
--- a/src/include/access/xlogprefetcher.h
+++ b/src/include/access/xlogprefetcher.h
@@ -48,7 +48,7 @@ extern void XLogPrefetcherBeginRead(XLogPrefetcher *prefetcher,
 	XLogRecPtr recPtr);
 
 extern XLogRecord *XLogPrefetcherReadRecord(XLogPrefetcher *prefetcher,
-			char **errmsg);
+			XLogReaderError *errordata);
 
 extern void XLogPrefetcherComputeStats(XLogPrefetcher *prefetcher);
 
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index da32c7db77..06664dc6fb 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogre

Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set

2023-09-25 Thread Daniel Gustafsson
> On 26 Sep 2023, at 00:20, Nathan Bossart  wrote:
> 
> On Thu, Sep 21, 2023 at 11:18:00AM +0900, bt23nguyent wrote:
>> -basic_archive_configured(ArchiveModuleState *state)
>> +basic_archive_configured(ArchiveModuleState *state, char **logdetail)
> 
> Could we do something more like GUC_check_errdetail() instead to maintain
> backward compatibility with v16?

We'd still need something exported to call into which isn't in 16, so it
wouldn't be more than optically backwards compatible since a module written for
17 won't compile for 16, or am I missing something?

--
Daniel Gustafsson





Re: [PGdocs] fix description for handling pf non-ASCII characters

2023-09-25 Thread Karl O. Pinc
Hello Hayato and Jian,

On Tue, 4 Jul 2023 01:30:56 +
"Hayato Kuroda (Fujitsu)"  wrote:

> Dear Jian,

> > looks fine. Do you need to add to commitfest?  
> 
> Thank you for your confirmation. ! I registered to following:
> 
> https://commitfest.postgresql.org/44/4437/

The way the Postgres commitfest process works is that
someone has to update the page to mark "reviewed" and the
reviewer has to use the commitfest website to pass
the patches to a committer.

I see a few problems with the English and style of the patches
and am commenting below and have signed up as a reviewer.  At
commitfest.postgresql.org I have marked the thread
as needing author attention.  Hayato, you will need
to mark the thread as needing review when you have
replied to this message.

Jian, you might want to sign on as a reviewer as well.
It can be nice to have that record of your participation.

Now, to reviewing the patch:

First, it is now best practice in the PG docs to
put a line break at the end of each sentence.
At least for the sentences on the lines you change.
(No need to update the whole document!)  Please
do this in the next version of your patch.  I don't
know if this is a requirement for acceptance by
a committer, but it won't hurt.

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e700782d3c..a4ce99ba4d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7040,9 +7040,8 @@ local0.*/var/log/postgresql
 The name will be displayed in the
pg_stat_activity view and included in CSV log
entries.  It can also be included in regular log entries via the  parameter.
-Only printable ASCII characters may be used in the
-application_name value. Other characters
will be
-replaced with question marks (?).
+Non-ASCII characters used in the
application_name
+will be replaced with hexadecimal strings.

   
  

Don't use the future tense to describe the system's behavior.  Instead
of "will be" write "are".  (Yes, this change would be an improvement
on the original text.  We should fix it while we're working on it
and our attention is focused.)

It is more accurate, if I understand the issue, to say that characters
are replaced with hexadecimal _representations_ of the input bytes.
Finally, it would be good to know what representation we're talking
about.  Perhaps just give the \xhh example and say: the Postgres
C-style escaped hexadecimal byte value.  And hyperlink to
https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-ESCAPE

(The docbook would be, depending on text you want to link:

C-style escaped hexadecimal
byte value.

I think.  You link to id="someidvalue" attribute values.)


@@ -8037,10 +8036,9 @@ COPY postgres_log FROM
'/full/path/to/logfile.csv' WITH csv; 
 The name can be any string of less
 than NAMEDATALEN characters (64 characters in
a standard
-build). Only printable ASCII characters may be used in the
-cluster_name value. Other characters will be
-replaced with question marks (?).  No name
is shown
-if this parameter is set to the empty string
'' (which is
+build). Non-ASCII characters used in the
cluster_name
+will be replaced with hexadecimal strings. No name is shown if
this
+parameter is set to the empty string ''
(which is the default). This parameter can only be set at server start.

   

Same review as for the first patch hunk.

diff --git a/doc/src/sgml/postgres-fdw.sgml
b/doc/src/sgml/postgres-fdw.sgml index 5062d712e7..98785e87ea 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -1067,9 +1067,8 @@ postgres=# SELECT postgres_fdw_disconnect_all();
   of any length and contain even non-ASCII characters.  However
when it's passed to and used as application_name
   in a foreign server, note that it will be truncated to less than
-  NAMEDATALEN characters and anything other than
-  printable ASCII characters will be replaced with question
-  marks (?).
+  NAMEDATALEN characters and non-ASCII characters
will be
+  replaced with hexadecimal strings.
   See  for details.
  
 
Same review as for the first patch hunk.

Since the both of you have looked and confirmed that the
actual behavior matches the new documentation I have not
done this.

But, have either of you checked that we're really talking about
replacing everything outside the 7-bit ASCII encodings?  
My reading of the commit referenced in the first email of this
thread says that it's everything outside of the _printable_
ASCII encodings, ASCII values outside of the range 32 to 127,
inclusive.  

Please check.  The docs should probably say "printable ASCII",
or "non-printable ASCII", depending.  I think the meaning
of "printable ASCII" is widely enough known to be 32-127.
So "printable" is good enough, right?

Regards,

Karl 
Free Software:  "You don't pay back, you pay fo

Re: POC: GROUP BY optimization

2023-09-25 Thread Andrey Lepikhov

On 20/7/2023 18:46, Tomas Vondra wrote:

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

According to this issue, I see two options:
1. Go through the grouping column list and find the most reliable one. 
If we have a unique column or column with statistics on the number of 
distinct values, which is significantly more than ndistincts for other 
grouping columns, we can place this column as the first in the grouping. 
It should guarantee the reliability of such a decision, isn't it?
2. If we have extended statistics on distinct values and these 
statistics cover some set of first columns in the grouping list, we can 
optimize these positions. It also looks reliable.


Any thoughts?

--
regards,
Andrey Lepikhov
Postgres Professional





RE: pg_upgrade and logical replication

2023-09-25 Thread Hayato Kuroda (Fujitsu)
Dear Michael,

> > 1. Your approach must be back-patched to older versions which support 
> > logical
> >replication feature, but the oldest one (PG10) has already been
> unsupported.
> >We should not modify such a branch.
> 
> This suggestion would be only for HEAD as it changes the behavior of -b.
> 
> > 2. Also, "max_logical_replication_workers = 0" approach would be consistent
> >with what we are doing now and for upgrade of publisher patch.
> >Please see the previous discussion [1].
> 
> Yeah, you're right.  Consistency would be good across the board, and
> we'd need to take care of the old clusters as well, so the GUC
> enforcement would be needed as well.  It does not strike me that this
> extra IsBinaryUpgrade would hurt anyway?  Forcing the hand of the
> backend has the merit of allowing the removal of the tweak with
> max_logical_replication_workers at some point in the future.

Hmm, our initial motivation is to suppress registering the launcher, and adding
a GUC setting is sufficient for it. Indeed, registering a launcher may be 
harmful,
but it seems not the goal of this thread (changing -b workflow in HEAD is not
sufficient alone for the issue). I'm not sure it should be included in patch 
sets
here.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: Avoid a possible out-of-bounds access (src/backend/optimizer/util/relnode.c)

2023-09-25 Thread Ashutosh Bapat
On Mon, Sep 25, 2023 at 7:14 PM Ranier Vilela  wrote:
>
> Em seg., 25 de set. de 2023 às 08:23, Ashutosh Bapat 
>  escreveu:
>>
>> On Sat, Sep 23, 2023 at 7:29 PM Ranier Vilela  wrote:
>> >
>> > Hi,
>> >
>> > Per Coverity.
>> > CID 1518088 (#2 of 2): Improper use of negative value (NEGATIVE_RETURNS)
>> >
>> > The function bms_singleton_member can returns a negative number.
>> >
>> > /*
>> > * Get a child rel for rel2 with the relids.  See above comments.
>> > */
>> > if (rel2_is_simple)
>> > {
>> > int varno = bms_singleton_member(child_relids2);
>> >
>> > child_rel2 = find_base_rel(root, varno);
>> > }
>> >
>> > It turns out that in the get_matching_part_pairs function (joinrels.c), 
>> > the return of bms_singleton_member is passed to the find_base_rel 
>> > function, which cannot receive a negative value.
>> >
>> > find_base_rel is protected by an Assertion, which effectively indicates 
>> > that the error does not occur in tests and in DEBUG mode.
>> >
>> > But this does not change the fact that bms_singleton_member can return a 
>> > negative value, which may occur on some production servers.
>> >
>> > Fix by changing the Assertion into a real test, to protect the 
>> > simple_rel_array array.
>>
>> Do you have a scenario where bms_singleton_member() actually returns a
>> -ve number OR it's just a possibility.
>
> Just a possibility.
>
>>
>> bms_make_singleton() has an
>> assertion at the end: Assert(result >= 0);
>> bms_make_singleton(), bms_add_member() explicitly forbids negative
>> values. It looks like we have covered all the places which can add a
>> negative value to a bitmapset. May be we are missing a place or two.
>> It will be good to investigate it.
>
> I try to do the research, mostly, with runtime compilation.
> As previously stated, the error does not appear in the tests.
> That said, although Assert protects in most cases, that doesn't mean it can't 
> occur in a query, running on a server in production mode.
>
> Now thinking about what you said about Assertion in bms_make_singleton.
> I think it's nonsense, no?

Sorry, I didn't write it correctly. bms_make_singleton() doesn't
accept a negative integer and bms_get_singleton_member() and
bms_singleton_member() has assertion at the end. Since there is no
possibility of a negative integer making itself a part of bitmapset,
the two functions Asserting instead of elog'ing is better. Assert are
cheaper in production.

> Why design a function that in DEBUG mode prohibits negative returns, but in 
> runtime mode allows it?
> After all, why allow a negative return, if for all practical purposes this is 
> prohibited?

You haven't given any proof that there's a possibility that a negative
value may be returned. We are not allowing negative value being
returned at all.

> Regarding the find_base_rel function, it is nonsense to protect the array 
> with Assertion.
> After all, we have already protected the upper limit with a real test, why 
> not also protect the lower limit.
> The additional testing is cheap and makes perfect sense, making the function 
> more robust in production mode.
> As an added bonus, modern compilers will probably be able to remove the 
> additional test if it deems it not necessary.
> Furthermore, all protections that were added to protect find_base_real calls 
> can eventually be removed,
> since find_base_real will accept parameters with negative values.

However, I agree that changing find_base_rel() the way you have done
in your patch is fine and mildly future-proof. +1 to that idea
irrespective of what bitmapset functions do.

-- 
Best Wishes,
Ashutosh Bapat




Re: pg_upgrade and logical replication

2023-09-25 Thread vignesh C
On Wed, 20 Sept 2023 at 16:54, Amit Kapila  wrote:
>
> On Fri, Sep 15, 2023 at 3:08 PM vignesh C  wrote:
> >
> > The attached v8 version patch has the changes for the same.
> >
>
> Is the check to ensure remote_lsn is valid correct in function
> check_for_subscription_state()? How about the case where the apply
> worker didn't receive any change but just marked the relation as
> 'ready'?

I agree that remote_lsn will not be valid in the case when all the
tables are in ready state and there are no changes to be sent by the
walsender to the worker. I was not sure if this check is required in
this case in the check_for_subscription_state function. I was thinking
that this check could be removed.
I'm also checking why the tables should only be in ready state, the
check that is there in the same function, can we support upgrades when
the tables are in syncdone state or not. I will post my analysis once
I have finished checking on the same.

Regards,
Vignesh




Re: Remove MSVC scripts from the tree

2023-09-25 Thread NINGWEI CHEN
On Fri, 22 Sep 2023 10:12:29 +0900
Michael Paquier  wrote:

> As of today, I can see that the only buildfarm members relying on
> these scripts are bowerbird and hamerkop, so these two would fail if
> the patch attached were to be applied today.  I am adding Andrew
> D. and the maintainers of hamerkop (SRA-OSS) in CC for comments.

hamerkop is not yet prepared for Meson builds, but we plan to work on this 
support soon. 
If we go with Meson builds exclusively right now, we will have to temporarily 
remove the master/HEAD for a while.

Best Regards.
-- 
SRA OSS LLC
Chen Ningwei





RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-25 Thread Hayato Kuroda (Fujitsu)
Dear Bharath,

Again, thank you for reviewing! PSA a new version.

> 
> Here are some more comments/thoughts on the v44 patch:
> 
> 1.
> +# pg_upgrade will fail because the slot still has unconsumed WAL records
> +command_fails(
> +[
> 
> Add a test case to hit fprintf(script, "The slot \"%s\" is invalid\n",
> file as well?

Added. The test was not added because 002_pg_upgrade.pl did not do similar 
checks,
but it is worth verifying. One difficulty was that output directory had 
millisecond
timestamp, so the absolute path could not be predicted. So File::Find::find was
used to detect the file.

> 2.
> +'run of pg_upgrade where the new cluster has insufficient
> max_replication_slots');
> +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
> +"pg_upgrade_output.d/ not removed after pg_upgrade failure");
> 
> +'run of pg_upgrade where the new cluster has the wrong wal_level');
> +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
> +"pg_upgrade_output.d/ not removed after pg_upgrade failure");
> 
> +'run of pg_upgrade of old cluster with idle replication slots');
> +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
> +"pg_upgrade_output.d/ not removed after pg_upgrade failure");
> 
> How do these tests recognize the failures are the intended ones? I
> mean, for instance when pg_upgrade fails for unused replication
> slots/unconsumed WAL records, then just looking at the presence of
> pg_upgrade_output.d might not be sufficient, no? Using
> command_fails_like instead of command_fails and looking at the
> contents of invalid_logical_relication_slots.txt might help make these
> tests more focused.

Yeah, currently the output was not checked. I checked and found that pg_upgrade
would output all messages (including error message) to stdout, so
command_fails_like() could not be used. Therefore, command_checks_all() was used
instead.

> 3.
> +pg_log(PG_REPORT, "fatal");
> +pg_fatal("Your installation contains invalid logical
> replication slots.\n"
> + "These slots can't be copied, so this cluster cannot
> be upgraded.\n"
> + "Consider removing such slots or consuming the
> pending WAL if any,\n"
> + "and then restart the upgrade.\n"
> + "A list of invalid logical replication slots is in
> the file:\n"
> + "%s", output_path);
> 
> It's not just the invalid logical replication slots, but also the
> slots with unconsumed WALs which aren't invalid and can be upgraded if
> ensured the WAL is consumed. So, a better wording would be:
> pg_fatal("Your installation contains logical replication slots
> that cannot be upgraded.\n"
>  "List of all such logical replication slots is in the 
> file:\n"
>  "These slots can't be copied, so this cluster cannot
> be upgraded.\n"
>  "Consider removing invalid slots and/or consuming the
> pending WAL if any,\n"
>  "and then restart the upgrade.\n"
>  "%s", output_path);

Fixed.

Also, I ran pgperltidy. Some formattings were changed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v45-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v45-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


Re: Doc: vcregress .bat commands list lacks "taptest"

2023-09-25 Thread Yugo NAGATA
On Tue, 26 Sep 2023 08:18:01 +0900
Michael Paquier  wrote:

> On Mon, Sep 25, 2023 at 11:07:57AM -0400, Andrew Dunstan wrote:
> > +1
> 
> Thanks, applied and backpatched then.

Thank you!

Regards,
Yugo Nagata

> --
> Michael


-- 
Yugo NAGATA 




Re: Remove distprep

2023-09-25 Thread Michael Paquier
On Wed, Aug 23, 2023 at 12:46:45PM +0200, Peter Eisentraut wrote:
> Apparently, the headerscheck and cpluspluscheck targets still didn't work
> right in the Cirrus jobs.  Here is an updated patch to address that.  This
> is also rebased over some recent changes that affected this patch (generated
> wait events stuff).

-gettext-files: distprep
+gettext-files: postgres

This bit in src/backend/nls.mk does not seem right to me.  The
following sequence works on HEAD, but breaks with the patch because
the files that should be automatically generated from the perl scripts
aren't anymore:
$ ./configure ...
$ make -C src/backend/ gettext-files
[...]
In file included from ../../../../src/include/postgres.h:46,
from brin.c:16:
../../../../src/include/utils/elog.h:79:10: fatal error:
utils/errcodes.h: No such file or directory
79 | #include "utils/errcodes.h" 

# Technically, this should depend on Makefile.global, but then
# version.sgml would need to be rebuilt after every configure run,
# even in distribution tarballs.  So this is cheating a bit, but it

There is this comment in doc/src/sgml/Makefile.  Does it still apply?

   Note that building PostgreSQL from the source
   repository requires reasonably up-to-date versions of 
bison,
   flex, and Perl.
   These tools are not needed to build from a distribution tarball, because
   the files generated with these tools are included in the tarball.
   Other tool requirements

This paragraph exists in sourcerepo.sgml, but it should be updated, I
guess, because now these three binaries would be required when
building from a tarball.

# specparse.c and specscanner.c are in the distribution tarball,
# so do not clean them here

This comment in src/test/isolation/Makefile should be removed.
--
Michael


signature.asc
Description: PGP signature


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-25 Thread Zhijie Hou (Fujitsu)
On Monday, September 25, 2023 7:01 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> To: 'Bharath Rupireddy' 
> Cc: Amit Kapila ; Dilip Kumar
> >
> > 5. In continuation to the above comment:
> >
> > Why can't this logic be something like - if there's any WAL record
> > seen after a slot's confirmed flush LSN is of type generated by WAL
> > resource manager having the rm_decode function defined, then the slot
> > can't be upgraded.
> 
> Thank you for giving new approach! We have never seen the approach before,
> but at least XLOG and HEAP2 rmgr have a decode function. So that
> XLOG_CHECKPOINT_SHUTDOWN, XLOG_CHECKPOINT_ONLINE, and
> XLOG_HEAP2_PRUNE cannot be ignored the approach, seems not appropriate.
> If you have another approach, I'm very happy if you post.

Another idea around decoding is to check if there is any decoding output for
the WAL records.

Like we can create a temp slot and use test_decoding to decode the WAL from the
confirmed_flush_lsn among existing logical replication slots. And if there is
any output from the output plugin, then we consider WAL has not been consumed
yet.

But this means we need to ignore some of the WALs like XLOG_XACT_INVALIDATIONS
which won't be decoded into the output. Also, this approach could be costly as
it needs to do the extra decoding and output, and we need to assume that "all 
the
WAL records including custom records will be decoded and output if they need to
be consumed" .

So it may not be better, but just share it for reference.

Best Regards,
Hou zj


Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector

2023-09-25 Thread Karl O. Pinc
Forgot to attach.  Sorry.

On Mon, 25 Sep 2023 23:30:38 -0500
"Karl O. Pinc"  wrote:

> Version 3.
> 
> Re-do title, which is all of patch v3-003.
> 
> On Mon, 25 Sep 2023 17:55:59 -0500
> "Karl O. Pinc"  wrote:
> 
> > On Mon, 25 Sep 2023 14:14:37 +0200
> > Daniel Gustafsson  wrote:  
> 
> > > Once done you can do "git format-patch origin/master -v 1" which
> > > will generate a set of n patches named v1-0001 through v1-000n.  
> 
> > Done.  11 patches attached.  Thanks for the help.  
> 
> > I am not particularly confident in the top-line commit
> > descriptions.  
> 
> > The bulk of the commit descriptions are very wordy  
> 
> > Listing all the attachments here for future discussion:  
> 
> v3-0001-Change-section-heading-to-better-reflect-saving-a.patch
> v3-0002-Change-section-heading-to-better-describe-referen.patch
> v3-0003-Better-section-heading-for-plpgsql-exception-trap.patch
> v3-0004-Describe-how-to-raise-an-exception-in-the-excepti.patch
> v3-0005-Improve-sentences-in-overview-of-system-configura.patch
> v3-0006-Provide-examples-of-listing-all-settings.patch
> v3-0007-Cleanup-summary-of-role-powers.patch
> v3-0008-Explain-the-difference-between-role-attributes-an.patch
> v3-0009-Document-the-oidvector-type.patch
> v3-0010-Improve-sentences-about-the-significance-of-the-s.patch
> v3-0011-Add-a-sub-section-to-describe-schema-resolution.patch

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
>From 122665c4155698abe88e2bd17639a991791b94e3 Mon Sep 17 00:00:00 2001
From: "Karl O. Pinc" 
Date: Sun, 24 Sep 2023 15:49:30 -0500
Subject: [PATCH v3 01/11] Change section heading to better reflect saving a
 result in variable(s)

The current section title of "Executing a Command with a Single-Row
Result" does not reflect what the section is really about.  Other
sections make clear how to _execute_ commands, single-row result or not.
What this section is about is how to _save_ a single row of results into
variable(s).

It would be nice to talk about saving results into variables in the
section heading but I couldn't come up with anything pithy.  "Saving a
Single-Row of a Command's Result" seems good enough, especially since
there's few other places to save results other than in variables.
---
 doc/src/sgml/plpgsql.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index f55e901c7e..8747e84245 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -1126,7 +1126,7 @@ PERFORM create_mv('cs_session_page_requests_mv', my_query);

 

-Executing a Command with a Single-Row Result
+Saving a Single-Row of a Command's Result
 
 
  SELECT INTO
-- 
2.30.2

>From 4de4a31d41124dfa793cc5cce0516673811ea414 Mon Sep 17 00:00:00 2001
From: "Karl O. Pinc" 
Date: Sun, 24 Sep 2023 15:52:21 -0500
Subject: [PATCH v3 02/11] Change section heading to better describe reference
 of existing types

The section heading of "Copying Types" does not reflect what the
section is about.  It is not about making copies of data types but
about using the data type of existing columns (or rows) in new type
declarations without having to know what the existing type is.

"Re-Using the Type of Columns and Variables" seems adequate.  Getting
something in there about declartions seems too wordy.  I thought
perhaps "Referencing" instead of "Re-Using", but "referencing" isn't
perfect and "re-using" is generic enough, shorter, and simpler to read.
---
 doc/src/sgml/plpgsql.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 8747e84245..874578265e 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -672,7 +672,7 @@ DECLARE

 
   
-   Copying Types
+   Re-Using the Type of Columns and Variables
 
 
 variable%TYPE
-- 
2.30.2

>From 80c2b8ef7ad6e610f5c7bdc61b827983a87110e2 Mon Sep 17 00:00:00 2001
From: "Karl O. Pinc" 
Date: Sun, 24 Sep 2023 16:03:29 -0500
Subject: [PATCH v3 03/11] Better section heading for plpgsql exception
 trapping

The docs seem to use "error" and "exception" interchangeably, perhaps
50% each.  But they never say that the are the same thing, and in the
larger world they are not.  Errors tend to be something that drop on
the floor and usually halt execution whereas exceptions can be trapped
and give the programmer more control over the flow of the program.
(Although, to be fair, exceptions are a subset of errors.)  "Trapping
Errors" is not a good section title for these reasons, and because
when it comes to programmatically raising errors in Pl/PgSQL you
don't, you raise exceptions.  The current section heading does not
stand out in a scan of the table of contents when you're looking for
exception handling, IMHO.

"Error Processing and Trapping Exceptions" is a little long but it
does accurately reflect that the section is about how Pl/PgSQL behaves
under er

Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector

2023-09-25 Thread Karl O. Pinc
Version 3.

Re-do title, which is all of patch v3-003.

On Mon, 25 Sep 2023 17:55:59 -0500
"Karl O. Pinc"  wrote:

> On Mon, 25 Sep 2023 14:14:37 +0200
> Daniel Gustafsson  wrote:

> > Once done you can do "git format-patch origin/master -v 1" which
> > will generate a set of n patches named v1-0001 through v1-000n.

> Done.  11 patches attached.  Thanks for the help.

> I am not particularly confident in the top-line commit
> descriptions.

> The bulk of the commit descriptions are very wordy

> Listing all the attachments here for future discussion:

v3-0001-Change-section-heading-to-better-reflect-saving-a.patch
v3-0002-Change-section-heading-to-better-describe-referen.patch
v3-0003-Better-section-heading-for-plpgsql-exception-trap.patch
v3-0004-Describe-how-to-raise-an-exception-in-the-excepti.patch
v3-0005-Improve-sentences-in-overview-of-system-configura.patch
v3-0006-Provide-examples-of-listing-all-settings.patch
v3-0007-Cleanup-summary-of-role-powers.patch
v3-0008-Explain-the-difference-between-role-attributes-an.patch
v3-0009-Document-the-oidvector-type.patch
v3-0010-Improve-sentences-about-the-significance-of-the-s.patch
v3-0011-Add-a-sub-section-to-describe-schema-resolution.patch

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein




How are jsonb_path_query SRFs $.datetime() defined ?

2023-09-25 Thread Markur Sens
Hi,

In the regression test I see tests the one below [1], but generally, 
jsonb_path_query is a SRF,
and the definition in pg_proc.dat has it as such [0], looking at the 
implementation it doesn’t look like it calls jsonb_query_first behind the 
scenes (say if it detects it’s being called in  SELECT), which would explain it.

How would I re-create this same behaviour with say another $.func() in jsonb or 
any SRF in general.

{ oid => '4006', descr => 'jsonpath query',
  proname => 'jsonb_path_query', prorows => '1000', proretset => 't',
  prorettype => 'jsonb', proargtypes => 'jsonb jsonpath jsonb bool',
  prosrc => 'jsonb_path_query' },


select jsonb_path_query('"10-03-2017"', '$.datetime("dd-mm-")');
 jsonb_path_query 
--
 "2017-03-10"
(1 row)

Re: [HACKERS] Should logtape.c blocks be of type long?

2023-09-25 Thread Michael Paquier
On Sun, Sep 24, 2023 at 10:42:49AM +0900, Michael Paquier wrote:
> Indeed, or Windows decides that making long 8-byte is wiser, but I
> doubt that's ever going to happen on backward-compatibility ground.

While looking more at that, I've noticed that I missed BufFileAppend()
and BufFileSeekBlock(), that themselves rely on long.  The other code
paths calling these two routines rely on BlockNumber (aka uint32), so
that seems to be the bottom of it.

For now, I have registered this patch to the next CF:
https://commitfest.postgresql.org/45/4589/

Comments are welcome.
--
Michael
From 331b2433bcf11f4e028002f52a64f6af91266b88 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 26 Sep 2023 13:07:56 +0900
Subject: [PATCH] Change logtape/tuplestore code to use int64 for block numbers

The code previously relied on long to track block numbers, which would
be 4 bytes in all Windows builds or any 32-bit builds.  This limited the
code to be able to handle up to 16TB of data with the default block
size, like CLUSTER.

Discussion: https://postgr.es/m/CAH2-WznCscXnWmnj=STC0aSa7QG+BRedDnZsP=jo_r9guzv...@mail.gmail.com
---
 src/include/storage/buffile.h  |   4 +-
 src/include/utils/logtape.h|   8 +-
 src/backend/storage/file/buffile.c |  10 +--
 src/backend/utils/sort/logtape.c   | 126 ++---
 src/backend/utils/sort/tuplesort.c |  12 +--
 5 files changed, 80 insertions(+), 80 deletions(-)

diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 6583766719..cbffc9c77e 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -44,9 +44,9 @@ extern size_t BufFileReadMaybeEOF(BufFile *file, void *ptr, size_t size, bool eo
 extern void BufFileWrite(BufFile *file, const void *ptr, size_t size);
 extern int	BufFileSeek(BufFile *file, int fileno, off_t offset, int whence);
 extern void BufFileTell(BufFile *file, int *fileno, off_t *offset);
-extern int	BufFileSeekBlock(BufFile *file, long blknum);
+extern int	BufFileSeekBlock(BufFile *file, int64 blknum);
 extern int64 BufFileSize(BufFile *file);
-extern long BufFileAppend(BufFile *target, BufFile *source);
+extern int64 BufFileAppend(BufFile *target, BufFile *source);
 
 extern BufFile *BufFileCreateFileSet(FileSet *fileset, const char *name);
 extern void BufFileExportFileSet(BufFile *file);
diff --git a/src/include/utils/logtape.h b/src/include/utils/logtape.h
index 5420a24ac9..2de7add81c 100644
--- a/src/include/utils/logtape.h
+++ b/src/include/utils/logtape.h
@@ -51,7 +51,7 @@ typedef struct TapeShare
 	 * Currently, all the leader process needs is the location of the
 	 * materialized tape's first block.
 	 */
-	long		firstblocknumber;
+	int64		firstblocknumber;
 } TapeShare;
 
 /*
@@ -70,8 +70,8 @@ extern void LogicalTapeWrite(LogicalTape *lt, const void *ptr, size_t size);
 extern void LogicalTapeRewindForRead(LogicalTape *lt, size_t buffer_size);
 extern void LogicalTapeFreeze(LogicalTape *lt, TapeShare *share);
 extern size_t LogicalTapeBackspace(LogicalTape *lt, size_t size);
-extern void LogicalTapeSeek(LogicalTape *lt, long blocknum, int offset);
-extern void LogicalTapeTell(LogicalTape *lt, long *blocknum, int *offset);
-extern long LogicalTapeSetBlocks(LogicalTapeSet *lts);
+extern void LogicalTapeSeek(LogicalTape *lt, int64 blocknum, int offset);
+extern void LogicalTapeTell(LogicalTape *lt, int64 *blocknum, int *offset);
+extern int64 LogicalTapeSetBlocks(LogicalTapeSet *lts);
 
 #endif			/* LOGTAPE_H */
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 41ab64100e..919e1106d6 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -841,14 +841,14 @@ BufFileTell(BufFile *file, int *fileno, off_t *offset)
  *
  * Performs absolute seek to the start of the n'th BLCKSZ-sized block of
  * the file.  Note that users of this interface will fail if their files
- * exceed BLCKSZ * LONG_MAX bytes, but that is quite a lot; we don't work
- * with tables bigger than that, either...
+ * exceed BLCKSZ * PG_INT64_MAX bytes, but that is quite a lot; we don't
+ * work with tables bigger than that, either...
  *
  * Result is 0 if OK, EOF if not.  Logical position is not moved if an
  * impossible seek is attempted.
  */
 int
-BufFileSeekBlock(BufFile *file, long blknum)
+BufFileSeekBlock(BufFile *file, int64 blknum)
 {
 	return BufFileSeek(file,
 	   (int) (blknum / BUFFILE_SEG_SIZE),
@@ -919,10 +919,10 @@ BufFileSize(BufFile *file)
  * begins.  Caller should apply this as an offset when working off block
  * positions that are in terms of the original BufFile space.
  */
-long
+int64
 BufFileAppend(BufFile *target, BufFile *source)
 {
-	long		startBlock = target->numFiles * BUFFILE_SEG_SIZE;
+	int64		startBlock = target->numFiles * BUFFILE_SEG_SIZE;
 	int			newNumFiles = target->numFiles + source->numFiles;
 	int			i;
 
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index f

Re: pg_upgrade and logical replication

2023-09-25 Thread Amit Kapila
On Mon, Sep 25, 2023 at 11:43 AM Michael Paquier  wrote:
>
> On Mon, Sep 25, 2023 at 10:05:41AM +0530, Amit Kapila wrote:
> > I also don't think that this patch has to solve the problem of
> > publishers in any way but as per my understanding, if due to some
> > reason we are not able to do the upgrade of publishers, this can add
> > more steps for users than they have to do now for logical replication
> > set up after upgrade. This is because now after restoring the
> > subscription rel's and origin, as soon as we start replication after
> > creating the slots on the publisher, we will never be able to
> > guarantee data consistency. So, they need to drop the entire
> > subscription setup including truncating the relations, and then set it
> > up from scratch which also means they need to somehow remember or take
> > a dump of the current subscription setup. According to me, the key
> > point is to have a mechanism to set up slots correctly to allow
> > replication (or subscriptions) to work after the upgrade. Without
> > that, it appears to me that we are restoring a subscription where it
> > can start from some random LSN and can easily lead to data consistency
> > issues where it can miss some of the updates.
>
> Sure, that's assuming that the publisher side is upgraded.
>

At some point, user needs to upgrade publisher and subscriber could
itself have some publications defined which means the downstream
subscribers will have the same problem.

>  FWIW, my
> take is that there's room to move forward with this patch anyway in
> favor of cases like rollover upgrades to the subscriber.
>
> > This is the primary reason why I prioritized to work on the publisher
> > side before getting this patch done, otherwise, the solution for this
> > patch was relatively clear. I am not sure but I guess this could be
> > the reason why originally we left it in the current state, otherwise,
> > restoring subscription rel's or origin doesn't seem to be too much of
> > an additional effort than what we are doing now.
>
> By "additional effort", you are referring to what the patch is doing,
> with the binary dump of pg_subscription_rel, right?
>

Yes.

-- 
With Regards,
Amit Kapila.




Could not run generate_unaccent_rules.py script when update unicode

2023-09-25 Thread Japin Li


Hi, hackers

When I try to update unicode mapping tables using make update-unicode [1],
I encountered an error about following:

generate_unaccent_rules.py --unicode-data-file 
../../src/common/unicode/UnicodeData.txt --latin-ascii-file Latin-ASCII.xml 
>unaccent.rules
/bin/sh: 1: generate_unaccent_rules.py: not found
make: *** [Makefile:33: unaccent.rules] Error 127
make: *** Deleting file 'unaccent.rules'

The generate_unaccent_rules.py is in contrib/unaccent and the Makefile:

# Allow running this even without --with-python
PYTHON ?= python

$(srcdir)/unaccent.rules: generate_unaccent_rules.py 
../../src/common/unicode/UnicodeData.txt Latin-ASCII.xml
$(PYTHON) $< --unicode-data-file $(word 2,$^) --latin-ascii-file $(word 
3,$^) >$@

It use python to run generate_unaccent_rules.py, However, the ?= operator in
Makefile only check variable is defined or not, but do not check variable is
empty.  Since the PYTHON is defined in src/Makefile.global, so here PYTHON
get empty when without --with-ptyhon.

Here are some examples:

japin@coltd-devel:~$ cat Makefile
PYTHON =
PYTHON ?= python

test:
echo '$(PYTHON)'
japin@coltd-devel:~$ make
echo ''

japin@coltd-devel:~$ cat Makefile
PYTHON = python3
PYTHON ?= python

test:
echo '$(PYTHON)'
japin@coltd-devel:~$ make
echo 'python3'
python3

japin@coltd-devel:~$ cat Makefile
PYTHON =
ifeq ($(PYTHON),)
PYTHON = python
endif

test:
echo '$(PYTHON)'
japin@coltd-devel:~$ make
echo 'python'
python
japin@coltd-devel:~$ cat Makefile
PYTHON = python3
ifeq ($(PYTHON),)
PYTHON = python
endif

test:
echo '$(PYTHON)'
japin@coltd-devel:~$ make
echo 'python3'
python3

Here is a patch to fix this, any thoughts?

diff --git a/contrib/unaccent/Makefile b/contrib/unaccent/Makefile
index 652a3e774c..3ff49ba1e9 100644
--- a/contrib/unaccent/Makefile
+++ b/contrib/unaccent/Makefile
@@ -26,7 +26,9 @@ endif
 update-unicode: $(srcdir)/unaccent.rules
 
 # Allow running this even without --with-python
-PYTHON ?= python
+ifeq ($(PYTHON),)
+PYTHON = python
+endif
 
 $(srcdir)/unaccent.rules: generate_unaccent_rules.py 
../../src/common/unicode/UnicodeData.txt Latin-ASCII.xml
$(PYTHON) $< --unicode-data-file $(word 2,$^) --latin-ascii-file $(word 
3,$^) >$@


[1] 
https://www.postgresql.org/message-id/MEYP282MB1669AC78EE8374B3DE797A09B6FCA%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM

-- 
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.




Re: Fix a wrong comment in setrefs.c

2023-09-25 Thread Richard Guo
On Tue, Sep 26, 2023 at 5:45 AM Tom Lane  wrote:

> Hmm.  This kind of makes me itch, because in principle a ressortgroupref
> identifier should uniquely identify a sorting/grouping column.  If it
> fails to do so here, maybe there are outright bugs lurking elsewhere?
>
> I poked into it a little and determined that the problematic
> ressortgroupref values are being assigned during prepunion.c,
> which says
>
>  * By convention, all non-resjunk columns in a setop tree have
>  * ressortgroupref equal to their resno.  In some cases the ref
> isn't
>  * needed, but this is a cleaner way than modifying the tlist
> later.
>
> So at the end of that, we can have Vars in the upper relations'
> targetlists that are associated by ressortgroupref with values
> in the setop input relations' tlists, but don't match.
> (You are right that added-on implicit coercions are one reason for
> the expressions to be different, but it's not the only one.)


Ah.  Thanks for the investigation.


> I'm inclined to write the comment more like "Usually the equal()
> check is redundant, but in setop plans it may not be, since
> prepunion.c assigns ressortgroupref equal to the column resno
> without regard to whether that matches the topmost level's
> sortgrouprefs and without regard to whether any implicit coercions
> are added in the setop tree.  We might have to clean that up someday;
> but for now, just ignore any false matches."


+1.  It explains the situation much more clearly and accurately.

Thanks
Richard


Re: How to update unicode mapping table?

2023-09-25 Thread Japin Li


On Tue, 26 Sep 2023 at 06:20, Tom Lane  wrote:
> Peter Eisentraut  writes:
>> On 25.09.23 08:02, Japin Li wrote:
>>> When I try to update the unicode mapping table through *.xml in
>>> src/backend/utils/mb/Unicode/, it doesn't update the *.map files.
>>> I find the make cannot go to this directory, what can I do to update
>>> the mapping tables?
>
>> This is done by "make update-unicode".
>
> On a slightly related note, I noticed while preparing 3aff1d3fd
> that src/backend/utils/mb/Unicode/Makefile seems a little screwy.
>
> So there doesn't seem to be any clean way to regenerate the fileset
> present in git.  Maybe these targets aren't supposed to be invoked
> here, but then why have a Makefile here at all?  Alternatively,
> maybe we have files in git that shouldn't be there (very likely due
> to the fact that this directory also lacks a .gitignore file).
>

I find those files do not removed when using VPATH build.

-- 
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.




Re: How to update unicode mapping table?

2023-09-25 Thread Japin Li


On Tue, 26 Sep 2023 at 05:58, Peter Eisentraut  wrote:
> On 25.09.23 08:02, Japin Li wrote:
>> When I try to update the unicode mapping table through *.xml in
>> src/backend/utils/mb/Unicode/, it doesn't update the *.map files.
>> I find the make cannot go to this directory, what can I do to update
>> the mapping tables?
>
> This is done by "make update-unicode".

Thanks, it seems works.

-- 
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.




Re: bug fix and documentation improvement about vacuumdb

2023-09-25 Thread Kuwamura Masaki
>
> I've applied this down to v16 now, thanks for the submission!
>

Thanks for pushing!

Masaki Kuwamura


Re: Add 'worker_type' to pg_stat_subscription

2023-09-25 Thread Peter Smith
On Tue, Sep 26, 2023 at 7:16 AM Nathan Bossart  wrote:
>
> On Thu, Sep 21, 2023 at 04:01:20PM +0530, Amit Kapila wrote:
> > The changes looks good to me, though I haven't tested it. But feel
> > free to commit if you are fine with this patch.
>
> Committed.
>

Thanks for pushing.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Doesn't pgstat_report_wal() handle the argument "force" incorrectly

2023-09-25 Thread Michael Paquier
On Mon, Sep 25, 2023 at 02:49:50PM +0900, Ryoga Yoshida wrote:
> On 2023-09-25 14:38, Michael Paquier wrote:
>> Another idea would be to do like in pgstat.c by adding the following
>> line, then use "nowait" to call each sub-function:
>> nowait = !force;
>> pgstat_flush_wal(nowait);
>> pgstat_flush_io(nowait);
> 
> That's very clear and I think it's good.

Done this way down to 15, then, with more comment polishing.
--
Michael


signature.asc
Description: PGP signature


Re: Doc: vcregress .bat commands list lacks "taptest"

2023-09-25 Thread Michael Paquier
On Mon, Sep 25, 2023 at 11:07:57AM -0400, Andrew Dunstan wrote:
> +1

Thanks, applied and backpatched then.
--
Michael


signature.asc
Description: PGP signature


Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector

2023-09-25 Thread Karl O. Pinc
On Mon, 25 Sep 2023 14:14:37 +0200
Daniel Gustafsson  wrote:

> > On 25 Sep 2023, at 14:00, Karl O. Pinc  wrote:  
> 
> > Is there a preferred data format or should I send
> > each patch in a separate attachment with description?  

> Once done you can do "git format-patch origin/master -v 1" which will
> generate a set of n patches named v1-0001 through v1-000n.  You can
> then attache those to the thread. 

Done.  11 patches attached.  Thanks for the help.

(This is v2, since I made some changes upon review.)

I am not particularly confident in the top-line commit
descriptions.  Some seem kind of long and not a whole
lot of thought went into them.  But the commit descriptions
are for the committer to decide anyway.

The bulk of the commit descriptions are very wordy
and will surely need at least some editing.

Listing all the attachments here for future discussion:

v2-0001-Change-section-heading-to-better-reflect-saving-a.patch
v2-0002-Change-section-heading-to-better-describe-referen.patch
v2-0003-Better-section-heading-for-plpgsql-exception-trap.patch
v2-0004-Describe-how-to-raise-an-exception-in-the-excepti.patch
v2-0005-Improve-sentences-in-overview-of-system-configura.patch
v2-0006-Provide-examples-of-listing-all-settings.patch
v2-0007-Cleanup-summary-of-role-powers.patch
v2-0008-Explain-the-difference-between-role-attributes-an.patch
v2-0009-Document-the-oidvector-type.patch
v2-0010-Improve-sentences-about-the-significance-of-the-s.patch
v2-0011-Add-a-sub-section-to-describe-schema-resolution.patch

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
>From 122665c4155698abe88e2bd17639a991791b94e3 Mon Sep 17 00:00:00 2001
From: "Karl O. Pinc" 
Date: Sun, 24 Sep 2023 15:49:30 -0500
Subject: [PATCH v2 01/11] Change section heading to better reflect saving a
 result in variable(s)

The current section title of "Executing a Command with a Single-Row
Result" does not reflect what the section is really about.  Other
sections make clear how to _execute_ commands, single-row result or not.
What this section is about is how to _save_ a single row of results into
variable(s).

It would be nice to talk about saving results into variables in the
section heading but I couldn't come up with anything pithy.  "Saving a
Single-Row of a Command's Result" seems good enough, especially since
there's few other places to save results other than in variables.
---
 doc/src/sgml/plpgsql.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index f55e901c7e..8747e84245 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -1126,7 +1126,7 @@ PERFORM create_mv('cs_session_page_requests_mv', my_query);

 

-Executing a Command with a Single-Row Result
+Saving a Single-Row of a Command's Result
 
 
  SELECT INTO
-- 
2.30.2

>From 4de4a31d41124dfa793cc5cce0516673811ea414 Mon Sep 17 00:00:00 2001
From: "Karl O. Pinc" 
Date: Sun, 24 Sep 2023 15:52:21 -0500
Subject: [PATCH v2 02/11] Change section heading to better describe reference
 of existing types

The section heading of "Copying Types" does not reflect what the
section is about.  It is not about making copies of data types but
about using the data type of existing columns (or rows) in new type
declarations without having to know what the existing type is.

"Re-Using the Type of Columns and Variables" seems adequate.  Getting
something in there about declartions seems too wordy.  I thought
perhaps "Referencing" instead of "Re-Using", but "referencing" isn't
perfect and "re-using" is generic enough, shorter, and simpler to read.
---
 doc/src/sgml/plpgsql.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 8747e84245..874578265e 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -672,7 +672,7 @@ DECLARE

 
   
-   Copying Types
+   Re-Using the Type of Columns and Variables
 
 
 variable%TYPE
-- 
2.30.2

>From 0252dd434bb9ab2487cd37a93912d19ca1ef5149 Mon Sep 17 00:00:00 2001
From: "Karl O. Pinc" 
Date: Sun, 24 Sep 2023 16:03:29 -0500
Subject: [PATCH v2 03/11] Better section heading for plpgsql exception
 trapping

The docs seem to use "error" and "exception" interchangeably, perhaps
50% each.  But they never say that the are the same thing, and in the
larger world they are not.  Errors tend to be something that drop on
the floor and usually halt execution whereas exceptions can be trapped
and give the programmer more control over the flow of the program.
"Trapping Errors" is not a good section title for these reasons, and
because when it comes to programmatically raising errors in Pl/PgSQL
you don't, you raise exceptions.  The current section heading does not
stand out in a scan of the table of contents when you're looking for
exception handling, IMHO.

"Error Handling and Exception Trapping" is a lit

Re: Partial aggregates pushdown

2023-09-25 Thread Bruce Momjian
On Mon, Sep 25, 2023 at 03:18:13AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Mr.Bruce, Mr.Pyhalov, Mr.Finnerty, hackers.
> 
> Thank you for your valuable comments. I sincerely apologize for the very late 
> reply.
> Here is a response to your comments or a fix to the patch.
> 
> Tuesday, August 8, 2023 at 3:31 Bruce Momjian
> > > I have modified the program except for the point "if the version of the 
> > > remote server is less than PG17".
> > > Instead, we have addressed the following.
> > > "If check_partial_aggregate_support is true and the remote server version 
> > > is older than the local server
> > > version, postgres_fdw does not assume that the partial aggregate function 
> > > is on the remote server unless
> > > the partial aggregate function and the aggregate function match."
> > > The reason for this is to maintain compatibility with any aggregate 
> > > function that does not support partial
> > > aggregate in one version of V1 (V1 is PG17 or higher), even if the next 
> > > version supports partial aggregate.
> > > For example, string_agg does not support partial aggregation in PG15, but 
> > > it will support partial aggregation
> > > in PG16.
> >
> > Just to clarify, I think you are saying:
> >
> > If check_partial_aggregate_support is true and the remote server
> > version is older than the local server version, postgres_fdw
> > checks if the partial aggregate function exists on the remote
> > server during planning and only uses it if it does.
> >
> > I tried to phrase it in a positive way, and mentioned the plan time
> > distinction.  Also, I am sorry I was away for most of July and am just
> > getting to this.
> Thanks for your comment. In the documentation, the description of 
> check_partial_aggregate_support is as follows
> (please see postgres-fdw.sgml).
> --
> check_partial_aggregate_support (boolean)
> Only if this option is true, during query planning, postgres_fdw connects to 
> the remote server and check if the remote server version is older than the 
> local server version. If so, postgres_fdw assumes that for each built-in 
> aggregate function, the partial aggregate function is not defined on the 
> remote server unless the partial aggregate function and the aggregate 
> function match. The default is false.
> --

My point is that there are three behaviors:

*  false - no check
*  true, remote version >= sender - no check
*  true, remove version < sender - check

Here is your code:

+ * Check that a buit-in aggpartialfunc exists on the remote server. If
+ * check_partial_aggregate_support is false, we assume the partial 
aggregate
+ * function exsits on the remote server. Otherwise we assume the 
partial
+ * aggregate function exsits on the remote server only if the remote 
server
+ * version is not less than the local server version.
+ */
+static bool
+is_builtin_aggpartialfunc_shippable(Oid aggpartialfn, 
PgFdwRelationInfo *fpinfo)
+{
+   boolshippable = true;
+
+   if (fpinfo->check_partial_aggregate_support)
+   {
+   if (fpinfo->remoteversion == 0)
+   {
+   PGconn *conn = GetConnection(fpinfo->user, 
false, NULL);
+
+   fpinfo->remoteversion = PQserverVersion(conn);
+   }
+   if (fpinfo->remoteversion < PG_VERSION_NUM)
+   shippable = false;
+   }
+   return shippable;
+}

I think this needs to be explained in the docs.  I am ready to adjust
the patch to improve the wording whenever you are ready.  Should I do it
now and post an updated version for you to use?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: ALTER ROLE documentation improvement

2023-09-25 Thread Nathan Bossart
On Fri, Sep 15, 2023 at 02:25:38PM -0700, Yurii Rashkovskii wrote:
> Thank you for the feedback. I've updated the glossary and updated the
> terminology to be consistent. Please see the new patch attached.

Thanks for the new version of the patch.

  This user owns all system catalog tables in each database.  It is also 
the role
  from which all granted permissions originate.  Because of these things, 
this
- role may not be dropped.
+ role may not be dropped. This role must always be a superuser, it can't 
be changed
+ to be a non-superuser.

I think we should move this note to the sentence just below that mentions
its superuserness.  Maybe it could look something like this:

This role also behaves as a normal database superuser, and its
superuser status cannot be revoked.

+   Database superusers can change any of these settings for any role, except 
for
+   changing SUPERUSER to NOSUPERUSER
+   for a bootstrap 
superuser.

nitpick: s/a bootstrap superuser/the bootstrap superuser

 #: commands/user.c:871
 #, c-format
-msgid "The bootstrap user must have the %s attribute."
+msgid "The bootstrap superuser must have the %s attribute."
 msgstr "Der Bootstrap-Benutzer muss das %s-Attribut haben."

No need to update the translation files.  Those are updated separately in
the pgtranslation repo.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set

2023-09-25 Thread Nathan Bossart
On Thu, Sep 21, 2023 at 11:18:00AM +0900, bt23nguyent wrote:
> -basic_archive_configured(ArchiveModuleState *state)
> +basic_archive_configured(ArchiveModuleState *state, char **logdetail)

Could we do something more like GUC_check_errdetail() instead to maintain
backward compatibility with v16?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: How to update unicode mapping table?

2023-09-25 Thread Tom Lane
Peter Eisentraut  writes:
> On 25.09.23 08:02, Japin Li wrote:
>> When I try to update the unicode mapping table through *.xml in
>> src/backend/utils/mb/Unicode/, it doesn't update the *.map files.
>> I find the make cannot go to this directory, what can I do to update
>> the mapping tables?

> This is done by "make update-unicode".

On a slightly related note, I noticed while preparing 3aff1d3fd
that src/backend/utils/mb/Unicode/Makefile seems a little screwy.
If you go into that directory and type "make distclean", you'll
find it removes some files that are in git:

[postgres@sss1 Unicode]$ make distclean
rm -f 8859-10.TXT 8859-13.TXT 8859-14.TXT 8859-15.TXT 8859-16.TXT 8859-2.TXT 
8859-3.TXT 8859-4.TXT 8859-5.TXT 8859-6.TXT 8859-7.TXT 8859-8.TXT 8859-9.TXT 
BIG5.TXT CNS11643.TXT CP1250.TXT CP1251.TXT CP1252.TXT CP1253.TXT CP1254.TXT 
CP1255.TXT CP1256.TXT CP1257.TXT CP1258.TXT CP866.TXT CP874.TXT CP932.TXT 
CP936.TXT CP950.TXT JIS0212.TXT JOHAB.TXT KOI8-R.TXT KOI8-U.TXT KSX1001.TXT 
euc-jis-2004-std.txt gb-18030-2000.xml sjis-0213-2004-std.txt 
windows-949-2000.xml
[postgres@sss1 Unicode]$ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add/rm ..." to update what will be committed)
  (use "git restore ..." to discard changes in working directory)
deleted:euc-jis-2004-std.txt
deleted:gb-18030-2000.xml
deleted:sjis-0213-2004-std.txt

no changes added to commit (use "git add" and/or "git commit -a")

This seems wrong.  If you "make maintainer-clean", that removes even more:

$ git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
  (use "git add/rm ..." to update what will be committed)
  (use "git restore ..." to discard changes in working directory)
deleted:big5_to_utf8.map
deleted:euc-jis-2004-std.txt
deleted:euc_cn_to_utf8.map
deleted:euc_jis_2004_to_utf8.map
deleted:euc_jp_to_utf8.map
deleted:euc_kr_to_utf8.map
deleted:euc_tw_to_utf8.map
deleted:gb-18030-2000.xml
deleted:gb18030_to_utf8.map
deleted:gbk_to_utf8.map
deleted:iso8859_10_to_utf8.map
deleted:iso8859_13_to_utf8.map
deleted:iso8859_14_to_utf8.map
deleted:iso8859_15_to_utf8.map
deleted:iso8859_16_to_utf8.map
deleted:iso8859_2_to_utf8.map
deleted:iso8859_3_to_utf8.map
deleted:iso8859_4_to_utf8.map
deleted:iso8859_5_to_utf8.map
deleted:iso8859_6_to_utf8.map
deleted:iso8859_7_to_utf8.map
deleted:iso8859_8_to_utf8.map
deleted:iso8859_9_to_utf8.map
deleted:johab_to_utf8.map
deleted:koi8r_to_utf8.map
deleted:koi8u_to_utf8.map
deleted:shift_jis_2004_to_utf8.map
deleted:sjis-0213-2004-std.txt
deleted:sjis_to_utf8.map
deleted:uhc_to_utf8.map
deleted:utf8_to_big5.map
deleted:utf8_to_euc_cn.map
deleted:utf8_to_euc_jis_2004.map
deleted:utf8_to_euc_jp.map
deleted:utf8_to_euc_kr.map
deleted:utf8_to_euc_tw.map
deleted:utf8_to_gb18030.map
deleted:utf8_to_gbk.map
deleted:utf8_to_iso8859_10.map
deleted:utf8_to_iso8859_13.map
deleted:utf8_to_iso8859_14.map
deleted:utf8_to_iso8859_15.map
deleted:utf8_to_iso8859_16.map
deleted:utf8_to_iso8859_2.map
deleted:utf8_to_iso8859_3.map
deleted:utf8_to_iso8859_4.map
deleted:utf8_to_iso8859_5.map
deleted:utf8_to_iso8859_6.map
deleted:utf8_to_iso8859_7.map
deleted:utf8_to_iso8859_8.map
deleted:utf8_to_iso8859_9.map
deleted:utf8_to_johab.map
deleted:utf8_to_koi8r.map
deleted:utf8_to_koi8u.map
deleted:utf8_to_shift_jis_2004.map
deleted:utf8_to_sjis.map
deleted:utf8_to_uhc.map
deleted:utf8_to_win1250.map
deleted:utf8_to_win1251.map
deleted:utf8_to_win1252.map
deleted:utf8_to_win1253.map
deleted:utf8_to_win1254.map
deleted:utf8_to_win1255.map
deleted:utf8_to_win1256.map
deleted:utf8_to_win1257.map
deleted:utf8_to_win1258.map
deleted:utf8_to_win866.map
deleted:utf8_to_win874.map
deleted:win1250_to_utf8.map
deleted:win1251_to_utf8.map
deleted:win1252_to_utf8.map
deleted:win1253_to_utf8.map
deleted:win1254_to_utf8.map
deleted:win1255_to_utf8.map
deleted:win1256_to_utf8.map
deleted:win1257_to_utf8.map
deleted:win1258_to_utf8.map
deleted:win866_to_utf8.map

Re: SET ROLE documentation improvement

2023-09-25 Thread Nathan Bossart
On Fri, Sep 15, 2023 at 02:36:16PM -0700, Yurii Rashkovskii wrote:
> On Fri, Sep 15, 2023 at 1:47 PM Nathan Bossart 
> wrote:
>> I think another issue is that the aforementioned note doesn't mention the
>> new SET option added in 3d14e17.
> 
> How do you think we should word it in that note to make it useful?

Maybe something like this:

The current session user must have the SET option for the specified
role_name, either directly or indirectly via a chain of memberships
with the SET option.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: How to update unicode mapping table?

2023-09-25 Thread Peter Eisentraut

On 25.09.23 08:02, Japin Li wrote:

When I try to update the unicode mapping table through *.xml in
src/backend/utils/mb/Unicode/, it doesn't update the *.map files.
I find the make cannot go to this directory, what can I do to update
the mapping tables?


This is done by "make update-unicode".





Re: Regression in COPY FROM caused by 9f8377f7a2

2023-09-25 Thread Tom Lane
Andrew Dunstan  writes:
> On 2023-09-25 Mo 11:06, Andrew Dunstan wrote:
>> On 2023-09-25 Mo 04:59, Laurenz Albe wrote:
>>> CREATE TABLE boom (t character varying(5) DEFAULT 'a long string');

> Thinking about this a little more, wouldn't it be better if we checked 
> at the time we set the default that the value is actually valid for the 
> given column? This is only one manifestation of a problem you could run 
> into given this table definition.

I dunno, it seems at least possible that someone would do this
deliberately as a means of preventing the column from being defaulted.
In any case, the current behavior has stood for a very long time and
no one has complained that an error should be thrown sooner.

regards, tom lane




Re: Fix a wrong comment in setrefs.c

2023-09-25 Thread Tom Lane
Richard Guo  writes:
> I noticed a wrong comment in search_indexed_tlist_for_sortgroupref().

> /* The equal() check should be redundant, but let's be paranoid */

> It turns out that the equal() check is necessary, because the given
> sort/group expression might be type of FuncExpr which converts integer
> to numeric.

Hmm.  This kind of makes me itch, because in principle a ressortgroupref
identifier should uniquely identify a sorting/grouping column.  If it
fails to do so here, maybe there are outright bugs lurking elsewhere?

I poked into it a little and determined that the problematic
ressortgroupref values are being assigned during prepunion.c,
which says

 * By convention, all non-resjunk columns in a setop tree have
 * ressortgroupref equal to their resno.  In some cases the ref isn't
 * needed, but this is a cleaner way than modifying the tlist later.

So at the end of that, we can have Vars in the upper relations'
targetlists that are associated by ressortgroupref with values
in the setop input relations' tlists, but don't match.
(You are right that added-on implicit coercions are one reason for
the expressions to be different, but it's not the only one.)

I'm inclined to write the comment more like "Usually the equal()
check is redundant, but in setop plans it may not be, since
prepunion.c assigns ressortgroupref equal to the column resno
without regard to whether that matches the topmost level's
sortgrouprefs and without regard to whether any implicit coercions
are added in the setop tree.  We might have to clean that up someday;
but for now, just ignore any false matches."

regards, tom lane




Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-09-25 Thread Andres Freund
Hi,

On 2023-09-25 12:48:30 -0700, Andres Freund wrote:
> On 2023-09-25 15:42:26 -0400, Tom Lane wrote:
> > I just did a git bisect run to discover when the failure documented
> > in bug #18130 [1] started.  And the answer is commit 82a4edabd.
> > Now, it's pretty obvious that that commit didn't in itself cause
> > problems like this:
> > 
> > postgres=# \copy test from 'bug18130.csv' csv
> > ERROR:  could not read block 5 in file "base/5/17005": read only 0 of 8192 
> > bytes
> > CONTEXT:  COPY test, line 472: 
> > "0,185647715,222655,489637,2,2023-07-31,9100.000,302110385,2023-07-30 
> > 14:16:36.750981+00,14026347..."
> 
> Ugh.
> 
> 
> > IMO there must be some very nasty bug lurking in the new
> > multiple-block extension logic, that happens to be exposed by this
> > test case with 82a4edabd's adjustments to the when-to-extend choices
> > but wasn't before that.
> 
> > To save other people the trouble of extracting the in-line data
> > in the bug submission, I've attached the test files I was using.
> 
> Thanks, looking at this now.

(had to switch locations in between)

Uh, huh.  The problem is that COPY uses a single BulkInsertState for multiple
partitions. Which to me seems to run counter to the following comment:
 *  The caller can also provide a BulkInsertState object to optimize many
 *  insertions into the same relation.  This keeps a pin on the current
 *  insertion target page (to save pin/unpin cycles) and also passes a
 *  BULKWRITE buffer selection strategy object to the buffer manager.
 *  Passing NULL for bistate selects the default behavior.

The reason this doesn't cause straight up corruption due to reusing a pin from
another relation is that b1ecb9b3fcfb added ReleaseBulkInsertStatePin() and a
call to it. But I didn't make ReleaseBulkInsertStatePin() reset the bulk
insertion state, which is what leads to the errors from the bug report.

Resetting the relevant BulkInsertState fields fixes the problem. But I'm not
sure that's the right fix. ISTM that independent of whether we fix this via
ReleaseBulkInsertStatePin() resetting the fields or via not reusing
BulkInsertState, we should add assertions defending against future issues like
this (e.g. by adding a relation field to BulkInsertState in cassert builds,
and asserting that the relation is the same as in prior calls unless
ReleaseBulkInsertStatePin() has been called).

Greetings,

Andres Freund




Re: Eager page freeze criteria clarification

2023-09-25 Thread Peter Geoghegan
On Mon, Sep 25, 2023 at 11:45 AM Robert Haas  wrote:
> > The reason I was thinking of using the "lsn at the end of the last vacuum", 
> > is
> > that it seems to be more adapative to the frequency of vacuuming.
>
> Yes, but I think it's *too* adaptive. The frequency of vacuuming can
> plausibly be multiple times per minute or not even annually. That's
> too big a range of variation.

+1. The risk of VACUUM chasing its own tail seems very real. We want
VACUUM to be adaptive to the workload, not adaptive to itself.

> Yeah, I don't know if that's exactly the right idea, but I think it's
> in the direction that I was thinking about. I'd even be happy with
> 100% of the time-between-recent checkpoints, maybe even 200% of
> time-between-recent checkpoints. But I think there probably should be
> some threshold beyond which we say "look, this doesn't look like it
> gets touched that much, let's just freeze it so we don't have to come
> back to it again later."

The sole justification for any strategy that freezes lazily is that it
can avoid useless freezing when freezing turns out to be unnecessary
-- that's it. So I find it more natural to think of freezing as the
default action, and *not freezing* as the thing that requires
justification. Thinking about it "backwards" like that just seems
simpler to me. There is only one possible reason to not freeze, but
several reasons to freeze.

> I think part of the calculus here should probably be that when the
> freeze threshold is long, the potential gains from making it even
> longer are not that much. If I change the freeze threshold on a table
> from 1 minute to 1 hour, I can potentially save uselessly freezing
> that page 59 times per hour, every hour, forever, if the page always
> gets modified right after I touch it. If I change the freeze threshold
> on a table from 1 hour to 1 day, I can only save 23 unnecessary
> freezes per day.

I totally agree with you on this point. It seems related to my point
about "freezing being the conceptual default action" in VACUUM.

Generally speaking, over-freezing is a problem when we reach the same
wrong conclusion (freeze the page) about the same relatively few pages
over and over -- senselessly repeating those mistakes really adds up
when you're vacuuming the same table very frequently. On the other
hand, under-freezing is typically a problem when we reach the same
wrong conclusion (don't freeze the page) about lots of pages only once
in a very long while. I strongly suspect that there is very little
gray area between the two, across the full spectrum of application
characteristics.

Most individual pages have very little chance of being modified in the
short to medium term. In a perfect world, with a perfect algorithm,
we'd almost certainly be freezing most pages at the earliest
opportunity. It is nevertheless also true that a freezing policy that
is only somewhat more aggressive than this ideal oracle algorithm will
freeze way too aggressive (by at least some measures). There isn't
much of a paradox to resolve here: it's all down to the cadence of
vacuuming, and of rows subject to constant churn.

As you point out, the "same policy" can produce dramatically different
outcomes when you actually consider what the consequences of the
policy are over time, when applied by VACUUM under a variety of
different workload conditions. So any freezing policy must be designed
with due consideration for those sorts of things. If VACUUM doesn't
freeze the page now, then when will it freeze it? For most individual
pages, that time will come (again, pages that benefit from lazy
vacuuming are the exception rather than the rule). Right now, VACUUM
almost behaves as if it thought "that's not my problem, it's a problem
for future me!".

Trying to differentiate between pages that we must not over freeze and
pages that we must not under freeze seems important. Generational
garbage collection (as used by managed VM runtimes) does something
that seems a little like this. It's based on the empirical observation
that "most objects die young". What the precise definition of "young"
really is varies significantly, but that turns out to be less of a
problem than you might think -- it can be derived through feedback
cycles. If you look at memory lifetimes on a logarithmic scale, very
different sorts of applications tend to look like they have remarkably
similar memory allocation characteristics.

> Percentage-wise, the overhead of being wrong is the
> same in both cases: I can have as many extra freeze operations as I
> have page modifications, if I pick the worst possible times to freeze
> in every case. But in absolute terms, the savings in the second
> scenario are a lot less.

Very true.

I'm surprised that there hasn't been any discussion of the absolute
amount of system-wide freeze debt on this thread. If 90% of the pages
in the entire database are frozen, it'll generally be okay if we make
the wrong call by freezing lazily when we shouldn't. This i

Re: Add 'worker_type' to pg_stat_subscription

2023-09-25 Thread Nathan Bossart
On Thu, Sep 21, 2023 at 04:01:20PM +0530, Amit Kapila wrote:
> The changes looks good to me, though I haven't tested it. But feel
> free to commit if you are fine with this patch.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: SQL:2011 application time

2023-09-25 Thread Peter Eisentraut

On 25.09.23 21:20, Paul Jungwirth wrote:

On 9/24/23 21:52, jian he wrote:

On Wed, Sep 20, 2023 at 10:50 AM Paul Jungwirth
 wrote:


On 9/17/23 20:11, jian he wrote:

small issues so far I found, v14.


Thank you again for the review! v15 is attached.



hi. some tiny issues.


Rebased v16 patches attached.


Looking through the tests in v16-0001:

+-- PK with no columns just WITHOUT OVERLAPS:
+CREATE TABLE temporal_rng (
+   valid_at tsrange,
+   CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OVERLAPS)
+);
+ERROR:  syntax error at or near "WITHOUT"
+LINE 3:  CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OV...
+  ^

I think this error is confusing.  The SQL standard requires at least one 
non-period column in a PK.  I don't know why that is or why we should 
implement it.  But if we want to implement it, maybe we should enforce 
that in parse analysis rather than directly in the parser, to be able to 
produce a more friendly error message.


+-- PK with a range column/PERIOD that isn't there:
+CREATE TABLE temporal_rng (
+   id INTEGER,
+   CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT 
OVERLAPS)

+);
+ERROR:  range or PERIOD "valid_at" in WITHOUT OVERLAPS does not exist

I think here we should just produce a "column doesn't exist" error 
message, the same as if the "id" column was invalid.  We don't need to 
get into the details of what kind of column it should be.  That is done 
in the next test


+ERROR:  column "valid_at" in WITHOUT OVERLAPS is not a range type

Also, in any case it would be nice to have a location pointer here (for 
both cases).


+-- PK with one column plus a range:
+CREATE TABLE temporal_rng (
+   -- Since we can't depend on having btree_gist here,
+   -- use an int4range instead of an int.
+   -- (The rangetypes regression test uses the same trick.)
+   id int4range,
+   valid_at tsrange,
+   CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT 
OVERLAPS)

+);

I'm confused why you are using int4range here (and in further tests) for 
the scalar (non-range) part of the primary key.  Wouldn't a plaint int4 
serve here?


+SELECT pg_get_indexdef(conindid, 0, true) FROM pg_constraint WHERE 
conname = 'temporal_rng_pk';

+pg_get_indexdef
+---
+ CREATE UNIQUE INDEX temporal_rng_pk ON temporal_rng USING gist (id, 
valid_at)


Shouldn't this somehow show the operator classes for the columns?  We 
are using different operator classes for the id and valid_at columns, 
aren't we?


+-- PK with USING INDEX (not possible):
+CREATE TABLE temporal3 (
+   id int4range,
+   valid_at tsrange
+);
+CREATE INDEX idx_temporal3_uq ON temporal3 USING gist (id, valid_at);
+ALTER TABLE temporal3
+   ADD CONSTRAINT temporal3_pk
+   PRIMARY KEY USING INDEX idx_temporal3_uq;
+ERROR:  "idx_temporal3_uq" is not a unique index
+LINE 2:  ADD CONSTRAINT temporal3_pk
+ ^
+DETAIL:  Cannot create a primary key or unique constraint using such an 
index.


Could you also add a test where the index is unique and the whole thing 
does work?



Apart from the tests, how about renaming the column 
pg_constraint.contemporal to something like to conwithoutoverlaps?






Re: Regression in COPY FROM caused by 9f8377f7a2

2023-09-25 Thread Andrew Dunstan


On 2023-09-25 Mo 11:06, Andrew Dunstan wrote:



On 2023-09-25 Mo 04:59, Laurenz Albe wrote:

On Mon, 2023-09-25 at 09:54 +0200, Laurenz Albe wrote:

In v16 and later, the following fails:

CREATE TABLE boom (t character varying(5) DEFAULT 'a long string');

COPY boom FROM STDIN;
ERROR:  value too long for type character varying(5)

In PostgreSQL v15 and earlier, the COPY statement succeeds.

The error is thrown in BeginCopyFrom in line 1578 (HEAD)

   defexpr = expression_planner(defexpr);

Bisecting shows that the regression was introduced by commit 9f8377f7a2,
which introduced DEFAULT values for COPY FROM.






Thinking about this a little more, wouldn't it be better if we checked 
at the time we set the default that the value is actually valid for the 
given column? This is only one manifestation of a problem you could run 
into given this table definition.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Eager page freeze criteria clarification

2023-09-25 Thread Robert Haas
On Sat, Sep 23, 2023 at 3:53 PM Melanie Plageman
 wrote:
> Freeze tuples on a page opportunistically if the page would be totally
> frozen and:
>
>  4. Buffer is already dirty and no FPI is required OR page LSN is older
>  than 33% of the LSNs since the last vacuum of the table.
>
>  5. Buffer is already dirty and no FPI is required AND page LSN is older
>  than 33% of the LSNs since the last vacuum of the table.
>
> On master, the heuristic is to freeze a page opportunistically if it
> would be totally frozen and if pruning emitted an FPI.
> ---
>
> My takeaways from all of the workload results are as follows:
>
> Algorithm 4 is too aggressive and regresses performance compared to
> master.
>
> Algorithm 5 freezes surprisingly few pages, especially in workloads
> where only the most recent data is being accessed or modified
> (append-only, update recent data).
>
> A work queue-like workload with other concurrent workloads is a
> particular challenge for the freeze heuristic, and we should think more
> about how to handle this.

I feel like we have a sort of Goldilocks problem here: the porridge is
either too hot or too cold, never just right. Say we just look at
workload B, looking at the difference between pgbench_accounts (which
is randomly and frequently updated and thus shouldn't be
opportunistically frozen) and pgbench_history (which is append-only
and should thus be frozen aggressively). Algorithm 4 gets
pgbench_history right and pgbench_accounts wrong, and master does the
opposite. In a perfect world, we'd have an algorithm which could
distinguish sharply between those cases, ramping up to maximum
aggressiveness on pgbench_history while doing nothing at all to
pgbench_accounts.

Algorithm 5 partially accomplishes this, but the results aren't
super-inspiring either. It doesn't add many page freezes in the case
where freezing is bad, but it also only manages to freeze a quarter of
pgbench_history, where algorithm 4 manages to freeze basically all of
it. That's a pretty stark difference. Given that algorithm 5 seems to
make some mistakes on some of the other workloads, I don't think it's
entirely clear that it's an improvement over master, at least in
practical terms.

It might be worth thinking harder about what it takes specifically to
get the pgbench_history case, aka the append-only table case, correct.
One thing that probably doesn't work very well is to freeze pages that
are more than X minutes old. Algorithm 5 uses an LSN threshold instead
of a wall-clock based threshold, but the effect is the same. I think
the problem here is that the vacuum operation essentially happens in
an instant. At the instant that it happens, some fraction of the data
added since the last vacuum is older than whatever threshold you pick,
and the rest is newer. If data is added at a constant rate and you
want to freeze at least 90% of the data, your recency threshold has to
be no more than 10% of the time since the last vacuum. But with
autovacuum_naptimes=60s, that's like 6 seconds, and that's way too
aggressive for a table like pgbench_accounts. It seems to me that it's
not possible to get both cases right by twiddling the threshold,
because pgbench_history wants the threshold to be 0, and
pgbench_accounts wants it to be ... perhaps not infinity, because
maybe the distribution is Gaussian or Zipfian or something rather than
uniform, but probably a couple of minutes.

So I feel like if we want to get both pgbench_history and
pgbench_accounts right, we need to consider some additional piece of
information that makes those cases distinguishable. Either of those
tables can contain a page that hasn't been accessed in 20 seconds, but
the correct behavior for such a page differs between one case and the
other. One random idea that I had was to refuse to opportunistically
freeze a page more than once while it remains resident in
shared_buffers. The first time we do it, we set a bit in the buffer
header or something that suppresses further opportunistic freezing.
When the buffer is evicted the bit is cleared. So we can still be
wrong on a heavily updated table like pgbench_acccounts, but if the
table fits in shared_buffers, we'll soon realize that we're getting it
wrong a lot and will stop making the same mistake over and over. But
this kind of idea only works if the working set is small enough to fit
in shared_buffers, so I don't think it's actually a great plan, unless
we only care about suppressing excess freezing on workloads that fit
in shared_buffers.

A variant on the same theme could be to keep some table-level counters
and use them to assess how often we're getting it wrong. If we're
often thawing recently-frozen pages, don't freeze so aggressively. But
this will not work if different parts of the same table behave
differently.

If we don't want to do something like this that somehow responds to
the characteristics of a particular page or table, then it seems we
either have to freeze quite aggressively to pick up 

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-09-25 Thread Joe Conway

On 9/25/23 14:03, Jeff Davis wrote:

On Mon, 2023-09-25 at 12:00 -0400, Joe Conway wrote:

Should there be a way to have a separate "execution" search_path?


I hadn't considered that and I like that idea for a few reasons:

   * a lot of the problem cases are for functions that don't need to
access tables at all, e.g., in an index expression.
   * it avoids annoyances with pg_temp, because that's not searched for
functions/operators anyway
   * perhaps we could force the object search_path to be empty for
IMMUTABLE functions?

I haven't thought it through in detail, but it seems like a promising
approach.



Related to this, it would be useful if you could grant create on schema 
for only non-executable objects. You may want to allow a user to create 
their own tables but not allow them to create their own functions, for 
example. Right now "GRANT CREATE ON SCHEMA foo" gives the grantee the 
ability to create "all the things".


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-09-25 Thread Andres Freund
Hi,

On 2023-09-25 15:42:26 -0400, Tom Lane wrote:
> I just did a git bisect run to discover when the failure documented
> in bug #18130 [1] started.  And the answer is commit 82a4edabd.
> Now, it's pretty obvious that that commit didn't in itself cause
> problems like this:
> 
> postgres=# \copy test from 'bug18130.csv' csv
> ERROR:  could not read block 5 in file "base/5/17005": read only 0 of 8192 
> bytes
> CONTEXT:  COPY test, line 472: 
> "0,185647715,222655,489637,2,2023-07-31,9100.000,302110385,2023-07-30 
> 14:16:36.750981+00,14026347..."

Ugh.


> IMO there must be some very nasty bug lurking in the new
> multiple-block extension logic, that happens to be exposed by this
> test case with 82a4edabd's adjustments to the when-to-extend choices
> but wasn't before that.

> To save other people the trouble of extracting the in-line data
> in the bug submission, I've attached the test files I was using.

Thanks, looking at this now.


> The DDL is simplified slightly from what was submitted.  I'm not
> entirely sure why a no-op trigger is needed to provoke the bug...

A trigger might prevent using the multi-insert API, which would lead to
different execution paths...


Greetings,

Andres Freund




Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-09-25 Thread Tom Lane
Andres Freund  writes:
> On 2023-09-06 18:01:53 -0400, Tom Lane wrote:
>> It turns out that this patch is what's making buildfarm member
>> chipmunk fail in contrib/pg_visibility [1].  That's easily reproduced
>> by running the test with shared_buffers = 10MB.  I didn't dig further
>> than the "git bisect" result:

> At first I was a bit confounded by not being able to reproduce this. My test
> environment had max_connections=110 for some other reason - and the problem
> doesn't reproduce with that setting...

I just did a git bisect run to discover when the failure documented
in bug #18130 [1] started.  And the answer is commit 82a4edabd.
Now, it's pretty obvious that that commit didn't in itself cause
problems like this:

postgres=# \copy test from 'bug18130.csv' csv
ERROR:  could not read block 5 in file "base/5/17005": read only 0 of 8192 bytes
CONTEXT:  COPY test, line 472: 
"0,185647715,222655,489637,2,2023-07-31,9100.000,302110385,2023-07-30 
14:16:36.750981+00,14026347..."

IMO there must be some very nasty bug lurking in the new
multiple-block extension logic, that happens to be exposed by this
test case with 82a4edabd's adjustments to the when-to-extend choices
but wasn't before that.

To save other people the trouble of extracting the in-line data
in the bug submission, I've attached the test files I was using.
The DDL is simplified slightly from what was submitted.  I'm not
entirely sure why a no-op trigger is needed to provoke the bug...

regards, tom lane

[1] 
https://www.postgresql.org/message-id/18130-7a86a7356a75209d%40postgresql.org

-- run this, then \copy test from 'bug18130.csv' csv

create table if not exists test (
a smallint,
b bigint,
c bigint,
d bigint,
e smallint,
plan_date date,
g numeric(18,7),
h bigint,
i timestamptz,
j bigint,
k integer,
l smallint,
primary key (a, b, c, d, e, plan_date)
) partition by list(a);

/* this is just to generate partition structure automatically */
create or replace function test_add_partitions(a integer, year integer)
returns void
volatile strict
as
$$
declare
parent text;
root text;
begin
root := 'test_' || a::text;
execute 'create table if not exists ' || root || ' partition of test for
values in (' || a::text || ') partition by hash (b);';
for b in 0..7 loop
parent := root || '_' || b::text;

execute 'create table if not exists ' || parent ||
' partition of ' || root || ' for values with (modulus
8, remainder ' || b::text || ')' ||
' partition by range (plan_date);';

execute 'create table if not exists ' ||
parent || '_y' || year::text || 'm1 partition of ' ||
parent ||
' for values from (''' || year::text || '-01-01'') to
(''' || year::text ||
'-02-01'');';
execute 'create table if not exists ' ||
parent || '_y' || year::text || 'm2 partition of ' ||
parent ||
' for values from (''' || year::text || '-02-01'') to
(''' || year::text ||
'-03-01'');';
execute 'create table if not exists ' ||
parent || '_y' || year::text || 'm3 partition of ' ||
parent ||
' for values from (''' || year::text || '-03-01'') to
(''' || year::text ||
'-04-01'');';
execute 'create table if not exists ' ||
parent || '_y' || year::text || 'm4 partition of ' ||
parent ||
' for values from (''' || year::text || '-04-01'') to
(''' || year::text ||
'-05-01'');';
execute 'create table if not exists ' ||
parent || '_y' || year::text || 'm5 partition of ' ||
parent ||
' for values from (''' || year::text || '-05-01'') to
(''' || year::text ||
'-06-01'');';
execute 'create table if not exists ' ||
parent || '_y' || year::text || 'm6 partition of ' ||
parent ||
' for values from (''' || year::text || '-06-01'') to
(''' || year::text ||
'-07-01'');';
execute 'create table if not exists ' ||
parent || '_y' || year::text || 'm7 partition of ' ||
parent ||
' for values from (''' || year::text || '-07-01'') to
(''' || year::text ||
'-08-01'');';
execute 'create table if not exists ' ||
parent || '_y' || year::text || 'm8 partition of ' ||
parent ||
' for values from (''' || year::text || '-08-01'') to
(''' || year::text ||
'-09-01'');';
execute 'create table if not exists ' ||
parent || '_y' || year::text || 'm9 partition of ' ||
parent ||
' for values from (''' || year::text || '-09-01'') to
(''' 

Re: DROP DATABASE is interruptible

2023-09-25 Thread Andres Freund
Hi,

On 2023-09-25 01:48:31 +0100, Peter Eisentraut wrote:
> I noticed that this patch set introduced this pg_dump test:
> 
> On 12.07.23 03:59, Andres Freund wrote:
> > +   'CREATE DATABASE invalid...' => {
> > +   create_order => 1,
> > +   create_sql => q(CREATE DATABASE invalid; UPDATE pg_database SET 
> > datconnlimit = -2 WHERE datname = 'invalid'),
> > +   regexp => qr/^CREATE DATABASE invalid/m,
> > +   not_like => {
> > +   pg_dumpall_dbprivs => 1,
> > +   },
> > +   },
> 
> But the key "not_like" isn't used for anything by that test suite. Maybe
> "unlike" was meant?

It's not clear to me either. Invalid databases shouldn't *ever* be dumped, so
explicitly listing pg_dumpall_dbprivs is odd.

TBH, I find this testsuite the most opaque in postgres...


> But even then it would be useless because the "like" key is empty, so there
> is nothing that "unlike" can subtract from.  Was there something expected
> from the mention of "pg_dumpall_dbprivs"?

Not that I can figure out...


> Perhaps it would be better to write out
> 
> like => {},
> 
> explicitly, with a comment, like some other tests are doing.

Yea, that looks like the right direction.

I'll go and backpatch the adjustment.

Greetings,

Andres Freund




Re: Eager page freeze criteria clarification

2023-09-25 Thread Robert Haas
On Fri, Sep 8, 2023 at 12:07 AM Andres Freund  wrote:
> > Downthread, I proposed using the RedoRecPtr of the latest checkpoint
> > rather than the LSN of the previou vacuum. I still like that idea.
>
> Assuming that "downthread" references
> https://postgr.es/m/CA%2BTgmoYb670VcDFbekjn2YQOKF9a7e-kBFoj2WJF1HtH7YPaWQ%40mail.gmail.com
> could you sketch out the logic you're imagining a bit more?

I'm not exactly sure what the question is here. I mean, it doesn't
quite make sense to just ask whether the page LSN is newer than the
last checkpoint's REDO record, because I think that's basically just
asking whether or not we would need an FPI, and the freeze criteria
that Melanie has been considering incorporate that check in some other
way already. But maybe some variant on that idea is useful - the
distance to the second-most-recent checkpoint, or a multiple or
percentage of the distance to the most-recent checkpoint, or whatever.

> The reason I was thinking of using the "lsn at the end of the last vacuum", is
> that it seems to be more adapative to the frequency of vacuuming.

Yes, but I think it's *too* adaptive. The frequency of vacuuming can
plausibly be multiple times per minute or not even annually. That's
too big a range of variation. The threshold for freezing can vary by
how actively the table or the page is updated, but I don't think it
should vary by six orders of magnitude. Under what theory does it make
sense to say that say "this row in table hasn't been modified in 20
seconds, so let's freeze it, but this row in table B hasn't been
modified in 8 months, so let's not freeze it because it might be
modified again soon"? If you use the LSN at the end of the last
vacuum, you're going to end up making decisions exactly like that,
which seems wrong to me.

> Perhaps we can mix both approaches. We can use the LSN and time of the last
> vacuum to establish an LSN->time mapping that's reasonably accurate for a
> relation. For infrequently vacuumed tables we can use the time between
> checkpoints to establish a *more aggressive* cutoff for freezing then what a
> percent-of-time-since-last-vacuum appach would provide. If e.g. a table gets
> vacuumed every 100 hours and checkpoint timeout is 1 hour, no realistic
> percent-of-time-since-last-vacuum setting will allow freezing, as all dirty
> pages will be too new. To allow freezing a decent proportion of those, we
> could allow freezing pages that lived longer than ~20%
> time-between-recent-checkpoints.

Yeah, I don't know if that's exactly the right idea, but I think it's
in the direction that I was thinking about. I'd even be happy with
100% of the time-between-recent checkpoints, maybe even 200% of
time-between-recent checkpoints. But I think there probably should be
some threshold beyond which we say "look, this doesn't look like it
gets touched that much, let's just freeze it so we don't have to come
back to it again later."

I think part of the calculus here should probably be that when the
freeze threshold is long, the potential gains from making it even
longer are not that much. If I change the freeze threshold on a table
from 1 minute to 1 hour, I can potentially save uselessly freezing
that page 59 times per hour, every hour, forever, if the page always
gets modified right after I touch it. If I change the freeze threshold
on a table from 1 hour to 1 day, I can only save 23 unnecessary
freezes per day. Percentage-wise, the overhead of being wrong is the
same in both cases: I can have as many extra freeze operations as I
have page modifications, if I pick the worst possible times to freeze
in every case. But in absolute terms, the savings in the second
scenario are a lot less. I think if a user is accessing a table
frequently, the overhead of jamming a useless freeze in between every
table access is going to be a lot more noticeable then when the table
is only accessed every once in a while. And I also think it's a lot
less likely that we'll reliably get it wrong. Workloads that touch a
page and then touch it again ~N seconds later can exist for all values
of N, but I bet they're way more common for small values of N than
large ones.

Is there also a need for a similar guard in the other direction? Let's
say that autovacuum_naptime=15s and on some particular table it
triggers every time. I've actually seen this on small queue tables. Do
you think that, in such tables, we should freeze pages that haven't
been modified in 15s?

> Hm, possibly stupid idea: What about using shared_buffers residency as a
> factor? If vacuum had to read in a page to vacuum it, a) we would need read IO
> to freeze it later, as we'll soon evict the page via the ringbuffer b)
> non-residency indicates the page isn't constantly being modified?

This doesn't seem completely stupid, but I fear it would behave
dramatically differently on a workload a little smaller than s_b vs.
one a little larger than s_b, and that doesn't seem good.

-- 
Robert Haas
EDB: http://www.enterprisedb

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-09-25 Thread Jeff Davis
On Mon, 2023-09-25 at 12:00 -0400, Joe Conway wrote:
> Should there be a way to have a separate "execution" search_path?

I hadn't considered that and I like that idea for a few reasons:

  * a lot of the problem cases are for functions that don't need to
access tables at all, e.g., in an index expression.
  * it avoids annoyances with pg_temp, because that's not searched for
functions/operators anyway
  * perhaps we could force the object search_path to be empty for
IMMUTABLE functions?

I haven't thought it through in detail, but it seems like a promising
approach.

Regards,
Jeff Davis





Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-09-25 Thread Jeff Davis
On Mon, 2023-09-25 at 11:30 -0400, Robert Haas wrote:
> On Fri, Sep 22, 2023 at 4:05 PM Jeff Davis  wrote:
> > You expect
> > Bob to do something like:
> > 
> >   CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ...
> > 
> > for all functions, not just SECURITY DEFINER functions, is that
> > right?
> 
> Yes, I do.

Do users like Bob do that today? If not, what causes you to expect them
to do so in the future?

> I think it's self-evident that a SQL function's behavior is
> not guaranteed to be invariant under all possible values of
> search_path.

It's certainly not self-evident in a literal sense. I think you mean
that it's "obvious" or something, and perhaps that narrow question is,
but it's also not terribly helpful.

If the important behaviors here were so obvious, how did we end up in
this mess in the first place?

> > We've already established that even postgres hackers are having
> > difficulty keeping up with these nuances. Even though the SET
> > clause
> > has been there for a long time, our documentation on the subject is
> > insufficient and misleading. And on top of that, it's extra typing
> > and
> > noise for every schema file. Until we make some changes I don't
> > think
> > we can expect users to do as you suggest.
> 
> Respectfully, I find this position unreasonable, to the point of
> finding it difficult to take seriously.

Which part exactly is unreasonable?

 * Hackers are having trouble keeping up with the nuances.
 * Our documentation on the subject *is* insufficient and misleading.
 * "pg_temp" is noise.

It seems like you think that users are already doing "SET search_path =
pg_catalog, pg_temp" in all the necessary places, and therefore no
change is required?


> Most of the problems that we're dealing with here have analogues in
> the world of shell scripts.

I think analogies to unix are what caused a lot of the problems we have
today, because postgres is a very different world. In unix-like
environments, a file is just a file; in postgres, we have all kinds of
code attached in interesting ways.


Regards,
Jeff Davis





Re: XLog size reductions: Reduced XLog record header size for PG17

2023-09-25 Thread Matthias van de Meent
On Wed, 20 Sept 2023 at 07:06, Michael Paquier  wrote:
>
> On Tue, Sep 19, 2023 at 12:07:07PM +0200, Matthias van de Meent wrote:
> > V5 is a rebased version of v4, and includes the latest patch from
> > "smaller XLRec block header" [0] as 0001.
>
> 0001 and 0007 are the meat of the changes.

Correct.

> -#define XLR_CHECK_CONSISTENCY  0x02
> +#define XLR_CHECK_CONSISTENCY  (0x20)
>
> I can't help but notice that there are a few stylistic choices like
> this one that are part of the patch.  Using parenthesis in the case of
> hexa values is inconsistent with the usual practices I've seen in the
> tree.

Yes, I'll take another look at that.

>  #define COPY_HEADER_FIELD(_dst, _size)\
>  do {\
> -if (remaining < _size)\
> +if (remaining < (_size))\
>  goto shortdata_err;\
>
> There are a couple of stylistic changes like this one, that I guess
> could just use their own patch to make these macros easier to use.

They actually fix complaints of my IDE, but are otherwise indeed stylistic.

> -#define XLogRecGetInfo(decoder) ((decoder)->record->header.xl_info)
> +#define XLogRecGetInfo(decoder) ((decoder)->record->header.xl_info & 
> XLR_INFO_MASK)
> +#define XLogRecGetRmgrInfo(decoder) (((decoder)->record->header.xl_info) & 
> XLR_RMGR_INFO_MASK)
>
> This stuff in 0002 is independent of 0001, am I right?  Doing this
> split with an extra macro is okay by me, reducing the presence of
> XLR_INFO_MASK and bitwise operations based on it.

Yes, that change is to stop making use of (~XLR_INFO_MASK) where
XLR_RMGR_INFO_MASK is the correct bitmask (whilst also being quite
useful in the later patch).

> 0003 is also mechanical, but if you begin to enforce the use of
> XLR_RMGR_INFO_MASK as the bits allowed to be passed down to the RMGR
> identity callback, we should have at least a validity check to make
> sure that nothing, even custom RMGRs, pass down unexpected bits?

I think that's already handled in XLogInsert(), but I'll make sure to
add more checks if they're not in place yet.

> I am not convinced that XLOG_INCLUDE_XID is a good interface, TBH, and
> I fear that people are going to forget to set it.  Wouldn't it be
> better to use an option where the XID is excluded instead, making the
> inclusing the an XID the default?

Most rmgrs don't actually use the XID. Only XACT, MULTIXACT, HEAP,
HEAP2, and LOGICALMSG use the xid, so I thought it would be easier to
just find the places where those RMGR's records were being logged than
to update all other places.

I don't mind changing how we decide to log the XID, but I don't think
EXCLUDE_XID is a good alternative: most records just don't need the
transaction ID. There are many more index AMs with logging than table
AMs, so I don't think it is that weird to default to 'not included'.

> > The resource manager has ID = 0, thus requiring some special
> > handling in other code. Apart from being generally useful, it is
> > used in future patches to detect the end of wal in lieu of a zero-ed
> > fixed-size xl_tot_len field.
>
> Err, no, that may not be true.  See for example this thread where the
> topic of improving the checks of xl_tot_len and rely on this value on
> when a record header has been validated, even across page borders:
> https://www.postgresql.org/message-id/17928-aa92416a70ff4...@postgresql.org

Yes, there are indeed exceptions when reusing WAL segments, but it's
still a good canary, like xl_tot_len before this patch.

> Except that, in which cases could an invalid RMGR be useful?

A sentinel value that is obviously invalid is available for several
types, e.g. BlockNumber, TransactionId, XLogRecPtr, Buffer, and this
is quite useful if you want to check if something is definitely
invalid. I think that's fine in principle, we're already "wasting"
some IDs in the gap between RM_MAX_BUILTIN_ID and RM_MIN_CUSTOM_ID.

In the current xlog infrastructure, we use xl_tot_len as that sentinel
to detect whether a new record may exist, but in this patch that can't
be used because the field may not exist and depends on other bytes. So
I used xl_rmgr_id as the field to base the 'may a next record exist'
checks on, which required the 0 rmgr ID to be invalid.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-09-25 Thread Robert Haas
On Mon, Sep 25, 2023 at 12:00 PM Joe Conway  wrote:
> Should there be a way to have a separate "execution" search_path?

I have heard that idea proposed before, and I don't think it's a
terrible idea, but I also don't think it fixes anything terribly
fundamental. I think it's pretty normal for people to define functions
and procedures and then call them from other functions and procedures,
and if you do that, then you need that schema in your execution search
path. Of course, if somebody doesn't do that, or schema-qualifies all
such references, then this becomes useful for defense in depth. But I
find it hard to see it as anything more than defense in depth because
I think a lot of people will need to have use cases where they need to
put non-system schemas into the execution search path, and such people
wouldn't really benefit from the existence of this feature.

Slightly off-topic, but I wonder whether, if we do this, we ought to
do it by adding some kind of a marker to the existing search_path,
rather than by creating a new GUC. For example, maybe putting & before
a schema name means that it can be searched, but only for
non-executable things. Then you could set search_path = &jconway,
pg_catalog or something of that kind. It could potentially be more
powerful to have it be a completely separate setting, but if we do
that, everyone who currently needs to secure search_path properly
starts needing to also secure execution_search_path properly. This is
one of those cases where two is not better than one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: nbtree's ScalarArrayOp array mark/restore code appears to be buggy

2023-09-25 Thread Peter Geoghegan
On Sat, Sep 23, 2023 at 4:22 PM Peter Geoghegan  wrote:
> Attached draft patch shows how this could work.
>
> _bt_restore_array_keys() has comments that seem to suppose that
> calling _bt_preprocess_keys is fairly expensive, and something that's
> well worth avoiding. But...is it, really? I wonder if we'd be better
> off just biting the bullet and always calling _bt_preprocess_keys
> here.

My current plan is to commit something close to my original patch on
Wednesday or Thursday. My proposed fix is minimally invasive (it still
allows _bt_restore_array_keys to avoid most calls to
_bt_preprocess_keys), so I don't see any reason to delay acting here.

-- 
Peter Geoghegan




Re: XLog size reductions: smaller XLRec block header for PG17

2023-09-25 Thread Matthias van de Meent
On Tue, 19 Sept 2023 at 01:03, Andres Freund  wrote:
>
> Hi,
>
> On 2023-05-18 19:22:26 +0300, Heikki Linnakangas wrote:
> > On 18/05/2023 17:59, Matthias van de Meent wrote:
> > > It changes the block IDs used to fit in 6 bits, using the upper 2 bits
> > > of the block_id field to store how much data is contained in the
> > > record (0, <=UINT8_MAX, or <=UINT16_MAX bytes).
> >
> > Perhaps we should introduce a few generic inline functions to do varint
> > encoding. That could be useful in many places, while this scheme is very
> > tailored for XLogRecordBlockHeader.

This scheme is reused later for the XLogRecord xl_tot_len field over
at [0], and FWIW is thus being reused. Sure, it's tailored to this WAL
use case, but IMO we're getting good value from it. We don't use
protobuf or JSON for WAL, we use our own serialization format. Having
some specialized encoding/decoding in that format for certain fields
is IMO quite acceptable.

> Yes - I proposed that and wrote an implementation of reasonably efficient
> varint encoding. Here's my prototype:
> https://postgr.es/m/20221004234952.anrguppx5owewb6n%40awork3.anarazel.de

As I mentioned on that thread, that prototype has a significant
probability of doing nothing to improve WAL size, or even increasing
the WAL size for installations which consume a lot of OIDs.

> I think it's a bad tradeoff to write lots of custom varint encodings, just to
> eek out a bit more space savings.

This is only a single "custom" varint encoding though, if you can even
call it that. It makes a field's size depend on flags set in another
byte, which is not that much different from the existing use of
XLR_BLOCK_ID_DATA_[LONG, SHORT].

> The increase in code complexity IMO makes it a bad tradeoff.

Pardon me for asking, but what would you consider to be a good
tradeoff then? I think the code relating to the WAL storage format is
about as simple as you can get it within the feature set it provides
and the size of the resulting records. While I think there is still
much to gain w.r.t. WAL record size, I don't think we can get much of
those improvements without adding at least some amount of complexity,
something I think to be true for most components in PostgreSQL.

So, except for redesigning significant parts of the public WAL APIs,
are we just going to ignore any potential improvements because they
"increase code complexity"?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] https://commitfest.postgresql.org/43/4386/




Re: Improve tab completion for ALTER DEFAULT PRIVILEGE and ALTER TABLE

2023-09-25 Thread vignesh C
On Sun, 2 Jul 2023 at 20:42, vignesh C  wrote:
>
> Hi,
>
> Improved tab completion for "ALTER DEFAULT PRIVILEGE" and "ALTER TABLE":
> 1) GRANT, REVOKE and FOR USER keyword was not displayed in tab
> completion of alter default privileges like the below statement:
> ALTER DEFAULT PRIVILEGES GRANT INSERT ON tables TO PUBLIC;
> ALTER DEFAULT PRIVILEGES REVOKE INSERT ON tables FROM PUBLIC;
> ALTER DEFAULT PRIVILEGES FOR USER vignesh revoke INSERT ON tables FROM dba1;
>
> 2) USER was not displayed for "ALTER DEFAULT PRIVILEGES IN SCHEMA
> public FOR " like in below statement:
> ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR USER dba1 GRANT INSERT
> ON TABLES TO PUBLIC;
>
> 3) "FOR GRANT OPTION" was not display for "ALTER DEFAULT PRIVILEGES
> REVOKE " like in below statement:
> alter default privileges revoke grant option for select ON tables FROM dba1;
>
> 4) "DATA TYPE" was missing in "ALTER TABLE table-name ALTER COLUMN
> column-name SET" like in:
> ALTER TABLE t1 ALTER COLUMN c1 SET DATA TYPE text;
>
> Attached patch has the changes for the same.

Added a commitfest entry for this:
https://commitfest.postgresql.org/45/4587/

Regards,
Vignesh




Re: Improving btree performance through specializing by key shape, take 2

2023-09-25 Thread Matthias van de Meent
On Tue, 19 Sept 2023 at 22:49, Peter Geoghegan  wrote:
>
> On Tue, Sep 19, 2023 at 6:28 AM Matthias van de Meent
>  wrote:
> > > To be clear, page deletion does what I described here (it does an
> > > in-place update of the downlink to the deleted page, so the same pivot
> > > tuple now points to its right sibling, which is our page of concern),
> > > in addition to fully removing the original pivot tuple whose downlink
> > > originally pointed to our page of concern. This is why page deletion
> > > makes the key space "move to the right", very much like a page split
> > > would.
> >
> > I am still aware of this issue, and I think we've discussed it in
> > detail earlier. I think it does not really impact this patchset. Sure,
> > I can't use dynamic prefix compression to its full potential, but I
> > still do get serious performance benefits:
>
> Then why have you linked whatever the first patch does with the high
> key to dynamic prefix compression in the first place? Your commit
> message makes it sound like it's a way to get around the race
> condition that affects dynamic prefix compression, but as far as I can
> tell it has nothing whatsoever to do with that race condition.

We wouldn't have to store the downlink's right separator and compare
it to the highkey if we didn't deviate from L&Y's algorithm for DELETE
operations (which causes the race condition): just the right sibling's
block number would be enough.

(Yes, the right sibling's block number isn't available for the
rightmost downlink of a page. In those cases, we'd have to reuse the
parent page's high key with that of the downlink page, but I suppose
that'll be relatively rare).

> Questions:
>
> 1. Why shouldn't the high key thing be treated as an unrelated piece of work?

Because it was only significant and relatively visible after getting
rid of the other full key compare operations, and it touches
essentially the same areas. Splitting them out in more patches would
be a hassle.

> I guess it's possible that it really should be structured that way,
> but even then it's your responsibility to make it clear why that is.

Sure. But I think I've made that clear upthread too.

> As things stand, this presentation is very confusing.

I'll take a look at improving the presentation.

> 2. Separately, why should dynamic prefix compression be tied to the
> specialization work? I also see no principled reason why it should be
> tied to the other two things.

My performance results show that insert performance degrades by 2-3%
for single-column indexes if only dynamic the prefix truncation patch
is applied [0]. The specialization patches fix that regression on my
machine (5950x) due to having optimized code for the use case. I can't
say for certain that other machines will see the same results, but I
think results will at least be similar.

> I didn't mind this sort of structure so much back when this work was
> very clearly exploratory -- I've certainly structured work in this
> area that way myself, in the past. But if you want this patch set to
> ever go beyond being an exploratory patch set, something has to
> change.

I think it's fairly complete, and mostly waiting for review.

> I don't have time to do a comprehensive (or even a fairly
> cursory) analysis of which parts of the patch are helping, and which
> are marginal or even add no value.

It is a shame that you don't have the time to review this patch.

> > > You'd have
> > > to compare the lower bound separator key from the parent (which might
> > > itself be the page-level low key for the parent) to the page low key.
> > > That's not a serious suggestion; I'm just pointing out that you need
> > > to be able to compare like with like for a canary condition like this
> > > one, and AFAICT there is no lightweight practical way of doing that
> > > that is 100% robust.
> >
> > True, if we had consistent LOWKEYs on pages, that'd make this job much
> > easier: the prefix could indeed be carried over in full. But that's
> > not currently the case for the nbtree code, and this is the next best
> > thing, as it also has the benefit of working with all currently
> > supported physical formats of btree indexes.
>
> I went over the low key thing again because I had to struggle to
> understand what your high key optimization had to do with dynamic
> prefix compression. I'm still struggling. I think that your commit
> message very much led me astray. Quoting it here:
>
> """
> Although this limits [...] relatively expensive _bt_compare.
> """
>
> You're directly tying the high key optimization to the dynamic prefix
> compression optimization. But why?

The value of skipping the _bt_compare call on the highkey is
relatively much higher in the prefix-skip case than it is on master,
as on master it's only one of the log(n) _bt_compare calls on the
page, while in the patch it's one of (on average) 3 full key
_bt_compare calls. This makes it much easier to prove the performance
gain, which made me integrate 

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-09-25 Thread Joe Conway

On 9/25/23 11:30, Robert Haas wrote:

I don't believe that people want to run their functions under a
sanitized search_path that only includes system schemas. That might
work for some people, but I think most people will define functions
that call other functions that they themselves defined, or access
tables that they themselves created. They will therefore need the
search_path to include the schemas in which they created those
objects.
Without diving into all the detailed nuances of this discussion, this 
particular paragraph made me wonder if at least part of the problem here 
is that the same search_path is used to find "things that I want to 
execute" (functions and operators) and "things I want to access" 
(tables, etc).


I think many folks would be well served by only having system schemas in 
the search_path for the former (augmented by explicit schema qualifying 
of one's own functions), but agree that almost no one wants that for the 
latter (needing to schema qualify every table reference).


Should there be a way to have a separate "execution" search_path?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Why need a lock?

2023-09-25 Thread Tom Lane
jacktby jacktby  writes:
> I find this in source code
> #define ShareLock 5   /* CREATE INDEX 
> (WITHOUT CONCURRENTLY) */
> I think if there is no CONCURRENTLY, so why we need a lock, I think that’s 
> unnecessary. What’s the point?

We need a lock precisely to prevent concurrent execution
(of table modifications that would confuse the index build).

regards, tom lane




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-09-25 Thread Robert Haas
On Fri, Sep 22, 2023 at 4:05 PM Jeff Davis  wrote:
> On Thu, 2023-09-21 at 14:06 -0400, Robert Haas wrote:
> > Also, in a case like this, I don't think it's unreasonable to ask
> > whether, perhaps, Bob just needs to be a little more careful about
> > setting search_path.
>
> That's what this whole thread is about: I wish it was reasonable, but I
> don't think the tools we provide today make it reasonable. You expect
> Bob to do something like:
>
>   CREATE FUNCTION ... SET search_path = pg_catalog, pg_temp ...
>
> for all functions, not just SECURITY DEFINER functions, is that right?

Yes, I do. I think it's self-evident that a SQL function's behavior is
not guaranteed to be invariant under all possible values of
search_path. If you care about your function behaving the same way all
the time, you have to set the search_path.

TBH, I don't see any reasonable way around that requirement. We can
perhaps provide some safeguards that will make it less likely that you
will get completely hosed if your forget, and we could decide to make
SET search_path or some mostly-equivalent thing the default at the
price of pretty large compatibility break, but you can't have
functions that both resolve object references using the caller's
search path and also reliably do what the author intended.

> > You can, I think, be expected to
> > check that functions you define have SET search_path attached.
>
> We've already established that even postgres hackers are having
> difficulty keeping up with these nuances. Even though the SET clause
> has been there for a long time, our documentation on the subject is
> insufficient and misleading. And on top of that, it's extra typing and
> noise for every schema file. Until we make some changes I don't think
> we can expect users to do as you suggest.

Respectfully, I find this position unreasonable, to the point of
finding it difficult to take seriously. You said in another part of
your email that I didn't quote here that it's a problem that it's a
problem that functions and procedures are created with public execute
access by default -- but you can work around this by using a schema to
which other users don't have access, or by changing the default
permissions for functions on the schema where you are creating them,
or by adjusting permissions on the individual objects. If you don't do
any of that but don't trust the other users on your system then you at
least need to set search_path. If you neither understand how function
permissions work nor understand the importance of controlling
search_path, you cannot expect to have a secure system with multiple,
mutually untrusting users. That's just never going to work, regardless
of what the server behavior is.

I also disagree with the idea that setting the search_path should be
regarded as noise. I think it's quite the opposite. I don't believe
that people want to run their functions under a sanitized search_path
that only includes system schemas. That might work for some people,
but I think most people will define functions that call other
functions that they themselves defined, or access tables that they
themselves created. They will therefore need the search_path to
include the schemas in which they created those objects. There's no
way for the system to magically figure out what the user wants here.
*Perhaps* if the function is defined interactively the then-current
value could be captured, but in a pg_dump for example that won't work,
and the configured value, wherever it came from initially, is going to
have to be recorded so that it can be recreated when the dump is
restored.

Most of the problems that we're dealing with here have analogues in
the world of shell scripts. A sql or plpgsql function is like a shell
script. If it's setuid, i.e. SECURITY DEFINER, you have to worry about
the caller hijacking it by setting PATH or IFS or LD_something. Even
if it isn't, you have to either trust that the caller has set a
reasonable PATH, or set one yourself, else your script isn't always
going to work reliably. Nobody really expects to be able to make a
setuid shell script secure at all -- that typically requires a wrapper
executable -- but it definitely can't be done by someone who doesn't
understand the importance of setting their PATH and has no idea how to
use chmod.

One thing that is quite different between the shell script situation
and what we do inside PostgreSQL is that there's a lot more security
by default. Every user gets a home directory which by default is
accessible only to them, or at the very least writable only by them,
and system directories have tightly-controlled permissions. I think
UNIX had analogues of a lot of the problems we have today 40 years
ago, but they've tightened things up. We've started to move in that
direction by, for example, removing public execute access by default.
If we want to move further in the direction that UNIX has taken, we
should probably get rid of the public schema altogether, and
auto-c

Re: Confused about gram.y referencs in Makefile?

2023-09-25 Thread Tom Lane
Daniel Gustafsson  writes:
> On 25 Sep 2023, at 05:34, Japin Li  wrote:
>> How about "See gram.h target's comment in src/backend/parser/Makefile"
>> or just "See src/backend/parser/Makefile"?

> The latter seems more stable, if the Makefile is ever restructured it's almost
> guaranteed that this comment will be missed with the location info becoming
> stale.

I did it like this:

 # Note that while each script call produces two output files, to be
-# parallel-make safe we need to split this into two rules.  (See for
-# example gram.y for more explanation.)
+# parallel-make safe we need to split this into two rules.  (See notes
+# in src/backend/parser/Makefile about rules with multiple outputs.)
 #

There are a whole lot of other cross-references to that same comment,
and they all look like

# See notes in src/backend/parser/Makefile about the following two rules

I considered modifying all of those as well, but decided it wasn't
really worth the trouble.  The Makefiles' days are numbered I think.

regards, tom lane




Re: Remove MSVC scripts from the tree

2023-09-25 Thread Andrew Dunstan


On 2023-09-21 Th 21:12, Michael Paquier wrote:


As of today, I can see that the only buildfarm members relying on
these scripts are bowerbird and hamerkop, so these two would fail if
the patch attached were to be applied today.  I am adding Andrew
D. and the maintainers of hamerkop (SRA-OSS) in CC for comments.



Changing bowerbird to use meson should not be difficult, just needs some 
TUITs.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Doc: vcregress .bat commands list lacks "taptest"

2023-09-25 Thread Andrew Dunstan


On 2023-09-25 Mo 03:14, Daniel Gustafsson wrote:

On 25 Sep 2023, at 08:58, Michael Paquier  wrote:
I would say yes, but with a backpatch.

Agreed.



+1


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Regression in COPY FROM caused by 9f8377f7a2

2023-09-25 Thread Andrew Dunstan


On 2023-09-25 Mo 04:59, Laurenz Albe wrote:

On Mon, 2023-09-25 at 09:54 +0200, Laurenz Albe wrote:

In v16 and later, the following fails:

CREATE TABLE boom (t character varying(5) DEFAULT 'a long string');

COPY boom FROM STDIN;
ERROR:  value too long for type character varying(5)

In PostgreSQL v15 and earlier, the COPY statement succeeds.

The error is thrown in BeginCopyFrom in line 1578 (HEAD)

   defexpr = expression_planner(defexpr);

Bisecting shows that the regression was introduced by commit 9f8377f7a2,
which introduced DEFAULT values for COPY FROM.



Oops :-(



I suggest the attached fix, which evaluates default values only if
the DEFAULT option was specified or if the column does not appear in
the column list of COPY.



Patch looks reasonable, haven't tested yet.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Why need a lock?

2023-09-25 Thread jacktby jacktby
I find this in source code
#define ShareLock   5   /* CREATE INDEX 
(WITHOUT CONCURRENTLY) */
I think if there is no CONCURRENTLY, so why we need a lock, I think that’s 
unnecessary. What’s the point?



Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

2023-09-25 Thread jian he
in doc/src/sgml/ref/alter_operator.sgml

   
HASHES

 
 Indicates this operator can support a hash join. Can only be set
when the operator does not support a hash join.
 

   

   
MERGES

 
 Indicates this operator can support a merge join. Can only be set
when the operator does not support a merge join.
 

   

if the operator cannot support hash/merge join, it can't do ALTER
OPERATOR oprname(left_arg, right_arg) SET (HASHES/MERGES = false);

I think it means:
Can only be set when the operator does support a hash/merge join. Once
set to true, it cannot be reset to false.




Re: Synchronizing slots from primary to standby

2023-09-25 Thread Drouvot, Bertrand

Hi,

On 9/25/23 10:44 AM, Drouvot, Bertrand wrote:

Hi,

On 9/23/23 3:38 AM, Amit Kapila wrote:

On Fri, Sep 22, 2023 at 6:01 PM Drouvot, Bertrand
 wrote:



There is a difference here that we also need to prevent removal of
rows required by sync_slots. That could be achieved by physical slot
(and hot_standby_feedback). So, having a requirement to have physical
slot doesn't sound too unreasonable to me. Otherwise, we need to
invent some new mechanism of having some sort of placeholder slot to
avoid removal of required rows. 


Thinking about it, I wonder if removal of required rows is even possible
given that:

- we don't allow to logical decode from a sync slot
- sync slot catalog_xmin <= its primary counter part catalog_xmin
- its primary counter part prevents rows removal thanks to its own catalog_xmin
- a sync slot is removed as soon as its primary counter part is removed

In that case I'm not sure how rows removal on the primary could lead to remove 
rows
required by a sync slot. Am I missing something? Do you have a scenario in mind?


Please forget the above questions, it's in fact pretty easy to remove rows on 
the primary that
would be needed by a sync slot.

I do agree that having a requirement to have physical slot does not sound 
unreasonable then.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: bug fix and documentation improvement about vacuumdb

2023-09-25 Thread Daniel Gustafsson
> On 24 Sep 2023, at 10:22, Kuwamura Masaki  
> wrote:
> 
> LGTM too!

I've applied this down to v16 now, thanks for the submission!

--
Daniel Gustafsson





RE: Synchronizing slots from primary to standby

2023-09-25 Thread Hayato Kuroda (Fujitsu)
Dear Ajin, Shveta,

Thank you for rebasing the patch set! Here are new comments for v19_2-0001.

01. WalSndWaitForStandbyNeeded()

```
if (SlotIsPhysical(MyReplicationSlot))
return false;
```

Is there a possibility that physical walsenders call this function? 
IIUC following is a stacktrace for the function, so the only logical walsenders 
use it.
If so, it should be Assert() instead of an if statement.

logical_read_xlog_page()
WalSndWaitForWal()
WalSndWaitForStandbyNeeded()

02. WalSndWaitForStandbyNeeded()

Can we set shouldwait in SlotSyncInitConfig()? synchronize_slot_names_list is
searched whenever the function is called, but it is not changed automatically.
If the slotname is compared with the list in the SlotSyncInitConfig(), the
liner search can be reduced.

03. WalSndWaitForStandbyConfirmation()

We should add ProcessRepliesIfAny() during the loop, otherwise the walsender
overlooks the death of an apply worker.

04. WalSndWaitForStandbyConfirmation()

Not sure, but do we have to return early if walsenders got 
PROCSIG_WALSND_INIT_STOPPING
signal? I thought that if physical walsenders get stuck, logical walsenders wait
forever. At that time we cannot stop the primary server even if "pg_ctl stop"
is executed.

05. SlotSyncInitConfig()

Why don't we free the memory for rawname, old standby_slot_names_list, and 
synchronize_slot_names_list?
They seem to be overwritten.

06. SlotSyncInitConfig()

Both physical and logical walsenders call the func, but physical one do not use
lists, right? If so, can we add a quick exit for physical walsenders?
Or, we should carefully remove where physical calls it.

07. StartReplication()

I think we do not have to call SlotSyncInitConfig().
Alternative approach is written in above.

08. the other

Also, I found the unexpected behavior after both 0001 and 0002 were applied.
Was it normal or not? 

1. constructed below setup
(ensured that logical slot existed on secondary)
2. stopped the primary
3. promoted the secondary server
4. disabled a subscription once
5. changed the connection string for subscriber
6. Inserted data to new primary
7. enabled the subscription again
8. got an ERROR: replication slot "sub" does not exist

I expected that the logical replication would be restarted, but it could not.
Was it real issue or my fault? The error would appear in secondary.log.

```
Setup:
primary--->secondary
   |
   |
subscriber
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



test_0925.sh
Description: test_0925.sh


Re: Index range search optimization

2023-09-25 Thread Alexander Korotkov
On Mon, Sep 25, 2023 at 12:58 PM Pavel Borisov  wrote:
> One more doubt about naming. Calling function
> _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
> ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck)
> as
> (void) _bt_checkkeys(scan, itup, indnatts, dir,
> &requiredMatchedByPrecheck, false);
> looks little bit misleading because of coincidence of names of 5 and 6
> arguments.

I've added the comment clarifying this argument usage.

--
Regards,
Alexander Korotkov


0001-Skip-checking-of-scan-keys-required-for-direction-v6.patch
Description: Binary data


Re: Avoid a possible out-of-bounds access (src/backend/optimizer/util/relnode.c)

2023-09-25 Thread Ranier Vilela
Em seg., 25 de set. de 2023 às 08:23, Ashutosh Bapat <
ashutosh.bapat@gmail.com> escreveu:

> On Sat, Sep 23, 2023 at 7:29 PM Ranier Vilela  wrote:
> >
> > Hi,
> >
> > Per Coverity.
> > CID 1518088 (#2 of 2): Improper use of negative value (NEGATIVE_RETURNS)
> >
> > The function bms_singleton_member can returns a negative number.
> >
> > /*
> > * Get a child rel for rel2 with the relids.  See above comments.
> > */
> > if (rel2_is_simple)
> > {
> > int varno = bms_singleton_member(child_relids2);
> >
> > child_rel2 = find_base_rel(root, varno);
> > }
> >
> > It turns out that in the get_matching_part_pairs function (joinrels.c),
> the return of bms_singleton_member is passed to the find_base_rel function,
> which cannot receive a negative value.
> >
> > find_base_rel is protected by an Assertion, which effectively indicates
> that the error does not occur in tests and in DEBUG mode.
> >
> > But this does not change the fact that bms_singleton_member can return a
> negative value, which may occur on some production servers.
> >
> > Fix by changing the Assertion into a real test, to protect the
> simple_rel_array array.
>
> Do you have a scenario where bms_singleton_member() actually returns a
> -ve number OR it's just a possibility.

Just a possibility.


> bms_make_singleton() has an
> assertion at the end: Assert(result >= 0);
> bms_make_singleton(), bms_add_member() explicitly forbids negative
> values. It looks like we have covered all the places which can add a
> negative value to a bitmapset. May be we are missing a place or two.
> It will be good to investigate it.
>
I try to do the research, mostly, with runtime compilation.
As previously stated, the error does not appear in the tests.
That said, although Assert protects in most cases, that doesn't mean it
can't occur in a query, running on a server in production mode.

Now thinking about what you said about Assertion in bms_make_singleton.
I think it's nonsense, no?
Why design a function that in DEBUG mode prohibits negative returns, but in
runtime mode allows it?
After all, why allow a negative return, if for all practical purposes this
is prohibited?
Regarding the find_base_rel function, it is nonsense to protect the array
with Assertion.
After all, we have already protected the upper limit with a real test, why
not also protect the lower limit.
The additional testing is cheap and makes perfect sense, making the
function more robust in production mode.
As an added bonus, modern compilers will probably be able to remove the
additional test if it deems it not necessary.
Furthermore, all protections that were added to protect find_base_real
calls can eventually be removed,
since find_base_real will accept parameters with negative values.

best regards,
Ranier Vilela


Re: generic plans and "initial" pruning

2023-09-25 Thread Amit Langote
On Wed, Sep 6, 2023 at 11:20 PM Robert Haas  wrote:
> On Wed, Sep 6, 2023 at 5:12 AM Amit Langote  wrote:
> > Attached updated patches.  Thanks for the review.
>
> I think 0001 looks ready to commit. I'm not sure that the commit
> message needs to mention future patches here, since this code cleanup
> seems like a good idea regardless, but if you feel otherwise, fair
> enough.

OK, I will remove the mention of future patches.

> On 0002, some questions:
>
> - In ExecEndLockRows, is the call to EvalPlanQualEnd a concern? i.e.
> Does that function need any adjustment?

I think it does with the patch as it stands.  It needs to have an
early exit at the top if parentestate is NULL, which it would be if
EvalPlanQualInit() wasn't called from an ExecInit*() function.

Though, as I answer below your question as to whether there is
actually any need to interrupt all of the ExecInit*() routines,
nothing needs to change in ExecEndLockRows().

> - In ExecEndMemoize, should there be a null-test around
> MemoryContextDelete(node->tableContext) as we have in
> ExecEndRecursiveUnion, ExecEndSetOp, etc.?

Oops, you're right.  Added.

> I wonder how we feel about setting pointers to NULL after freeing the
> associated data structures. The existing code isn't consistent about
> doing that, and making it do so would be a fairly large change that
> would bloat this patch quite a bit. On the other hand, I think it's a
> good practice as a general matter, and we do do it in some ExecEnd
> functions.

I agree that it might be worthwhile to take the opportunity and make
the code more consistent in this regard.  So, I've included those
changes too in 0002.

> On 0003, I have some doubt about whether we really have all the right
> design decisions in detail here:
>
> - Why have this weird rule where sometimes we return NULL and other
> times the planstate? Is there any point to such a coding rule? Why not
> just always return the planstate?
>
> - Is there any point to all of these early exit cases? For example, in
> ExecInitBitmapAnd, why exit early if initialization fails? Why not
> just plunge ahead and if initialization failed the caller will notice
> that and when we ExecEndNode some of the child node pointers will be
> NULL but who cares? The obvious disadvantage of this approach is that
> we're doing a bunch of unnecessary initialization, but we're also
> speeding up the common case where we don't need to abort by avoiding a
> branch that will rarely be taken. I'm not quite sure what the right
> thing to do is here.
>
> - The cases where we call ExecGetRangeTableRelation or
> ExecOpenScanRelation are a bit subtler ... maybe initialization that
> we're going to do later is going to barf if the tuple descriptor of
> the relation isn't what we thought it was going to be. In that case it
> becomes important to exit early. But if that's not actually a problem,
> then we could apply the same principle here also -- don't pollute the
> code with early-exit cases, just let it do its thing and sort it out
> later. Do you know what the actual problems would be here if we didn't
> exit early in these cases?
>
> - Depending on the answers to the above points, one thing we could
> think of doing is put an early exit case into ExecInitNode itself: if
> (unlikely(!ExecPlanStillValid(whatever)) return NULL. Maybe Andres or
> someone is going to argue that that checks too often and is thus too
> expensive, but it would be a lot more maintainable than having similar
> checks strewn throughout the ExecInit* functions. Perhaps it deserves
> some thought/benchmarking. More generally, if there's anything we can
> do to centralize these checks in fewer places, I think that would be
> worth considering. The patch isn't terribly large as it stands, so I
> don't necessarily think that this is a critical issue, but I'm just
> wondering if we can do better. I'm not even sure that it would be too
> expensive to just initialize the whole plan always, and then just do
> one test at the end. That's not OK if the changed tuple descriptor (or
> something else) is going to crash or error out in a funny way or
> something before initialization is completed, but if it's just going
> to result in burning a few CPU cycles in a corner case, I don't know
> if we should really care.

I thought about this some and figured that adding the
is-CachedPlan-still-valid tests in the following places should suffice
after all:

1. In InitPlan() right after the top-level ExecInitNode() calls
2. In ExecInit*() functions of Scan nodes, right after
ExecOpenScanRelation() calls

CachedPlans can only become invalid because of concurrent changes to
the inheritance child tables referenced in the plan.  Only the
following schema modifications of child tables are possible to be
performed concurrently:

* Addition of a column (allowed only if traditional inheritance child)
* Addition of an index
* Addition of a non-index constraint
* Dropping of a child table (allowed only if traditional i

Re: On login trigger: take three

2023-09-25 Thread Daniel Gustafsson
> On 25 Sep 2023, at 11:13, Alexander Korotkov  wrote:

> I'd like to do a short summary of
> design issues on this thread.

Thanks for summarizing this long thread!

> the patch for the GUC option to disable
> all event triggers resides in a separate thread [4]. Apparently that
> patch should be committed first [5].

I have committed the prerequisite patch for temporarily disabling EVTs via a
GUC in 7750fefdb2.  We should absolutely avoid creating any more dependencies
on single-user mode (yes, I have changed my mind since the beginning of the
thread).

> 3. Yet another question is connection-time overhead introduced by this
> patch. The overhead estimate varies from no measurable overhead [6] to
> 5% overhead [7]. In order to overcome that, [8] has introduced a
> database-level flag indicating whether there are connection triggers.
> Later this flag was removed [9] in a hope that the possible overhead
> is acceptable.

While I disliked the flag, I do think the overhead is problematic.  Last time I
profiled it I found it noticeable, and it seems expensive for such a niche
feature to impact such a hot path.  Maybe you can think of other ways to reduce
the cost here (if it indeed is an issue in the latest version of the patch,
which is not one I've benchmarked)?

> 5. It was also pointed out [11] that ^C in psql doesn't cancel
> long-running client connection triggers. That might be considered a
> psql problem though.

While it is a psql problem, it's exacerbated by a slow login EVT and it breaks
what I would guess is the mental model of many who press ^C in a stalled login.
At the very least we should probably document the risk?

--
Daniel Gustafsson





Re: GUC for temporarily disabling event triggers

2023-09-25 Thread Daniel Gustafsson
> On 25 Sep 2023, at 09:52, Daniel Gustafsson  wrote:
> 
>> On 25 Sep 2023, at 09:50, Michael Paquier  wrote:
>> 
>> On Mon, Sep 25, 2023 at 09:19:35AM +0200, Daniel Gustafsson wrote:
>>> Fair enough, although I used "fire" instead of "run" which is consistent 
>>> with
>>> the event trigger documentation.
>> 
>> Okay by me.
> 
> Great, I'll go ahead and apply this version then. Thanks for review!

And applied, closing the CF entry.

--
Daniel Gustafsson





Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector

2023-09-25 Thread Daniel Gustafsson
> On 25 Sep 2023, at 14:00, Karl O. Pinc  wrote:

> Is there a preferred data format or should I send
> each patch in a separate attachment with description?

The easiest way would be to create a patchset off of master I think.  In a
branch, commit each change with an explanatory commit message.  Once done you
can do "git format-patch origin/master -v 1" which will generate a set of n
patches named v1-0001 through v1-000n.  You can then attache those to the
thread.  This will make it easier for a reviewer, and it's easy to apply them
in the right order in case one change depends on another earlier change.

--
Daniel Gustafsson





Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector

2023-09-25 Thread Karl O. Pinc
On Mon, 25 Sep 2023 09:26:38 +0200
Daniel Gustafsson  wrote:

> > On 25 Sep 2023, at 00:57, Karl O. Pinc  wrote:  
> 
> > I have made various, mostly unrelated to each other,
> > small improvements to the documentation.  

> While I agree it's subjective, I don't think adding a new section or
> paragraph qualifies as short or small.  I would prefer if each
> (related) change is in a single commit with a commit message which
> describes the motivation for the change.  A reviewer can second-guess
> the rationale for the changes, but they shouldn't have to.

Will do.  Is there a preferred data format or should I send
each patch in a separate attachment with description?

> The resulting patchset can all be in the same thread though.

Thanks.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein




Re: Build the docs if there are changes in docs and don't run other tasks if the changes are only in docs

2023-09-25 Thread Nazir Bilal Yavuz
Hi,

I attached the second version of the patch.

On Mon, 11 Sept 2023 at 15:11, Daniel Gustafsson  wrote:
>
> > On 11 Sep 2023, at 13:03, Nazir Bilal Yavuz  wrote:
>
> >> Almost, but not entirely.  There are a set of scripts which generate 
> >> content
> >> for the docs based on files in src/, like 
> >> src/backend/catalog/sql_features.txt
> >> and src/include/parser/kwlist.h.  If those source files change, or their
> >> scripts, it would be helpful to build docs.
> >
> > Thanks! Are these the only files that are not in the doc subfolders
> > but effect docs?
>
> I believe so, the list of scripts and input files can be teased out of the
> doc/src/sgml/meson.build file.

Now the files mentioned in the doc/src/sgml/meson.build file will
trigger building the docs task. Also, if the changes are only in the
README files, CI will be skipped completely.

Any kind of feedback would be appreciated.

Regards,
Nazir Bilal Yavuz
Microsoft
From 843d0c132c0f95dfee50eaaf667a92fffe400eeb Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Mon, 11 Sep 2023 13:29:33 +0300
Subject: [PATCH v2 2/2] Skip CI if changes are only in docs or in the READMEs

When changes are only in docs or in the READMEs, skip all the tasks.

This skip is overriden in 'SanityCheck' and 'Build the Docs' tasks. So,
these two tasks might have not been skipped.
---
 .cirrus.tasks.yml | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index d7c45224af9..ec6bcd5af70 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -3,6 +3,11 @@
 # For instructions on how to enable the CI integration in a repository and
 # further details, see src/tools/ci/README
 
+# When changes are only in docs or in the READMEs, skip all the tasks.
+#
+# This skip is overriden in 'SanityCheck' and 'Build the Docs' tasks. So,
+# these two tasks might have not been skipped.
+skip: changesIncludeOnly('doc/**', '**/README')
 
 env:
   # The lower depth accelerates git clone. Use a bit of depth so that
@@ -54,12 +59,12 @@ on_failure_meson: &on_failure_meson
 # broken commits, have a minimal task that all others depend on.
 task:
   name: SanityCheck
-
-  # If a specific OS is requested, don't run the sanity check. This shortens
+  # If a specific OS is requested or if there are changes only in the docs
+  # or in the READMEs, don't run the sanity check. This shortens
   # push-wait-for-ci cycle time a bit when debugging operating system specific
   # failures. Uses skip instead of only_if, as cirrus otherwise warns about
   # only_if conditions not matching.
-  skip: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*'
+  skip: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*' || changesIncludeOnly('doc/**', '**/README')
 
   env:
 CPUS: 4
-- 
2.40.1

From 31fcd0c5b588e9c4ef968e7a92489cfb86740228 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Thu, 7 Sep 2023 17:58:06 +0300
Subject: [PATCH v2 1/2] Only built the docs if there are changes are in the
 docs

Building the docs triggered although there are no changes in the docs.
So, the new 'Building the Docs' task is created. This task only
run if a specific OS is not requested and if there are changes in docs
or in the CI files.
---
 .cirrus.tasks.yml | 77 ++-
 1 file changed, 62 insertions(+), 15 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index e137769850d..d7c45224af9 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -740,21 +740,6 @@ task:
   make -s -j${BUILD_JOBS} clean
   time make -s -j${BUILD_JOBS} world-bin
 
-  ###
-  # Verify docs can be built
-  ###
-  # XXX: Only do this if there have been changes in doc/ since last build
-  always:
-docs_build_script: |
-  time ./configure \
---cache gcc.cache \
-CC="ccache gcc" \
-CXX="ccache g++" \
-CLANG="ccache clang"
-  make -s -j${BUILD_JOBS} clean
-  time make -s -j${BUILD_JOBS} -C doc
-
-  ###
   # Verify headerscheck / cpluspluscheck succeed
   #
   # - Don't use ccache, the files are uncacheable, polluting ccache's
@@ -777,3 +762,65 @@ task:
 
   always:
 upload_caches: ccache
+
+
+task:
+  name: Build the Docs
+  depends_on: SanityCheck
+  # Only run if a specific OS is not requested and if there are changes in docs
+  # or in the CI files.
+  skip: >
+$CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*' ||
+!changesInclude('doc/**',
+'.cirrus.yml',
+'.cirrus.tasks.yml',
+'src/backend/catalog/sql_feature_packages.txt',
+'src/backend/catalog/sql_features.txt',
+'src/backend/utils/errcodes.txt',
+'src/backend/utils/activity/wait_event_names.txt',
+'src/backend/utils/activity/generate-wait_event_types.pl',
+'src/include/parser/kwlist.h')
+
+  env:
+CPUS: 4
+BUILD_JOBS: 4
+IMAGE_FAMILY: pg-ci-bullseye
+ 

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-25 Thread Amit Kapila
On Mon, Sep 25, 2023 at 1:23 PM Bharath Rupireddy
 wrote:
>
> On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar  wrote:
> >
> > > > Is there anything else that stops this patch from supporting migration
> > > > of logical replication slots from PG versions < 17?
> > >
> > > IMHO one of the main change we are doing in PG 17 is that on shutdown
> > > checkpoint we are ensuring that if the confirmed flush lsn is updated
> > > since the last checkpoint and that is not yet synched to the disk then
> > > we are doing so.  I think this is the most important change otherwise
> > > many slots for which we have already streamed all the WAL might give
> > > an error assuming that there are pending WAL from the slots which are
> > > not yet confirmed.
> > >
> >
> > You might need to refer to [1] for the change I am talking about
> >
> > [1] 
> > https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com
>
> I see. IIUC, without that commit e0b2eed [1], it may happen that the
> slot's on-disk confirmed_flush LSN value can be higher than the WAL
> LSN that's flushed to disk, no?
>

No, without that commit, there is a very high possibility that even if
we have sent the WAL to the subscriber and got the acknowledgment of
the same, we would miss updating it before shutdown. This would lead
to upgrade failures because upgrades have no way to later identify
whether the remaining WAL records are sent to the subscriber.

> If so, can't it be detected if the WAL
> at confirmed_flush LSN is valid or not when reading WAL with
> xlogreader machinery?
>
> What if the commit e0b2eed [1] is treated to be fixing a bug with the
> reasoning [2] and backpatch? When done so, it's easy to support
> upgradation/migration of logical replication slots from PG versions <
> 17, no?
>

Yeah, we could try to make a case to backpatch it but when I raised
that point there was not much consensus on backpatching it. We are
aware and understand that if we could backpatch it then the prior
version slots be upgraded but the case to backpatch needs broader
consensus. For now, the idea is to get the core of the functionality
to be committed and then we can see if we get the consensus on
backpatching the commit you mentioned and probably changing the
version checks in this work.

-- 
With Regards,
Amit Kapila.




Re: Avoid a possible out-of-bounds access (src/backend/optimizer/util/relnode.c)

2023-09-25 Thread Ashutosh Bapat
On Sat, Sep 23, 2023 at 7:29 PM Ranier Vilela  wrote:
>
> Hi,
>
> Per Coverity.
> CID 1518088 (#2 of 2): Improper use of negative value (NEGATIVE_RETURNS)
>
> The function bms_singleton_member can returns a negative number.
>
> /*
> * Get a child rel for rel2 with the relids.  See above comments.
> */
> if (rel2_is_simple)
> {
> int varno = bms_singleton_member(child_relids2);
>
> child_rel2 = find_base_rel(root, varno);
> }
>
> It turns out that in the get_matching_part_pairs function (joinrels.c), the 
> return of bms_singleton_member is passed to the find_base_rel function, which 
> cannot receive a negative value.
>
> find_base_rel is protected by an Assertion, which effectively indicates that 
> the error does not occur in tests and in DEBUG mode.
>
> But this does not change the fact that bms_singleton_member can return a 
> negative value, which may occur on some production servers.
>
> Fix by changing the Assertion into a real test, to protect the 
> simple_rel_array array.

Do you have a scenario where bms_singleton_member() actually returns a
-ve number OR it's just a possibility. bms_make_singleton() has an
assertion at the end: Assert(result >= 0);
bms_make_singleton(), bms_add_member() explicitly forbids negative
values. It looks like we have covered all the places which can add a
negative value to a bitmapset. May be we are missing a place or two.
It will be good to investigate it.

What you are suggesting may be ok, as is as well.

-- 
Best Wishes,
Ashutosh Bapat




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-25 Thread Hayato Kuroda (Fujitsu)
Dear Bharath,

Thank you for giving comments! Before addressing your comments,
I wanted to reply some of them.

> 4.
> +/*
> + * There is a possibility that following records may be generated
> + * during the upgrade.
> + */
> +is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
> XLOG_CHECKPOINT_SHUTDOWN) ||
> +is_xlog_record_type(rmid, info, RM_XLOG_ID,
> XLOG_CHECKPOINT_ONLINE) ||
> +is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
> +is_xlog_record_type(rmid, info, RM_XLOG_ID,
> XLOG_FPI_FOR_HINT) ||
> +is_xlog_record_type(rmid, info, RM_XLOG_ID,
> XLOG_PARAMETER_CHANGE) ||
> +is_xlog_record_type(rmid, info, RM_STANDBY_ID,
> XLOG_RUNNING_XACTS) ||
> +is_xlog_record_type(rmid, info, RM_HEAP2_ID,
> XLOG_HEAP2_PRUNE);
> 
> What if we missed to capture the WAL records that may be generated
> during upgrade?

If such records are generated before calling 
binary_upgrade_validate_wal_logical_end(),
the upgrading would fail. Otherwise it would be succeeded. Anyway, we don't care
such records because those aren't required to be replicated. The main thing we
want to detect is that we don't miss any record generated before server 
shutdown.

> 
> What happens if a custom WAL resource manager generates table/index AM
> WAL records during upgrade?

If such records are found, definitely we cannot distinguish whether it is 
acceptable.
We do not have a way to know the property of custom WALs. We didn't care as 
there
are other problems in the approach, if such a facility is invoked.
Please see the similar discussion [1].

> 
> What happens if new WAL records are added that may be generated during
> the upgrade? Isn't keeping this code extensible and in sync with
> future changes a problem? 

Actually, others also pointed out the similar point. Originally we just checked
confirmed_flush_lsn and "latest checkpoint lsn" reported by pg_controldata, but
found an issue what the upgrading cannot be passed if users do pg_upgrade 
--check
just before the actual upgrade. Then we discussed some idea but they have some
disadvantages, so we settled on the current idea. Here is a summary which
describes current situation it would be quite helpful [2]
(maybe you have already known).

> Or we'd better say that any custom WAL
> records are found after the slot's confirmed flush LSN, then the slot
> isn't upgraded?

After concluding how we ensure, we can add the sentence accordingly.


> 
> 5. In continuation to the above comment:
> 
> Why can't this logic be something like - if there's any WAL record
> seen after a slot's confirmed flush LSN is of type generated by WAL
> resource manager having the rm_decode function defined, then the slot
> can't be upgraded.

Thank you for giving new approach! We have never seen the approach before,
but at least XLOG and HEAP2 rmgr have a decode function. So that
XLOG_CHECKPOINT_SHUTDOWN, XLOG_CHECKPOINT_ONLINE, and XLOG_HEAP2_PRUNE cannot
be ignored the approach, seems not appropriate.
If you have another approach, I'm very happy if you post.

[1]: https://www.postgresql.org/message-id/ZNZ4AxUMIrnMgRbo%40momjian.us
[2]: 
https://www.postgresql.org/message-id/CAA4eK1JVKZGRHLOEotWi%2Be%2B09jucNedqpkkc-Do4dh5FTAU%2B5w%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

2023-09-25 Thread jian he
/*
 * AlterOperator
 * routine implementing ALTER OPERATOR  SET (option = ...).
 *
 * Currently, only RESTRICT and JOIN estimator functions can be changed.
 */
ObjectAddress
AlterOperator(AlterOperatorStmt *stmt)

The above comment needs to change, other than that, it passed the
test, works as expected.




Re: Make --help output fit within 80 columns per line

2023-09-25 Thread torikoshia

On 2023-09-25 15:27, torikoshia wrote:
On Tue, Sep 19, 2023 at 3:23 AM Greg Sabino Mullane 
 wrote:

Thanks for your investigation!

On Fri, Sep 15, 2023 at 11:11 AM torikoshia 
 wrote:
I do not intend to adhere to this rule(my terminals are usually 
bigger
than 80 chars per line), but wouldn't it be a not bad direction to 
use

80 characters for all commands?


Well, that's the question du jour, isn't it? The 80 character limit is 
based on punch cards, and really has no place in modern systems. While 
gnu systems are stuck in the past, many other ones have moved on to 
more sensible defaults:


$ wget --help | wc -L
110

$ gcloud --help | wc -L
122

$ yum --help | wc -L
122

git is an interesting one, as they force things through a pager for 
their help, but if you look at their raw help text files, they have 
plenty of times they go past 80 when needed:


$ wc -L git/Documentation/git-*.txt | sort -g | tail -20
109 git-filter-branch.txt
109 git-rebase.txt
116 git-diff-index.txt
116 git-http-fetch.txt
117 git-restore.txt
122 git-checkout.txt
122 git-ls-tree.txt
129 git-init-db.txt
131 git-push.txt
132 git-update-ref.txt
142 git-maintenance.txt
144 git-interpret-trailers.txt
146 git-cat-file.txt
148 git-repack.txt
161 git-config.txt
162 git-notes.txt
205 git-stash.txt
251 git-submodule.txt

So in summary, I think 80 is a decent soft limit, but let's not stress 
out about some lines going over that, and make a hard limit of perhaps 
120.


+1. It may be a good compromise.
For enforcing the hard limit, is it better to add a regression test
like patch 0001?


Ugh, regression tests failed and it appears to be due to reasons related 
to meson.

I'm going to investigate it.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Index range search optimization

2023-09-25 Thread Pavel Borisov
Sorry, I've mistaken with attached version previously. Correct v5 attached.

On Mon, 25 Sept 2023 at 13:58, Pavel Borisov  wrote:
>
> Hi, Alexander!
>
> I found and fixed a couple of naming issues that came to v4 from
> earlier patches.
> Also, I added initialization of requiredMatchedByPrecheck in case of first 
> page.
>
> Please see patch v5.
>
> One more doubt about naming. Calling function
> _bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
> ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck)
> as
> (void) _bt_checkkeys(scan, itup, indnatts, dir,
> &requiredMatchedByPrecheck, false);
> looks little bit misleading because of coincidence of names of 5 and 6
> arguments.


v5-0001-PATCH-Skip-checking-of-scan-keys-required-for-dir.patch
Description: Binary data


Fix a wrong comment in setrefs.c

2023-09-25 Thread Richard Guo
I noticed a wrong comment in search_indexed_tlist_for_sortgroupref().

foreach(lc, itlist->tlist)
{
TargetEntry *tle = (TargetEntry *) lfirst(lc);

/* The equal() check should be redundant, but let's be paranoid */
if (tle->ressortgroupref == sortgroupref &&
equal(node, tle->expr))
{

It turns out that the equal() check is necessary, because the given
sort/group expression might be type of FuncExpr which converts integer
to numeric.  In this case we need to modify its args not itself to
reference the matching subplan output expression.  As an example,
consider

explain (costs off, verbose)
SELECT 1.1 AS two UNION (SELECT 2 UNION ALL SELECT 2);
 QUERY PLAN
-
 HashAggregate
   Output: (1.1)
   Group Key: (1.1)
   ->  Append
 ->  Result
   Output: 1.1
 ->  Result
   Output: (2)
   ->  Append
 ->  Result
   Output: 2
 ->  Result
   Output: 2
(13 rows)

If we remove the equal() check, this query would cause crash in
execution.

I'm considering changing the comment as below.

-  /* The equal() check should be redundant, but let's be paranoid */
+  /*
+   * The equal() check is necessary, because expressions with the same
+   * sortgroupref might be different, e.g., the given sort/group
+   * expression can be type of FuncExpr which converts integer to
+   * numeric, and we need to modify its args not itself to reference the
+   * matching subplan output expression in this case.
+   */

Any thoughts?

Thanks
Richard


v1-0001-Fix-a-wrong-comment-in-setrefs.c.patch
Description: Binary data


Re: Is this a problem in GenericXLogFinish()?

2023-09-25 Thread Heikki Linnakangas

On 22/09/2023 23:52, Jeff Davis wrote:


src/backend/transam/README says:

   ...
   4. Mark the shared buffer(s) as dirty with MarkBufferDirty().  (This
   must happen before the WAL record is inserted; see notes in
   SyncOneBuffer().)
   ...

But GenericXLogFinish() does this:

   ...
   /* Insert xlog record */
   lsn = XLogInsert(RM_GENERIC_ID, 0);

   /* Set LSN and mark buffers dirty */
   for (i = 0; i < MAX_GENERIC_XLOG_PAGES; i++)
   {
   PageData   *pageData = &state->pages[i];

   if (BufferIsInvalid(pageData->buffer))
   continue;
   PageSetLSN(BufferGetPage(pageData->buffer), lsn);
   MarkBufferDirty(pageData->buffer);
   }
   END_CRIT_SECTION();

Am I missing something or is that a problem?


Yes, that's a problem.

I wish we had an assertion for that. XLogInsert() could assert that the 
page is already marked dirty, for example.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Index range search optimization

2023-09-25 Thread Pavel Borisov
Hi, Alexander!

I found and fixed a couple of naming issues that came to v4 from
earlier patches.
Also, I added initialization of requiredMatchedByPrecheck in case of first page.

Please see patch v5.

One more doubt about naming. Calling function
_bt_checkkeys(IndexScanDesc scan, IndexTuple tuple, int tupnatts,
ScanDirection dir, bool *continuescan, bool requiredMatchedByPrecheck)
as
(void) _bt_checkkeys(scan, itup, indnatts, dir,
&requiredMatchedByPrecheck, false);
looks little bit misleading because of coincidence of names of 5 and 6
arguments.


0001-Skip-checking-of-scan-keys-required-for-direction-v4.patch
Description: Binary data


Re: RFC: Logging plan of the running query

2023-09-25 Thread Andrey Lepikhov

On 25/9/2023 14:21, torikoshia wrote:

On 2023-09-20 14:39, Lepikhov Andrei wrote:
Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on 
all CFI using 
v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then 
ran the following query but did not cause any problems.


```
=# CREATE TABLE test();
=# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
BEGIN
   EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
   PERFORM pg_sleep(5);
END; $$ LANGUAGE plpgsql VOLATILE;
=# SELECT ddl();
```

Is this the case you're worrying about?


I didn't find a problem either. I just feel uncomfortable if, at the 
moment of interruption, we have a descriptor of another query than the 
query have been executing and holding resources.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: POC: GUC option for skipping shared buffers in core dumps

2023-09-25 Thread Andrey Lepikhov

Hi,

The current approach could be better because we want to use it on 
Windows/MacOS and other systems. So, I've tried to develop another 
strategy - detaching shmem and DSM blocks before executing the abort() 
routine.

As I can see, it helps and reduces the size of the core file.

--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index 5810f8825e..4d7bf2c0e4 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -325,7 +325,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
{
SetProcessingMode(NormalProcessing);
CheckerModeMain();
-   abort();
+   pg_abort();
}
 
/*
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index ed11e8be7f..34ac874ad0 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -197,7 +197,7 @@ main(int argc, char *argv[])
else
PostmasterMain(argc, argv);
/* the functions above should not return */
-   abort();
+   pg_abort();
 }
 
 
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index 54e9bfb8c4..fc32a6bb1b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1469,7 +1469,7 @@ PostmasterMain(int argc, char *argv[])
 */
ExitPostmaster(status != STATUS_OK);
 
-   abort();/* not reached */
+   pg_abort(); /* not reached */
 }
 
 
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index e250b0567e..3123b388ab 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -385,7 +385,7 @@ WalSndShutdown(void)
whereToSendOutput = DestNone;
 
proc_exit(0);
-   abort();/* keep the compiler 
quiet */
+   pg_abort(); /* keep the compiler 
quiet */
 }
 
 /*
diff --git a/src/backend/utils/error/assert.c b/src/backend/utils/error/assert.c
index 719dd7b309..f422d42440 100644
--- a/src/backend/utils/error/assert.c
+++ b/src/backend/utils/error/assert.c
@@ -19,6 +19,32 @@
 #include 
 #endif
 
+#include 
+#include 
+
+int core_dump_no_shared_buffers = COREDUMP_INCLUDE_ALL;
+
+/*
+ * Remember, at the same time someone can work with shared memory, write them 
to
+ * disk and so on.
+ */
+void
+pg_abort(void)
+{
+   if (core_dump_no_shared_buffers != COREDUMP_INCLUDE_ALL)
+   {
+   if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL ||
+   core_dump_no_shared_buffers == COREDUMP_EXCLUDE_DSM)
+   dsm_detach_all();
+
+   if (core_dump_no_shared_buffers == COREDUMP_EXCLUDE_ALL ||
+   core_dump_no_shared_buffers == COREDUMP_EXCLUDE_SHMEM)
+   PGSharedMemoryDetach();
+   }
+
+   abort();
+}
+
 /*
  * ExceptionalCondition - Handles the failure of an Assert()
  *
@@ -63,5 +89,5 @@ ExceptionalCondition(const char *conditionName,
sleep(100);
 #endif
 
-   abort();
+   pg_abort();
 }
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8e1f3e8521..f6c863ca68 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -601,7 +601,7 @@ errfinish(const char *filename, int lineno, const char 
*funcname)
 * children...
 */
fflush(NULL);
-   abort();
+   pg_abort();
}
 
/*
diff --git a/src/backend/utils/misc/guc_tables.c 
b/src/backend/utils/misc/guc_tables.c
index bdb26e2b77..95e205e8d1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -427,6 +427,14 @@ static const struct config_enum_entry 
debug_logical_replication_streaming_option
{NULL, 0, false}
 };
 
+static const struct config_enum_entry core_dump_no_shared_buffers_options[] = {
+   {"none", COREDUMP_INCLUDE_ALL, false},
+   {"shmem", COREDUMP_EXCLUDE_SHMEM, false},
+   {"dsm", COREDUMP_EXCLUDE_DSM, false},
+   {"all", COREDUMP_EXCLUDE_ALL, false},
+   {NULL, 0, false}
+};
+
 StaticAssertDecl(lengthof(ssl_protocol_versions_info) == (PG_TLS1_3_VERSION + 
2),
 "array length mismatch");
 
@@ -4971,6 +4979,17 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
 
+   {
+   {"core_dump_no_shared_buffers", PGC_POSTMASTER, 
DEVELOPER_OPTIONS,
+   NULL,
+   NULL,
+   GUC_NOT_IN_SAMPLE
+   },
+   &core_dump_no_shared_buffers,
+   COREDUMP_INCLUDE_ALL, core_dump_no_shared_buffers_options,
+   NULL, NULL

Re: On login trigger: take three

2023-09-25 Thread Alexander Korotkov
Hi!

On Wed, Jun 14, 2023 at 10:49 PM Mikhail Gribkov  wrote:
> The attached v40 patch is a fresh rebase on master branch to actualize the 
> state before the upcoming commitfest.
> Nothing has changed beyond rebasing.
>
> And just for convenience, here is a link to the exact message of the thread
> describing the current approach:
> https://www.postgresql.org/message-id/CAMEv5_vg4aJOoUC74XJm%2B5B7%2BTF1nT-Yhtg%2BawtBOESXG5Grfg%40mail.gmail.com

Thank you for the update.  I think the patch is interesting and
demanding.  The code, docs and tests seem to be quite polished
already.  Simultaneously, the thread is long and spread over a long
time.  So, it's easy to lose track.  I'd like to do a short summary of
design issues on this thread.

1. Initially the patch introduced "on login" triggers. It was unclear
if they need to rerun on RESET ALL or DISCARD ALL [1]. The new name is
"client connection" trigger, which seems fine [2].

2. Another question is how to deal with triggers, which hangs or fails
with error [1]. One possible way to workaround that is single-user
mode, which is already advised to workaround the errors in other event
triggers. However, in perspective single-user mode might be deleted.
Also, single-user mode is considered as a worst-case scenario recovery
tool, while it's very easy to block the database connections with
client connection triggers. As addition/alternative to single-user
mode, GUC options to disable all event triggers and/or client
connection triggers. Finally, the patch for the GUC option to disable
all event triggers resides in a separate thread [4]. Apparently that
patch should be committed first [5].

3. Yet another question is connection-time overhead introduced by this
patch. The overhead estimate varies from no measurable overhead [6] to
5% overhead [7]. In order to overcome that, [8] has introduced a
database-level flag indicating whether there are connection triggers.
Later this flag was removed [9] in a hope that the possible overhead
is acceptable.

4. [10] points that there is no clean way to store information about
unsuccessful connections (declined by either authentication or
trigger). However, this is considered out-of-scope for the current
patch, and could be implemented later if needed.

5. It was also pointed out [11] that ^C in psql doesn't cancel
long-running client connection triggers. That might be considered a
psql problem though.

6. It has been also pointed out that [12] all triggers, which write
data to the database, must check pg_is_in_recovery() to work correctly
on standby. That seems to be currently reflected in the documentation.

So, for me the open issues seem to be 2, 3 and 5.  My plan to revive
this patch is to commit the GUC patch [4], recheck the overhead and
probably leave "^C in psql" problem as a separate standalone issue.
Any thoughts?

Links.

1. 
https://www.postgresql.org/message-id/CAFj8pRBdqdqvkU3mVKzoOnO+jPz-6manRV47CDEa+1jD6x6LFg%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAFj8pRCxdQgHy8Mynk3hz6pFsqQ9BN6Vfgy0MJLtQBAUhWDf3w%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/E0D5DC61-C490-45BD-A984-E8D56493EC4F%40yesql.se
4. 
https://www.postgresql.org/message-id/9140106e-f9bf-4d85-8fc8-f2d3c094a...@yesql.se
5. 
https://www.postgresql.org/message-id/20230306221010.gszjoakt5jp7oqpd%40awork3.anarazel.de
6. 
https://www.postgresql.org/message-id/90760f2d-2f9c-12ab-d2c5-e8e6fb7d08de%40postgrespro.ru
7. 
https://www.postgresql.org/message-id/CAFj8pRChwu01VLx76nKBVyScHCsd1YnBGiKfDJ6h17g4CSnUBg%40mail.gmail.com
8. 
https://www.postgresql.org/message-id/4471d472-5dfc-f2b0-ad05-0ff8d0a3bb0c%40postgrespro.ru
9. 
https://www.postgresql.org/message-id/CAMEv5_vg4aJOoUC74XJm%2B5B7%2BTF1nT-Yhtg%2BawtBOESXG5Grfg%40mail.gmail.com
10. 
https://www.postgresql.org/message-id/9c897136-4755-dcfc-2d24-b12bcfe4467f%40sigaev.ru
11. 
https://www.postgresql.org/message-id/CA%2BTgmoZv9f1s797tihx-zXQN4AE4ZFBV5C0K%3DzngbgNu3xNNkg%40mail.gmail.com
12. 
https://www.postgresql.org/message-id/20220312024652.lvgehszwke4hhove%40alap3.anarazel.de

--
Regards,
Alexander Korotkov




Re: Regression in COPY FROM caused by 9f8377f7a2

2023-09-25 Thread Laurenz Albe
On Mon, 2023-09-25 at 09:54 +0200, Laurenz Albe wrote:
> In v16 and later, the following fails:
> 
> CREATE TABLE boom (t character varying(5) DEFAULT 'a long string');
> 
> COPY boom FROM STDIN;
> ERROR:  value too long for type character varying(5)
> 
> In PostgreSQL v15 and earlier, the COPY statement succeeds.
> 
> The error is thrown in BeginCopyFrom in line 1578 (HEAD)
> 
>   defexpr = expression_planner(defexpr);
> 
> Bisecting shows that the regression was introduced by commit 9f8377f7a2,
> which introduced DEFAULT values for COPY FROM.

I suggest the attached fix, which evaluates default values only if
the DEFAULT option was specified or if the column does not appear in
the column list of COPY.

Yours,
Laurenz Albe
From 4af982c56df57a1a0f04340d394c297559fbabb5 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 25 Sep 2023 10:56:15 +0200
Subject: [PATCH] Evaluate defaults in COPY FROM only if necessary

Since commit 9f8377f7a2, we evaluate the column default values in
COPY FROM for all columns except generated ones, because they could
be needed if the input value matches the DEFAULT option.

This can cause a surprising regression:

  CREATE TABLE boom (t character varying(5) DEFAULT 'a long string');
  COPY boom FROM STDIN;
  ERROR:  value too long for type character varying(5)

This worked before 9f8377f7a2, since default values were only
evaluated for columns that were not specified in the column list.

To fix, fetch the default values only if the DEFAULT option was
specified or for columns not specified in the column list.
---
 src/backend/commands/copyfrom.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 70871ed819..320b764aa9 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1567,7 +1567,14 @@ BeginCopyFrom(ParseState *pstate,
 		/* Get default info if available */
 		defexprs[attnum - 1] = NULL;
 
-		if (!att->attgenerated)
+		/*
+		 * We need the default values only for columns that do not appear in the
+		 * column list or if the DEFAULT option was given.  We also don't need
+		 * it for generated columns.
+		 */
+		if ((!list_member_int(cstate->attnumlist, attnum) ||
+			 cstate->opts.default_print != NULL) &&
+			!att->attgenerated)
 		{
 			Expr	   *defexpr = (Expr *) build_column_default(cstate->rel,
 attnum);
-- 
2.41.0



Re: Failures on gombessa -- EIO?

2023-09-25 Thread Nikola Ivanov
Hi Thomas,

I will check it today.

Regards
Nikola

On Sat, 23 Sept 2023 at 04:54, Thomas Munro  wrote:

> I am not sure why REL_16_STABLE fails consistently as of ~4 days ago.
> It seems like bad storage or something?  Just now it happened also on
> HEAD.  I wonder why it would be sensitive to the branch.
>


Re: Synchronizing slots from primary to standby

2023-09-25 Thread Drouvot, Bertrand

Hi,

On 9/23/23 3:38 AM, Amit Kapila wrote:

On Fri, Sep 22, 2023 at 6:01 PM Drouvot, Bertrand
 wrote:


Thanks for all the work that has been done on this feature, and sorry
to have been quiet on it for so long.

On 9/18/23 12:22 PM, shveta malik wrote:

On Wed, Sep 13, 2023 at 4:48 PM Hayato Kuroda (Fujitsu)
 wrote:

Right, but I wanted to know why it is needed. One motivation seemed to know the
WAL location of physical standby, but I thought that struct WalSnd.apply could
be also used. Is it bad to assume that the physical walsender always exists?



We do not plan to target this case where physical slot is not created
between primary and physical-standby in the first draft.  In such a
case, slot-synchronization will be skipped for the time being. We can
extend this functionality (if needed) later.



I do think it's needed to extend this functionality. Having physical slot
created sounds like a (too?) strong requirement as:

- It has not been a requirement for Logical decoding on standby so that could 
sounds weird
to require it for sync slot (while it's not allowed to logical decode from sync 
slots)



There is a difference here that we also need to prevent removal of
rows required by sync_slots. That could be achieved by physical slot
(and hot_standby_feedback). So, having a requirement to have physical
slot doesn't sound too unreasonable to me. Otherwise, we need to
invent some new mechanism of having some sort of placeholder slot to
avoid removal of required rows. 


Thinking about it, I wonder if removal of required rows is even possible
given that:

- we don't allow to logical decode from a sync slot
- sync slot catalog_xmin <= its primary counter part catalog_xmin
- its primary counter part prevents rows removal thanks to its own catalog_xmin
- a sync slot is removed as soon as its primary counter part is removed

In that case I'm not sure how rows removal on the primary could lead to remove 
rows
required by a sync slot. Am I missing something? Do you have a scenario in mind?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-25 Thread Dilip Kumar
On Mon, Sep 25, 2023 at 1:23 PM Bharath Rupireddy
 wrote:
>
> On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar  wrote:
> >
> > > > Is there anything else that stops this patch from supporting migration
> > > > of logical replication slots from PG versions < 17?
> > >
> > > IMHO one of the main change we are doing in PG 17 is that on shutdown
> > > checkpoint we are ensuring that if the confirmed flush lsn is updated
> > > since the last checkpoint and that is not yet synched to the disk then
> > > we are doing so.  I think this is the most important change otherwise
> > > many slots for which we have already streamed all the WAL might give
> > > an error assuming that there are pending WAL from the slots which are
> > > not yet confirmed.
> > >
> >
> > You might need to refer to [1] for the change I am talking about
> >
> > [1] 
> > https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com
>
> I see. IIUC, without that commit e0b2eed [1], it may happen that the
> slot's on-disk confirmed_flush LSN value can be higher than the WAL
> LSN that's flushed to disk, no? If so, can't it be detected if the WAL
> at confirmed_flush LSN is valid or not when reading WAL with
> xlogreader machinery?

Actually, without this commit the slot's "confirmed_flush LSN" value
in memory can be higher than the disk because if you notice this
function LogicalConfirmReceivedLocation(), if we change only the
confirmed flush the slot is not marked dirty that means on shutdown
the slot will not be persisted to the disk.  But logically this will
not cause any issue so we can not treat it as a bug it may cause us to
process some extra records after the restart but that is not really a
bug.

> What if the commit e0b2eed [1] is treated to be fixing a bug with the
> reasoning [2] and backpatch? When done so, it's easy to support
> upgradation/migration of logical replication slots from PG versions <
> 17, no?

Maybe this could be backpatched in order to support this upgrade from
the older version but not as a bug fix.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: pipe_read_line for reading arbitrary strings

2023-09-25 Thread Daniel Gustafsson
> On 4 Jul 2023, at 14:50, Daniel Gustafsson  wrote:
> 
>> On 4 Jul 2023, at 13:59, Heikki Linnakangas  wrote:
>> On 08/03/2023 00:05, Daniel Gustafsson wrote:
> 
>>> If we are going to continue using this for reading $stuff from pipes, maybe 
>>> we
>>> should think about presenting a nicer API which removes that risk?  
>>> Returning
>>> an allocated buffer which contains all the output along the lines of the 
>>> recent
>>> pg_get_line work seems a lot nicer and safer IMO.
>> 
>> +1
> 
> Thanks for review!
> 
>>> /*
>>> * Execute a command in a pipe and read the first line from it. The returned
>>> * string is allocated, the caller is responsible for freeing.
>>> */
>>> char *
>>> pipe_read_line(char *cmd)
>> 
>> I think it's worth being explicit here that it's palloc'd, or malloc'd in 
>> frontend programs, rather than just "allocated". Like in pg_get_line.
> 
> Good point, I'll make that happen before committing this.

Fixed, along with commit message wordsmithing in the attached.  Unless objected
to I'll go ahead with this version.

--
Daniel Gustafsson



v3-0001-Refactor-pipe_read_line-to-return-the-full-line.patch
Description: Binary data


Regression in COPY FROM caused by 9f8377f7a2

2023-09-25 Thread Laurenz Albe
In v16 and later, the following fails:

CREATE TABLE boom (t character varying(5) DEFAULT 'a long string');

COPY boom FROM STDIN;
ERROR:  value too long for type character varying(5)

In PostgreSQL v15 and earlier, the COPY statement succeeds.

The error is thrown in BeginCopyFrom in line 1578 (HEAD)

  defexpr = expression_planner(defexpr);

Bisecting shows that the regression was introduced by commit 9f8377f7a2,
which introduced DEFAULT values for COPY FROM.

The table definition is clearly silly, so I am not sure if that
regression is worth fixing.  On the other hand, it is not cool if
something that worked without an error in v15 starts to fail later on.

Yours,
Laurenz Albe




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-25 Thread Bharath Rupireddy
On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar  wrote:
>
> > > Is there anything else that stops this patch from supporting migration
> > > of logical replication slots from PG versions < 17?
> >
> > IMHO one of the main change we are doing in PG 17 is that on shutdown
> > checkpoint we are ensuring that if the confirmed flush lsn is updated
> > since the last checkpoint and that is not yet synched to the disk then
> > we are doing so.  I think this is the most important change otherwise
> > many slots for which we have already streamed all the WAL might give
> > an error assuming that there are pending WAL from the slots which are
> > not yet confirmed.
> >
>
> You might need to refer to [1] for the change I am talking about
>
> [1] 
> https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com

I see. IIUC, without that commit e0b2eed [1], it may happen that the
slot's on-disk confirmed_flush LSN value can be higher than the WAL
LSN that's flushed to disk, no? If so, can't it be detected if the WAL
at confirmed_flush LSN is valid or not when reading WAL with
xlogreader machinery?

What if the commit e0b2eed [1] is treated to be fixing a bug with the
reasoning [2] and backpatch? When done so, it's easy to support
upgradation/migration of logical replication slots from PG versions <
17, no?

[1]
commit e0b2eed047df9045664da6f724cb42c10f8b12f0
Author: Amit Kapila 
Date:   Thu Sep 14 08:56:13 2023 +0530

Flush logical slots to disk during a shutdown checkpoint if required.

[2]
It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart.  Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives.  As we don't flush the latest value of confirm_flush LSN, it
may lead to processing the same changes again without this patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: GUC for temporarily disabling event triggers

2023-09-25 Thread Daniel Gustafsson
> On 25 Sep 2023, at 09:50, Michael Paquier  wrote:
> 
> On Mon, Sep 25, 2023 at 09:19:35AM +0200, Daniel Gustafsson wrote:
>> Fair enough, although I used "fire" instead of "run" which is consistent with
>> the event trigger documentation.
> 
> Okay by me.

Great, I'll go ahead and apply this version then. Thanks for review!

--
Daniel Gustafsson





Re: GUC for temporarily disabling event triggers

2023-09-25 Thread Michael Paquier
On Mon, Sep 25, 2023 at 09:19:35AM +0200, Daniel Gustafsson wrote:
> Fair enough, although I used "fire" instead of "run" which is consistent with
> the event trigger documentation.

Okay by me.
--
Michael


signature.asc
Description: PGP signature


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-25 Thread Bharath Rupireddy
On Sat, Sep 23, 2023 at 10:18 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Again, thank you for reviewing! Here is a new version patch.

Here are some more comments/thoughts on the v44 patch:

1.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+[

Add a test case to hit fprintf(script, "The slot \"%s\" is invalid\n",
file as well?

2.
+'run of pg_upgrade where the new cluster has insufficient
max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+"pg_upgrade_output.d/ not removed after pg_upgrade failure");

+'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+"pg_upgrade_output.d/ not removed after pg_upgrade failure");

+'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+"pg_upgrade_output.d/ not removed after pg_upgrade failure");

How do these tests recognize the failures are the intended ones? I
mean, for instance when pg_upgrade fails for unused replication
slots/unconsumed WAL records, then just looking at the presence of
pg_upgrade_output.d might not be sufficient, no? Using
command_fails_like instead of command_fails and looking at the
contents of invalid_logical_relication_slots.txt might help make these
tests more focused.

3.
+pg_log(PG_REPORT, "fatal");
+pg_fatal("Your installation contains invalid logical
replication slots.\n"
+ "These slots can't be copied, so this cluster cannot
be upgraded.\n"
+ "Consider removing such slots or consuming the
pending WAL if any,\n"
+ "and then restart the upgrade.\n"
+ "A list of invalid logical replication slots is in
the file:\n"
+ "%s", output_path);

It's not just the invalid logical replication slots, but also the
slots with unconsumed WALs which aren't invalid and can be upgraded if
ensured the WAL is consumed. So, a better wording would be:
pg_fatal("Your installation contains logical replication slots
that cannot be upgraded.\n"
 "List of all such logical replication slots is in the file:\n"
 "These slots can't be copied, so this cluster cannot
be upgraded.\n"
 "Consider removing invalid slots and/or consuming the
pending WAL if any,\n"
 "and then restart the upgrade.\n"
 "%s", output_path);

4.
+/*
+ * There is a possibility that following records may be generated
+ * during the upgrade.
+ */
+is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_SHUTDOWN) ||
+is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_ONLINE) ||
+is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
+is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) ||
+is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_PARAMETER_CHANGE) ||
+is_xlog_record_type(rmid, info, RM_STANDBY_ID,
XLOG_RUNNING_XACTS) ||
+is_xlog_record_type(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE);

What if we missed to capture the WAL records that may be generated
during upgrade?

What happens if a custom WAL resource manager generates table/index AM
WAL records during upgrade?

What happens if new WAL records are added that may be generated during
the upgrade? Isn't keeping this code extensible and in sync with
future changes a problem? Or we'd better say that any custom WAL
records are found after the slot's confirmed flush LSN, then the slot
isn't upgraded?

5. In continuation to the above comment:

Why can't this logic be something like - if there's any WAL record
seen after a slot's confirmed flush LSN is of type generated by WAL
resource manager having the rm_decode function defined, then the slot
can't be upgraded.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Invalidate the subscription worker in cases where a user loses their superuser status

2023-09-25 Thread vignesh C
On Mon, 25 Sept 2023 at 00:32, vignesh C  wrote:
>
> On Sat, 23 Sept 2023 at 11:28, Amit Kapila  wrote:
> >
> > On Sat, Sep 23, 2023 at 1:27 AM vignesh C  wrote:
> > >
> > >
> > > Fixed this issue by checking if the subscription owner has changed
> > > from superuser to non-superuser in case the pg_authid rows changes.
> > > The attached patch has the changes for the same.
> > >
> >
> > @@ -3952,7 +3953,9 @@ maybe_reread_subscription(void)
> >   newsub->passwordrequired != MySubscription->passwordrequired ||
> >   strcmp(newsub->origin, MySubscription->origin) != 0 ||
> >   newsub->owner != MySubscription->owner ||
> > - !equal(newsub->publications, MySubscription->publications))
> > + !equal(newsub->publications, MySubscription->publications) ||
> > + (!superuser_arg(MySubscription->owner) &&
> > + MySubscription->isownersuperuser))
> >   {
> >   if (am_parallel_apply_worker())
> >   ereport(LOG,
> > @@ -4605,6 +4608,13 @@ InitializeLogRepWorker(void)
> >   proc_exit(0);
> >   }
> >
> > + /*
> > + * Fetch subscription owner is a superuser. This value will be later
> > + * checked to see when there is any change with this role and the worker
> > + * will be restarted if required.
> > + */
> > + MySubscription->isownersuperuser = superuser_arg(MySubscription->owner);
> >
> > Why didn't you filled this parameter in GetSubscription() like other
> > parameters? If we do that then the comparison of first change in your
> > patch will look similar to all other comparisons.
>
> I felt this variable need not be added to the pg_subscription catalog
> table, instead we could save the state of subscription owner when the
> worker is started and compare this value during invalidations. As this
> information is added only to the memory Subscription structure and not
> added to the catalog FormData_pg_subscription, the checking is
> slightly different in this case. Also since this variable will be used
> only within the worker, I felt we need not add it to the catalog.

On further thinking I felt getting superuser can be moved to
GetSubscription which will make the code consistent with other
checking and will fix the comment Amit had given, the attached version
has the change for the same.

Regards,
Vignesh
From f5731c4bde69e4c7f1a8c10f795d49956d335694 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Fri, 22 Sep 2023 15:12:23 +0530
Subject: [PATCH v2] Restart the apply worker if the subscription owner has
 changed from superuser to non-superuser.

Restart the apply worker if the subscription owner has changed from
superuser to non-superuser. This is required so that the subscription
connection string gets revalidated to identify cases where the
password option is not specified as part of the connection string for
non-superuser.
---
 src/backend/catalog/pg_subscription.c  |  3 +++
 src/backend/replication/logical/worker.c   | 15 ---
 src/include/catalog/pg_subscription.h  |  1 +
 src/test/subscription/t/027_nosuperuser.pl | 21 +
 4 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index d07f88ce28..bc74b695c6 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -108,6 +108,9 @@ GetSubscription(Oid subid, bool missing_ok)
    Anum_pg_subscription_suborigin);
 	sub->origin = TextDatumGetCString(datum);
 
+	/* Get superuser for subscription owner */
+	sub->isownersuperuser = superuser_arg(sub->owner);
+
 	ReleaseSysCache(tup);
 
 	return sub;
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 597947410f..8dd41836b1 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3942,7 +3942,8 @@ maybe_reread_subscription(void)
 	 * The launcher will start a new worker but note that the parallel apply
 	 * worker won't restart if the streaming option's value is changed from
 	 * 'parallel' to any other value or the server decides not to stream the
-	 * in-progress transaction.
+	 * in-progress transaction. Exit if the owner of the subscription has
+	 * changed from superuser to a non-superuser.
 	 */
 	if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 ||
 		strcmp(newsub->name, MySubscription->name) != 0 ||
@@ -3952,7 +3953,8 @@ maybe_reread_subscription(void)
 		newsub->passwordrequired != MySubscription->passwordrequired ||
 		strcmp(newsub->origin, MySubscription->origin) != 0 ||
 		newsub->owner != MySubscription->owner ||
-		!equal(newsub->publications, MySubscription->publications))
+		!equal(newsub->publications, MySubscription->publications) ||
+		(!newsub->isownersuperuser && MySubscription->isownersuperuser))
 	{
 		if (am_parallel_apply_worker())
 			ereport(LOG,
@@ -4621,11 +4623,18 @@ InitializeLogRepWorker(void)
 	SetConfigOption("synchronous_commit", MySubscription->synccommit,
 	PGC_BACKEND, PGC_S_OVERRIDE);
 
-	/* Keep us in

Re: Confused about gram.y referencs in Makefile?

2023-09-25 Thread Daniel Gustafsson
> On 25 Sep 2023, at 05:34, Japin Li  wrote:

> Maybe be reference src/backend/parser/Makefile is better than current.
> 
> How about "See gram.h target's comment in src/backend/parser/Makefile"
> or just "See src/backend/parser/Makefile"?

The latter seems more stable, if the Makefile is ever restructured it's almost
guaranteed that this comment will be missed with the location info becoming
stale.

--
Daniel Gustafsson





Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector

2023-09-25 Thread Daniel Gustafsson
> On 25 Sep 2023, at 00:57, Karl O. Pinc  wrote:

> I have made various, mostly unrelated to each other,
> small improvements to the documentation.  These
> are usually in the areas of plpgsql, schemas, and permissions.
> Most change 1 lines, but some supply short overviews.
> 
> "Short" is subjective, so if these need to be
> broken into different threads or different
> commitfest entries let me know.

While I agree it's subjective, I don't think adding a new section or paragraph
qualifies as short or small.  I would prefer if each (related) change is in a
single commit with a commit message which describes the motivation for the
change.  A reviewer can second-guess the rationale for the changes, but they
shouldn't have to.

The resulting patchset can all be in the same thread though.

--
Daniel Gustafsson





Re: RFC: Logging plan of the running query

2023-09-25 Thread torikoshia

On 2023-09-20 14:39, Lepikhov Andrei wrote:
Thanks for your reply.


On Tue, Sep 19, 2023, at 8:39 PM, torikoshia wrote:

On 2023-09-15 15:21, Lepikhov Andrei wrote:

On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote:
I have explored this patch and, by and large, agree with the code. 
But

some questions I want to discuss:
1. Maybe add a hook to give a chance for extensions, like
pg_query_state, to do their job?


Do you imagine adding a hook which enables adding custom interrupt 
codes

like below?

https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch


No, I think around the hook, which would allow us to rewrite the
pg_query_state extension without additional patches by using the
functionality provided by your patch. I mean, an extension could
provide console UI, not only server logging. And obtain from target
backend so much information in the explain as the instrumentation
level of the current query can give.
It may make pg_query_state shorter and more stable.

2. In this implementation, ProcessInterrupts does a lot of work 
during

the explain creation: memory allocations, pass through the plan,
systables locks, syscache access, etc. I guess it can change the
semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be
called - I can imagine a SELECT query, which would call a stored
procedure, which executes some DDL and acquires row exclusive lock at
the time of interruption. So, in my mind, it is too unpredictable to
make the explain in the place of interruption processing. Maybe it is
better to think about some hook at the end of ExecProcNode, where a
pending explain could be created?


Yeah, I withdrew this patch once for that reason, but we are resuming
development in response to the results of a discussion by James and
others at this year's pgcon about the safety of this process in CFI:

https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com


I can't track the logic path  of the decision provided at this
conference. But my anxiety related to specific place, where
ActiveQueryDesc points top-level query and interruption comes during
DDL, wrapped up in stored procedure. For example:
CREATE TABLE test();
CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
BEGIN
  EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
  ...
END; $$ LANGUAGE plpgsql VOLATILE;
SELECT ddl(), ... FROM ...;


Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on 
all CFI using 
v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then 
ran the following query but did not cause any problems.


```
=# CREATE TABLE test();
=# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
BEGIN
  EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
  PERFORM pg_sleep(5);
END; $$ LANGUAGE plpgsql VOLATILE;
=# SELECT ddl();
```

Is this the case you're worrying about?


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




  1   2   >