date:20230109

Re: Avoiding "wrong tuple length" errors at the end of VACUUM on pg_database update (Backpatch of 947789f to v12 and v13)

2023-01-09 Thread Tom Lane

Michael Paquier  writes:
> Any objections about getting 947789f applied to REL_13_STABLE and
> REL_12_STABLE and see this issue completely gone from all the versions
> supported?

No objections to back-patching the fix, but I wonder if we can find
some mechanical way to prevent this sort of error in future.  It's
surely far from obvious that we need to apply heap_inplace_update
to a raw tuple rather than a syscache entry.

A partial fix perhaps could be to verify that the supplied tuple
is the same length as what we see on-disk?  It's partial because
it'd only trigger if there had actually been a toasted-field
expansion, so it'd most likely not catch such coding errors
during developer testing.

regards, tom lane

Re: typos

2023-01-09 Thread Michael Paquier

On Tue, Jan 10, 2023 at 12:24:40PM +0530, Amit Kapila wrote:
> Thanks for noticing this. I'll take care of this and some other typo
> patches together.

Does this include 0010?  I was just looking at the whole set and this
one looked like a cleanup worth on its own so I was going to handle
it, until I saw your update.  If you are also looking at that, I won't
stand in your way, of course :) 
--
Michael


signature.asc
Description: PGP signature

Avoiding "wrong tuple length" errors at the end of VACUUM on pg_database update (Backpatch of 947789f to v12 and v13)

2023-01-09 Thread Michael Paquier

Hi all,

The problem mentioned in $subject has been discussed here:
https://www.postgresql.org/message-id/dm5pr0501mb38800d9e4605bca72dd35557cc...@dm5pr0501mb3880.namprd05.prod.outlook.com

Ths issue has been fixed by 947789f, without a backpatch to v12 (as
per 96cdeae) as the risk seemed rather limited seen from here, back
when the problem was discussed.  Unfortunately, I have seen customer
deployments on v12 and v13 playing with pg_database entries large
enough that they would have toast entries and would be able to trigger
the problem fixed in v14 at the end of a vacuum.

Any objections about getting 947789f applied to REL_13_STABLE and
REL_12_STABLE and see this issue completely gone from all the versions
supported?

Thanks,
--
Michael


signature.asc
Description: PGP signature

Re: split TOAST support out of postgres.h

2023-01-09 Thread Noah Misch

On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:
> On 30.12.22 17:50, Tom Lane wrote:
> >Peter Eisentraut  writes:
> >>On 28.12.22 16:07, Tom Lane wrote:
> >>>I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
> >>>so widely, I doubt it is buying very much in terms of reducing header
> >>>footprint.  How bad is it to do #2?
> >
> >>See this incremental patch set.
> >
> >Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
> >I think that bears out my feeling that fmgr.h wasn't a great location:
> >I count 117 #includes of that, many of which are in .h files themselves
> >so that many more .c files would be required to read them.
> 
> committed

SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
breakage en masse.  I would revert this.

Re: allowing for control over SET ROLE

2023-01-09 Thread Jeff Davis

On Fri, 2022-12-30 at 22:16 -0800, Noah Misch wrote:
> create user unpriv;
> grant pg_maintain to unpriv with set false;
> create schema maint authorization pg_maintain
>   create table t (c int);
> create or replace function maint.f() returns int language sql as
> 'select 1';
> alter function maint.f() owner to pg_maintain;
> set session authorization unpriv;
> create or replace function maint.f() returns int language sql
> security definer as 'select 1';
> create index on maint.t(c);

I dug into this case, as well as some mirror-image risks associated
with SECURITY INVOKER. This goes on a bit of a tangent and I'm sure I'm
retreading what others already know.

The risks of SECURITY DEFINER are about ownership: by owning something
with code attached, you're responsible to make sure the code is safe
and can only be run by the right users. Additionally, there are a
number of ways someone might come to own some code other than by
defining it themselves. Robert addressed the SET ROLE, CREATE ... OWNER
and the OWNER TO paths; but that still leaves the replace-function path
and the create index paths that you illustrated. As I said earlier I'm
not 100% satisfied with SET ROLE as a privilege; but I'm more
comfortable that it has a defined scope: the SET ROLE privilege should
control paths that can "gift" code to that user.

The risks of SECURITY INVOKER are more serious. It inherently means
that one user is writing code, and another is executing it. And in the
SQL world of triggers, views, expression indexes and logical
replication, the invoker often doesn't know what they are invoking.
There are search path risks, risks associated with resolving the right
function/operator/cast, risks of concurrent DDL (i.e. changing a
function definition right before a superuser executes it), etc. It
severely limits the kinds of trust models you can use in logical
replication. And SECURITY INVOKER weirdly inverts the trust
relationship of a GRANT: if A grants to B, then B must *completely*
trust A in order to exercise that new privilege because A can inject
arbitrary SECURITY INVOKER code in front of the object.

UNIX basically operates on a SECURITY INVOKER model, so I guess that
means that it can work. But then again, grepping a file doesn't execute
arbitrary code from inside that file (though there are bugs
sometimes... see [1]). It just seems like the wrong model for SQL.

[ Aside: that probably explains why the SQL spec defaults to SECURITY
DEFINER. ]

Brainstorming, I think we can do more to mitigate the risks of SECURITY
INVOKER:

* If running a command that would invoke a SECURITY INVOKER function
that is not owned by superuser or a member of the invoker's role, throw
an error instead. We could control this with a GUC for compatibility.

* Have SECURITY PUBLIC which executes with minimal privileges, which
would be good for convenience functions that might be used in an index
expression or view.

* Another idea is to separate out read privileges -- a SECURITY INVOKER
that is read-only is sounds less dangerous (though not without some
risk).

* Prevent extension scripts from running SECURITY INVOKER functions.

[1]
https://lcamtuf.blogspot.com/2014/10/psa-dont-run-strings-on-untrusted-files.html

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: FYI: 2022-10 thorntail failures from coreutils FICLONE

2023-01-09 Thread Noah Misch

On Mon, Jan 09, 2023 at 10:49:26PM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > thorntail failed some recovery tests in 2022-10:
> 
> Speaking of which ... thorntail hasn't reported in for nearly
> three weeks.  Is it stuck?

Its machine has been unresponsive to ssh for those weeks.

Re: typos

2023-01-09 Thread Amit Kapila

On Tue, Jan 10, 2023 at 10:27 AM Justin Pryzby  wrote:
>
> On Tue, Jan 03, 2023 at 03:39:22PM -0600, Justin Pryzby wrote:
> > On Tue, Jan 03, 2023 at 04:28:29PM +0900, Michael Paquier wrote:
> > > On Fri, Dec 30, 2022 at 05:12:57PM -0600, Justin Pryzby wrote:
> > >
> > >  # Use larger ccache cache, as this task compiles with multiple 
> > > compilers /
> > >  # flag combinations
> > > -CCACHE_MAXSIZE: "1GB"
> > > +CCACHE_MAXSIZE: "1G"
> > >
> > > In 0006, I am not sure how much this matters.  Perhaps somebody more
> > > fluent with Cirrus, though, has a different opinion..
> >
> > It's got almost nothing to do with cirrus.  It's an environment
> > variable, and we're using a suffix other than what's
> > supported/documented by ccache, which only happens to work.
> >
> > > 0014 and 0013 do not reduce the translation workload, as the messages
> > > include some stuff specific to the GUC names accessed to, or some
> > > specific details about the code paths triggered.
> >
> > It seems to matter because otherwise the translators sometimes re-type
> > the view name, which (not surprisingly) can get messed up, which is how
> > I mentioned having noticed this.
> >
> > On Tue, Jan 03, 2023 at 05:41:58PM +0900, Michael Paquier wrote:
> > > On Tue, Jan 03, 2023 at 01:03:01PM +0530, Amit Kapila wrote:
> > > > One minor comment:
> > > > -spoken in Belgium (BE), with a UTF-8
> > > > character set
> > > > +spoken in Belgium (BE), with a UTF-8
> > > > character set
> > > >
> > > > Shouldn't this be UTF8 as we are using in
> > > > func.sgml?
> > >
> > > Yeah, I was wondering as well why this change is not worse, which is
> > > why I left it out of 33ab0a2.  There is an acronym for UTF in
> > > acronym.sgml, which makes sense to me, but that's the only place where
> > > this is used.  To add more on top of that, the docs basically need
> > > only UTF8, and we have three references to UTF-16, none of them using
> > > the  markup.
> >
> > I changed it for consistency, as it's the only thing that says <>UTF-8<>
> > anywhere, and charset.sgml already says <>UTF<>-8 elsewhere.
> >
> > Alternately, I suggest to change charset to say <>UTF8<> in both places.
>
> As attached.
> This also fixes "specualtive" in Amit's recent commit.
>

Thanks for noticing this. I'll take care of this and some other typo
patches together.

-- 
With Regards,
Amit Kapila.

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-09 Thread Amit Kapila

On Tue, Jan 10, 2023 at 11:16 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 9 Jan 2023 14:21:03 +0530, Amit Kapila  
> wrote in
> > Pushed the first (0001) patch.
>
> It added the following error message.
>
> +   seg = dsm_attach(handle);
> +   if (!seg)
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +errmsg("unable to map dynamic shared memory 
> segment")));
>
> On the other hand we already have the following one in parallel.c
> (another in pg_prewarm)
>
> seg = dsm_attach(DatumGetUInt32(main_arg));
> if (seg == NULL)
> ereport(ERROR,
> 
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("could not map dynamic shared memory 
> segment")));
>
> Although I don't see a technical difference between the two, all the
> other occurances including the just above (except test_shm_mq) use
> "could not". A faint memory in my non-durable memory tells me that we
> have a policy that we use "can/could not" than "unable".
>

Right, it is mentioned in docs [1] (see section "Tricky Words to Avoid").

> (Mmm. I find ones in StartBackgroundWorker and sepgsql_client_auth.)
>
> Shouldn't we use the latter than the former?  If that's true, it seems
> to me that test_shm_mq also needs the same amendment to avoid the same
> mistake in future.
>
> =
> index 2e5914d5d9..a2d7474ed4 100644
> --- a/src/backend/replication/logical/applyparallelworker.c
> +++ b/src/backend/replication/logical/applyparallelworker.c
> @@ -891,7 +891,7 @@ ParallelApplyWorkerMain(Datum main_arg)
> if (!seg)
> ereport(ERROR,
> 
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> -errmsg("unable to map dynamic shared memory 
> segment")));
> +errmsg("could not map dynamic shared memory 
> segment")));
>
> toc = shm_toc_attach(PG_LOGICAL_APPLY_SHM_MAGIC, 
> dsm_segment_address(seg));
> if (!toc)
> diff --git a/src/test/modules/test_shm_mq/worker.c 
> b/src/test/modules/test_shm_mq/worker.c
> index 8807727337..005b56023b 100644
> --- a/src/test/modules/test_shm_mq/worker.c
> +++ b/src/test/modules/test_shm_mq/worker.c
> @@ -81,7 +81,7 @@ test_shm_mq_main(Datum main_arg)
> if (seg == NULL)
> ereport(ERROR,
> 
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> -errmsg("unable to map dynamic shared memory 
> segment")));
> +errmsg("could not map dynamic shared memory 
> segment")));
> toc = shm_toc_attach(PG_TEST_SHM_MQ_MAGIC, dsm_segment_address(seg));
> if (toc == NULL)
> ereport(ERROR,
> =
>

Can you please start a new thread and post these changes as we are
proposing to change existing message as well?


[1] - https://www.postgresql.org/docs/devel/error-style-guide.html

-- 
With Regards,
Amit Kapila.

Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

2023-01-09 Thread Peter Geoghegan

On Mon, Jan 9, 2023 at 12:58 PM Peter Geoghegan  wrote:
> I didn't realize that affected visibilitymap_set() calls could
> generate useless set-VM WAL records until you pointed it out. That's
> far more likely to happen than the race condition that I described --
> it has nothing at all to do with concurrency. That's what clinches it
> for me.

I didn't spend as much time on this as I'd like to so far, but I think
that this concern about visibilitymap_set() actually turns out to not
apply. The visibilitymap_set() call in question is gated by a
"!VM_ALL_FROZEN()", which is enough to avoid the problem with writing
useless VM set records.

That doesn't make me doubt my original concern about races where the
all-frozen bit can be set, without setting the all-visible bit, and
without accounting for the fact that it changed underneath us. That
scenario will have !VM_ALL_FROZEN(), so that won't save us. (And we
won't test VM_ALL_VISIBLE() or PD_ALL_VISIBLE in a way that is
sufficient to realize that all_visible_according_to_vm is stale.
prunestate.all_visible being set doesn't reliably indicate that it's not stale,
but lazy_scan_heap incorrectly believes that it really does work that way.)

--
Peter Geoghegan

Re: Add BufFileRead variants with short read and EOF detection

2023-01-09 Thread Amit Kapila

On Fri, Jan 6, 2023 at 6:18 PM Peter Eisentraut
 wrote:
>
> On 02.01.23 13:13, Amit Kapila wrote:
> > On Wed, Dec 28, 2022 at 4:17 PM Peter Eisentraut
> >  wrote:
> >>
> >> Most callers of BufFileRead() want to check whether they read the full
> >> specified length.  Checking this at every call site is very tedious.
> >> This patch provides additional variants BufFileReadExact() and
> >> BufFileReadMaybeEOF() that include the length checks.
> >>
> >> I considered changing BufFileRead() itself, but this function is also
> >> used in extensions, and so changing the behavior like this would create
> >> a lot of problems there.  The new names are analogous to the existing
> >> LogicalTapeReadExact().
> >>
> >
> > +1 for the new APIs. I have noticed that some of the existing places
> > use %m and the file path in messages which are not used in the new
> > common function.
>
> The existing uses of %m are wrong.  This was already fixed once in
> 7897e3bb902c557412645b82120f4d95f7474906, but the affected areas of code
> were apparently developed at around the same time and didn't get the
> fix.  So I have attached a separate patch to fix this first, which could
> be backpatched.
>

+1. The patch is not getting applied due to a recent commit.

> The original patch is then rebased on top of that.  I have adjusted the
> error message to include the file set name if available.
>
> What this doesn't keep is the purpose of the temp file in some cases,
> like "hash-join temporary file".  We could maybe make this an additional
> argument or an error context, but it seems cumbersome in any case.
>

Yeah, we can do that but not sure if it is worth doing any of those
because there are already other places that don't use the exact
context.

-- 
With Regards,
Amit Kapila.

Re: doc: add missing "id" attributes to extension packaging page

2023-01-09 Thread Brar Piening


On 10.01.2023 at 06:28, Brar Piening wrote:


I'll repost a rebased version of the styling patch in a minute.


After checking that there's no need for rebasing I'm reposting the
original patch here, to make cfbot pick it up as the latest one in a
somewhat screwed up thread mixing two patches (sorry for that - won't
happen again).

Althoug the patch is pretty compact you probably need some understanding
of both XSLT and CSS to understand and judge the changes it introduces.

It pretty much does two things:

1. Make html ids discoverable in the browser by adding a link with a
   hover effect to items sections and varlistentries that have an id.
   Hover one of the psql options and click on the hash mark in [1] to
   see the behavior.
2. Emit a warning to the command line and a comment to the HTML output
   when the docs build runs into a section without id or a varlistentry
   without id where at least one entry in the varlist already has an id.

The original mail for the patch is at [2], the commitfest entry is at
[3] and the initial discussion leading to this patch starts at [4].

Regards,

Brar

[1] https://pgdocs.piening.info/app-psql.html#APP-PSQL-OPTION-PORT

[2]
https://www.postgresql.org/message-id/d6695820-af71-5e84-58b0-ff9f1c189603%40gmx.de

[3] https://commitfest.postgresql.org/41/4042/

[4]
https://www.postgresql.org/message-id/4364ab38-a475-a1fc-b104-ecd6c72010d0%40enterprisedb.com

diff --git a/doc/src/sgml/stylesheet-html-common.xsl 
b/doc/src/sgml/stylesheet-html-common.xsl
index 9df2782ce4..111b03d6fc 100644
--- a/doc/src/sgml/stylesheet-html-common.xsl
+++ b/doc/src/sgml/stylesheet-html-common.xsl
@@ -301,4 +301,115 @@ set   toc,title
   
 
 
+
+
+
+  
+
+
+
+
+  
+  
+
+  
+
+
+
+
+  
+  
+  
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+  
+
+  
+  
+
+  
+  6
+  
+
+  
+
+  
+  http://www.w3.org/1999/xhtml;>
+
+
+  
+clear: both
+  
+
+
+  
+
+
+  
+
+
+
+  
+
+  
+
+
+
+
+  
+  
+
+  
+
+  #
+  
+
+  
+
+
+  id_link
+
+ #
+  
+
+
+  
+  
+
+  
+   element without id. Please add an id to make it 
usable
+   as stable anchor in the public HTML 
documentation.
+
+
+  Please add an id to the 
+  
+   element in the sgml source code.
+
+  
+
+  
+
+
 
diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css
index 6410a47958..e4174e0613 100644
--- a/doc/src/sgml/stylesheet.css
+++ b/doc/src/sgml/stylesheet.css
@@ -163,3 +163,13 @@ acronym{ font-style: inherit; }
 width: 75%;
   }
 }
+
+/* Links to ids of headers and definition terms */
+a.id_link {
+color: inherit;
+visibility: hidden;
+}
+
+*:hover > a.id_link {
+visibility: visible;
+}

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-09 Thread Kyotaro Horiguchi

Hello.

At Mon, 9 Jan 2023 14:21:03 +0530, Amit Kapila  wrote 
in 
> Pushed the first (0001) patch.

It added the following error message.

+   seg = dsm_attach(handle);
+   if (!seg)
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("unable to map dynamic shared memory 
segment")));

On the other hand we already have the following one in parallel.c
(another in pg_prewarm)

seg = dsm_attach(DatumGetUInt32(main_arg));
if (seg == NULL)
ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("could not map dynamic shared memory 
segment")));

Although I don't see a technical difference between the two, all the
other occurances including the just above (except test_shm_mq) use
"could not". A faint memory in my non-durable memory tells me that we
have a policy that we use "can/could not" than "unable".

(Mmm. I find ones in StartBackgroundWorker and sepgsql_client_auth.)

Shouldn't we use the latter than the former?  If that's true, it seems
to me that test_shm_mq also needs the same amendment to avoid the same
mistake in future.

=
index 2e5914d5d9..a2d7474ed4 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -891,7 +891,7 @@ ParallelApplyWorkerMain(Datum main_arg)
if (!seg)
ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-errmsg("unable to map dynamic shared memory 
segment")));
+errmsg("could not map dynamic shared memory 
segment")));
 
toc = shm_toc_attach(PG_LOGICAL_APPLY_SHM_MAGIC, 
dsm_segment_address(seg));
if (!toc)
diff --git a/src/test/modules/test_shm_mq/worker.c 
b/src/test/modules/test_shm_mq/worker.c
index 8807727337..005b56023b 100644
--- a/src/test/modules/test_shm_mq/worker.c
+++ b/src/test/modules/test_shm_mq/worker.c
@@ -81,7 +81,7 @@ test_shm_mq_main(Datum main_arg)
if (seg == NULL)
ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-errmsg("unable to map dynamic shared memory 
segment")));
+errmsg("could not map dynamic shared memory 
segment")));
toc = shm_toc_attach(PG_TEST_SHM_MQ_MAGIC, dsm_segment_address(seg));
if (toc == NULL)
ereport(ERROR,
=

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-09 Thread Tom Lane

David Rowley  writes:
> Ideally, our sort costing would just be better, but I think that
> raises the bar a little too high to start thinking of making
> improvements to that for this patch.

It's trickier than it looks, cf f4c7c410e.  But if you just want
to add a small correction based on number of columns being sorted
by, that seems within reach.  See the comment for cost_sort though.
Also, I suppose for incremental sorts we'd want to consider only
the number of newly-sorted columns, but I'm not sure if that info
is readily at hand either.

regards, tom lane

Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-09 Thread Amit Kapila

On Sat, Jan 7, 2023 at 6:15 AM Nathan Bossart  wrote:
>
> On Fri, Jan 06, 2023 at 05:31:26PM -0500, Tom Lane wrote:
>
> > Attached is a rebased 0003, just to keep the cfbot happy.
> > I'm kind of wondering whether 0003 is worth the complexity TBH,
> > but in any case I ran out of time to look at it closely today.
>
> Yeah.  It's not as bad as I was expecting, but it does add a bit more
> complexity than is probably warranted.
>

Personally, I think it is not as complex as we were initially thinking
and does the job accurately unless we are missing something. So, +1 to
proceed with this approach.

I haven't looked in detail but isn't it better to explain somewhere in
the comments that it achieves to rate limit the restart of workers in
case of error and allows them to restart immediately in case of
subscription parameter change?

Another minor point: Don't we need to set the launcher's latch after
removing the entry from the hash table to avoid the launcher waiting
on the latch for a bit longer?

-- 
With Regards,
Amit Kapila.

Re: doc: add missing "id" attributes to extension packaging page

2023-01-09 Thread Brar Piening


On 09.01.2023 at 23:28, Karl O. Pinc wrote:

On Mon, 09 Jan 2023 15:18:18 -0500
Tom Lane  wrote:

It's probably going to be necessary to have follow-on patches,
because I'm sure there is stuff in the pipeline that adds more
ID-less tags.  Or do we have a way to create warnings about that?

I am unclear on how to make warnings with xslt.  You can make
a listing of problems, but who would read it if the build
completed successfully?  You can make errors and abort.


You can emit warnings to the command line or you can abort with an
error. I've opted for warnings + comments in the output in the styling
patch.

The biggest issue about errors and warnings is the fact that xslt does
not process files in a line-based way which makes it pretty much
impossible to give hints where the problem causing the warning is
located. Since everything is bound together via XML entities, you can't
even tell the source file.

I've worked around this by also emitting an HTML comment to the output
so that I can find a somewhat unique string next to it and the grep the
documentation sources for this string. It's a bit ugly but the best I
could come up with.

I'll repost a rebased version of the styling patch in a minute.

Regards,

Brar

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-09 Thread David Rowley

On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey  wrote:
> Do we have any pending items for this patch now?

I'm just wondering if not trying this when the query has a DISTINCT
clause is a copout.  What I wanted to avoid was doing additional
sorting work for WindowAgg just to have it destroyed by Hash
Aggregate.  I'm now wondering if adding both the original
slightly-less-sorted path plus the new slightly-more-sorted path then
if distinct decides to Hash Aggregate then it'll still be able to pick
the cheapest input path to do that on.  Unfortunately, our sort
costing just does not seem to be advanced enough to know that sorting
by fewer columns might be cheaper, so adding the additional path is
likely just going to result in add_path() ditching the old
slightly-less-sorted path due to the new slightly-more-sorted path
having better pathkeys. So, we'd probably be wasting our time if we
added both paths with the current sort costing code.

# explain analyze select * from pg_Class order by relkind,relname;
  QUERY PLAN
---
 Sort  (cost=36.01..37.04 rows=412 width=273) (actual
time=0.544..0.567 rows=412 loops=1)
   Sort Key: relkind, relname
   Sort Method: quicksort  Memory: 109kB
   ->  Seq Scan on pg_class  (cost=0.00..18.12 rows=412 width=273)
(actual time=0.014..0.083 rows=412 loops=1)
 Planning Time: 0.152 ms
 Execution Time: 0.629 ms
(6 rows)


# explain analyze select * from pg_Class order by relkind;
  QUERY PLAN
---
 Sort  (cost=36.01..37.04 rows=412 width=273) (actual
time=0.194..0.218 rows=412 loops=1)
   Sort Key: relkind
   Sort Method: quicksort  Memory: 109kB
   ->  Seq Scan on pg_class  (cost=0.00..18.12 rows=412 width=273)
(actual time=0.014..0.083 rows=412 loops=1)
 Planning Time: 0.143 ms
 Execution Time: 0.278 ms
(6 rows)

the total cost is the same for both of these, but the execution time
seems to vary quite a bit.

Maybe we should try and do this for DISTINCT queries if the
distinct_pathkeys match the orderby_pathkeys. That seems a little less
copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
likely that the ORDER BY might opt to use the Unique path for DISTINCT
since it'll already have the correct pathkeys.  However, if the ORDER
BY has fewer columns then it might be cheaper to Hash Aggregate and
then sort all over again, especially so when the DISTINCT removes a
large proportion of the rows.

Ideally, our sort costing would just be better, but I think that
raises the bar a little too high to start thinking of making
improvements to that for this patch.

David

Re: doc: add missing "id" attributes to extension packaging page

2023-01-09 Thread Tom Lane

Brar Piening  writes:
> On 09.01.2023 at 21:18, Tom Lane wrote:
>> * I got rid of a couple of "-et-al" additions, because it did not
>> seem like a good precedent.  That would tempt people to modify
>> existing ID tags when adding variables to an entry, which'd defeat
>> the purpose I think.

> I tried to use it sparsely, mostly where a varlistentry had multiple
> child items and I had arbitrarily pick one of them. It's not important,
> though. I'm curious how you solved this.

I just removed "-et-al", I didn't question your choice of the principal
variable name.  As you say, it didn't seem to matter that much.

regards, tom lane

Re: doc: add missing "id" attributes to extension packaging page

2023-01-09 Thread Brar Piening


On 09.01.2023 at 21:18, Tom Lane wrote:

It's not great to have multiple CF entries pointing at the same email
thread --- it confuses both people and bots.  Next time please split
off a thread for each distinct patch.


I agree. I had overestimated the cfbot's ability to handle branched
threads. I'll create separate threads next time.



* AFAIK our practice is to use "-" never "_" in XML ID attributes.
You weren't very consistent about that even within this patch, and
the overall effect would have been to have no standard about that
at all, which doesn't seem great.  I changed them all to "-".


Noted. Maybe it's worth to write a short paragraph about Ids and their
style somewhere in the docs (e.g.  Appendix J.5).



* I got rid of a couple of "-et-al" additions, because it did not
seem like a good precedent.  That would tempt people to modify
existing ID tags when adding variables to an entry, which'd defeat
the purpose I think.


I tried to use it sparsely, mostly where a varlistentry had multiple
child items and I had arbitrarily pick one of them. It's not important,
though. I'm curious how you solved this.



* I fixed a couple of things that looked like typos or unnecessary
inconsistencies.  I have to admit that my eyes glazed over after
awhile, so there might be remaining infelicities.


I'm all for consistency. The only places where I intentionally refrained
from being consistent was where I felt Ids would get too long or where
there were already ids in place that didn't match my naming scheme.



It's probably going to be necessary to have follow-on patches,
because I'm sure there is stuff in the pipeline that adds more
ID-less tags.  Or do we have a way to create warnings about that?


Agreed. And yes, we do have a limited way to create warnings (that's
part of the other patch).



I'm unqualified to review CSS stuff, so you'll need to get somebody
else to review that patch.  But I'd suggest reposting it, else
the cfbot is going to start whining that the patch-of-record in
this thread no longer applies.


I will do that. Thanks for your feedback!

Regards,

Brar

Re: split TOAST support out of postgres.h

2023-01-09 Thread Peter Eisentraut


On 30.12.22 17:50, Tom Lane wrote:

Peter Eisentraut  writes:

On 28.12.22 16:07, Tom Lane wrote:

I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint.  How bad is it to do #2?



See this incremental patch set.


Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.


committed

Re: typos

2023-01-09 Thread Justin Pryzby

On Tue, Jan 03, 2023 at 03:39:22PM -0600, Justin Pryzby wrote:
> On Tue, Jan 03, 2023 at 04:28:29PM +0900, Michael Paquier wrote:
> > On Fri, Dec 30, 2022 at 05:12:57PM -0600, Justin Pryzby wrote:
> > 
> >  # Use larger ccache cache, as this task compiles with multiple 
> > compilers /
> >  # flag combinations
> > -CCACHE_MAXSIZE: "1GB"
> > +CCACHE_MAXSIZE: "1G"
> > 
> > In 0006, I am not sure how much this matters.  Perhaps somebody more
> > fluent with Cirrus, though, has a different opinion..
> 
> It's got almost nothing to do with cirrus.  It's an environment
> variable, and we're using a suffix other than what's
> supported/documented by ccache, which only happens to work.
> 
> > 0014 and 0013 do not reduce the translation workload, as the messages
> > include some stuff specific to the GUC names accessed to, or some
> > specific details about the code paths triggered.
> 
> It seems to matter because otherwise the translators sometimes re-type
> the view name, which (not surprisingly) can get messed up, which is how
> I mentioned having noticed this.
> 
> On Tue, Jan 03, 2023 at 05:41:58PM +0900, Michael Paquier wrote:
> > On Tue, Jan 03, 2023 at 01:03:01PM +0530, Amit Kapila wrote:
> > > One minor comment:
> > > -spoken in Belgium (BE), with a UTF-8
> > > character set
> > > +spoken in Belgium (BE), with a UTF-8
> > > character set
> > > 
> > > Shouldn't this be UTF8 as we are using in
> > > func.sgml?
> > 
> > Yeah, I was wondering as well why this change is not worse, which is
> > why I left it out of 33ab0a2.  There is an acronym for UTF in
> > acronym.sgml, which makes sense to me, but that's the only place where 
> > this is used.  To add more on top of that, the docs basically need
> > only UTF8, and we have three references to UTF-16, none of them using
> > the  markup.
> 
> I changed it for consistency, as it's the only thing that says <>UTF-8<>
> anywhere, and charset.sgml already says <>UTF<>-8 elsewhere.
> 
> Alternately, I suggest to change charset to say <>UTF8<> in both places.

As attached.
This also fixes "specualtive" in Amit's recent commit.

-- 
Justin
>From 8b56d6007e13e3b42bc75da3b9174ecab8a8dd70 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 25 Sep 2022 18:40:36 -0500
Subject: [PATCH 1/9] typos

---
 .cirrus.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 69837bcd5ad..048a004e309 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -655,7 +655,7 @@ task:
 
 # Use larger ccache cache, as this task compiles with multiple compilers /
 # flag combinations
-CCACHE_MAXSIZE: "1GB"
+CCACHE_MAXSIZE: "1G"
 CCACHE_DIR: "/tmp/ccache_dir"
 
 LINUX_CONFIGURE_FEATURES: *LINUX_CONFIGURE_FEATURES
-- 
2.25.1

>From c082768a857ebd7a0a534ee497761dca0c621f64 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 26 Sep 2021 11:13:27 -0500
Subject: [PATCH 2/9] comments grammar: extended (and other) stats

See:
202109272104.7t253iw236fb@alvherre.pgsql
070d2e19e40897d857f570f24888fc30727ed9c0
609b0652af00374b89411ea2613fd5bb92bca92c
a4d75c86bf15220df22de0a92c819ecef9db3849
7300a699502fe5432b05fbc75baca534b080bebb
ccaa3569f58796868303629bc2d63b599b38
---
 src/backend/commands/statscmds.c | 4 ++--
 src/backend/statistics/README| 2 +-
 src/backend/statistics/dependencies.c| 8 
 src/backend/statistics/extended_stats.c  | 4 ++--
 src/backend/utils/adt/pgstatfuncs.c  | 4 ++--
 src/backend/utils/adt/ruleutils.c| 2 +-
 src/include/statistics/extended_stats_internal.h | 2 +-
 7 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 26ebd0819d6..2e41745646b 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -377,7 +377,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 	/*
 	 * If no statistic type was specified, build them all (but only when the
-	 * statistics is defined on more than one column/expression).
+	 * statistics object is defined on more than one column/expression).
 	 */
 	if ((!requested_type) && (numcols >= 2))
 	{
@@ -432,7 +432,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	 * similar to why we don't bother with extracting columns from
 	 * expressions. It's either expensive or very easy to defeat for
 	 * determined user, and there's no risk if we allow such statistics (the
-	 * statistics is useless, but harmless).
+	 * statistic is useless, but harmless).
 	 */
 	foreach(cell, stxexprs)
 	{
diff --git a/src/backend/statistics/README b/src/backend/statistics/README
index 13a97a35662..b87ca4734b2 100644
--- a/src/backend/statistics/README
+++ b/src/backend/statistics/README
@@ -24,7 +24,7 @@ There are currently several kinds of extended statistics:
 Compatible clause types
 ---
 
-Each type of statistics may be used to estimate some subset of clause types.

RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-09 Thread houzj.f...@fujitsu.com

On Monday, January 9, 2023 4:51 PM Amit Kapila  wrote:
> 
> On Sun, Jan 8, 2023 at 11:32 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Sunday, January 8, 2023 11:59 AM houzj.f...@fujitsu.com
>  wrote:
> > > Attach the updated patch set.
> >
> > Sorry, the commit message of 0001 was accidentally deleted, just
> > attach the same patch set again with commit message.
> >
> 
> Pushed the first (0001) patch.

Thanks for pushing, here are the remaining patches.
I reordered the patch number to put patches that are easier to
commit in the front of others.

Best regards,
Hou zj



v78-0004-Retry-to-apply-streaming-xact-only-in-apply-work.patch
Description:  v78-0004-Retry-to-apply-streaming-xact-only-in-apply-work.patch


v78-0001-Add-a-main_worker_pid-to-pg_stat_subscription.patch
Description:  v78-0001-Add-a-main_worker_pid-to-pg_stat_subscription.patch


v78-0002-Stop-extra-worker-if-GUC-was-changed.patch
Description: v78-0002-Stop-extra-worker-if-GUC-was-changed.patch


v78-0003-Add-GUC-stream_serialize_threshold-and-test-seri.patch
Description:  v78-0003-Add-GUC-stream_serialize_threshold-and-test-seri.patch

Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-09 Thread Tom Lane

"Hayato Kuroda (Fujitsu)"  writes:
> Thanks for reporting. PSA rebased version.
> These can be applied work well on my HEAD(bd8d45).

I think that it's a really bad idea to require postgres_fdw.sql
to have two expected-files: that will be a maintenance nightmare.
Please put whatever it is that needs a variant expected-file
into its own, hopefully very small and seldom-changed, test script.
Or rethink whether you really need a test case that has
platform-dependent output.

regards, tom lane

Re: Strengthen pg_waldump's --save-fullpage tests

2023-01-09 Thread Bharath Rupireddy

On Tue, Jan 10, 2023 at 6:52 AM Michael Paquier  wrote:
>
> On Mon, Jan 09, 2023 at 08:30:00AM +0530, Bharath Rupireddy wrote:
> > A recent commit [1] added --save-fullpage option to pg_waldump to
> > extract full page images (FPI) from WAL records and save them into
> > files (one file per FPI) under a specified directory. While it added
> > tests to check the LSN from the FPI file name and the FPI file
> > contents, it missed to further check the FPI contents like the tuples
> > on the page. I'm attaching a patch that basically reads the FPI file
> > (saved by pg_waldump) contents and raw page from the table file (using
> > pageinspect extension) and compares the tuples from both of them. This
> > test ensures that the pg_waldump outputs the correct FPI. This idea is
> > also discussed elsewhere [2].
> >
> > Thoughts?
>
> I am not sure that it is necessary to expand this set of tests to have
> dependencies on heap and pageinspect (if we do so, what of index AMs)
> and spend more cycles on that, while we already have something in
> place to cross-check ReadRecPtr with what's stored in the page header
> written on top of the block size.

While checking for a page LSN is enough here, there's no harm in
verifying the whole FPI fetched from WAL record with that of the raw
page data. Also, this test illustrates how one can make use of the
fetched FPI - like reading the contents using pg_read_binary_file()
(of course on can also use COPY command to load the FPI data to
postgres) and using pageinspect functions to make sense of the raw
data.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Show various offset arrays for heap WAL records

2023-01-09 Thread Peter Geoghegan

On Mon, Jan 9, 2023 at 1:58 PM Andres Freund  wrote:
> A couple times when investigating data corruption issues, the last time just
> yesterday in [1], I needed to see the offsets affected by PRUNE and VACUUM
> records. As that's probably not just me, I think we should make that change
> in-tree.

I remember how useful this was when we were investigating that early
bug in 14, that turned out to be in parallel VACUUM. So I'm all in
favor of it.

> The attached patch adds details to XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
> XLOG_HEAP2_FREEZE_PAGE.

I'm bound to end up doing the same in index access methods. Might make
sense for the utility routines to live somewhere more centralized, at
least when code reuse is likely. Practically every index AM has WAL
records that include a sorted page offset number array, just like
these ones. It's a very standard thing, obviously.

> I chose to include infomask[2] for the different freeze plans mainly because
> it looks odd to see different plans without a visible reason. But I'm not sure
> that's the right choice.

I don't think that it is particularly necessary to do so in order for
the output to make sense -- pg_waldump is inherently a tool for
experts. What it comes down to for me is whether or not this
information is sufficiently useful to display, and/or can be (or needs
to be) controlled via some kind of verbosity knob.

I think that it easily could be useful, and I also think that it
easily could be a bit annoying. How hard would it be to invent a
general mechanism to control the verbosity of what we'll show for each
WAL record?

-- 
Peter Geoghegan

Re: FYI: 2022-10 thorntail failures from coreutils FICLONE

2023-01-09 Thread Tom Lane

Noah Misch  writes:
> thorntail failed some recovery tests in 2022-10:

Speaking of which ... thorntail hasn't reported in for nearly
three weeks.  Is it stuck?

regards, tom lane

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-09 Thread Mark Dilger




> On Jan 9, 2023, at 2:07 PM, Andres Freund  wrote:
> 
> The tests encounter the issue today. If you add
> Assert(TransactionIdIsValid(ctx->next_xid));
> Assert(FullTransactionIdIsValid(ctx->next_fxid));
> early in FullTransactionIdFromXidAndCtx() it'll be hit in the
> amcheck/pg_amcheck tests.

Ok, I can confirm that.  I find the assertion

Assert(epoch != (uint32)-1);

a bit simpler to reason about, but either way, I agree it is a bug.  Thanks for 
finding this.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Fix pg_publication_tables to exclude generated columns

2023-01-09 Thread shiy.f...@fujitsu.com

On Mon, Jan 9, 2023 11:06 PM Tom Lane  wrote:
> 
> Amit Kapila  writes:
> > On Mon, Jan 9, 2023 at 5:29 PM shiy.f...@fujitsu.com
> >  wrote:
> >> I think one way to fix it is to modify pg_publication_tables query to 
> >> exclude
> >> generated columns. But in this way, we need to bump catalog version when
> fixing
> >> it in back-branch. Another way is to modify function
> >> pg_get_publication_tables()'s return value to contain all supported columns
> if
> >> no column list is specified, and we don't need to change system view.
> 
> > That sounds like a reasonable approach to fix the issue.
> 
> We could just not fix it in the back branches.  I'd argue that this is
> as much a definition change as a bug fix, so it doesn't really feel
> like something to back-patch anyway.
> 

If this is not fixed in back-branch, in some cases we will get an error when
creating/refreshing subscription because we query pg_publication_tables in
column list check.

e.g.

-- publisher
CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int GENERATED 
ALWAYS AS (a + 1) STORED);
CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 (a, b, c);
CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4;

-- subscriber
CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int);

postgres=# CREATE SUBSCRIPTION sub1 CONNECTION 'port=5432' PUBLICATION 
pub_mix_7, pub_mix_8;
ERROR:  cannot use different column lists for table "public.test_mix_4" in 
different publications

I think it might be better to fix it in back-branch. And if we fix it by
modifying pg_get_publication_tables(), we don't need to bump catalog version in
back-branch, I think this seems acceptable.

Regards,
Shi yu

Re: Allow DISTINCT to use Incremental Sort

2023-01-09 Thread Richard Guo

On Tue, Jan 10, 2023 at 10:14 AM David Rowley  wrote:

> > /* For explicit-sort case, always use the more rigorous clause */
> > if (list_length(root->distinct_pathkeys) <
> > list_length(root->sort_pathkeys))
> > {
> > needed_pathkeys = root->sort_pathkeys;
> > /* Assert checks that parser didn't mess up... */
> > Assert(pathkeys_contained_in(root->distinct_pathkeys,
> >  needed_pathkeys));
> > }
> > else
> > needed_pathkeys = root->distinct_pathkeys;
> >
> > I'm not sure if this is necessary, as AFAIU the parser should have
> > ensured that the sortClause is always a prefix of distinctClause.
>
> I think that's true for standard DISTINCT, but it's not for DISTINCT ON.
>
> > In the patch this code has been removed.  I think we should also remove
> > the related comments in create_final_distinct_paths.
> >
> >* When we have DISTINCT ON, we must sort by the more rigorous of
> >* DISTINCT and ORDER BY, else it won't have the desired behavior.
> > -  * Also, if we do have to do an explicit sort, we might as well use
> > -  * the more rigorous ordering to avoid a second sort later.  (Note
> > -  * that the parser will have ensured that one clause is a prefix of
> > -  * the other.)
>
> I'm not quite following what you think has changed here.

Sorry I didn't make myself clear.  I mean currently on HEAD in planner.c
from line 4847 to line 4857, we have the code to make sure we always use
the more rigorous clause for explicit-sort case.  I think this code is
not necessary, because we have already done that in line 4791 to 4796,
for both DISTINCT ON and standard DISTINCT.

In this patch that code (line 4847 to line 4857) has been removed, which
I agree with.  I just wondered if the related comment should be removed
too.  But now after a second thought, I think it's OK to keep that
comment, as it still holds true in the new patch.

> I've attached an updated patch

The patch looks good to me.

Thanks
Richard

RE: [PATCH] Support % wildcard in extension upgrade filenames

2023-01-09 Thread Regina Obe

> I continue to think that this is a fundamentally bad idea.  It creates all
sorts of
> uncertainties about what is a valid update path and what is not.
Restrictions
> like
> 
> + Such wildcard update
> + scripts will only be used when no explicit path is found from
> + old to target version.
> 
> are just band-aids to try to cover up the worst problems.
> 
> Have you considered the idea of instead inventing a "\include" facility
for
> extension scripts?  Then, if you want to use one-monster-script to handle
> different upgrade cases, you still need one script file for each supported
> upgrade step, but those can be one-liners including the common script
file.
> Plus, such a facility could be of use to people who want intermediate
> factorization solutions (that is, some sharing of code without buying all
the
> way into one-monster-script).
> 
>   regards, tom lane

The problem with an include is that it does nothing for us. We still need to
ship a ton of script files.  We've already dealt with the file size issue.

So our PostGIS 3.4.0+ (not yet released) already kind of simulates include
using blank script files that have nothing in them but forces the extension
machinery to upgrade the version to ANY to get to the required installed
version 3.4.0+

So for example postgis--3.3.0--ANY.sql

Would contain this:
-- Just tag extension postgis version as "ANY"
-- Installed by postgis 3.4.0dev
-- Built on ...

And the file has the upgrade steps:
postgis--ANY--3.4.0dev.sql

So that when you are on 3.3.0 and want to upgrade to 3.4.0dev ( it takes
3.3.0 -> ANY -> 3.4.0dev)

The other option I had proposed was getting rid of the micro version, but I
got shot down on that argument -- with PostGIS PSC complaining about people
not being able to upgrade a micro if per chance we have some security patch
released in a micro.

https://lists.osgeo.org/pipermail/postgis-devel/2022-June/029673.html

https://lists.osgeo.org/pipermail/postgis-devel/2022-July/029713.html


Thanks,
Regina

Re: Handle infinite recursion in logical replication setup

2023-01-09 Thread Jonathan S. Katz


On 9/12/22 1:23 AM, vignesh C wrote:

On Fri, 9 Sept 2022 at 11:12, Amit Kapila  wrote:


On Thu, Sep 8, 2022 at 9:32 AM vignesh C  wrote:



The attached patch has the changes to handle the same.



Pushed. I am not completely sure whether we want the remaining
documentation patch in this thread in its current form or by modifying
it. Johnathan has shown some interest in it. I feel you can start a
separate thread for it to see if there is any interest in the same and
close the CF entry for this work.


Thanks for pushing the patch. I have closed this entry in commitfest.
I will wait for some more time and see the response regarding the
documentation patch and then start a new thread if required.


I've been testing this patch in advancing of working on the 
documentation and came across a behavior I wanted to note. Specifically, 
I am hitting a deadlock while trying to synchronous replicate between 
the two instances at any `synchronous_commit` level above `local`.


Here is my set up. I have two instances, "A" and "B".

On A and B, run:

  CREATE TABLE sync (id int PRIMARY KEY, info float);
  CREATE PUBLICATION sync FOR TABLE sync;

On A, run:

  CREATE SUBSCRIPTION sync
  CONNECTION 'connstr-to-B'
  PUBLICATION sync
  WITH (
streaming=true, copy_data=false,
origin=none, synchronous_commit='on');

On B, run:

  CREATE SUBSCRIPTION sync
  CONNECTION 'connstr-to-A'
  PUBLICATION sync
  WITH (
streaming=true, copy_data=false,
origin=none, synchronous_commit='on');

On A and B, run:

  ALTER SYSTEM SET synchronous_standby_names TO 'sync';
  SELECT pg_reload_conf();

Verify on A and B that pg_stat_replication.sync_state is set to "sync"

  SELECT application_name, sync_state = 'sync' AS is_sync
  FROM pg_stat_replication
  WHERE application_name = 'sync';

The next to commands should be run simultaneously on A and B:

-- run this on A
INSERT INTO sync
SELECT x, random() FROM generate_series(1,200, 2) x;

-- run this on B
INSERT INTO sync
SELECT x, random() FROM generate_series(2,200, 2) x;

This consistently created the deadlock in my testing.

Discussing with Masahiko off-list, this is due to a deadlock from 4 
processes: the walsenders on A and B, and the apply workers on A and B. 
The walsenders are waiting for feedback from the apply workers, and the 
apply workers are waiting for the walsenders to synchronize (I may be 
oversimplifying).


He suggested I try the above example instead with `synchronous_commit` 
set to `local`. In this case, I verified that there is no more deadlock, 
but he informed me that we would not be able to use cascading 
synchronous replication when "origin=none".


If we decide that this is a documentation issue, I'd suggest we improve 
the guidance around using `synchronous_commit`[1] on the CREATE 
SUBSCRIPTION page, as the GUC page[2] warns against using `local`:


"The setting local causes commits to wait for local flush to disk, but 
not for replication. This is usually not desirable when synchronous 
replication is in use, but is provided for completeness."


Thanks,

Jonathan

[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html
[2] 
https://www.postgresql.org/docs/devel/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT


OpenPGP_signature
Description: OpenPGP digital signature

Re: ATTACH PARTITION seems to ignore column generation status

2023-01-09 Thread Tom Lane

Amit Langote  writes:
> Thanks for the patch.  It looks good, though I guess you said that we
> should also change the error message that CREATE TABLE ... PARTITION
> OF emits to match the other cases while we're here.  I've attached a
> delta patch.

Thanks.  I hadn't touched that issue because I wasn't entirely sure
which error message(s) you were unhappy with.  These changes look
fine offhand.

regards, tom lane

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-05 13:44:20 -0500, Reid Thompson wrote:
> From 0a6b152e0559a25033bd7d43eb0959432e0d Mon Sep 17 00:00:00 2001
> From: Reid Thompson 
> Date: Thu, 11 Aug 2022 12:01:25 -0400
> Subject: [PATCH 1/2] Add tracking of backend memory allocated to
>  pg_stat_activity
> 
> This new field displays the current bytes of memory allocated to the
> backend process. It is updated as memory for the process is
> malloc'd/free'd. Memory allocated to items on the freelist is included in
> the displayed value.

It doesn't actually malloc/free. It tracks palloc/pfree.


> Dynamic shared memory allocations are included only in the value displayed
> for the backend that created them, they are not included in the value for
> backends that are attached to them to avoid double counting.

As mentioned before, I don't think accounting DSM this way makes sense.


> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -407,6 +407,9 @@ StartAutoVacLauncher(void)
>  
>  #ifndef EXEC_BACKEND
>   case 0:
> + /* Zero allocated bytes to avoid double counting parent 
> allocation */
> + pgstat_zero_my_allocated_bytes();
> +
>   /* in postmaster child ... */
>   InitPostmasterChild();



> @@ -1485,6 +1488,9 @@ StartAutoVacWorker(void)
>  
>  #ifndef EXEC_BACKEND
>   case 0:
> + /* Zero allocated bytes to avoid double counting parent 
> allocation */
> + pgstat_zero_my_allocated_bytes();
> +
>   /* in postmaster child ... */
>   InitPostmasterChild();
>  
> diff --git a/src/backend/postmaster/postmaster.c 
> b/src/backend/postmaster/postmaster.c
> index eac3450774..24278e5c18 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -4102,6 +4102,9 @@ BackendStartup(Port *port)
>   {
>   free(bn);
>  
> + /* Zero allocated bytes to avoid double counting parent 
> allocation */
> + pgstat_zero_my_allocated_bytes();
> +
>   /* Detangle from postmaster */
>   InitPostmasterChild();


It doesn't at all seem right to call pgstat_zero_my_allocated_bytes() here,
before even InitPostmasterChild() is called. Nor does it seem right to add the
call to so many places.

Note that this is before we even delete postmaster's memory, see e.g.:
/*
 * If the PostmasterContext is still around, recycle the space; we don't
 * need it anymore after InitPostgres completes.  Note this does not 
trash
 * *MyProcPort, because ConnCreate() allocated that space with malloc()
 * ... else we'd need to copy the Port data first.  Also, subsidiary 
data
 * such as the username isn't lost either; see ProcessStartupPacket().
 */
if (PostmasterContext)
{
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
}

calling pgstat_zero_my_allocated_bytes() before we do this will lead to
undercounting memory usage, afaict.


> +/* Enum helper for reporting memory allocated bytes */
> +enum allocation_direction
> +{
> + PG_ALLOC_DECREASE = -1,
> + PG_ALLOC_IGNORE,
> + PG_ALLOC_INCREASE,
> +};

What's the point of this?


> +/* --
> + * pgstat_report_allocated_bytes() -
> + *
> + *  Called to report change in memory allocated for this backend.
> + *
> + * my_allocated_bytes initially points to local memory, making it safe to 
> call
> + * this before pgstats has been initialized. allocation_direction is a
> + * positive/negative multiplier enum defined above.
> + * --
> + */
> +static inline void
> +pgstat_report_allocated_bytes(int64 allocated_bytes, int 
> allocation_direction)

I don't think this should take allocation_direction as a parameter, I'd make
it two different functions.


> +{
> + uint64  temp;
> +
> + /*
> +  * Avoid *my_allocated_bytes unsigned integer overflow on
> +  * PG_ALLOC_DECREASE
> +  */
> + if (allocation_direction == PG_ALLOC_DECREASE &&
> + pg_sub_u64_overflow(*my_allocated_bytes, allocated_bytes, 
> ))
> + {
> + *my_allocated_bytes = 0;
> + ereport(LOG,
> + errmsg("Backend %d deallocated %lld bytes, 
> exceeding the %llu bytes it is currently reporting allocated. Setting 
> reported to 0.",
> +MyProcPid, (long long) 
> allocated_bytes,
> +(unsigned long long) 
> *my_allocated_bytes));

We certainly shouldn't have an ereport in here. This stuff really needs to be
cheap.


> + }
> + else
> + *my_allocated_bytes += (allocated_bytes) * allocation_direction;

Superfluous parens?



> +/* --
> + * pgstat_get_all_memory_allocated() -
> + *
> + *   Return a uint64

Re: ATTACH PARTITION seems to ignore column generation status

2023-01-09 Thread Amit Langote

On Tue, Jan 10, 2023 at 6:41 AM Tom Lane  wrote:
> I wrote:
> > After thinking about this awhile, I feel that we ought to disallow
> > it in the traditional-inheritance case as well.  The reason is that
> > there are semantic prohibitions on inserting or updating a generated
> > column, eg
>
> > regression=# create table t (f1 int, f2 int generated always as (f1+1) 
> > stored);
> > CREATE TABLE
> > regression=# update t set f2=42;
> > ERROR:  column "f2" can only be updated to DEFAULT
> > DETAIL:  Column "f2" is a generated column.
>
> > It's not very reasonable to have to recheck that for child tables,
> > and we don't.  But if one does this:
>
> > regression=# create table pp (f1 int, f2 int);
> > CREATE TABLE
> > regression=# create table cc (f1 int, f2 int generated always as (f1+1) 
> > stored) inherits(pp);
> > NOTICE:  merging column "f1" with inherited definition
> > NOTICE:  merging column "f2" with inherited definition
> > CREATE TABLE
> > regression=# insert into cc values(1);
> > INSERT 0 1
> > regression=# update pp set f2 = 99 where f1 = 1;
> > UPDATE 1
> > regression=# table cc;
> >  f1 | f2
> > +
> >   1 | 99
> > (1 row)
>
> > That is surely just as broken as the partition-based case.

Agree that it looks bad.

> So what we need is about like this.  This is definitely not something
> to back-patch, since it's taking away what had been a documented
> behavior.  You could imagine trying to prevent such UPDATEs instead,
> but I judge it not worth the trouble.  If anyone were actually using
> this capability we'd have heard bug reports.

Thanks for the patch.  It looks good, though I guess you said that we
should also change the error message that CREATE TABLE ... PARTITION
OF emits to match the other cases while we're here.  I've attached a
delta patch.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


partition-generated-cols-different-error.patch
Description: Binary data

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-09 Thread Takamichi Osumi (Fujitsu)

On Tuesday, December 27, 2022 6:29 PM Tuesday, December 27, 2022 6:29 PM wrote:
> Thanks for reviewing our patch! PSA new version patch set.
Now, the patches fails to apply to the HEAD,
because of recent commits (c6e1f62e2c and 216a784829c) as reported in [1].

I'll rebase the patch with other changes when I post a new version.


[1] - http://cfbot.cputube.org/patch_41_3581.log


Best Regards,
Takamichi Osumi

Re: Schema variables - new implementation for Postgres 15 (typo)

2023-01-09 Thread Julien Rouhaud

Hi,

On Sat, Jan 07, 2023 at 08:09:27AM +0100, Pavel Stehule wrote:
> >
> > Hmm, how safe is it for third-party code to access the stored data directly
> > rather than a copy?  If it makes extension fragile if they're not careful
> > enough with cache invalidation, or even give them a way to mess up with the
> > data directly, it's probably not a good idea to provide such an API.
> >
>
> ok, I removed it

Another new behavior I see is the new rowtype_only parameter for
LookupVariable.  Has this been discussed?

I can see how it can be annoying to get a "variable isn't composite" type of
error when you already know that only a composite object can be used (and other
might work), but it looks really scary to entirely ignore some objects that
should be found in your search_path just because of their datatype.

And if we ignore something like "a.b" if "a" isn't a variable of composite
type, why wouldn't we apply the same "just ignore it" rule if it's indeed a
composite type but doesn't have any "b" field?  Your application could also
start to use different object if your drop a say json variable and create a
composite variable instead.

It seems to be in contradiction with how the rest of the system works and looks
wrong to me.  Note also that LookupVariable can be quite expensive since you
may have to do a lookup for every schema found in the search_path, so the
sooner it stops the better.

> > > updated patches attached

I forgot to mention it last time but you should bump the copyright year for all
new files added when you'll send a new version of the patchset.

Re: Allow DISTINCT to use Incremental Sort

2023-01-09 Thread David Rowley

Thanks for having a look at this.

On Tue, 10 Jan 2023 at 02:28, Richard Guo  wrote:
> +1 for the changes.  A minor comment is that previously on HEAD for
> SELECT DISTINCT case, if we have to do an explicit full sort atop the
> cheapest path, we try to make sure to always use the more rigorous
> ordering.

I'm not quite sure I follow what's changed here.  As far as I see it
the code still does what it did and uses the more rigorous sort.

postgres=# explain (costs off) select distinct on (relname) * from
pg_Class order by relname, oid;
QUERY PLAN
--
 Unique
   ->  Sort
 Sort Key: relname, oid
 ->  Seq Scan on pg_class

If it didn't, then there'd have been a Sort atop of the Unique to
ORDER BY relname,oid in the above.

Maybe you were looking at create_partial_distinct_paths()? We don't do
anything there for DISTINCT ON, although perhaps we could. Just not
for this patch.

>
> /* For explicit-sort case, always use the more rigorous clause */
> if (list_length(root->distinct_pathkeys) <
> list_length(root->sort_pathkeys))
> {
> needed_pathkeys = root->sort_pathkeys;
> /* Assert checks that parser didn't mess up... */
> Assert(pathkeys_contained_in(root->distinct_pathkeys,
>  needed_pathkeys));
> }
> else
> needed_pathkeys = root->distinct_pathkeys;
>
> I'm not sure if this is necessary, as AFAIU the parser should have
> ensured that the sortClause is always a prefix of distinctClause.

I think that's true for standard DISTINCT, but it's not for DISTINCT ON.

> In the patch this code has been removed.  I think we should also remove
> the related comments in create_final_distinct_paths.
>
>* When we have DISTINCT ON, we must sort by the more rigorous of
>* DISTINCT and ORDER BY, else it won't have the desired behavior.
> -  * Also, if we do have to do an explicit sort, we might as well use
> -  * the more rigorous ordering to avoid a second sort later.  (Note
> -  * that the parser will have ensured that one clause is a prefix of
> -  * the other.)

I'm not quite following what you think has changed here.

> Also, the comment just above this one is outdated too.
>
>   * First, if we have any adequately-presorted paths, just stick a
>   * Unique node on those.  Then consider doing an explicit sort of the
>   * cheapest input path and Unique'ing that.
>
> The two-step workflow is what is the case on HEAD but not any more in
> the patch.  And I think we should mention incremental sort on any paths
> with presorted keys.

Yeah, I agree, that comment should mention incremental sort.

I've attached an updated patch

David
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 000d757bdd..9ee3a019ec 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4654,22 +4654,63 @@ create_partial_distinct_paths(PlannerInfo *root, 
RelOptInfo *input_rel,

  cheapest_partial_path->rows,

  NULL, NULL);
 
-   /* first try adding unique paths atop of sorted paths */
+   /*
+* Try sorting the cheapest path and incrementally sorting any paths 
with
+* presorted keys and put a unique paths atop of those.
+*/
if (grouping_is_sortable(parse->distinctClause))
{
foreach(lc, input_rel->partial_pathlist)
{
-   Path   *path = (Path *) lfirst(lc);
+   Path   *input_path = (Path *) lfirst(lc);
+   Path   *sorted_path;
+   boolis_sorted;
+   int presorted_keys;
 
-   if (pathkeys_contained_in(root->distinct_pathkeys, 
path->pathkeys))
+   is_sorted = 
pathkeys_count_contained_in(root->distinct_pathkeys,
+   
input_path->pathkeys,
+   
_keys);
+
+   if (is_sorted)
+   sorted_path = input_path;
+   else
{
-   add_partial_path(partial_distinct_rel, (Path *)
-
create_upper_unique_path(root,
-   
  partial_distinct_rel,
-

Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-06 11:52:04 +0530, vignesh C wrote:
> On Sat, 29 Oct 2022 at 08:24, Andres Freund  wrote:
> >
> > The patches here aren't fully polished (as will be evident). But they should
> > be more than good enough to discuss whether this is a sane direction.
> 
> The patch does not apply on top of HEAD as in [1], please post a rebased
> patch.

Thanks for letting me now. Updated version attached.

Greetings,

Andres Freund
>From dc67e1ff43e550a2ff6a0181995f2f12bbb2a423 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 1 Jul 2020 19:06:45 -0700
Subject: [PATCH v2 01/14] aio: Add some error checking around pinning.

---
 src/include/storage/bufmgr.h|  1 +
 src/backend/storage/buffer/bufmgr.c | 42 -
 2 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 33eadbc1291..3becf32a3c0 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -129,6 +129,7 @@ extern void ReleaseBuffer(Buffer buffer);
 extern void UnlockReleaseBuffer(Buffer buffer);
 extern void MarkBufferDirty(Buffer buffer);
 extern void IncrBufferRefCount(Buffer buffer);
+extern void BufferCheckOneLocalPin(Buffer buffer);
 extern Buffer ReleaseAndReadBuffer(Buffer buffer, Relation relation,
    BlockNumber blockNum);
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 3fb38a25cfa..bfaf141edd7 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1707,6 +1707,8 @@ PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy)
 	bool		result;
 	PrivateRefCountEntry *ref;
 
+	Assert(!BufferIsLocal(b));
+
 	ref = GetPrivateRefCountEntry(b, true);
 
 	if (ref == NULL)
@@ -1852,6 +1854,8 @@ UnpinBuffer(BufferDesc *buf)
 	PrivateRefCountEntry *ref;
 	Buffer		b = BufferDescriptorGetBuffer(buf);
 
+	Assert(!BufferIsLocal(b));
+
 	/* not moving as we're likely deleting it soon anyway */
 	ref = GetPrivateRefCountEntry(b, false);
 	Assert(ref != NULL);
@@ -4209,6 +4213,25 @@ ConditionalLockBuffer(Buffer buffer)
 	LW_EXCLUSIVE);
 }
 
+void
+BufferCheckOneLocalPin(Buffer buffer)
+{
+	if (BufferIsLocal(buffer))
+	{
+		/* There should be exactly one pin */
+		if (LocalRefCount[-buffer - 1] != 1)
+			elog(ERROR, "incorrect local pin count: %d",
+ LocalRefCount[-buffer - 1]);
+	}
+	else
+	{
+		/* There should be exactly one local pin */
+		if (GetPrivateRefCount(buffer) != 1)
+			elog(ERROR, "incorrect local pin count: %d",
+ GetPrivateRefCount(buffer));
+	}
+}
+
 /*
  * LockBufferForCleanup - lock a buffer in preparation for deleting items
  *
@@ -4236,20 +4259,11 @@ LockBufferForCleanup(Buffer buffer)
 	Assert(BufferIsPinned(buffer));
 	Assert(PinCountWaitBuf == NULL);
 
-	if (BufferIsLocal(buffer))
-	{
-		/* There should be exactly one pin */
-		if (LocalRefCount[-buffer - 1] != 1)
-			elog(ERROR, "incorrect local pin count: %d",
- LocalRefCount[-buffer - 1]);
-		/* Nobody else to wait for */
-		return;
-	}
+	BufferCheckOneLocalPin(buffer);
 
-	/* There should be exactly one local pin */
-	if (GetPrivateRefCount(buffer) != 1)
-		elog(ERROR, "incorrect local pin count: %d",
-			 GetPrivateRefCount(buffer));
+	/* Nobody else to wait for */
+	if (BufferIsLocal(buffer))
+		return;
 
 	bufHdr = GetBufferDescriptor(buffer - 1);
 
@@ -4757,6 +4771,8 @@ LockBufHdr(BufferDesc *desc)
 	SpinDelayStatus delayStatus;
 	uint32		old_buf_state;
 
+	Assert(!BufferIsLocal(BufferDescriptorGetBuffer(desc)));
+
 	init_local_spin_delay();
 
 	while (true)
-- 
2.38.0

>From bb6a65580687d8bb932e2dc26c32e72025d34354 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 24 Oct 2022 12:28:06 -0700
Subject: [PATCH v2 02/14] hio: Release extension lock before initializing page
 / pinning VM

PageInit() while holding the extension lock is unnecessary after 0d1fe9f74e3
started to use RBM_ZERO_AND_LOCK - nobody can look at the new page before we
release the page lock. PageInit() zeroes the page, which isn't that cheap, so
deferring it until after the extension lock is released seems like a good idea.

Doing visibilitymap_pin() while holding the extension lock, introduced in
7db0cd2145f2, looks like an accident. Due to the restrictions on
HEAP_INSERT_FROZEN it's unlikely to be a performance issue, but it still seems
better to move it out.
---
 src/backend/access/heap/hio.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index e152807d2dc..7479212d4e0 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -623,6 +623,13 @@ loop:
 	 */
 	buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);
 
+	/*
+	 * Release the file-extension lock; it's now OK for someone else to extend
+	 * the relation some more.
+	 */
+	if (needLock)
+		UnlockRelationForExtension(relation, ExclusiveLock);
+
 	/*
 	 * We need to

Re: postgres_fdw uninterruptible during connection establishment / ProcSignalBarrier

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-03 12:05:20 -0800, Andres Freund wrote:
> The attached patch adds libpq-be-fe-helpers.h and uses it in libpqwalreceiver,
> dblink, postgres_fdw.

> As I made libpq-be-fe-helpers.h handle reserving external fds,
> libpqwalreceiver now does so. I briefly looked through its users without
> seeing cases of leaking in case of errors - which would already have been bad,
> since we'd already have leaked a libpq connection/socket.
> 
> 
> Given the lack of field complaints and the size of the required changes, I
> don't think we should backpatch this, even though it's pretty clearly buggy
> as-is.

Any comments on this? Otherwise I think I'll go with this approach.

Greetings,

Andres Freund

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-07 01:59:40 +, Imseih (AWS), Sami wrote:
> --- a/src/backend/access/nbtree/nbtree.c
> +++ b/src/backend/access/nbtree/nbtree.c
> @@ -998,6 +998,8 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult 
> *stats,
>   if (info->report_progress)
>   
> pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
>   
>  scanblkno);
> + if (info->report_parallel_progress && (scanblkno % 
> REPORT_PARALLEL_VACUUM_EVERY_PAGES) == 0)
> + 
> parallel_vacuum_update_progress(info->parallel_vacuum_state);
>   }
>   }

I still think it's wrong to need multiple pgstat_progress_*() calls within one
scan. If we really need it, it should be abstracted into a helper function
that wrapps all the vacuum progress stuff needed for an index.

> @@ -688,7 +703,13 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState 
> *pvs, int num_index_scan
>*/
>   if (nworkers > 0)
>   {
> - /* Wait for all vacuum workers to finish */
> + /*
> +  * Wait for all indexes to be vacuumed while
> +  * updating the parallel vacuum index progress,
> +  * and then wait for all workers to finish.
> +  */
> + parallel_vacuum_wait_to_finish(pvs);
> +
>   WaitForParallelWorkersToFinish(pvs->pcxt);
>  
>   for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)

I don't think it's a good idea to have two difference routines that wait for
workers to exit.  And certainly not when one of them basically just polls in a
regular interval as parallel_vacuum_wait_to_finish().

I think the problem here is that you're basically trying to work around the
lack of an asynchronous state update mechanism between leader and workers. The
workaround is to add a lot of different places that poll whether there has
been any progress. And you're not doing that by integrating with the existing
machinery for interrupt processing (i.e. CHECK_FOR_INTERRUPTS()), but by
developing a new mechanism.

I think your best bet would be to integrate with HandleParallelMessages().

Greetings,

Andres Freund

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-09 Thread Peter Geoghegan

On Mon, Jan 9, 2023 at 5:22 PM Andres Freund  wrote:
> I've also seen the inverse, with recent versions of postgres: Autovacuum can
> only ever make progress if it's an anti-wraparound vacuum, because it'll
> always get cancelled otherwise.  I'm worried that substantially increasing the
> time until an anti-wraparound autovacuum happens will lead to more users
> running into out-of-xid shutdowns.
>
> I don't think it's safe to just increase the time at which anti-wrap vacuums
> happen to a hardcoded 1 billion.

That's not what the patch does. It doubles the time that the anti-wrap
no-autocancellation behaviors kick in, up to a maximum of 1 billion
XIDs/MXIDs. So it goes from autovacuum_freeze_max_age to
autovacuum_freeze_max_age x 2, without changing the basic fact that we
initially launch autovacuums that advance relfrozenxid/relminmxid when
the autovacuum_freeze_max_age threshold is first crossed.

These heuristics are totally negotiable -- and likely should be
thought out in more detail. It's likely that most of the benefit of
the patch comes from simply trying to advance relfrozenxid without the
special auto-cancellation behavior one single time. The main problem
right now is that the threshold that launches most antiwraparound
autovacuums is exactly the same as the threshold that activates the
auto-cancellation protections. Even doing the latter very slightly
later than the former could easily make things much better, while
adding essentially no risk of the kind you're concerned about.

> I'm also doubtful that it's ok to just make all autovacuums on relations with
> an age > 1 billion anti-wraparound ones. For people that use a large
> autovacuum_freeze_max_age that will be a rude awakening.

Actually, users that have autovacuum_freeze_max_age set to over 1
billion will get exactly the same behavior as before (except that the
instrumentation of autovacuum will be better). It'll be identical.

If you set autovacuum_freeze_max_age to 2 billion, and a "standard"
autovacuum is launched on a table whose relfrozenxid age is 1.5
billion, it'll just be a regular dead tuples/inserted tuples
autovacuum, with the same old familiar locking characteristics as
today.

-- 
Peter Geoghegan

Re: Strengthen pg_waldump's --save-fullpage tests

2023-01-09 Thread Michael Paquier

On Mon, Jan 09, 2023 at 08:30:00AM +0530, Bharath Rupireddy wrote:
> A recent commit [1] added --save-fullpage option to pg_waldump to
> extract full page images (FPI) from WAL records and save them into
> files (one file per FPI) under a specified directory. While it added
> tests to check the LSN from the FPI file name and the FPI file
> contents, it missed to further check the FPI contents like the tuples
> on the page. I'm attaching a patch that basically reads the FPI file
> (saved by pg_waldump) contents and raw page from the table file (using
> pageinspect extension) and compares the tuples from both of them. This
> test ensures that the pg_waldump outputs the correct FPI. This idea is
> also discussed elsewhere [2].
> 
> Thoughts?

I am not sure that it is necessary to expand this set of tests to have
dependencies on heap and pageinspect (if we do so, what of index AMs)
and spend more cycles on that, while we already have something in
place to cross-check ReadRecPtr with what's stored in the page header
written on top of the block size.
--
Michael


signature.asc
Description: PGP signature

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-08 17:49:20 -0800, Peter Geoghegan wrote:
> Teach autovacuum.c to launch "table age" autovacuums at the same point
> that it previously triggered antiwraparound autovacuums.  Antiwraparound
> autovacuums are retained, but are only used as a true option of last
> resort, when regular autovacuum has presumably tried and failed to
> advance relfrozenxid (likely because the auto-cancel behavior kept
> cancelling regular autovacuums triggered based on table age).

I've also seen the inverse, with recent versions of postgres: Autovacuum can
only ever make progress if it's an anti-wraparound vacuum, because it'll
always get cancelled otherwise.  I'm worried that substantially increasing the
time until an anti-wraparound autovacuum happens will lead to more users
running into out-of-xid shutdowns.

I don't think it's safe to just increase the time at which anti-wrap vacuums
happen to a hardcoded 1 billion.

I'm also doubtful that it's ok to just make all autovacuums on relations with
an age > 1 billion anti-wraparound ones. For people that use a large
autovacuum_freeze_max_age that will be a rude awakening.

I am all in favor for adding logic to trigger autovacuum based on the table
age, without needing to reach autovacuum_freeze_max_age. It never made sense
to me that we get to the "emergency mode" in entirely normal operation. But
I'm not in favor of just entirely reinterpreting existing GUCs and adding
important thresholds as hardcoded numbers.

Greetings,

Andres Freund

Re: Allow +group in pg_ident.conf

2023-01-09 Thread Michael Paquier

On Mon, Jan 09, 2023 at 05:33:14PM -0500, Andrew Dunstan wrote:
> I've adapted a sentence from the pg_hba.conf documentation so we stay
> consistent.

+  
+   If the database-username begins with a
+   + character, then the operating system user can login as
+   any user belonging to that role, similarly to how user names beginning with
+   + are treated in pg_hba.conf.
+   Thus, a + mark means match any of the roles that
+   are directly or indirectly members of this role, while a name
+   without a + mark matches only that specific role.
+  

Should this also mention that the behavior is enforced even in cases
where we expect a case-sensitive match?

> It's not really relevant. We're not comparing role names here; rather we
> look up two roles and then ask if one is a member of the other. I've
> added a comment.
> 
> Thanks for the review (I take it you're generally in favor).

-   if (case_insensitive)
+   if (regexp_pgrole[0] == '+')
+   {
+   /*
+* Since we're not comparing role names here, use of case
+* insensitive matching doesn't really apply.
+*/
+   Oid roleid = get_role_oid(pg_role, true);
+   Assert(false);
+   if (is_member(roleid, regexp_pgrole +1))
+   *found_p = true;
+   }
+   else if (case_insensitive)

Hmm.  As check_ident_usermap() is now coded, it means that the case of
a syntax substitution could be enforced to use a group with the user
name given by the client.  For example, take this ident entry:
mymap   /^(.*)@mydomain\.com$  \1

Then, if we attempt to access Postgres with "+testr...@mydomain.com",
we would get a substitution to "+testrole", which would be enforced to
check on is_member() with this expected role name.  I am not sure what
should be the correct behavior here.
--
Michael


signature.asc
Description: PGP signature

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-09 Thread Andres Freund

Hi,

I'm a bit worried that this is optimizing the rare case while hurting the
common case. See e.g. my point below about creating additional slots in the
happy path.

It's also not clear that change is right directionally. If we want to avoid
re-fetching the "original" row version, why don't we provide that
functionality via table_tuple_lock()?


On 2023-01-09 13:29:18 +0300, Alexander Korotkov wrote:
> @@ -53,6 +53,12 @@ static bool SampleHeapTupleVisible(TableScanDesc scan, 
> Buffer buffer,
>  HeapTuple 
> tuple,
>  OffsetNumber 
> tupoffset);
>
> +static TM_Result heapam_tuple_lock_internal(Relation relation, ItemPointer 
> tid,
> + 
> Snapshot snapshot, TupleTableSlot *slot,
> + 
> CommandId cid, LockTupleMode mode,
> + 
> LockWaitPolicy wait_policy, uint8 flags,
> + 
> TM_FailureData *tmfd, bool updated);
> +
>  static BlockNumber heapam_scan_get_blocks_done(HeapScanDesc hscan);
>
>  static const TableAmRoutine heapam_methods;
> @@ -299,14 +305,39 @@ heapam_tuple_complete_speculative(Relation relation, 
> TupleTableSlot *slot,
>  static TM_Result
>  heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
>   Snapshot snapshot, Snapshot crosscheck, 
> bool wait,
> - TM_FailureData *tmfd, bool changingPart)
> + TM_FailureData *tmfd, bool changingPart,
> + TupleTableSlot *lockedSlot)
>  {
> + TM_Result   result;
> +
>   /*
>* Currently Deleting of index tuples are handled at vacuum, in case if
>* the storage itself is cleaning the dead tuples by itself, it is the
>* time to call the index tuple deletion also.
>*/
> - return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, 
> changingPart);
> + result = heap_delete(relation, tid, cid, crosscheck, wait, tmfd, 
> changingPart);
> +
> + /*
> +  * If the tuple has been concurrently updated, get lock already so that 
> on
> +  * retry it will succeed, provided that the caller asked to do this by
> +  * providing a lockedSlot.
> +  */
> + if (result == TM_Updated && lockedSlot != NULL)
> + {
> + result = heapam_tuple_lock_internal(relation, tid, snapshot,
> + 
> lockedSlot, cid, LockTupleExclusive,
> + 
> LockWaitBlock,
> + 
> TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
> + 
> tmfd, true);

You're ignoring the 'wait' parameter here, no? I think the modification to
heapam_tuple_update() has the same issue.


> + if (result == TM_Ok)
> + {
> + tmfd->traversed = true;
> + return TM_Updated;
> + }
> + }
> +
> + return result;

Doesn't this mean that the caller can't easily distinguish between
heapam_tuple_delete() and heapam_tuple_lock_internal() returning a failure
state?


> @@ -350,213 +402,8 @@ heapam_tuple_lock(Relation relation, ItemPointer tid, 
> Snapshot snapshot,
> LockWaitPolicy wait_policy, uint8 flags,
> TM_FailureData *tmfd)
>  {

Moving the entire body of the function around, makes it harder to review
this change, because the code movement is intermingled with "actual" changes.


> +/*
> + * This routine does the work for heapam_tuple_lock(), but also support
> + * `updated` to re-use the work done by heapam_tuple_update() or
> + * heapam_tuple_delete() on fetching tuple and checking its visibility.
> + */
> +static TM_Result
> +heapam_tuple_lock_internal(Relation relation, ItemPointer tid, Snapshot 
> snapshot,
> +TupleTableSlot *slot, 
> CommandId cid, LockTupleMode mode,
> +LockWaitPolicy wait_policy, 
> uint8 flags,
> +TM_FailureData *tmfd, bool 
> updated)
> +{
> + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
> + TM_Result   result;
> + Buffer  buffer = InvalidBuffer;
> + HeapTuple   tuple = >base.tupdata;
> + boolfollow_updates;
> +
> + follow_updates

Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-09 13:46:50 +0300, Alexander Korotkov wrote:
> I'm going to push v9.

Could you hold off for a bit? I'd like to look at this, I'm not sure I like
the direction.

Greetings,

Andres Freund

Re: [PATCH] random_normal function

2023-01-09 Thread Tom Lane

Dean Rasheed  writes:
> On Mon, 9 Jan 2023 at 18:52, Tom Lane  wrote:
>> (We could probably go further
>> than this, like trying to verify distribution properties.  But
>> it's been too long since college statistics for me to want to
>> write such tests myself, and I'm not real sure we need them.)

> I played around with the Kolmogorov–Smirnov test, to test random() for
> uniformity. The following passes roughly 99.9% of the time:

Ah, cool, thanks for this code.

> With a one-in-a-thousand chance of failing, if you wanted something
> with around a one-in-a-billion chance of failing, you could just try
> it 3 times:
> SELECT ks_test_uniform_random() OR
>ks_test_uniform_random() OR
>ks_test_uniform_random();
> but it feels pretty hacky, and probably not really necessary.

That seems like a good way, because I'm not satisfied with
one-in-a-thousand odds if we want to remove the "ignore" marker.
It's still plenty fast enough: on my machine, the v2 patch below
takes about 19ms, versus 22ms for the script as it stands in HEAD.

> Rigorous tests for other distributions are harder, but also probably
> not necessary if we have confidence that the underlying PRNG is
> uniform.

Agreed.

>> BTW, if this does bring the probability of failure down to the
>> one-in-a-billion range, I think we could also nuke the whole
>> "ignore:" business, simplifying pg_regress and allowing the
>> random test to be run in parallel with others.

> I didn't check the one-in-a-billion claim, but +1 for that.

I realized that we do already run random in a parallel group;
the "ignore: random" line in parallel_schedule just marks it
as failure-ignorable, it doesn't schedule it.  (The comment is a
bit misleading about this, but I want to remove that not rewrite it.)
Nonetheless, nuking the whole ignore-failures mechanism seems like
good cleanup to me.

Also, I tried this on some 32-bit big-endian hardware (NetBSD on macppc)
to verify my thesis that the results of random() are now machine
independent.  That part works, but the random_normal() tests didn't;
I saw low-order-bit differences from the results on x86_64 Linux.
Presumably, that's because one or more of sqrt()/log()/sin() are
rounding off a bit differently there.  v2 attached deals with this by
backing off to "extra_float_digits = 0" for that test.  Once it hits the
buildfarm we might find we have to reduce extra_float_digits some more,
but that was enough to make NetBSD/macppc happy.

regards, tom lane

diff --git a/src/test/regress/expected/random.out b/src/test/regress/expected/random.out
index 30bd866138..8785c88ad2 100644
--- a/src/test/regress/expected/random.out
+++ b/src/test/regress/expected/random.out
@@ -1,81 +1,146 @@
 --
 -- RANDOM
--- Test the random function
+-- Test random() and allies
 --
--- count the number of tuples originally, should be 1000
-SELECT count(*) FROM onek;
- count 

-  1000
-(1 row)
-
--- pick three random rows, they shouldn't match
-(SELECT unique1 AS random
-  FROM onek ORDER BY random() LIMIT 1)
-INTERSECT
-(SELECT unique1 AS random
-  FROM onek ORDER BY random() LIMIT 1)
-INTERSECT
-(SELECT unique1 AS random
-  FROM onek ORDER BY random() LIMIT 1);
- random 
-
+-- Tests in this file may have a small probability of failure,
+-- since we are dealing with randomness.  Try to keep the failure
+-- risk for any one test case under 1e-9.
+--
+-- There should be no duplicates in 1000 random() values.
+-- (Assuming 52 random bits in the float8 results, we could
+-- take as many as 3000 values and still have less than 1e-9 chance
+-- of failure, per https://en.wikipedia.org/wiki/Birthday_problem)
+SELECT r, count(*)
+FROM (SELECT random() r FROM generate_series(1, 1000)) ss
+GROUP BY r HAVING count(*) > 1;
+ r | count 
+---+---
 (0 rows)
 
--- count roughly 1/10 of the tuples
-CREATE TABLE RANDOM_TBL AS
-  SELECT count(*) AS random
-  FROM onek WHERE random() < 1.0/10;
--- select again, the count should be different
-INSERT INTO RANDOM_TBL (random)
-  SELECT count(*)
-  FROM onek WHERE random() < 1.0/10;
--- select again, the count should be different
-INSERT INTO RANDOM_TBL (random)
-  SELECT count(*)
-  FROM onek WHERE random() < 1.0/10;
--- select again, the count should be different
-INSERT INTO RANDOM_TBL (random)
-  SELECT count(*)
-  FROM onek WHERE random() < 1.0/10;
--- now test that they are different counts
-SELECT random, count(random) FROM RANDOM_TBL
-  GROUP BY random HAVING count(random) > 3;
- random | count 
-+---
-(0 rows)
+-- The range should be [0, 1).  We can expect that at least one out of 2000
+-- random values is in the lowest or highest 1% of the range with failure
+-- probability less than about 1e-9.
+SELECT count(*) FILTER (WHERE r < 0 OR r >= 1) AS out_of_range,
+   (count(*) FILTER (WHERE r < 0.01)) > 0 AS has_small,
+   (count(*) FILTER (WHERE r > 0.99)) > 0 AS has_large
+FROM (SELECT random() r FROM generate_series(1, 2000)) ss;
+ out_of_range

Re: Allow +group in pg_ident.conf

2023-01-09 Thread Jelte Fennema

This seems very much related to my patch here:
https://commitfest.postgresql.org/41/4081/ (yes, somehow the thread
got split. I blame outlook)

I'll try to review this one soonish.

Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

2023-01-09 Thread Peter Geoghegan

On Mon, Jan 9, 2023 at 12:51 PM Robert Haas  wrote:
> I feel that you should at least have a reproducer for these problems
> posted to the thread, and ideally a regression test, before committing
> things. I think it's very hard to understand what the problems are
> right now.

Hard to understand relative to what, exactly? We're talking about a
very subtle race condition here.

I'll try to come up with a reproducer, but I *utterly* reject your
assertion that it's a hard requirement, sight unseen. Why should those
be the parameters of the discussion?

For one thing I'm quite confident that I'm right, with or without a
reproducer. And my argument isn't all that hard to follow, if you have
relevant expertise, and actually take the time. But even this is
unlikely to matter much. Even if I somehow turn out to have been
completely wrong about the race condition, it is still self-evident
that the problem of uselessly WAL logging non-changes to the VM
exists. That doesn't require any concurrent access at all. It's a
natural consequence of calling visibilitymap_set() with
VISIBILITYMAP_ALL_FROZEN-only flags. You need only look at the code
for 2 minutes to see it.

> I don't particularly have a problem with the idea of 0001, because if
> we use InvalidTransactionId to mean that there cannot be any
> conflicts, we do not need FrozenTransactionId to mean the same thing.
> Picking one or the other makes sense.

We've already picked one, many years ago -- InvalidTransactionId. This
is a long established convention, common to all REDO routines that are
capable of creating granular conflicts.

I already committed 0001 over a week ago. We were calling
ResolveRecoveryConflictWithSnapshot with FrozenTransactionId arguments
before now, which was 100% guaranteed to be a waste of cycles. I saw
no need to wait more than a few days for a +1, given that this
particular issue was so completely clear cut.

--
Peter Geoghegan

Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2023-01-09 Thread Nathan Bossart

On Tue, Jan 03, 2023 at 03:45:49PM -0800, Nathan Bossart wrote:
> I'd like to get this fixed, but I have yet to hear thoughts on the
> suggested approach.  I'll proceed with adjusting the tests and
> documentation shortly unless someone objects.

As promised, here is a new version of the patch with adjusted tests and
documentation.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From e46033659ec3fd2bfeac7d816364b72eccb55410 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 9 Jan 2023 14:29:24 -0800
Subject: [PATCH v4 1/1] fix MAINTAIN privs

---
 doc/src/sgml/ref/analyze.sgml |  5 ++-
 doc/src/sgml/ref/cluster.sgml | 20 +++--
 doc/src/sgml/ref/lock.sgml|  5 ++-
 doc/src/sgml/ref/reindex.sgml |  9 +++-
 doc/src/sgml/ref/vacuum.sgml  |  8 +++-
 src/backend/catalog/partition.c   | 22 ++
 src/backend/catalog/toasting.c| 32 +++
 src/backend/commands/cluster.c| 35 +++-
 src/backend/commands/indexcmds.c  | 10 ++---
 src/backend/commands/lockcmds.c   |  9 
 src/backend/commands/tablecmds.c  | 41 ++-
 src/backend/commands/vacuum.c |  7 ++--
 src/include/catalog/partition.h   |  1 +
 src/include/catalog/pg_class.h|  1 +
 src/include/catalog/toasting.h|  1 +
 src/include/commands/tablecmds.h  |  1 +
 .../expected/cluster-conflict-partition.out   |  6 ++-
 .../specs/cluster-conflict-partition.spec |  5 +--
 src/test/regress/expected/cluster.out |  4 +-
 src/test/regress/expected/vacuum.out  | 18 
 src/test/regress/sql/cluster.sql  |  2 +
 21 files changed, 189 insertions(+), 53 deletions(-)

diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index a26834da4f..2f94e89cb0 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -156,7 +156,10 @@ ANALYZE [ VERBOSE ] [ table_and_columnsANALYZE can only be performed by superusers and roles
-   with privileges of pg_maintain.)
+   with privileges of pg_maintain.)  If a role has
+   permission to ANALYZE a partitioned table, it is also
+   permitted to ANALYZE each of its partitions, regardless
+   of whether the role has the aforementioned privileges on the partition.
ANALYZE will skip over any tables that the calling user
does not have permission to analyze.
   
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 145101e6a5..5616293eef 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -69,10 +69,7 @@ CLUSTER [VERBOSE]
   
CLUSTER without any parameter reclusters all the
previously-clustered tables in the current database that the calling user
-   owns or has the MAINTAIN privilege for, or all such tables
-   if called by a superuser or a role with privileges of the
-   pg_maintain
-   role.  This form of CLUSTER cannot be
+   has privileges for.  This form of CLUSTER cannot be
executed inside a transaction block.
   
 
@@ -134,6 +131,21 @@ CLUSTER [VERBOSE]
  
   Notes
 
+   
+To cluster a table, one must have the MAINTAIN privilege
+on the table or be the table's owner, a superuser, or a role with
+privileges of the
+pg_maintain
+role.  If a role has permission to CLUSTER a partitioned
+table, it is also permitted to CLUSTER each of its
+partitions, regardless of whether the role has the aforementioned
+privileges on the partition.  Similarly, if a role has permission to
+CLUSTER the main relation of a TOAST
+table, it is also permitted to CLUSTER its
+TOAST table.  CLUSTER will skip over
+any tables that the calling user does not have permission to cluster.
+   
+

 In cases where you are accessing single rows randomly
 within a table, the actual order of the data in the
diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml
index d9c5bf9a1d..21a9f88c70 100644
--- a/doc/src/sgml/ref/lock.sgml
+++ b/doc/src/sgml/ref/lock.sgml
@@ -176,7 +176,10 @@ LOCK [ TABLE ] [ ONLY ] name [ * ]
 or TRUNCATE privileges on the target table. All other
 forms of LOCK are allowed with
 table-level UPDATE, DELETE,
-or TRUNCATE privileges.
+or TRUNCATE privileges.  If a role has permission to
+lock a partitioned table, it is also permitted to lock each of its
+partitions, regardless of whether the role has the aforementioned
+privileges on the partition.

 

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 192513f34e..aa95c8743e 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -306,7 +306,14 @@ REINDEX [ ( option [, ...] ) ] { DA
indexes on shared catalogs will be skipped unless the user owns the
catalog (which typically won't be

Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-01-09 Thread Tom Lane

I continue to think that this is a fundamentally bad idea.  It creates
all sorts of uncertainties about what is a valid update path and what
is not.  Restrictions like

+ Such wildcard update
+ scripts will only be used when no explicit path is found from
+ old to target version.

are just band-aids to try to cover up the worst problems.

Have you considered the idea of instead inventing a "\include" facility
for extension scripts?  Then, if you want to use one-monster-script
to handle different upgrade cases, you still need one script file for
each supported upgrade step, but those can be one-liners including the
common script file.  Plus, such a facility could be of use to people
who want intermediate factorization solutions (that is, some sharing
of code without buying all the way into one-monster-script).

regards, tom lane

Re: [PATCH] random_normal function

2023-01-09 Thread Dean Rasheed

On Mon, 9 Jan 2023 at 18:52, Tom Lane  wrote:
>
> I pushed Paul's patch with the previously-discussed tests, but
> the more I look at random.sql the less I like it.  I propose
> that we nuke the existing tests from orbit and replace with
> something more or less as attached.

Looks like a definite improvement.

>  (We could probably go further
> than this, like trying to verify distribution properties.  But
> it's been too long since college statistics for me to want to
> write such tests myself, and I'm not real sure we need them.)

I played around with the Kolmogorov–Smirnov test, to test random() for
uniformity. The following passes roughly 99.9% of the time:

CREATE OR REPLACE FUNCTION ks_test_uniform_random()
RETURNS boolean AS
$$
DECLARE
  n int := 1000;-- Number of samples
  c float8 := 1.94947;  -- Critical value for 99.9% confidence
/*  c float8 := 1.62762;  -- Critical value for 99% confidence */
/*  c float8 := 1.22385;  -- Critical value for 90% confidence */
  ok boolean;
BEGIN
  ok := (
WITH samples AS (
  SELECT random() r FROM generate_series(1, n) ORDER BY 1
), indexed_samples AS (
  SELECT (row_number() OVER())-1.0 i, r FROM samples
)
SELECT max(abs(i/n-r)) < c / sqrt(n) FROM indexed_samples
  );
  RETURN ok;
END
$$
LANGUAGE plpgsql;

and is very fast. So that gives decent confidence that random() is
indeed uniform.

With a one-in-a-thousand chance of failing, if you wanted something
with around a one-in-a-billion chance of failing, you could just try
it 3 times:

SELECT ks_test_uniform_random() OR
   ks_test_uniform_random() OR
   ks_test_uniform_random();

but it feels pretty hacky, and probably not really necessary.

Rigorous tests for other distributions are harder, but also probably
not necessary if we have confidence that the underlying PRNG is
uniform.

> BTW, if this does bring the probability of failure down to the
> one-in-a-billion range, I think we could also nuke the whole
> "ignore:" business, simplifying pg_regress and allowing the
> random test to be run in parallel with others.

I didn't check the one-in-a-billion claim, but +1 for that.

Regards,
Dean

Re: Allow +group in pg_ident.conf

2023-01-09 Thread Andrew Dunstan


On 2023-01-09 Mo 13:24, Nathan Bossart wrote:
> On Mon, Jan 09, 2023 at 08:00:26AM -0500, Andrew Dunstan wrote:
>> +   If the database-username begins with a
>> +   + character, then the operating system user can login 
>> as
>> +   any user belonging to that role, similarly to how user names beginning 
>> with
>> +   + are treated in pg_hba.conf.
> I would ѕuggest making it clear that this means role membership and not
> privileges via INHERIT.


I've adapted a sentence from the pg_hba.conf documentation so we stay
consistent.


>> -if (case_insensitive)
>> +if (regexp_pgrole[0] == '+')
>> +{
>> +Oid roleid = get_role_oid(pg_role, true);
>> +if (is_member(roleid, regexp_pgrole +1))
>> +*found_p = true;
>> +}
>> +else if (case_insensitive)
> It looks like the is_member() check will always be case-sensitive.  Should
> it respect the value of case_insensitive?  If not, I think there should be
> a brief comment explaining why.


It's not really relevant. We're not comparing role names here; rather we
look up two roles and then ask if one is a member of the other. I've
added a comment.

Thanks for the review (I take it you're generally in favor).


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index cc8c59206c..a1c9a2eefc 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -961,6 +961,15 @@ mymap   /^(.*)@otherdomain\.com$   guest
@mydomain.com, and allow any user whose system name ends with
@otherdomain.com to log in as guest.
   
+  
+   If the database-username begins with a
+   + character, then the operating system user can login as
+   any user belonging to that role, similarly to how user names beginning with
+   + are treated in pg_hba.conf.
+   Thus, a + mark means match any of the roles that
+   are directly or indirectly members of this role, while a name
+   without a + mark matches only that specific role.
+  
 
   

diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index f5a2cc53be..2d186639b1 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -2900,9 +2900,20 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 
 		/*
 		 * now check if the username actually matched what the user is trying
-		 * to connect as
+		 * to connect as. First check if it's a member of a specified group
+		 * role.
 		 */
-		if (case_insensitive)
+		if (regexp_pgrole[0] == '+')
+		{
+			/*
+			 * Since we're not comparing role names here, use of case
+			 * insensitive matching doesn't really apply.
+			 */
+			Oid roleid = get_role_oid(pg_role, true);
+			if (is_member(roleid, regexp_pgrole +1))
+*found_p = true;
+		}
+		else if (case_insensitive)
 		{
 			if (pg_strcasecmp(regexp_pgrole, pg_role) == 0)
 *found_p = true;
@@ -2919,16 +2930,29 @@ check_ident_usermap(IdentLine *identLine, const char *usermap_name,
 	else
 	{
 		/* Not regular expression, so make complete match */
-		if (case_insensitive)
+
+		char *map_role = identLine->pg_role;
+
+		if (case_insensitive ?
+			(pg_strcasecmp(identLine->token->string, ident_user) != 0) :
+			(strcmp(identLine->token->string, ident_user) != 0))
+			return;
+
+		if (map_role[0] == '+')
 		{
-			if (pg_strcasecmp(identLine->pg_role, pg_role) == 0 &&
-pg_strcasecmp(identLine->token->string, ident_user) == 0)
+			/* see comment above re case insensitive matching in this case */
+			Oid roleid = get_role_oid(pg_role, true);
+			if (is_member(roleid, ++map_role))
+*found_p = true;
+		}
+		else if (case_insensitive)
+		{
+			if (pg_strcasecmp(map_role, pg_role) == 0)
 *found_p = true;
 		}
 		else
 		{
-			if (strcmp(identLine->pg_role, pg_role) == 0 &&
-strcmp(identLine->token->string, ident_user) == 0)
+			if (strcmp(map_role, pg_role) == 0)
 *found_p = true;
 		}
 	}
diff --git a/src/test/authentication/t/003_peer.pl b/src/test/authentication/t/003_peer.pl
index 24cefd14e0..93ef2b8f68 100644
--- a/src/test/authentication/t/003_peer.pl
+++ b/src/test/authentication/t/003_peer.pl
@@ -100,6 +100,8 @@ if (find_in_log(
 
 # Add a database role, to use for the user name map.
 $node->safe_psql('postgres', qq{CREATE ROLE testmapuser LOGIN});
+$node->safe_psql('postgres',"CREATE ROLE testmapgroup NOLOGIN");
+$node->safe_psql('postgres', "GRANT testmapgroup TO testmapuser");
 
 # Extract as well the system user for the user name map.
 my $system_user =
@@ -142,17 +144,59 @@ test_role(
 
 
 # Concatenate system_user to system_user.
-$regex_test_string = $system_user . $system_user;
+my $bad_regex_test_string = $system_user . $system_user;
 
 # Failure as the regular expression does not match.
-reset_pg_ident($node, 'mypeermap', qq{/^.*$regex_test_string\$},
+reset_pg_ident($node, 'mypeermap', qq{/^.*$bad_regex_test_string\$},
 	'testmapuser');

Re: doc: add missing "id" attributes to extension packaging page

2023-01-09 Thread Tom Lane

"Karl O. Pinc"  writes:
> I think there's more to comment on regards the xslt in the
> other patch, but I'll wait for the new thread for that patch.
> That might be where there should be warnings raised, etc.

We can continue using this thread, now that the other commit is in.

regards, tom lane

Re: doc: add missing "id" attributes to extension packaging page

2023-01-09 Thread Karl O. Pinc

On Mon, 09 Jan 2023 15:18:18 -0500
Tom Lane  wrote:

> Brar Piening  writes:
> > On 09.01.2023 at 03:38, vignesh C wrote:  
> >> There are couple of commitfest entries for this:
> >> https://commitfest.postgresql.org/41/4041/
> >> https://commitfest.postgresql.org/41/4042/ Can one of them be
> >> closed?  
> 
> > I've split the initial patch into two parts upon Álvaro's request
> > in [1] so that we can discuss them separately  

> I pushed the ID-addition patch, with a few fixes:

> It's probably going to be necessary to have follow-on patches,
> because I'm sure there is stuff in the pipeline that adds more
> ID-less tags.  Or do we have a way to create warnings about that?

I am unclear on how to make warnings with xslt.  You can make
a listing of problems, but who would read it if the build
completed successfully?  You can make errors and abort.

But my xslt and docbook and pg-docs-fu are a bit stale.

I think there's more to comment on regards the xslt in the
other patch, but I'll wait for the new thread for that patch.
That might be where there should be warnings raised, etc.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: WAL Insertion Lock Improvements (was: Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish())

2023-01-09 Thread Andres Freund

Hi,

On 2022-12-08 12:29:54 +0530, Bharath Rupireddy wrote:
> On Tue, Dec 6, 2022 at 12:00 AM Andres Freund  wrote:
> > I think it'd be safe to optimize LWLockConflictsWithVar(), due to some
> > pre-existing, quite crufty, code. LWLockConflictsWithVar() says:
> >
> >  * Test first to see if it the slot is free right now.
> >  *
> >  * XXX: the caller uses a spinlock before this, so we don't need a 
> > memory
> >  * barrier here as far as the current usage is concerned.  But that 
> > might
> >  * not be safe in general.
> >
> > which happens to be true in the single, indirect, caller:
> >
> > /* Read the current insert position */
> > SpinLockAcquire(>insertpos_lck);
> > bytepos = Insert->CurrBytePos;
> > SpinLockRelease(>insertpos_lck);
> > reservedUpto = XLogBytePosToEndRecPtr(bytepos);
> >
> > I think at the very least we ought to have a comment in
> > WaitXLogInsertionsToFinish() highlighting this.
> 
> So, using a spinlock ensures no memory ordering occurs while reading
> lock->state in LWLockConflictsWithVar()?

No, a spinlock *does* imply ordering. But your patch does remove several of
the spinlock acquisitions (via LWLockWaitListLock()). And moved the assignment
in LWLockUpdateVar() out from under the spinlock.

If you remove spinlock operations (or other barrier primitives), you need to
make sure that such modifications don't break the required memory ordering.

> How does spinlock that gets acquired and released in the caller
> WaitXLogInsertionsToFinish() itself and the memory barrier in the called
> function LWLockConflictsWithVar() relate here? Can you please help me
> understand this a bit?

The caller's barrier means that we'll see values that are at least as "up to
date" as at the time of the barrier (it's a bit more complicated than that, a
barrier always needs to be paired with another barrier).

> > It's not at all clear to me that the proposed fast-path for 
> > LWLockUpdateVar()
> > is safe. I think at the very least we could end up missing waiters that we
> > should have woken up.
> >
> > I think it ought to be safe to do something like
> >
> > pg_atomic_exchange_u64()..
> > if (!(pg_atomic_read_u32(>state) & LW_FLAG_HAS_WAITERS))
> >   return;
> 
> pg_atomic_exchange_u64(>state, exchange_with_what_?. Exchange will
> change the value no?

Not lock->state, but the atomic passed to LWLockUpdateVar(), which we do want
to update. An pg_atomic_exchange_u64() includes a memory barrier.

> > because the pg_atomic_exchange_u64() will provide the necessary memory
> > barrier.
> 
> I'm reading some comments [1], are these also true for 64-bit atomic
> CAS?

Yes. See
/* 
 * The 64 bit operations have the same semantics as their 32bit counterparts
 * if they are available. Check the corresponding 32bit function for
 * documentation.
 * 
 */

> Does it mean that an atomic CAS operation inherently provides a
> memory barrier?

Yes.

> Can you please point me if it's described better somewhere else?

I'm not sure what you'd like to have described more extensively, tbh.

Greetings,

Andres Freund

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2023-01-09 Thread Peter Geoghegan

On Mon, Jan 9, 2023 at 1:43 PM Andres Freund  wrote:
> ISTM that some of the page level freezing functions are misnamed. In heapam.c
> the heap_xlog* routines are for replay, afaict. However
> heap_xlog_freeze_plan() is used to WAL log the freeze
> plan. heap_xlog_freeze_page() is used to replay that WAL record. Probably your
> brain is too used to nbtree/ :).

Sometimes I wonder why other people stubbornly insist on not starting
every function name with an underscore.   :-)

> I think s/heap_xlog_freeze/heap_log_freeze/ would mostly do the trick, except
> that heap_xlog_new_freeze_plan() doesn't quite fit in the scheme.

> The routines then also should be moved a bit up, because right now they're
> inbetween other routines doing WAL replay, adding to the confusion.

I believe that I used this scheme because of the fact that the new
functions were conceptually related to REDO routines, even though they
run during original execution. I'm quite happy to revise the code
based on your suggestions, though.

> The memcpy in heap_xlog_freeze_page() seems a tad odd. I assume that the
> alignment guarantees for xl_heap_freeze_plan are too weak?

They're not too weak. I'm not sure why the memcpy() was used. I see
your point; it makes you wonder if it must be necessary, which then
seems to call into question why it's okay to access the main array as
an array. I can change this detail, too.

I'll try to get back to it this week.

--
Peter Geoghegan

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-09 13:55:02 -0800, Mark Dilger wrote:
> > On Jan 9, 2023, at 11:34 AM, Andres Freund  wrote:
> > 
> > 1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in
> > update_cached_xid_range(), we end up using the xid 0 (or an outdated value 
> > in
> > subsequent calls) to determine whether epoch needs to be reduced.
> 
> Can you say a bit more about your analysis here, preferably with pointers to
> the lines of code you are analyzing?  Does the problem exist in amcheck as
> currently committed, or are you thinking about a problem that arises only
> after applying your patch?  I'm a bit fuzzy on where xid 0 gets used.

The problems exist in the code as currently committed. I'm not sure what
exactly the consequences are, the result is that oldest_fxid will be, at least
temporarily, bogus.

Consider the first call to update_cached_xid_range():

/*
 * Update our cached range of valid transaction IDs.
 */
static void
update_cached_xid_range(HeapCheckContext *ctx)
{
/* Make cached copies */
LWLockAcquire(XidGenLock, LW_SHARED);
ctx->next_fxid = ShmemVariableCache->nextXid;
ctx->oldest_xid = ShmemVariableCache->oldestXid;
LWLockRelease(XidGenLock);

/* And compute alternate versions of the same */
ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
}

The problem is that the call to FullTransactionIdFromXidAndCtx() happens
before ctx->next_xid is assigned, even though FullTransactionIdFromXidAndCtx()
uses ctx->next_xid.

static FullTransactionId
FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
{
uint32  epoch;

if (!TransactionIdIsNormal(xid))
return FullTransactionIdFromEpochAndXid(0, xid);
epoch = EpochFromFullTransactionId(ctx->next_fxid);
if (xid > ctx->next_xid)
epoch--;
return FullTransactionIdFromEpochAndXid(epoch, xid);
}

Because ctx->next_xid is 0, due to not having been set yet, "xid > 
ctx->next_xid"
will always be true, leading to epoch being reduced by one.

In the common case of there never having been an xid wraparound, we'll thus
underflow epoch, generating an xid far into the future.

The tests encounter the issue today. If you add
Assert(TransactionIdIsValid(ctx->next_xid));
Assert(FullTransactionIdIsValid(ctx->next_fxid));
early in FullTransactionIdFromXidAndCtx() it'll be hit in the
amcheck/pg_amcheck tests.

Greetings,

Andres Freund

Show various offset arrays for heap WAL records

2023-01-09 Thread Andres Freund

Hi,

A couple times when investigating data corruption issues, the last time just
yesterday in [1], I needed to see the offsets affected by PRUNE and VACUUM
records. As that's probably not just me, I think we should make that change
in-tree.

The attached patch adds details to XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
XLOG_HEAP2_FREEZE_PAGE.

The biggest issue I have with the patch is that it's very hard to figure out
what punctuation to use where ;). The existing code is very inconsistent.

I chose to include infomask[2] for the different freeze plans mainly because
it looks odd to see different plans without a visible reason. But I'm not sure
that's the right choice.

Greetings,

Andres Freund

[1] 
https://postgr.es/m/CANtu0ojby3eBdMXfs4QmS%2BK1avBc7NcRq_Ot5bnzrbwM%2BuQ55w%40mail.gmail.com
>From 2668f69edc5345a45b850eb4f33321f8bc18cb6e Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 9 Jan 2023 13:49:09 -0800
Subject: [PATCH v3] heapdesc: Include information about offset arrays

---
 src/backend/access/rmgrdesc/heapdesc.c | 94 ++
 1 file changed, 94 insertions(+)

diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 628f7e82156..f1c4a0e4c64 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -114,6 +114,37 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "off %u", xlrec->offnum);
 	}
 }
+
+static void
+print_offset_array(StringInfo buf, const char *label, OffsetNumber *offsets, int count)
+{
+	if (count == 0)
+		return;
+	appendStringInfo(buf, "%s [", label);
+	for (int i = 0; i < count; i++)
+	{
+		if (i > 0)
+			appendStringInfoString(buf, ", ");
+		appendStringInfo(buf, "%u", offsets[i]);
+	}
+	appendStringInfoString(buf, "]");
+}
+
+static void
+print_redirect_array(StringInfo buf, const char *label, OffsetNumber *offsets, int count)
+{
+	if (count == 0)
+		return;
+	appendStringInfo(buf, "%s [", label);
+	for (int i = 0; i < count; i++)
+	{
+		if (i > 0)
+			appendStringInfoString(buf, ", ");
+		appendStringInfo(buf, "%u->%u", offsets[i * 2], offsets[i * 2 + 1]);
+	}
+	appendStringInfoString(buf, "]");
+}
+
 void
 heap2_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -129,12 +160,48 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->snapshotConflictHorizon,
 		 xlrec->nredirected,
 		 xlrec->ndead);
+
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *end;
+			OffsetNumber *redirected;
+			OffsetNumber *nowdead;
+			OffsetNumber *nowunused;
+			int			nredirected;
+			int			nunused;
+			Size		datalen;
+
+			redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, );
+
+			nredirected = xlrec->nredirected;
+			end = (OffsetNumber *) ((char *) redirected + datalen);
+			nowdead = redirected + (nredirected * 2);
+			nowunused = nowdead + xlrec->ndead;
+			nunused = (end - nowunused);
+			Assert(nunused >= 0);
+
+			appendStringInfo(buf, " nunused %u", nunused);
+			print_offset_array(buf, ", unused:", nowunused, nunused);
+			print_redirect_array(buf, ", redirected:", redirected, nredirected);
+			print_offset_array(buf, ", dead:", nowdead, xlrec->ndead);
+		}
 	}
 	else if (info == XLOG_HEAP2_VACUUM)
 	{
 		xl_heap_vacuum *xlrec = (xl_heap_vacuum *) rec;
 
 		appendStringInfo(buf, "nunused %u", xlrec->nunused);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *nowunused;
+			Size		datalen;
+
+			nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, );
+
+			print_offset_array(buf, " unused:", nowunused, xlrec->nunused);
+		}
 	}
 	else if (info == XLOG_HEAP2_FREEZE_PAGE)
 	{
@@ -142,6 +209,30 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 
 		appendStringInfo(buf, "snapshotConflictHorizon %u nplans %u",
 		 xlrec->snapshotConflictHorizon, xlrec->nplans);
+
+		if (!XLogRecHasBlockImage(record, 0))
+		{
+			OffsetNumber *offsets;
+			xl_heap_freeze_plan *plans;
+			int			curoff = 0;
+
+			plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
+			offsets = (OffsetNumber *) ((char *) plans +
+		(xlrec->nplans *
+		 sizeof(xl_heap_freeze_plan)));
+
+			for (int p = 0; p < xlrec->nplans; p++)
+			{
+xl_heap_freeze_plan plan;
+
+memcpy(, [p], sizeof(xl_heap_freeze_plan));
+
+appendStringInfo(buf, "; plan #%d: xmax %u infomask %u infomask2 %u",
+ p, plan.xmax, plan.t_infomask, plan.t_infomask2);
+print_offset_array(buf, ", offsets", offsets + curoff, plan.ntuples);
+curoff += plan.ntuples;
+			}
+		}
 	}
 	else if (info == XLOG_HEAP2_VISIBLE)
 	{
@@ -153,9 +244,12 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
+		bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
 
 		appendStringInfo(buf, "%d tuples flags 0x%02X", xlrec->ntuples,
 		 xlrec->flags);
+		if

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-09 Thread Mark Dilger

> On Jan 9, 2023, at 11:34 AM, Andres Freund  wrote:
> 
> 1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in
> update_cached_xid_range(), we end up using the xid 0 (or an outdated value in
> subsequent calls) to determine whether epoch needs to be reduced.

Can you say a bit more about your analysis here, preferably with pointers to 
the lines of code you are analyzing?  Does the problem exist in amcheck as 
currently committed, or are you thinking about a problem that arises only after 
applying your patch?  I'm a bit fuzzy on where xid 0 gets used.

Thanks

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2023-01-09 Thread Andres Freund

Hi,

On 2022-11-15 10:26:05 -0800, Peter Geoghegan wrote:
> Pushed something like this earlier today, though without any changes
> to VISIBLE records.

While updating a patch to log various offsets in pg_waldump, I noticed a few
minor issues in this patch:

ISTM that some of the page level freezing functions are misnamed. In heapam.c
the heap_xlog* routines are for replay, afaict. However
heap_xlog_freeze_plan() is used to WAL log the freeze
plan. heap_xlog_freeze_page() is used to replay that WAL record. Probably your
brain is too used to nbtree/ :).

I think s/heap_xlog_freeze/heap_log_freeze/ would mostly do the trick, except
that heap_xlog_new_freeze_plan() doesn't quite fit in the scheme.

The routines then also should be moved a bit up, because right now they're
inbetween other routines doing WAL replay, adding to the confusion.

The memcpy in heap_xlog_freeze_page() seems a tad odd. I assume that the
alignment guarantees for xl_heap_freeze_plan are too weak? But imo it's
failure prone (and I'm not sure strictly speaking legal from an undefined
behaviour POV) to form a pointer to a misaligned array. Even if we then later
just memcpy() from those pointers.

Greetings,

Andres Freund

Re: ATTACH PARTITION seems to ignore column generation status

2023-01-09 Thread Tom Lane

I wrote:
> After thinking about this awhile, I feel that we ought to disallow
> it in the traditional-inheritance case as well.  The reason is that
> there are semantic prohibitions on inserting or updating a generated
> column, eg

> regression=# create table t (f1 int, f2 int generated always as (f1+1) 
> stored);
> CREATE TABLE
> regression=# update t set f2=42;
> ERROR:  column "f2" can only be updated to DEFAULT
> DETAIL:  Column "f2" is a generated column.

> It's not very reasonable to have to recheck that for child tables,
> and we don't.  But if one does this:

> regression=# create table pp (f1 int, f2 int);
> CREATE TABLE
> regression=# create table cc (f1 int, f2 int generated always as (f1+1) 
> stored) inherits(pp);
> NOTICE:  merging column "f1" with inherited definition
> NOTICE:  merging column "f2" with inherited definition
> CREATE TABLE
> regression=# insert into cc values(1);
> INSERT 0 1
> regression=# update pp set f2 = 99 where f1 = 1;
> UPDATE 1
> regression=# table cc;
>  f1 | f2 
> +
>   1 | 99
> (1 row)

> That is surely just as broken as the partition-based case.

So what we need is about like this.  This is definitely not something
to back-patch, since it's taking away what had been a documented
behavior.  You could imagine trying to prevent such UPDATEs instead,
but I judge it not worth the trouble.  If anyone were actually using
this capability we'd have heard bug reports.

regards, tom lane

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index d91a781479..6b60cd80ae 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -344,8 +344,8 @@ CREATE TABLE people (
   
   

-If a parent column is not a generated column, a child column may be
-defined to be a generated column or not.
+If a parent column is not a generated column, a child column must
+not be generated either.

   
  
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1db3bd9e2e..72ad6507d7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2931,6 +2931,11 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
  * also check that the child column doesn't specify a default
  * value or identity, which matches the rules for a single
  * column in parse_util.c.
+ *
+ * Conversely, if the parent column is not generated, the
+ * child column can't be either.  (We used to allow that, but
+ * it results in being able to override the generation
+ * expression via UPDATEs through the parent.)
  */
 if (def->generated)
 {
@@ -2951,15 +2956,14 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
  errmsg("column \"%s\" inherits from generated column but specifies identity",
 		def->colname)));
 }
-
-/*
- * If the parent column is not generated, then take whatever
- * the child column definition says.
- */
 else
 {
 	if (newdef->generated)
-		def->generated = newdef->generated;
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_COLUMN_DEFINITION),
+ errmsg("child column \"%s\" specifies generation expression",
+		def->colname),
+ errhint("A child table column cannot be generated unless its parent column is.")));
 }
 
 /* If new def has a default, override previous default */
@@ -15038,13 +15042,18 @@ MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel)
 attributeName)));
 
 			/*
-			 * If parent column is generated, child column must be, too.
+			 * Child column must be generated if and only if parent column is.
 			 */
 			if (attribute->attgenerated && !childatt->attgenerated)
 ereport(ERROR,
 		(errcode(ERRCODE_DATATYPE_MISMATCH),
 		 errmsg("column \"%s\" in child table must be a generated column",
 attributeName)));
+			if (childatt->attgenerated && !attribute->attgenerated)
+ereport(ERROR,
+		(errcode(ERRCODE_DATATYPE_MISMATCH),
+		 errmsg("column \"%s\" in child table must not be a generated column",
+attributeName)));
 
 			/*
 			 * Check that both generation expressions match.
diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
index 1db5f9ed47..3c10dabf6d 100644
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -268,38 +268,17 @@ SELECT * FROM gtest1;
  4 | 8
 (2 rows)
 
+-- can't have generated column that is a child of normal column
 CREATE TABLE gtest_normal (a int, b int);
-CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS (a * 2) STORED) INHERITS (gtest_normal);
+CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS (a * 2) STORED) INHERITS (gtest_normal);  -- error
 NOTICE:  merging column "a" with inherited definition
 NOTICE:  merging column "b" with inherited definition
-\d

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-01-09 Thread Melanie Plageman

Attached is v45 of the patchset. I've done some additional code cleanup
and changes. The most significant change, however, is the docs. I've
separated the docs into its own patch for ease of review.

The docs patch here was edited and co-authored by Samay Sharma.
I'm not sure if the order of pg_stat_io in the docs is correct.

The significant changes are removal of all "correspondence" or
"equivalence"-related sections (those explaining how other IO stats were
the same or different from pg_stat_io columns).

I've tried to remove references to "strategies" and "Buffer Access
Strategy" as much as possible.

I've moved the advice and interpretation section to the bottom --
outside of the table of definitions. Since this page is primarily a
reference page, I agree with Samay that incorporating interpretation
into the column definitions adds clutter and confusion.

I think the best course would be to have an "Interpreting Statistics"
section.

I suggest a structure like the following for this section:
- Statistics Collection Configuration
- Viewing Statistics
- Statistics Views Reference
- Statistics Functions Reference
- Interpreting Statistics

As an aside, this section of the docs has some other structural issues
as well.

For example, I'm not sure it makes sense to have the dynamic statistics
views as sub-sections under 28.2, which is titled "The Cumulative
Statistics System."

In fact the docs say this under Section 28.2
https://www.postgresql.org/docs/current/monitoring-stats.html

"PostgreSQL also supports reporting dynamic information about exactly
what is going on in the system right now, such as the exact command
currently being executed by other server processes, and which other
connections exist in the system. This facility is independent of the
cumulative statistics system."

So, it is a bit weird that they are defined under the section titled
"The Cumulative Statistics System".

In this version of the patchset, I have not attempted a new structure
but instead moved the advice/interpretation for pg_stat_io to below the
table containing the column definitions.

- Melanie
From e87831a0ffe94af54b91285630dd6f1c497c368a Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 4 Jan 2023 17:20:41 -0500
Subject: [PATCH v45 2/5] pgstat: Infrastructure to track IO operations

Introduce "IOOp", an IO operation done by a backend, "IOObject", the
target object of the IO, and "IOContext", the context or location of the
IO operations on that object. For example, the checkpointer may write a
shared buffer out. This would be considered an IOOp "written" on an
IOObject IOOBJECT_RELATION in IOContext IOCONTEXT_NORMAL by BackendType
"checkpointer".

Each IOOp (evict, extend, fsync, read, reuse, and write) can be counted
per IOObject (relation, temp relation) per IOContext (normal, bulkread,
bulkwrite, or vacuum) through a call to pgstat_count_io_op().

Note that this commit introduces the infrastructure to count IO
Operation statistics. A subsequent commit will add calls to
pgstat_count_io_op() in the appropriate locations.

IOContext IOCONTEXT_NORMAL concerns operations on local and shared
buffers, while IOCONTEXT_BULKREAD, IOCONTEXT_BULKWRITE, and
IOCONTEXT_VACUUM IOContexts concern IO operations on buffers as part of
a BufferAccessStrategy.

IOObject IOOBJECT_TEMP_RELATION concerns IO Operations on buffers
containing temporary table data, while IOObject IOOBJECT_RELATION
concerns IO Operations on buffers containing permanent relation data.

Stats on IOOps on all IOObjects in all IOContexts for a given backend
are first counted in a backend's local memory and then flushed to shared
memory and accumulated with those from all other backends, exited and
live.

Some BackendTypes will not flush their pending statistics at regular
intervals and explicitly call pgstat_flush_io_ops() during the course of
normal operations to flush their backend-local IO operation statistics
to shared memory in a timely manner.

Because not all BackendType, IOOp, IOObject, IOContext combinations are
valid, the validity of the stats is checked before flushing pending
stats and before reading in the existing stats file to shared memory.

The aggregated stats in shared memory could be extended in the future
with per-backend stats -- useful for per connection IO statistics and
monitoring.

Author: Melanie Plageman 
Reviewed-by: Andres Freund 
Reviewed-by: Justin Pryzby 
Reviewed-by: Kyotaro Horiguchi 
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml  |   2 +
 src/backend/utils/activity/Makefile   |   1 +
 src/backend/utils/activity/meson.build|   1 +
 src/backend/utils/activity/pgstat.c   |  26 ++
 src/backend/utils/activity/pgstat_bgwriter.c  |   7 +-
 .../utils/activity/pgstat_checkpointer.c  |   7 +-
 src/backend/utils/activity/pgstat_io.c| 400 ++

Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

2023-01-09 Thread Peter Geoghegan

On Mon, Jan 9, 2023 at 11:57 AM Andres Freund  wrote:
> Afaict we'll need to backpatch this all the way?

I thought that we probably wouldn't need to, at first. But I now think
that we really have to.

I didn't realize that affected visibilitymap_set() calls could
generate useless set-VM WAL records until you pointed it out. That's
far more likely to happen than the race condition that I described --
it has nothing at all to do with concurrency. That's what clinches it
for me.

> > One of the calls to visibilitymap_set() during VACUUM's initial heap
> > pass could unset a page's all-visible bit during the process of setting
> > the same page's all-frozen bit.
>
> As just mentioned upthread, this just seems wrong.

I don't know why this sentence ever made sense to me. Anyway, it's not
important now.

> Do you have a reproducer for this?

No, but I'm quite certain that the race can happen.

If it's important to have a reproducer then I can probably come up
with one. I could likely figure out a way to write an isolation test
that reliably triggers the issue. It would have to work by playing
games with cleanup lock/buffer pin waits, since that's the only thing
that the test can hook into to make things happen in just the
right/wrong order.

> >   elog(WARNING, "page is not marked all-visible but 
> > visibility map bit is set in relation \"%s\" page %u",
> >vacrel->relname, blkno);
>
> Hm. The message gets a bit less accurate with the change. Perhaps OK? OTOH, it
> might be useful to know what bit was wrong when debugging problems.

Theoretically it might change again, if we call
visibilitymap_get_status() again. Maybe I should just broaden the
error message a bit instead?

> I still think such changes are inappropriate for a bugfix, particularly one
> that needs to be backpatched.

I'll remove the changes that are inessential in the next revision. I
wouldn't have done it if I'd fully understood the seriousness of the
issue from the start.

If you're really concerned about diff size then I should point out
that the changes to lazy_vacuum_heap_rel() aren't strictly necessary,
and probably shouldn't be backpatched. I deemed that in scope because
it's part of the same overall problem of updating the visibility map
based on potentially stale information. It makes zero sense to check
with the visibility map before updating it when we already know that
the page is all-visible. I mean, are we trying to avoid the work of
needlessly updating the visibility map in cases where its state was
corrupt, but then became uncorrupt (relative to the heap page) by
mistake?

--
Peter Geoghegan

Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

2023-01-09 Thread Robert Haas

On Mon, Jan 2, 2023 at 1:32 PM Peter Geoghegan  wrote:
> On Sat, Dec 31, 2022 at 4:53 PM Peter Geoghegan  wrote:
> > The first patch makes sure that the snapshotConflictHorizon cutoff
> > (XID cutoff for recovery conflicts) is never a special XID, unless
> > that XID is InvalidTransactionId, which is interpreted as a
> > snapshotConflictHorizon value that will never need a recovery conflict
> > (per the general convention for snapshotConflictHorizon values
> > explained above ResolveRecoveryConflictWithSnapshot).
>
> Pushed this just now.
>
> Attached is another very simple refactoring patch for vacuumlazy.c. It
> makes vacuumlazy.c save the result of get_database_name() in vacrel,
> which matches what we already do with things like
> get_namespace_name().
>
> Would be helpful if I could get a +1 on
> v1-0002-Never-just-set-the-all-frozen-bit-in-VM.patch, which is
> somewhat more substantial than the others.

I feel that you should at least have a reproducer for these problems
posted to the thread, and ideally a regression test, before committing
things. I think it's very hard to understand what the problems are
right now.

I don't particularly have a problem with the idea of 0001, because if
we use InvalidTransactionId to mean that there cannot be any
conflicts, we do not need FrozenTransactionId to mean the same thing.
Picking one or the other makes sense. Perhaps we would need two values
if we both needed a value that meant "conflict with nothing" and also
a value that meant "conflict with everything," but in that case I
suppose we would want FrozenTransactionId to be the one that meant
conflict with nothing, since it logically precedes all other XIDs, and
conflicts are with XIDs that precede the value in the record. However,
I don't find the patch very clear, either. It doesn't update any
comments, not even this one:

 /*
  * It's possible that we froze tuples and made the page's XID cutoff
  * (for recovery conflict purposes) FrozenTransactionId.  This is okay
  * because visibility_cutoff_xid will be logged by our caller in a
  * moment.
  */
-Assert(cutoff == FrozenTransactionId ||
+Assert(!TransactionIdIsValid(cutoff) ||
cutoff == prunestate->visibility_cutoff_xid);

Isn't the comment now incorrect as a direct result of the changes in the patch?

As for 0002, I agree that it's bad if we can get into a state where
the all-frozen bit is set and the all-visible bit is not. I'm not
completely sure what concrete harm that will cause, but it does not
seem good. But I also *entirely* agree with Andres that patches should
run around adjusting nearby code - e.g. variable names - in ways that
aren't truly necessary. That just makes life harder, not only for
anyone who wants to review the patch now, but also for future readers
who may need to understand what the patch changed and why.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: doc: add missing "id" attributes to extension packaging page

2023-01-09 Thread Karl O. Pinc

On Mon, 09 Jan 2023 15:18:18 -0500
Tom Lane  wrote:

> I pushed the ID-addition patch, with a few fixes:
> 
> * AFAIK our practice is to use "-" never "_" in XML ID attributes.
> You weren't very consistent about that even within this patch, and
> the overall effect would have been to have no standard about that
> at all, which doesn't seem great.  I changed them all to "-".

Apologies for not catching this.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-01-09 Thread Tom Lane

Andres Freund  writes:
> On 2023-01-07 13:50:04 -0500, Tom Lane wrote:
>> Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
>> sanely defined in the first place?  It looks to me that
>> RelationFindReplTupleSeq assumes without proof that there is a unique
>> full-tuple match, but that is only reasonable to assume if there is at
>> least one unique index (and maybe not even then, if nulls are involved).

> If the table definition match between publisher and standby, it doesn't matter
> which tuple is updated, if all columns are used to match. Since there's
> nothing distinguishing two rows with all columns being equal, it doesn't
> matter which we update.

Yeah, but the point here is precisely that they might *not* match;
for example there could be extra columns in the subscriber's table.
This may be largely a documentation problem, though --- I think my
beef is mainly that there's nothing in our docs explaining the
semantic pitfalls of FULL, we only say "it's slow".

Anyway, to get back to the point at hand: if we do have a REPLICA IDENTITY
FULL situation then we can make use of any unique index over a subset of
the transmitted columns, and if there's more than one candidate index
it's unlikely to matter which one we pick.  Given your comment I guess
we have to also compare the non-indexed columns, so we can't completely
convert the FULL case to the straight index case.  But still it doesn't
seem to me to be appropriate to use the planner to find a suitable index.

regards, tom lane

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-09 Thread Thomas Munro

On Tue, Jan 10, 2023 at 8:34 AM Andres Freund  wrote:
> A different approach would be to represent fxids as *signed* 64bit
> integers. That'd of course loose more range, but could represent everything
> accurately, and would have a compatible on-disk representation on two's
> complement platforms (all our platforms). I think the only place that'd need
> special treatment is U64FromFullTransactionId() / its callers.  I think this
> might be the most robust approach.

It does sound like an interesting approach; it means you are free to
retreat arbitrarily without ever thinking about it, and by the
arguments given (LSN space required to consume fxids) it's still
'enough'.  Essentially all these bugs are places where the author
already believed it worked that way.

(Two's complement is required in the C23 draft.)

Re: doc: add missing "id" attributes to extension packaging page

2023-01-09 Thread Tom Lane

Brar Piening  writes:
> On 09.01.2023 at 03:38, vignesh C wrote:
>> There are couple of commitfest entries for this:
>> https://commitfest.postgresql.org/41/4041/
>> https://commitfest.postgresql.org/41/4042/ Can one of them be closed?

> I've split the initial patch into two parts upon Álvaro's request in [1]
> so that we can discuss them separately

It's not great to have multiple CF entries pointing at the same email
thread --- it confuses both people and bots.  Next time please split
off a thread for each distinct patch.

I pushed the ID-addition patch, with a few fixes:

* AFAIK our practice is to use "-" never "_" in XML ID attributes.
You weren't very consistent about that even within this patch, and
the overall effect would have been to have no standard about that
at all, which doesn't seem great.  I changed them all to "-".

* I got rid of a couple of "-et-al" additions, because it did not
seem like a good precedent.  That would tempt people to modify
existing ID tags when adding variables to an entry, which'd defeat
the purpose I think.

* I fixed a couple of things that looked like typos or unnecessary
inconsistencies.  I have to admit that my eyes glazed over after
awhile, so there might be remaining infelicities.

It's probably going to be necessary to have follow-on patches,
because I'm sure there is stuff in the pipeline that adds more
ID-less tags.  Or do we have a way to create warnings about that?

I'm unqualified to review CSS stuff, so you'll need to get somebody
else to review that patch.  But I'd suggest reposting it, else
the cfbot is going to start whining that the patch-of-record in
this thread no longer applies.

regards, tom lane

Re: Split index and table statistics into different types of stats

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-09 17:08:42 +0530, Nitin Jadhav wrote:
> +1 to keep common functions/code between table and index stats. Only
> the data structure should be different as the goal is to shrink the
> current memory usage.

I don't think the goal is solely to shrink memory usage - it's also to make it
possible to add more stats that are specific to just indexes or just
tables. Of course that's related to memory usage...

Greetings,

Andres Freund

Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

2023-01-09 Thread Peter Geoghegan

On Mon, Jan 9, 2023 at 11:44 AM Andres Freund  wrote:
> I think that's just an imprecise formulation though - the problem is that we
> can call visibilitymap_set() with just VISIBILITYMAP_ALL_FROZEN, even though
> VISIBILITYMAP_ALL_VISIBLE was concurrently unset.

That's correct.

You're right that my description of the problem from the commit
message was confusing. But we're on the same page about the problem
now.

> ISTM that we ought to update all_visible_according_to_vm from
> PageIsAllVisible() once we've locked the page. Even if we avoid this specific
> case, it seems a recipe for future bugs to have a potentially outdated
> variable around.

I basically agree, but some of the details are tricky.

As I mentioned already, my work on visibility map snapshots just gets
rid of all_visible_according_to_vm, which is my preferred long term
approach. We will very likely need to keep all_visible_according_to_vm
as a cache for performance reasons for as long as we have
SKIP_PAGES_THRESHOLD.

Can we just update all_visible_according_to_vm using
PageIsAllVisible(), without making all_visible_according_to_vm
significantly less useful as a cache? Maybe. Not sure offhand.

-- 
Peter Geoghegan

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-07 13:50:04 -0500, Tom Lane wrote:
> I think we should either live within the constraints set by this
> overarching design, or else nuke execReplication.c from orbit and
> start using the real planner and executor.  Perhaps the foreign
> key enforcement mechanisms could be a model --- although if you
> don't want to buy into using SPI as well, you probably should look
> at Amit L's work at [1].

I don't think using the full executor for every change is feasible from an
overhead perspective. But it might make sense to bail out to using the full
executor in a bunch of non-fastpath paths.

I think this is basically similar to COPY not using the full executor.

But that doesn't mean that all of this has to be open coded in
execReplication.c. Abstracting pieces so that COPY, logical rep and perhaps
even nodeModifyTable.c can share code makes sense.

> Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
> sanely defined in the first place?  It looks to me that
> RelationFindReplTupleSeq assumes without proof that there is a unique
> full-tuple match, but that is only reasonable to assume if there is at
> least one unique index (and maybe not even then, if nulls are involved).

If the table definition match between publisher and standby, it doesn't matter
which tuple is updated, if all columns are used to match. Since there's
nothing distinguishing two rows with all columns being equal, it doesn't
matter which we update.

Greetings,

Andres Freund

Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-09 10:16:03 -0800, Peter Geoghegan wrote:
> Attached is v3, which explicitly checks the need to set the PD_ALL_VISIBLE
> flag at the relevant visibilitymap_set() call site. It also has improved
> comments.

Afaict we'll need to backpatch this all the way?


> From e7788ebdb589fb7c6f866cf53658cc369f9858b5 Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan 
> Date: Sat, 31 Dec 2022 15:13:01 -0800
> Subject: [PATCH v3] Don't accidentally unset all-visible bit in VM.
> 
> One of the calls to visibilitymap_set() during VACUUM's initial heap
> pass could unset a page's all-visible bit during the process of setting
> the same page's all-frozen bit.

As just mentioned upthread, this just seems wrong.


> This could happen in the event of a
> concurrent HOT update from a transaction that aborts soon after.  Since
> the all_visible_according_to_vm local variable that lazy_scan_heap works
> off of when setting the VM doesn't reflect the current state of the VM,
> and since visibilitymap_set() just requested that the all-frozen bit get
> set in one case, there was a race condition.  Heap pages could initially
> be all-visible just as all_visible_according_to_vm is established, then
> not be all-visible after the update, and then become eligible to be set
> all-visible once more following pruning by VACUUM.  There is no reason
> why VACUUM can't remove a concurrently aborted heap-only tuple right
> away, and so no reason why such a page won't be able to reach the
> relevant visibilitymap_set() call site.

Do you have a reproducer for this?


> @@ -1120,8 +1123,8 @@ lazy_scan_heap(LVRelState *vacrel)
>* got cleared after lazy_scan_skip() was called, so we must 
> recheck
>* with buffer lock before concluding that the VM is corrupt.
>*/
> - else if (all_visible_according_to_vm && !PageIsAllVisible(page)
> -  && VM_ALL_VISIBLE(vacrel->rel, blkno, 
> ))
> + else if (all_visible_according_to_vm && !PageIsAllVisible(page) 
> &&
> +  visibilitymap_get_status(vacrel->rel, blkno, 
> ) != 0)
>   {
>   elog(WARNING, "page is not marked all-visible but 
> visibility map bit is set in relation \"%s\" page %u",
>vacrel->relname, blkno);

Hm. The message gets a bit less accurate with the change. Perhaps OK? OTOH, it
might be useful to know what bit was wrong when debugging problems.


> @@ -1164,12 +1167,34 @@ lazy_scan_heap(LVRelState *vacrel)
>!VM_ALL_FROZEN(vacrel->rel, blkno, ))
>   {
>   /*
> -  * We can pass InvalidTransactionId as the cutoff XID 
> here,
> -  * because setting the all-frozen bit doesn't cause 
> recovery
> -  * conflicts.
> +  * Avoid relying on all_visible_according_to_vm as a 
> proxy for the
> +  * page-level PD_ALL_VISIBLE bit being set, since it 
> might have
> +  * become stale -- even when all_visible is set in 
> prunestate.
> +  *
> +  * Consider the example of a page that starts out 
> all-visible and
> +  * then has a tuple concurrently deleted by an xact 
> that aborts.
> +  * The page will be all_visible_according_to_vm, and 
> will have
> +  * all_visible set in prunestate.  It will nevertheless 
> not have
> +  * PD_ALL_VISIBLE set by here (plus neither VM bit will 
> be set).
> +  * And so we must check if PD_ALL_VISIBLE needs to be 
> set.
>*/
> + if (!PageIsAllVisible(page))
> + {
> + PageSetAllVisible(page);
> + MarkBufferDirty(buf);
> + }
> +
> + /*
> +  * Set the page all-frozen (and all-visible) in the VM.
> +  *
> +  * We can pass InvalidTransactionId as our 
> visibility_cutoff_xid,
> +  * since a snapshotConflictHorizon sufficient to make 
> everything
> +  * safe for REDO was logged when the page's tuples were 
> frozen.
> +  */
> + 
> Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
>   visibilitymap_set(vacrel->rel, blkno, buf, 
> InvalidXLogRecPtr,
> vmbuffer, 
> InvalidTransactionId,
> +   
> VISIBILITYMAP_ALL_VISIBLE |
> 
> VISIBILITYMAP_ALL_FROZEN);
>   }
>  
> @@ -1311,7 +1336,11 @@ lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer, 
> BlockNumber

Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

2023-01-09 Thread Andres Freund

Hi,

On 2023-01-08 16:27:59 -0800, Peter Geoghegan wrote:
> On Sun, Jan 8, 2023 at 3:53 PM Andres Freund  wrote:
> > How?
> 
> See the commit message for the scenario I have in mind, which involves
> a concurrent HOT update that aborts.

I looked at it. I specifically was wondering about this part of it:
> One of the calls to visibilitymap_set() during VACUUM's initial heap
> pass could unset a page's all-visible bit during the process of setting
> the same page's all-frozen bit.

Which I just don't see as possible, due to visibilitymap_set() simply never
unsetting bits.

I think that's just an imprecise formulation though - the problem is that we
can call visibilitymap_set() with just VISIBILITYMAP_ALL_FROZEN, even though
VISIBILITYMAP_ALL_VISIBLE was concurrently unset.

ISTM that we ought to update all_visible_according_to_vm from
PageIsAllVisible() once we've locked the page. Even if we avoid this specific
case, it seems a recipe for future bugs to have a potentially outdated
variable around.

Greetings,

Andres Freund

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-09 Thread Andres Freund

Hi,

Robert, Mark, CCing you because of the amcheck aspect (see below).

On 2023-01-09 17:50:10 +0100, Matthias van de Meent wrote:
> On Sun, 8 Jan 2023 at 04:09, Andres Freund  wrote:
> > > For a bit I was thinking that we should introduce the notion that a
> > > FullTransactionId can be from the past. Specifically that when the upper 
> > > 32bit
> > > are all set, we treat the lower 32bit as a value from before xid 0 using 
> > > the
> > > normal 32bit xid arithmetic.  But it sucks to add overhead for that
> > > everywhere.
> > >
> > > It might be a bit more palatable to designate one individual value,
> > > e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
> > > before the start of the universe an xid point to...
> >
> > On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid 
> > values. I
> > hacked up a patch that converts various fxid functions to inline functions
> > with such assertions, and it indeed quickly catches the problem this thread
> > reported, close to the source of the use.
> 
> Wouldn't it be enough to only fix the constructions in
> FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does
> not occur with the patch), and (optionally) bump the first XID
> available for any cluster to (FirstNormalXid + 1) to retain the 'older
> than any running transaction' property?

It's not too hard to fix in individual places, but I suspect that we'll
introduce the bug in future places without some more fundamental protection.

Locally I fixed it by clamping vacuum_defer_cleanup_age to a reasonable value
in ComputeXidHorizons() and GetSnapshotData().

Fixing it in FullXidRelativeTo() doesn't seem quite right, while it's ok to
just return FirstNormalTransactionId in the case of vacuum_defer_cleanup_age,
it doesn't see necessarily correct for other cases.

> The change only fixes the issue for FullTransactionId, which IMO is OK
> - I don't think we need to keep xid->xid8->xid symmetric in cases of
> xid8 wraparound.

I think we should keep that symmetric, it just gets too confusing / easy to
miss bugs otherwise.

> > One issue with that is is that it'd reduce what can be input for the xid8
> > type. But it's hard to believe that'd be a real issue?
> 
> Yes, it's unlikely anyone would ever hit that with our current WAL
> format - we use 24 bytes /xid just to log it's use, so we'd use at
> most epoch 0x1000_ in unrealistic scenarios. In addition;
> technically, we already have (3*2^32 - 3) "invalid" xid8 values that
> can never be produced in FullXidRelativeTo - those few extra invalid
> values don't matter much to me except "even more special casing".

Yep. The attached 0002 is a first implementation of this.

The new assertions found at least one bug in amcheck, and one further example
of the problem of representing past 32 xids in 64bit:

1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in
update_cached_xid_range(), we end up using the xid 0 (or an outdated value in
subsequent calls) to determine whether epoch needs to be reduced.

2) One test generates includes an xid from the future (4026531839). Which
causes epoch to wrap around (via the epoch--) in
FullTransactionIdFromXidAndCtx(). I've hackily fixed that by just representing
it as an xid from the future instead. But not sure that's a good answer.

A different approach would be to represent fxids as *signed* 64bit
integers. That'd of course loose more range, but could represent everything
accurately, and would have a compatible on-disk representation on two's
complement platforms (all our platforms). I think the only place that'd need
special treatment is U64FromFullTransactionId() / its callers.  I think this
might be the most robust approach.

Greetings,

Andres Freund
>From 13450d14a1e064dd958b516ffa36b7a24bb49d7b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 9 Jan 2023 10:23:10 -0800
Subject: [PATCH v2 1/3] Fix corruption due to vacuum_defer_cleanup_age
 underflowing 64bit xids

Author:
Reviewed-by:
Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63...@awork3.anarazel.de
Backpatch:
---
 src/backend/storage/ipc/procarray.c | 73 -
 1 file changed, 61 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4340bf96416..d7d18ba8c12 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -367,6 +367,7 @@ static inline void ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId l
 static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid);
 static void MaintainLatestCompletedXid(TransactionId latestXid);
 static void MaintainLatestCompletedXidRecovery(TransactionId latestXid);
+static int ClampVacuumDeferCleanupAge(FullTransactionId rel, TransactionId xid);

 static inline FullTransactionId FullXidRelativeTo(FullTransactionId rel,
   TransactionId xid);
@@ -1888,17 +1889,32 @@

Re: [PATCH] random_normal function

2023-01-09 Thread Tom Lane

I wrote:
> Hmm ... it occurred to me to try the same check on the existing
> random() tests (attached), and darn if they don't fail even more
> often, usually within 50K iterations.  So maybe we should rethink
> that whole thing.

I pushed Paul's patch with the previously-discussed tests, but
the more I look at random.sql the less I like it.  I propose
that we nuke the existing tests from orbit and replace with
something more or less as attached.  This is faster than what
we have, removes the unnecessary dependency on the "onek" table,
and I believe it to be a substantially more thorough test of the
random functions' properties.  (We could probably go further
than this, like trying to verify distribution properties.  But
it's been too long since college statistics for me to want to
write such tests myself, and I'm not real sure we need them.)

BTW, if this does bring the probability of failure down to the
one-in-a-billion range, I think we could also nuke the whole
"ignore:" business, simplifying pg_regress and allowing the
random test to be run in parallel with others.

regards, tom lane

diff --git a/src/test/regress/expected/random.out b/src/test/regress/expected/random.out
index 30bd866138..2a9249f3cb 100644
--- a/src/test/regress/expected/random.out
+++ b/src/test/regress/expected/random.out
@@ -1,81 +1,110 @@
 --
 -- RANDOM
--- Test the random function
+-- Test random() and allies
 --
--- count the number of tuples originally, should be 1000
-SELECT count(*) FROM onek;
- count 

-  1000
+-- Tests in this file may have a small probability of failure,
+-- since we are dealing with randomness.  Try to keep the failure
+-- risk for any one test case under 1e-9.
+--
+-- There should be no duplicates in 1000 random() values.
+-- (Assuming 52 random bits in the float8 results, we could
+-- take as many as 3000 values and still have less than 1e-9 chance
+-- of failure, per https://en.wikipedia.org/wiki/Birthday_problem)
+SELECT r, count(*)
+FROM (SELECT random() r FROM generate_series(1, 1000)) ss
+GROUP BY r HAVING count(*) > 1;
+ r | count 
+---+---
+(0 rows)
+
+-- The range should be [0, 1).  We can expect that at least one out of 2000
+-- random values is in the lowest or highest 1% of the range with failure
+-- probability less than about 1e-9.
+SELECT count(*) FILTER (WHERE r < 0 OR r >= 1) AS out_of_range,
+   (count(*) FILTER (WHERE r < 0.01)) > 0 AS has_small,
+   (count(*) FILTER (WHERE r > 0.99)) > 0 AS has_large
+FROM (SELECT random() r FROM generate_series(1, 2000)) ss;
+ out_of_range | has_small | has_large 
+--+---+---
+0 | t | t
 (1 row)
 
--- pick three random rows, they shouldn't match
-(SELECT unique1 AS random
-  FROM onek ORDER BY random() LIMIT 1)
-INTERSECT
-(SELECT unique1 AS random
-  FROM onek ORDER BY random() LIMIT 1)
-INTERSECT
-(SELECT unique1 AS random
-  FROM onek ORDER BY random() LIMIT 1);
- random 
-
-(0 rows)
-
--- count roughly 1/10 of the tuples
-CREATE TABLE RANDOM_TBL AS
-  SELECT count(*) AS random
-  FROM onek WHERE random() < 1.0/10;
--- select again, the count should be different
-INSERT INTO RANDOM_TBL (random)
-  SELECT count(*)
-  FROM onek WHERE random() < 1.0/10;
--- select again, the count should be different
-INSERT INTO RANDOM_TBL (random)
-  SELECT count(*)
-  FROM onek WHERE random() < 1.0/10;
--- select again, the count should be different
-INSERT INTO RANDOM_TBL (random)
-  SELECT count(*)
-  FROM onek WHERE random() < 1.0/10;
--- now test that they are different counts
-SELECT random, count(random) FROM RANDOM_TBL
-  GROUP BY random HAVING count(random) > 3;
- random | count 
-+---
-(0 rows)
-
-SELECT AVG(random) FROM RANDOM_TBL
-  HAVING AVG(random) NOT BETWEEN 80 AND 120;
- avg 
--
-(0 rows)
-
 -- now test random_normal()
-TRUNCATE random_tbl;
-INSERT INTO random_tbl (random)
-  SELECT count(*)
-  FROM onek WHERE random_normal(0, 1) < 0;
-INSERT INTO random_tbl (random)
-  SELECT count(*)
-  FROM onek WHERE random_normal(0) < 0;
-INSERT INTO random_tbl (random)
-  SELECT count(*)
-  FROM onek WHERE random_normal() < 0;
-INSERT INTO random_tbl (random)
-  SELECT count(*)
-  FROM onek WHERE random_normal(stddev => 1, mean => 0) < 0;
--- expect similar, but not identical values
-SELECT random, count(random) FROM random_tbl
-  GROUP BY random HAVING count(random) > 3;
- random | count 
-+---
+-- As above, there should be no duplicates in 1000 random_normal() values.
+SELECT r, count(*)
+FROM (SELECT random_normal() r FROM generate_series(1, 1000)) ss
+GROUP BY r HAVING count(*) > 1;
+ r | count 
+---+---
 (0 rows)
 
--- approximately check expected distribution
-SELECT AVG(random) FROM random_tbl
-  HAVING AVG(random) NOT BETWEEN 400 AND 600;
- avg 
--
-(0 rows)
+-- ... unless we force the range (standard deviation) to zero.
+-- This is a good place to check that the mean input does something, too.
+SELECT r, count(*)

Re: Add SHELL_EXIT_CODE to psql

2023-01-09 Thread Corey Huinker

On Mon, Jan 9, 2023 at 10:01 AM Maxim Orlov  wrote:

> Hi!
>
> In overall, I think we move in the right direction. But we could make code
> better, should we?
>
> +   /* Capture exit code for SHELL_EXIT_CODE */
> +   close_exit_code = pclose(fd);
> +   if (close_exit_code == -1)
> +   {
> +   pg_log_error("%s: %m", cmd);
> +   error = true;
> +   }
> +   if (WIFEXITED(close_exit_code))
> +   exit_code=WEXITSTATUS(close_exit_code);
> +   else if(WIFSIGNALED(close_exit_code))
> +   exit_code=WTERMSIG(close_exit_code);
> +   else if(WIFSTOPPED(close_exit_code))
> +   exit_code=WSTOPSIG(close_exit_code);
> +   if (exit_code)
> +   error = true;
> I think, it's better to add spaces around middle if block. It will be easy
> to read.
> Also, consider, adding spaces around assignment in this block.
>

Noted and will implement.


> +   /*
> +   snprintf(exit_code_buf, sizeof(exit_code_buf), "%d",
> WEXITSTATUS(exit_code));
> +   */
> Probably, this is not needed.
>

Heh. Oops.


> > 1. pg_regress now creates an environment variable called PG_OS_TARGET
> Maybe, we can use env "OS"? I do not know much about Windows, but I think
> this is kind of standard environment variable there.
>

I chose a name that would avoid collisions with anything a user might
potentially throw into their environment, so if the var "OS" is fairly
standard is a reason to avoid using it. Also, going with our own env var
allows us to stay in perfect synchronization with the build's #ifdef WIN32
... and whatever #ifdefs may come in the future for new OSes. If there is
already an environment variable that does that for us, I would rather use
that, but I haven't found it.

The 0001 patch is unchanged from last time (aside from anything rebasing
might have done).

>
From cb2ac7393cf0ce0a7c336dfc749ff91142a88b09 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Tue, 3 Jan 2023 22:16:20 -0500
Subject: [PATCH v4 1/2] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 40e6c231a3..56c59e0b10 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -595,6 +595,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.0

From 9d4bec9e046a80f89ce484e2dc4f25c8f1d99e9c Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 9 Jan 2023 13:25:08 -0500
Subject: [PATCH v4 2/2] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 +
 src/bin/psql/command.c |  8 +++
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 38 ++
 src/test/regress/expected/psql.out | 25 
 src/test/regress/sql/psql.sql  | 21 +
 6 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 3f994a3592..711a6d24fd 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4264,6 +4264,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command, invoked via either \! or `.
+ 0 means no error. See also SHELL_ERROR.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b5201edf55..5d15c2143b 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5005,6 +5005,7 @@ static bool
 do_shell(const char *command)
 {
 	int			result;
+	char		exit_code_buf[32];
 
 	fflush(NULL);
 	if (!command)
@@ -5032,6 +5033,13 @@ do_shell(const char *command)

Re: Allow +group in pg_ident.conf

2023-01-09 Thread Nathan Bossart

On Mon, Jan 09, 2023 at 08:00:26AM -0500, Andrew Dunstan wrote:
> +   If the database-username begins with a
> +   + character, then the operating system user can login 
> as
> +   any user belonging to that role, similarly to how user names beginning 
> with
> +   + are treated in pg_hba.conf.

I would ѕuggest making it clear that this means role membership and not
privileges via INHERIT.

> - if (case_insensitive)
> + if (regexp_pgrole[0] == '+')
> + {
> + Oid roleid = get_role_oid(pg_role, true);
> + if (is_member(roleid, regexp_pgrole +1))
> + *found_p = true;
> + }
> + else if (case_insensitive)

It looks like the is_member() check will always be case-sensitive.  Should
it respect the value of case_insensitive?  If not, I think there should be
a brief comment explaining why.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-01-09 Thread Önder Kalacı

Hi,

Thank you for the useful comments!


> I took a very brief look through this.  I'm not too pleased with
> this whole line of development TBH.  It seems to me that the core
> design of execReplication.c and related code is "let's build our
> own half-baked executor and much-less-than-half-baked planner,
> because XXX".  (I'm not too sure what XXX was, really, but apparently
> somebody managed to convince people that that is a sane and
> maintainable design.)


This provided me with a broad perspective for the whole execReplication.c.
Before your comment, I have not thought about why there is a specific
logic for the execution of logical replication.

I tried to read the initial commit that adds execReplication.c
(665d1fad99e7b11678b0d5fa24d2898424243cd6)
and the main relevant mail thread (PostgreSQL: Re: Logical Replication WIP
).
But, I couldn't find
any references on this decision. Maybe I'm missing something?

Regarding planner, as far as I can speculate, before my patch, there is
probably no need for any planner infrastructure.
The reason seems that the logical replication either needs a
sequential scan for REPLICA IDENTITY FULL
or an index scan for the primary key / unique index.  I'm not suggesting
that we shouldn't use planner at all,
just trying to understand the design choices that have been made earlier.


Now this patch has decided that it *will*
> use the real planner, or at least portions of it in some cases.
> If we're going to do that ISTM we ought to replace all the existing
> not-really-a-planner logic, but this has not done that; instead
> we have a large net addition to the already very duplicative
> replication code, with weird restrictions because it doesn't want
> to make changes to the half-baked executor.
>

That sounds like a one good perspective on the restrictions that this patch
adds.
>From my perspective, I wanted to fit into the existing execReplication.c,
which only
works for primary keys / unique keys. And, if you look closely, the
restrictions I suggest
are actually the same/similar restrictions with REPLICA IDENTITY ... USING
INDEX.
I hope/assume this is no surprise for you and not too hard to explain to
the users.


>
> I think we should either live within the constraints set by this
> overarching design, or else nuke execReplication.c from orbit and
> start using the real planner and executor.  Perhaps the foreign
> key enforcement mechanisms could be a model --- although if you
> don't want to buy into using SPI as well, you probably should look
> at Amit L's work at [1].
>

This sounds like a good long term plan to me. Are you also suggesting to do
that
before this patch?

I think that such a change is a non-trivial / XL project, which could
likely not be easily
achievable by myself in a reasonable time frame.


>
> Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
> sanely defined in the first place?  It looks to me that
> RelationFindReplTupleSeq assumes without proof that there is a unique
> full-tuple match, but that is only reasonable to assume if there is at
> least one unique index (and maybe not even then, if nulls are involved).
>

In general, RelationFindReplTupleSeq is ok not to match any tuples. So, I'm
not sure
if uniqueness is required?

Even if there are multiple matches, RelationFindReplTupleSeq does only one
change at
a time. My understanding is that if there are multiple matches on the
source, they are
generated as different messages, and each message triggers
RelationFindReplTupleSeq.



> If there is a unique index, why can't that be chosen as replica identity?
> If there isn't, what semantics are we actually providing?
>

I'm not sure I can fully follow this question. In this patch, I'm trying to
allow non-unique
indexes to be used in the subscription. And, the target could have multiple
indexes.

So, the semantics is that we automatically allow users to be able to use
non-unique
indexes on the subscription side even if the replica identity is full on
the source.

The reason (a) we use planner (b) not ask users which index to use, is that
it'd be very inconvenient for any user to pick the indexes among multiple
indexes on the subscription.

If there is a unique index, the expectation is that the user would pick
REPLICA IDENTITY .. USING INDEX or just make it the primary key.
In those cases, this patch would not interfere with the existing logic.


> What I'm thinking about is that maybe REPLICA IDENTITY FULL should be
> defined as "the subscriber can pick any unique index to match on,
> and is allowed to fail if the table has none".  Or if "fail" is a bridge
> too far for you, we could fall back to the existing seqscan logic.
> But thumbing through the existing indexes to find a non-expression unique
> index wouldn't require invoking the full planner.  Any candidate index
> would result in a plan

Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

2023-01-09 Thread Peter Geoghegan

On Sun, Jan 8, 2023 at 6:43 PM Peter Geoghegan  wrote:
> On further reflection even v2 won't repair the page-level
> PD_ALL_VISIBLE flag in passing in this scenario. ISTM that on HEAD we
> might actually leave the all-frozen bit set in the VM, while both the
> all-visible bit and the page-level PD_ALL_VISIBLE bit remain unset.
> Again, all due to the approach we take with
> all_visible_according_to_vm, which can go stale independently of both
> the VM bit being unset and the PD_ALL_VISIBLE bit being unset (in my
> example problem scenario).

Attached is v3, which explicitly checks the need to set the
PD_ALL_VISIBLE flag at the relevant visibilitymap_set() call site. It
also has improved comments.

-- 
Peter Geoghegan


v3-0001-Don-t-accidentally-unset-all-visible-bit-in-VM.patch
Description: Binary data

Re: Common function for percent placeholder replacement

2023-01-09 Thread Nathan Bossart

On Mon, Jan 09, 2023 at 09:36:12AM +0100, Peter Eisentraut wrote:
> On 04.01.23 01:37, Nathan Bossart wrote:
>> On Tue, Dec 20, 2022 at 06:30:40AM +0100, Peter Eisentraut wrote:
>> > + * A value may be NULL.  If the corresponding placeholder is found in the
>> > + * input string, the whole function returns NULL.
>> 
>> This appears to be carried over from BuildRestoreCommand(), and AFAICT it
>> is only necessary because pg_rewind doesn't support %r in restore_command.
>> IMHO this behavior is counterintuitive and possibly error-prone and should
>> result in an ERROR instead.  Since pg_rewind is the only special case, it
>> could independently check for %r before building the command.
> 
> Yeah, this annoyed me, too.  I have now changed it so that a NULL "value" is
> the same as an unsupported placeholder.  This preserves the existing
> behavior.
> 
> (Having pg_rewind check for %r itself would probably require replicating
> most of the string processing logic (consider something like "%%r"), so it
> didn't seem appealing.)

Sounds good to me.

> + nativePath = pstrdup(path);
> + make_native_path(nativePath);

> + nativePath = pstrdup(xlogpath);
> + make_native_path(nativePath);

Should these be freed?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: add \dpS to psql

2023-01-09 Thread Nathan Bossart

On Sat, Jan 07, 2023 at 11:18:59AM +, Dean Rasheed wrote:
> It might be true that temp tables aren't usually interesting from a
> permissions point of view, but it's not hard to imagine situations
> where interesting things do happen. It's also probably the case that
> most users won't have many temp tables, so I don't think including
> them by default will be particularly intrusive.
> 
> Also, from a user perspective, I think it would be something of a POLA
> violation for \dp[S] and \dt[S] to include different sets of tables,
> though I appreciate that we do that now. There's nothing in the docs
> to indicate that that's the case.

Agreed.

> Anyway, I've pushed the v2 patch as-is. If anyone feels strongly
> enough that we should change its behaviour for temp tables, then we
> can still discuss that.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: MERGE ... RETURNING

2023-01-09 Thread Dean Rasheed

On Mon, 9 Jan 2023 at 16:23, Vik Fearing  wrote:
>
> Bikeshedding here.  Instead of Yet Another WITH Clause, could we perhaps
> make a MERGING() function analogous to the GROUPING() function that goes
> with grouping sets?
>
> MERGE ...
> RETURNING *, MERGING('clause'), MERGING('action');
>

Hmm, possibly, but I think that would complicate the implementation quite a bit.

GROUPING() is not really a function (in the sense that there is no
pg_proc entry for it, you can't do "\df grouping", and it isn't
executed with its arguments like a normal function). Rather, it
requires special-case handling in the parser, through to the executor,
and I think MERGING() would be similar.

Also, it masks any user function with the same name, and would
probably require MERGING to be some level of reserved keyword.

I'm not sure that's worth it, just to have a more standard-looking
RETURNING list, without a WITH clause.

Regards,
Dean

Re: suppressing useless wakeups in logical/worker.c

2023-01-09 Thread Nathan Bossart

rebased for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 2466001a3ae6f94aac8eff45b488765e619bea1b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 1 Dec 2022 20:50:21 -0800
Subject: [PATCH v2 1/1] suppress unnecessary wakeups in logical/worker.c

---
 src/backend/replication/logical/tablesync.c |  20 +++
 src/backend/replication/logical/worker.c| 156 +++-
 src/include/replication/worker_internal.h   |   3 +
 3 files changed, 142 insertions(+), 37 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 38dfce7129..88218e1fed 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -419,6 +419,13 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
 
 	Assert(!IsTransactionState());
 
+	/*
+	 * If we've made it past our previously-stored special wakeup time, reset
+	 * it so that it can be recalculated as needed.
+	 */
+	if (next_sync_start <= GetCurrentTimestamp())
+		next_sync_start = PG_INT64_MAX;
+
 	/* We need up-to-date sync state info for subscription tables here. */
 	FetchTableStates(_tx);
 
@@ -592,6 +599,19 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
  DSM_HANDLE_INVALID);
 		hentry->last_start_time = now;
 	}
+	else if (found)
+	{
+		TimestampTz retry_time = hentry->last_start_time +
+ (wal_retrieve_retry_interval *
+  INT64CONST(1000));
+
+		/*
+		 * Store when we can start the sync worker so that we
+		 * know how long to sleep.
+		 */
+		if (retry_time < next_sync_start)
+			next_sync_start = retry_time;
+	}
 }
 			}
 		}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 79cda39445..284f11428c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -207,8 +207,6 @@
 #include "utils/syscache.h"
 #include "utils/timeout.h"
 
-#define NAPTIME_PER_CYCLE 1000	/* max sleep time between cycles (1s) */
-
 typedef struct FlushPosition
 {
 	dlist_node	node;
@@ -348,6 +346,26 @@ static XLogRecPtr skip_xact_finish_lsn = InvalidXLogRecPtr;
 /* BufFile handle of the current streaming file */
 static BufFile *stream_fd = NULL;
 
+/*
+ * Reasons to wake up and perform periodic tasks.
+ */
+typedef enum LogRepWorkerWakeupReason
+{
+	LRW_WAKEUP_TERMINATE,
+	LRW_WAKEUP_PING,
+	LRW_WAKEUP_STATUS,
+	NUM_LRW_WAKEUPS
+} LogRepWorkerWakeupReason;
+
+/*
+ * Wake up times for periodic tasks.
+ */
+static TimestampTz wakeup[NUM_LRW_WAKEUPS];
+TimestampTz next_sync_start;
+
+static void LogRepWorkerComputeNextWakeup(LogRepWorkerWakeupReason reason,
+		  TimestampTz now);
+
 typedef struct SubXactInfo
 {
 	TransactionId xid;			/* XID of the subxact */
@@ -3446,10 +3464,9 @@ UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
 static void
 LogicalRepApplyLoop(XLogRecPtr last_received)
 {
-	TimestampTz last_recv_timestamp = GetCurrentTimestamp();
-	bool		ping_sent = false;
 	TimeLineID	tli;
 	ErrorContextCallback errcallback;
+	TimestampTz now;
 
 	/*
 	 * Init the ApplyMessageContext which we clean up after each replication
@@ -3479,6 +3496,12 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 	error_context_stack = 
 	apply_error_context_stack = error_context_stack;
 
+	/* Initialize nap wakeup times. */
+	now = GetCurrentTimestamp();
+	for (int i = 0; i < NUM_LRW_WAKEUPS; i++)
+		LogRepWorkerComputeNextWakeup(i, now);
+	next_sync_start = PG_INT64_MAX;
+
 	/* This outer loop iterates once per wait. */
 	for (;;)
 	{
@@ -3495,6 +3518,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 
 		len = walrcv_receive(LogRepWorkerWalRcvConn, , );
 
+		now = GetCurrentTimestamp();
 		if (len != 0)
 		{
 			/* Loop to process all available data (without blocking). */
@@ -3518,9 +3542,9 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 	int			c;
 	StringInfoData s;
 
-	/* Reset timeout. */
-	last_recv_timestamp = GetCurrentTimestamp();
-	ping_sent = false;
+	/* Adjust the ping and terminate wakeup times. */
+	LogRepWorkerComputeNextWakeup(LRW_WAKEUP_TERMINATE, now);
+	LogRepWorkerComputeNextWakeup(LRW_WAKEUP_PING, now);
 
 	/* Ensure we are reading the data into our memory context. */
 	MemoryContextSwitchTo(ApplyMessageContext);
@@ -3574,6 +3598,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 }
 
 len = walrcv_receive(LogRepWorkerWalRcvConn, , );
+now = GetCurrentTimestamp();
 			}
 		}
 
@@ -3612,14 +3637,43 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		if (!dlist_is_empty(_mapping))
 			wait_time = WalWriterDelay;
 		else
-			wait_time = NAPTIME_PER_CYCLE;
+		{
+			TimestampTz nextWakeup = PG_INT64_MAX;
+
+			/* Find soonest wakeup time, to limit our nap. */
+			now = GetCurrentTimestamp();
+			for (int i = 0; i < NUM_LRW_WAKEUPS; i++)
+nextWakeup =

Re: pub/sub - specifying optional parameters without values.

2023-01-09 Thread Zheng Li

Hi,

This documentation change looks good to me. I verified in testing and in code 
that the value for boolean parameters in PUB/SUB commands can be omitted. which 
is equivalent to specifying TRUE. For example,

CREATE PUBLICATIOIN mypub for ALL TABLES with (publish_via_partition_root);
is equivalent to
CREATE PUBLICATIOIN mypub for ALL TABLES with (publish_via_partition_root = 
TRUE);

The behavior is due to the following code
https://github.com/postgres/postgres/blob/master/src/backend/commands/define.c#L113

Marking this as ready for committer.

The new status of this patch is: Ready for Committer

Re: wake up logical workers after ALTER SUBSCRIPTION

2023-01-09 Thread Nathan Bossart

rebased for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index cf220c3bcb..7661c0c86e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1996,6 +1996,16 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   Waiting to read or update information
about heavyweight locks.
  
+ 
+  LogicalRepLauncherDSA
+  Waiting for logical replication launcher dynamic shared memory
+  allocator access
+ 
+ 
+  LogicalRepLauncherHash
+  Waiting for logical replication launcher shared memory hash table
+  access
+ 
  
   LogicalRepWorker
   Waiting to read or update the state of logical replication
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index baff00dd74..468e5e0bf1 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1504,6 +1504,14 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
 	}
 	list_free(subworkers);
 
+	/*
+	 * Clear the last-start time for the apply worker to free up space.  If
+	 * this transaction rolls back, the launcher might restart the apply worker
+	 * before wal_retrieve_retry_interval milliseconds have elapsed, but that's
+	 * probably okay.
+	 */
+	logicalrep_launcher_delete_last_start_time(subid);
+
 	/*
 	 * Cleanup of tablesync replication origins.
 	 *
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index afb7acddaa..466917569a 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_subscription_rel.h"
 #include "funcapi.h"
+#include "lib/dshash.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -57,6 +58,25 @@ int			max_logical_replication_workers = 4;
 int			max_sync_workers_per_subscription = 2;
 int			max_parallel_apply_workers_per_subscription = 2;
 
+/* an entry in the last-start times hash table */
+typedef struct LauncherLastStartTimesEntry
+{
+	Oid		subid;
+	TimestampTz last_start_time;
+} LauncherLastStartTimesEntry;
+
+/* parameters for the last-start times hash table */
+static const dshash_parameters dsh_params = {
+	sizeof(Oid),
+	sizeof(LauncherLastStartTimesEntry),
+	dshash_memcmp,
+	dshash_memhash,
+	LWTRANCHE_LAUNCHER_HASH
+};
+
+static dsa_area *last_start_times_dsa = NULL;
+static dshash_table *last_start_times = NULL;
+
 LogicalRepWorker *MyLogicalRepWorker = NULL;
 
 typedef struct LogicalRepCtxStruct
@@ -64,6 +84,10 @@ typedef struct LogicalRepCtxStruct
 	/* Supervisor process. */
 	pid_t		launcher_pid;
 
+	/* hash table for last-start times */
+	dsa_handle	last_start_dsa;
+	dshash_table_handle last_start_dsh;
+
 	/* Background workers. */
 	LogicalRepWorker workers[FLEXIBLE_ARRAY_MEMBER];
 } LogicalRepCtxStruct;
@@ -76,6 +100,9 @@ static void logicalrep_worker_onexit(int code, Datum arg);
 static void logicalrep_worker_detach(void);
 static void logicalrep_worker_cleanup(LogicalRepWorker *worker);
 static int	logicalrep_pa_worker_count(Oid subid);
+static void logicalrep_launcher_attach_dshmem(void);
+static void logicalrep_launcher_set_last_start_time(Oid subid, TimestampTz start_time);
+static TimestampTz logicalrep_launcher_get_last_start_time(Oid subid);
 
 static bool on_commit_launcher_wakeup = false;
 
@@ -902,6 +929,9 @@ ApplyLauncherShmemInit(void)
 			memset(worker, 0, sizeof(LogicalRepWorker));
 			SpinLockInit(>relmutex);
 		}
+
+		LogicalRepCtx->last_start_dsa = DSM_HANDLE_INVALID;
+		LogicalRepCtx->last_start_dsh = DSM_HANDLE_INVALID;
 	}
 }
 
@@ -947,8 +977,6 @@ ApplyLauncherWakeup(void)
 void
 ApplyLauncherMain(Datum main_arg)
 {
-	TimestampTz last_start_time = 0;
-
 	ereport(DEBUG1,
 			(errmsg_internal("logical replication launcher started")));
 
@@ -983,58 +1011,56 @@ ApplyLauncherMain(Datum main_arg)
 
 		now = GetCurrentTimestamp();
 
-		/* Limit the start retry to once a wal_retrieve_retry_interval */
-		if (TimestampDifferenceExceeds(last_start_time, now,
-	   wal_retrieve_retry_interval))
-		{
-			/* Use temporary context for the database list and worker info. */
-			subctx = AllocSetContextCreate(TopMemoryContext,
-		   "Logical Replication Launcher sublist",
-		   ALLOCSET_DEFAULT_SIZES);
-			oldctx = MemoryContextSwitchTo(subctx);
-
-			/* search for subscriptions to start or stop. */
-			sublist = get_subscription_list();
-
-			/* Start the missing workers for enabled subscriptions. */
-			foreach(lc, sublist)
-			{
-Subscription *sub = (Subscription *) lfirst(lc);
-LogicalRepWorker *w;
+		/* Use temporary context for the database list and worker info. */
+		subctx = AllocSetContextCreate(TopMemoryContext,
+	   "Logical Replication Launcher sublist",
+

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-09 Thread Ankit Kumar Pandey




On 09/01/23 17:53, David Rowley wrote:



We need to keep looping backwards until we find the first WindowClause
which does not contain the pathkeys of the ORDER BY.


I found the cause of confusion, *first* WindowClause means first from

forward direction. Since we are looping from backward, I took it first

from the last.


eg:

select count(*) over (order by a), count(*) over (order by a,b), 
count(*) over (order by a,b,c) from abcd order by a,b,c,d;


first window clause is count(*) over (order by a) which we are using for 
order-by pushdown.



This is in sync with implementation as well.



I couldn't quite understand why the foreach() loop's
condition couldn't just be "if (foreach_current_index(l) ==
sort_pushdown_idx)", but I see that if we don't check if the path is
already correctly sorted that we could end up pushing the sort down
onto the path that's already correctly sorted.  We decided we didn't
want to move the sort around if it does not reduce the amount of
sorting.


Yes, this was the reason, the current patch handles this without is_sort 
now, which is great.



All the tests you added still pass. Although, I didn't
really study the tests yet to see if everything we talked about is
covered.


It covers general cases and exceptions. Also, I did few additional 
tests. Looked good.



It turned out the sort_pushdown_idx = foreach_current_index(l) - 1;
break; didn't work as if all the WindowClauses have pathkeys contained
in the order by pathkeys then we don't ever set sort_pushdown_idx. I
adjusted it to do:



if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
   sort_pushdown_idx = foreach_current_index(l);
else
  break;


Yes, that would have been problematic. I have verified this case

and on related note, I have added a test case that ensure order by pushdown

shouldn't happen if window function's order by is superset of query's 
order by.



I also fixed up the outdated comments and changed it so we only set
orderby_pathkeys once instead of once per loop in the
foreach_reverse() loop.


Thanks, code look a lot neater now (is_sorted is gone and handled in a 
better way).



I gave some thought to whether doing foreach_delete_current() is safe
within a foreach_reverse() loop. I didn't try it, but I couldn't see
any reason why not.  It is pretty late here and I'd need to test that
to be certain. If it turns out not to be safe then we need to document
that fact in the comments of the foreach_reverse() macro and the
foreach_delete_current() macro.


I tested foreach_delete_current inside foreach_reverse loop.

It worked fine.


I have attached patch with one extra test case (as mentioned above). 
Rest of the changes are looking fine.


Ran pgbench again and optimized version still had a lead (168 tps vs 135 
tps) in performance.



Do we have any pending items for this patch now?

--

Regards,
Ankit Kumar Pandey
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..cf59ebfe87 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4421,8 +4421,14 @@ create_one_window_path(PlannerInfo *root,
 	   List *activeWindows)
 {
 	PathTarget *window_target;
+	Query	   *parse = root->parse;
 	ListCell   *l;
 	List	   *topqual = NIL;
+	List	   *window_pathkeys;
+	List	   *orderby_pathkeys = NIL;
+	int			sort_pushdown_idx = -1;
+	int			presorted_keys;
+	bool		is_sorted;
 
 	/*
 	 * Since each window clause could require a different sort order, we stack
@@ -4441,17 +4447,99 @@ create_one_window_path(PlannerInfo *root,
 	 */
 	window_target = input_target;
 
+	/*
+	 * Here we attempt to minimize the number of sorts which must be performed
+	 * for queries with an ORDER BY clause.
+	 *
+	 * It's possible due to select_active_windows() putting the WindowClauses
+	 * with the lowest tleSortGroupRef last in the activeWindows list that the
+	 * final WindowClause has a subset of the sort requirements that the
+	 * query's ORDER BY clause has.  Below we try to detect this case and if
+	 * we find this is true, we consider adjusting the sort that's done for
+	 * WindowAggs and include the additional clauses so that no additional
+	 * sorting is required for the query's ORDER BY clause.
+	 *
+	 * We don't try this optimization for the following cases:
+	 *
+	 * 1.	If the query has a DISTINCT clause.  We needn't waste any
+	 *		additional effort for the more strict sort order as if DISTINCT
+	 *		is done via Hash Aggregate then that's going to undo this work.
+	 *
+	 * 2.	If the query has a LIMIT clause.  The top-level sort will be
+	 *		able to use a top-n sort which might be more efficient than
+	 *		sorting	by the additional columns.  If the LIMIT does not reduce
+	 *		the	number of rows significantly then this might not be true, but
+	 *		we don't try to consider that here.
+	 *
+	 * 3.	If the top-level WindowClause has a runCondition then this can
+	 *		filter out tuples and make the final sort

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-01-09 Thread Matthias van de Meent

On Sun, 8 Jan 2023 at 04:09, Andres Freund  wrote:
>
> Hi,
>
> On 2023-01-07 16:29:23 -0800, Andres Freund wrote:
> > It's probably not too hard to fix specifically in this one place - we could
> > just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but
> > it strikes me as as a somewhat larger issue for the 64it xid 
> > infrastructure. I
> > suspect this might not be the only place running into problems with such
> > "before the universe" xids.
>
> I haven't found other problematic places in HEAD, but did end up find a less
> serious version of this bug in < 14: GetFullRecentGlobalXmin(). I did verify
> that with vacuum_defer_cleanup_age set GetFullRecentGlobalXmin() returns
> values that look likely to cause problems. Its "just" used in gist luckily.
>
> It's hard to find places that do this kind of arithmetic, we traditionally
> haven't had a helper for it. So it's open-coded in various ways.
> [...]
>
> The currently existing places I found, other than the ones in procarray.c,
> luckily don't seem to convert the xids to 64bit xids.

That's good to know.

> > For a bit I was thinking that we should introduce the notion that a
> > FullTransactionId can be from the past. Specifically that when the upper 
> > 32bit
> > are all set, we treat the lower 32bit as a value from before xid 0 using the
> > normal 32bit xid arithmetic.  But it sucks to add overhead for that
> > everywhere.
> >
> > It might be a bit more palatable to designate one individual value,
> > e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far
> > before the start of the universe an xid point to...
>
> On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I
> hacked up a patch that converts various fxid functions to inline functions
> with such assertions, and it indeed quickly catches the problem this thread
> reported, close to the source of the use.

Wouldn't it be enough to only fix the constructions in
FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does
not occur with the patch), and (optionally) bump the first XID
available for any cluster to (FirstNormalXid + 1) to retain the 'older
than any running transaction' property?

The change only fixes the issue for FullTransactionId, which IMO is OK
- I don't think we need to keep xid->xid8->xid symmetric in cases of
xid8 wraparound.

> One issue with that is is that it'd reduce what can be input for the xid8
> type. But it's hard to believe that'd be a real issue?

Yes, it's unlikely anyone would ever hit that with our current WAL
format - we use 24 bytes /xid just to log it's use, so we'd use at
most epoch 0x1000_ in unrealistic scenarios. In addition;
technically, we already have (3*2^32 - 3) "invalid" xid8 values that
can never be produced in FullXidRelativeTo - those few extra invalid
values don't matter much to me except "even more special casing".

Kind regards,

Matthias van de Meent.


v1-0001-Prevent-underflow-of-xid8-epoch.patch
Description: Binary data

Re: Make EXPLAIN generate a generic plan for a parameterized query

2023-01-09 Thread Laurenz Albe

On Tue, 2022-12-27 at 14:37 -0800, Michel Pelletier wrote:
> I built and tested this patch for review and it works well, although I got 
> the following warning when building:
> 
> analyze.c: In function 'transformStmt':
> analyze.c:2919:35: warning: 'generic_plan' may be used uninitialized in this 
> function [-Wmaybe-uninitialized]
>  2919 |         pstate->p_generic_explain = generic_plan;
>       |         ~~^~
> analyze.c:2909:25: note: 'generic_plan' was declared here
>  2909 |         bool            generic_plan;
>       |                         ^~~~

Thanks for checking.  The variable should indeed be initialized, although
my compiler didn't complain.

Attached is a fixed version.

Yours,
Laurenz Albe
From baf60d9480d8022866d1ed77b00c7b8506f97f70 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 9 Jan 2023 17:37:40 +0100
Subject: [PATCH] Add EXPLAIN option GENERIC_PLAN

This allows EXPLAIN to generate generic plans for parameterized
statements (that have parameter placeholders like $1 in the
statement text).

Author: Laurenz Albe
Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel%40cybertec.at
---
 doc/src/sgml/ref/explain.sgml | 15 +++
 src/backend/commands/explain.c|  9 +
 src/backend/parser/analyze.c  | 13 +
 src/backend/parser/parse_coerce.c | 15 +++
 src/backend/parser/parse_expr.c   | 16 
 src/include/commands/explain.h|  1 +
 src/include/parser/parse_node.h   |  2 ++
 src/test/regress/expected/explain.out | 14 ++
 src/test/regress/sql/explain.sql  |  5 +
 9 files changed, 90 insertions(+)

diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index d4895b9d7d..221f905a59 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -40,6 +40,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] statementboolean ]
 COSTS [ boolean ]
 SETTINGS [ boolean ]
+GENERIC_PLAN [ boolean ]
 BUFFERS [ boolean ]
 WAL [ boolean ]
 TIMING [ boolean ]
@@ -167,6 +168,20 @@ ROLLBACK;
 

 
+   
+GENERIC_PLAN
+
+ 
+  Generate a generic plan for the statement (see 
+  for details about generic plans).  The statement can contain parameter
+  placeholders like $1 and must be a statement that can
+  use parameters.  This option cannot be used together with
+  ANALYZE, since a statement with unknown parameters
+  cannot be executed.
+ 
+
+   
+

 BUFFERS
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e4621ef8d6..7ee3d24da2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -190,6 +190,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 			es->wal = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "settings") == 0)
 			es->settings = defGetBoolean(opt);
+		else if (strcmp(opt->defname, "generic_plan") == 0)
+			es->generic = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "timing") == 0)
 		{
 			timing_set = true;
@@ -227,6 +229,13 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	 parser_errposition(pstate, opt->location)));
 	}
 
+	/* check that GENERIC_PLAN is not used with EXPLAIN ANALYZE */
+	if (es->generic && es->analyze)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN ANALYZE cannot be used with GENERIC_PLAN")));
+
+	/* check that WAL is used with EXPLAIN ANALYZE */
 	if (es->wal && !es->analyze)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 5b90974e83..8b56eadf7e 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -27,6 +27,7 @@
 #include "access/sysattr.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
+#include "commands/defrem.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -2905,6 +2906,18 @@ static Query *
 transformExplainStmt(ParseState *pstate, ExplainStmt *stmt)
 {
 	Query	   *result;
+	ListCell   *lc;
+	bool		generic_plan = false;
+
+	foreach(lc, stmt->options)
+	{
+		DefElem*opt = (DefElem *) lfirst(lc);
+
+		if (strcmp(opt->defname, "generic_plan") == 0)
+			generic_plan = defGetBoolean(opt);
+		/* don't "break", as we want the last value */
+	}
+	pstate->p_generic_explain = generic_plan;
 
 	/* transform contained query, allowing SELECT INTO */
 	stmt->query = (Node *) transformOptionalSelectInto(pstate, stmt->query);
diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
index 34757da0fa..b9c8dc1a25 100644
--- a/src/backend/parser/parse_coerce.c
+++ b/src/backend/parser/parse_coerce.c
@@ -369,6 +369,21 @@ coerce_type(ParseState *pstate, Node *node,
 
 		return result;
 	}
+	/*
+	 * If we are to generate a generic plan for EXPLAIN, simply

Re: MERGE ... RETURNING

2023-01-09 Thread Vik Fearing


On 1/9/23 13:29, Dean Rasheed wrote:

On Sun, 8 Jan 2023 at 20:09, Isaac Morland  wrote:


Would it be useful to have just the action? Perhaps "WITH ACTION"? My idea is that this 
would return an enum of INSERT, UPDATE, DELETE (so is "action" the right word?). It seems 
to me in many situations I would be more likely to care about which of these 3 happened rather than 
the exact clause that applied. This isn't necessarily meant to be instead of your suggestion 
because I can imagine wanting to know the exact clause, just an alternative that might suffice in 
many situations. Using it would also avoid problems arising from editing the query in a way which 
changes the numbers of the clauses.



Hmm, perhaps that's something that can be added as well. Both use
cases seem useful.


Bikeshedding here.  Instead of Yet Another WITH Clause, could we perhaps 
make a MERGING() function analogous to the GROUPING() function that goes 
with grouping sets?


MERGE ...
RETURNING *, MERGING('clause'), MERGING('action');

Or something.
--
Vik Fearing

Re: [PATCH] random_normal function

2023-01-09 Thread Dean Rasheed

On Mon, 9 Jan 2023 at 15:26, Tom Lane  wrote:
>
> Dean Rasheed  writes:
> > So IMO all pseudorandom functions should share the same PRNG state and
> > seed-setting functions. That would mean they should all be in the same
> > (new) C file, so that the PRNG state can be kept private to that file.
>
> I agree with this in principle, but I feel no need to actually reshuffle
> the code before we accept a proposal for such a function that wouldn't
> logically belong in float.c.
>
> > I think it would also make sense to add a new "Random Functions"
> > section to the docs, and move the descriptions of random(),
> > random_normal() and setseed() there.
>
> Likewise, this'll just add confusion in the short run.  A 
> with only three functions in it is going to look mighty odd.
>

OK, that's fair enough, while we're just adding random_normal().

BTW, "UUID Functions" only has 1 function in it :-)

Regards,
Dean

psql: Add role's membership options to the \du+ command

2023-01-09 Thread Pavel Luzanov


When you include one role in another, you can specify three options:
ADMIN, INHERIT (added in e3ce2de0) and SET (3d14e171).

For example.

CREATE ROLE alice LOGIN;

GRANT pg_read_all_settings TO alice WITH ADMIN TRUE, INHERIT TRUE, SET TRUE;
GRANT pg_stat_scan_tables TO alice WITH ADMIN FALSE, INHERIT FALSE, SET 
FALSE;

GRANT pg_read_all_stats TO alice WITH ADMIN FALSE, INHERIT TRUE, SET FALSE;

For information about the options, you need to look in the pg_auth_members:

SELECT roleid::regrole, admin_option, inherit_option, set_option
FROM pg_auth_members
WHERE member = 'alice'::regrole;
    roleid    | admin_option | inherit_option | set_option
--+--++
 pg_read_all_settings | t    | t  | t
 pg_stat_scan_tables  | f    | f  | f
 pg_read_all_stats    | f    | t  | f
(3 rows)

I think it would be useful to be able to get this information with a 
psql command

like \du (and \dg). With proposed patch the \du command still only lists
the roles of which alice is a member:

\du alice
 List of roles
 Role name | Attributes |  Member of
---++--
 alice |    | 
{pg_read_all_settings,pg_read_all_stats,pg_stat_scan_tables}


But the \du+ command adds information about the selected ADMIN, INHERIT
and SET options:

\du+ alice
    List of roles
 Role name | Attributes |   Member of   
| Description

---++---+-
 alice |    | pg_read_all_settings WITH ADMIN, INHERIT, SET+|
   |    | pg_read_all_stats WITH INHERIT   +|
   |    | pg_stat_scan_tables   |

One more change. The roles in the "Member of" column are sorted for both
\du+ and \du for consistent output.

Any comments are welcome.

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 8a5285da9a..ef3e87fa32 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1724,9 +1724,8 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
 S modifier to include system roles.
 If pattern is specified,
 only those roles whose names match the pattern are listed.
-If the form \dg+ is used, additional information
-is shown about each role; currently this adds the comment for each
-role.
+If the form \dg+ is used, each role is listed
+with its associated description and options for each role's membership.
 
 
   
@@ -1964,9 +1963,8 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
 S modifier to include system roles.
 If pattern is specified,
 only those roles whose names match the pattern are listed.
-If the form \du+ is used, additional information
-is shown about each role; currently this adds the comment for each
-role.
+If the form \du+ is used, each role is listed
+with its associated description and options for each role's membership.
 
 
   
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index df166365e8..c7b10f96f3 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3624,11 +3624,29 @@ describeRoles(const char *pattern, bool verbose, bool showSystem)
 	printfPQExpBuffer(,
 	  "SELECT r.rolname, r.rolsuper, r.rolinherit,\n"
 	  "  r.rolcreaterole, r.rolcreatedb, r.rolcanlogin,\n"
-	  "  r.rolconnlimit, r.rolvaliduntil,\n"
-	  "  ARRAY(SELECT b.rolname\n"
-	  "FROM pg_catalog.pg_auth_members m\n"
-	  "JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n"
-	  "WHERE m.member = r.oid) as memberof");
+	  "  r.rolconnlimit, r.rolvaliduntil,\n");
+
+	if (verbose)
+	{
+		appendPQExpBufferStr(,
+			 "  (SELECT string_agg (quote_ident(b.rolname) || regexp_replace(\n"
+			 " CASE WHEN m.admin_option THEN ', ADMIN' ELSE '' END ||\n"
+			 " CASE WHEN m.inherit_option THEN ', INHERIT' ELSE '' END ||\n"
+			 " CASE WHEN m.set_option THEN ', SET' ELSE '' END, \n"
+			 " '^,', ' WITH', 1), E'\\n' ORDER BY b.rolname)\n"
+			 "  FROM pg_catalog.pg_auth_members m\n"
+			 "  JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n"
+			 "  WHERE m.member = r.oid) as memberof");
+	}
+	else
+	{
+		appendPQExpBufferStr(,
+			 "  ARRAY(SELECT b.rolname\n"
+			 "FROM pg_catalog.pg_auth_members m\n"
+			 "JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n"
+			 "WHERE m.member = r.oid\n"
+			 "ORDER

Re: doc: add missing "id" attributes to extension packaging page

2023-01-09 Thread Karl O. Pinc

On Mon, 9 Jan 2023 08:09:02 +0100
Brar Piening  wrote:

> On 09.01.2023 at 03:31, vignesh C wrote:
> > The patch does not apply on top of HEAD as in [1], please post a
> > rebased patch:  

> This one applies on top of 3c569049b7b502bb4952483d19ce622ff0af5fd6
> and the documentation build succeeds. Beyond rebasing I've added a
> few more ids (to make the other patch
> (make_html_ids_discoverable.patch) build without warnings again) but
> nothing that would justify another review.

Agreed.  I believe that as long as your system has xmllint installed
and the documentation builds there's not a lot that can go wrong.
This patch only adds lots-of-id attributes.

> We probably have to move quickly with this patch since it touches
> pretty much any file in the documentation and will be outdated in a
> minute.

+1

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: PL/pgSQL cursors should get generated portal names by default

2023-01-09 Thread Pavel Stehule

Hi

I wrote a new check in plpgsql_check, that tries to identify explicit work
with the name of the referenced portal.

create or replace function foo01()
returns refcursor as $$#option dump
declare
  c cursor for select 1;
  r refcursor;
begin
  open c;
  r := 'c';
  return r;
end;
$$ language plpgsql;
CREATE FUNCTION
(2023-01-09 16:49:10) postgres=# select * from
plpgsql_check_function('foo01', compatibility_warnings => true);
┌───┐
│  plpgsql_check_function
│
╞═══╡
│ compatibility:0:7:assignment:obsolete setting of refcursor or cursor
variable │
│ Detail: Internal name of cursor should not be specified by users.
│
│ Context: at assignment to variable "r" declared on line 4
│
│ warning extra:0:3:DECLARE:never read variable "c"
│
└───┘
(4 rows)

Regards

Pavel

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-01-09 Thread Andrew Dunstan



On 2023-01-09 Mo 10:07, Jelte Fennema wrote:
> Thanks for clarifying your reasoning. I now agree that ssrootcert=system
> is now the best option.


Cool, that looks like a consensus.


>
>>> 2. Should we allow the same approach with ssl_ca_file on the server side, 
>>> for client cert validation?
>> I don't know enough about this use case to implement it safely. We'd
>> have to make sure the HBA entry is checking the hostname (so that we
>> do the reverse DNS dance), and I guess we'd need to introduce a new
>> clientcert verify-* mode? Also, it seems like server operators are
>> more likely to know exactly which roots they need, at least compared
>> to clients. I agree the feature is useful, but I'm not excited about
>> attaching it to this patchset.


I'm confused. A client cert might not have a hostname at all, and isn't
used to verify the connecting address, but to verify the username. It
needs to have a CN/DN equal to the user name of the connection, or that
maps to that name via pg_ident.conf.


cheers


andrew

-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-01-09 Thread Jim Jones


Hi Christoph,

Thanks for the patch! I just tested it and I could reproduce the 
expected behaviour in these cases:


postgres=# CREATE VIEW w
AS  WITH (

postgres=# CREATE OR REPLACE VIEW w
AS  WITH (

postgres=# CREATE VIEW w WITH (
CHECK_OPTION  SECURITY_BARRIER  SECURITY_INVOKER

postgres=# CREATE OR REPLACE VIEW w WITH (
CHECK_OPTION  SECURITY_BARRIER  SECURITY_INVOKER

postgres=# ALTER VIEW w
ALTER COLUMN  OWNER TO  RENAME    RESET (   SET ( 
SET SCHEMA


postgres=# ALTER VIEW w RESET (
CHECK_OPTION  SECURITY_BARRIER  SECURITY_INVOKER

postgres=# ALTER VIEW w SET (check_option =
CASCADED  LOCAL

However, an "ALTER TABLE  S" does not complete the open 
parenthesis "(" from "SET (", as suggested in "ALTER VIEW  ".


postgres=# ALTER VIEW w SET
Display all 187 possibilities? (y or n)

Is it intended to behave like this? If so, an "ALTER VIEW  
RES" does complete the open parenthesis -> "RESET (".


Best,
Jim

On 09.12.22 11:31, Christoph Heiss wrote:

Thanks for the review!

On 12/8/22 12:19, Melih Mutlu wrote:

Hi Christoph,

I just took a quick look at your patch.
Some suggestions:

    +   else if (Matches("ALTER", "VIEW", MatchAny, "SET", "("))
    +       COMPLETE_WITH_LIST(view_optional_parameters);
    +   /* ALTER VIEW xxx RESET ( yyy , ... ) */
    +   else if (Matches("ALTER", "VIEW", MatchAny, "RESET", "("))
    +       COMPLETE_WITH_LIST(view_optional_parameters);


What about combining these two cases into one like Matches("ALTER", 
"VIEW", MatchAny, "SET|RESET", "(") ?

Good thinking, incorporated that into v2.
While at it, I also added completion for the values of the SET 
parameters, since that is useful as well.




     /* ALTER VIEW  */
     else if (Matches("ALTER", "VIEW", MatchAny))
         COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME",
                       "SET SCHEMA");


Also seems like SET and RESET don't get auto-completed for "ALTER 
VIEW ".

I think it would be nice to include those missing words.

That is already contained in the patch:

@@ -2139,7 +2146,7 @@ psql_completion(const char *text, int start, int 
end)

 /* ALTER VIEW  */
 else if (Matches("ALTER", "VIEW", MatchAny))
 COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME",
-  "SET SCHEMA");
+  "SET SCHEMA", "SET (", "RESET (");

Thanks,
Christoph


smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH] random_normal function

2023-01-09 Thread Tom Lane

Dean Rasheed  writes:
> So IMO all pseudorandom functions should share the same PRNG state and
> seed-setting functions. That would mean they should all be in the same
> (new) C file, so that the PRNG state can be kept private to that file.

I agree with this in principle, but I feel no need to actually reshuffle
the code before we accept a proposal for such a function that wouldn't
logically belong in float.c.

> I think it would also make sense to add a new "Random Functions"
> section to the docs, and move the descriptions of random(),
> random_normal() and setseed() there.

Likewise, this'll just add confusion in the short run.  A 
with only three functions in it is going to look mighty odd.

regards, tom lane

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-01-09 Thread Jelte Fennema

Thanks for clarifying your reasoning. I now agree that ssrootcert=system
is now the best option.

> > 2. Should we allow the same approach with ssl_ca_file on the server side, 
> > for client cert validation?
>
> I don't know enough about this use case to implement it safely. We'd
> have to make sure the HBA entry is checking the hostname (so that we
> do the reverse DNS dance), and I guess we'd need to introduce a new
> clientcert verify-* mode? Also, it seems like server operators are
> more likely to know exactly which roots they need, at least compared
> to clients. I agree the feature is useful, but I'm not excited about
> attaching it to this patchset.

The main thing would be to have ssl_ca_file=system check against
the certs from the system CA store. And probably we would want
to disallow clientcert=verify-ca when ssl_ca_file is set to system.
Other than that I don't think anything is necessary. I definitely agree
that this patch should not be blocked on this. But it seems simple
enough to implement and imho it would be a bit confusing if ssl_ca_file
does not support the "system" value, but sslrootcert does.

I also took a closer look at the code, and the only comment I have is:

> appendPQExpBuffer(>errorMessage,

These calls can all be replaced by the recently added libpq_append_conn_error

Finally I tested this against a Postgres server I created on Azure and
the new value works as expected. The only thing that I think would be
good to change is the error message when sslmode=verify-full, and
sslrootcert is not provided, but ~/.postgresql/root.crt is also not available.
I think it would be good for the error to mention sslrootcert=system

> psql: error: connection to server at "xxx.postgres.database.azure.com" 
> (123.456.789.123), port 5432 failed: root certificate file 
> "/home/jelte/.postgresql/root.crt" does not exist
> Either provide the file or change sslmode to disable server certificate 
> verification.

Changing that last line to something like (maybe removing the part
about disabling server certificate verification):
> Either provide the file using the sslrootcert parameter, or use 
> sslrootert=system to use the OS certificate store, or change sslmode to 
> disable server certificate verification.

On Fri, 6 Jan 2023 at 19:20, Jacob Champion  wrote:
>
> On Fri, Jan 6, 2023 at 2:18 AM Jelte Fennema  wrote:
> >
> > Huge +1 from me. On Azure we're already using public CAs to sign 
> > certificates for our managed postgres offerings[1][2]. Right now, our 
> > customers have to go to the hassle of downloading a specific root cert or 
> > finding their OS default location. Neither of these allow us to give users 
> > a simple copy-pastable connection string that uses secure settings. This 
> > would change this and make it much easier for our customers to use secure 
> > connections to their database.
>
> Thanks! Timescale Cloud is in the same boat.
>
> > I have two main questions:
> > 1. From the rest of the thread it's not entirely clear to me why this patch 
> > goes for the sslrootcert=system approach, instead of changing what 
> > sslrootcert='' means when using verify-full. Like Tom Lane suggested, we 
> > could change it to try ~/.postgresql/root.crt and if that doesn't exist 
> > make it try the system store, instead of erroring out like it does now when 
> > ~/.postgresql/root.crt doesn't exist.
>
> I mentioned it briefly upthread, but to expand: For something this
> critical to security, I don't like solutions that don't say exactly
> what they do. What does the following connection string mean?
>
> $ psql 'host=example.org sslmode=verify-full'
>
> If it sometimes means to use root.crt (if one exists) and sometimes to
> use the system store, then
> 1) it's hard to audit the actual behavior without knowing the state of
> the filesystem, and
> 2) if I want to connect to a server using the system store, and *only*
> the system store, then I have to make sure that the default root.crt
> never exists. But what if other software on my system relies on it?
>
> It also provides a bigger target for exploit chains, because I can
> remove somebody's root.crt file and their connections may try to
> continue with an effectively weaker verification level instead of
> erroring out. I realize that for many people this is a nonissue ("if
> you can delete the root cert, you can probably do much worse") but IME
> arbitrary file deletion vulnerabilities are treated with less concern
> than arbitrary file writes.
>
> By contrast,
>
> $ psql 'host=example.org sslrootcert=system sslmode=verify-full'
>
> has a clear meaning. We'll never use a root.crt.
>
> (A downside of reusing sslrootcert like this is the cross-version
> hazard. The connection string 'host=example.org sslrootcert=system'
> means something strong with this patchset, but something very weak to
> libpq 15 and prior. So clients should probably continue to specify
> sslmode=verify-full explicitly for the foreseeable future.)
>
> >

Re: Fix pg_publication_tables to exclude generated columns

2023-01-09 Thread Tom Lane

Amit Kapila  writes:
> On Mon, Jan 9, 2023 at 5:29 PM shiy.f...@fujitsu.com
>  wrote:
>> I think one way to fix it is to modify pg_publication_tables query to exclude
>> generated columns. But in this way, we need to bump catalog version when 
>> fixing
>> it in back-branch. Another way is to modify function
>> pg_get_publication_tables()'s return value to contain all supported columns 
>> if
>> no column list is specified, and we don't need to change system view.

> That sounds like a reasonable approach to fix the issue.

We could just not fix it in the back branches.  I'd argue that this is
as much a definition change as a bug fix, so it doesn't really feel
like something to back-patch anyway.

regards, tom lane

Re: Add SHELL_EXIT_CODE to psql

2023-01-09 Thread Maxim Orlov

Hi!

In overall, I think we move in the right direction. But we could make code
better, should we?

+   /* Capture exit code for SHELL_EXIT_CODE */
+   close_exit_code = pclose(fd);
+   if (close_exit_code == -1)
+   {
+   pg_log_error("%s: %m", cmd);
+   error = true;
+   }
+   if (WIFEXITED(close_exit_code))
+   exit_code=WEXITSTATUS(close_exit_code);
+   else if(WIFSIGNALED(close_exit_code))
+   exit_code=WTERMSIG(close_exit_code);
+   else if(WIFSTOPPED(close_exit_code))
+   exit_code=WSTOPSIG(close_exit_code);
+   if (exit_code)
+   error = true;
I think, it's better to add spaces around middle if block. It will be easy
to read.
Also, consider, adding spaces around assignment in this block.

+   /*
+   snprintf(exit_code_buf, sizeof(exit_code_buf), "%d",
WEXITSTATUS(exit_code));
+   */
Probably, this is not needed.


> 1. pg_regress now creates an environment variable called PG_OS_TARGET
Maybe, we can use env "OS"? I do not know much about Windows, but I think
this is kind of standard environment variable there.

-- 
Best regards,
Maxim Orlov.

Re: Timeout when changes are filtered out by the core during logical replication

2023-01-09 Thread Ashutosh Bapat

On Fri, Dec 23, 2022 at 2:45 PM Amit Kapila  wrote:

> On Thu, Dec 22, 2022 at 6:58 PM Ashutosh Bapat
>  wrote:
> >
> > Hi All,
> > A customer ran a script dropping a few dozens of users in a transaction.
> Before dropping a user they change the ownership of the tables owned by
> that user to another user and revoking all the accesses from that user in
> the same transaction. There were a few thousand tables whose privileges and
> ownership was changed by this transaction. Since all of these changes were
> in catalog table, those changes were filtered out in
> ReorderBufferProcessTXN()
> > by the following code
> >if (!RelationIsLogicallyLogged(relation))
> > goto change_done;
> >
> > I tried to reproduce a similar situation through the attached TAP test.
> For 500 users and 1000 tables, we see that the transaction takes
> significant time but logical decoding does not take much time. So with the
> default 1 min WAL sender and receiver timeout I could not reproduce the
> timeout. Beyond that our TAp test itself times out.
> >
> > But I think there's a possibility that the logical receiver will time
> out this way when decoding a sufficiently large transaction which takes
> more than the timeout amount of time to decode. So I think we need to call
> OutputPluginUpdateProgress() after a regular interval (in terms of time or
> number of changes) to consume any feedback from the subscriber or send a
> keep-alive message.
> >
>
> I don't think it will be a good idea to directly call
> OutputPluginUpdateProgress() from reorderbuffer.c. There is already a
> patch to discuss this problem [1].
>

Yeah. I don't mean to use OutputPluginUpdateProgress() directly. The patch
just showed that it helps calling it there in some way. Thanks for pointing
the other thread. I have reviewed the patch on that thread and continue the
discussion there.


>
> > Following commit
> > ```
> > commit 87c1dd246af8ace926645900f02886905b889718
> > Author: Amit Kapila 
> > Date:   Wed May 11 10:12:23 2022 +0530
> >
> > Fix the logical replication timeout during large transactions.
> >
> >  ```
> > fixed a similar problem when the changes were filtered by an output
> plugin, but in this case the changes are not being handed over to the
> output plugin as well. If we fix it in the core we may not need to handle
> it in the output plugin as that commit does. The commit does not have a
> test case which I could run to reproduce the timeout.
> >
>
> It is not evident how to write a stable test for this because
> estimating how many changes are enough for the configured
> wal_receiver_timeout to
> pass on all the buildfarm machines is tricky. If you have good ideas
> then feel free to propose a test patch.
>

Will continue this on the other thread.

-- 
Best Wishes,
Ashutosh

1 2 >

1 - 100 of 135 matches

Mail list logo