date:20231016

On Tue, Oct 17, 2023 at 01:40:18AM -0400, Tom Lane wrote:
> makes the failure go away.  Unfortunately, I've not yet found another
> way to make it go away :-(.  My upthread idea of using a local variable
> instead of result->time is no help, and some other random code
> alterations didn't change the results either.

That may be a long shot, but even a modulo?  Say in these two code
paths:
-   while (result->time >= USECS_PER_DAY)
-   result->time -= USECS_PER_DAY;
+   if (result->time >= USECS_PER_DAY)
+   result->time %= USECS_PER_DAY;

> Not sure where we go from here.  While I don't have much hesitation
> about blowing off xlc_r 12.1, it would be sad if their latest
> toolchain doesn't work either.  (I didn't try permuting the code
> while using the newer compiler, though.)

We've spent a lot of time on that.  I'm OK to just give up, trim the
values of the view with a qual, and call it a day.
--
Michael


signature.asc
Description: PGP signature

Re: interval_ops shall stop using btequalimage (deduplication)

2023-10-16 Thread Donghang Lin

> I've also caught btree posting lists where one TID refers to a '1d' heap
> tuple, while another TID refers to a '24h' heap tuple.  amcheck complains.
Index-only scans can return the '1d' bits where the actual tuple had the
'24h'
bits.

Have a build without the patch. I can't reproduce amcheck complaints in
release mode
where all these statements succeed.
  create table t (c interval);
  insert into t select x from generate_series(1,500), (values ('1 year 1
month'::interval), ('1 year 30 days')) t(x);
  select distinct c from t;
  create index ti on t (c);
  select bt_index_check('ti'::regclass);
On following query, the query seems to return the right result sets,
index-only scan doesn't seem to be mislead by the misuse of btequalimage

  postgres=# vacuum (INDEX_CLEANUP on) t;
VACUUM
postgres=# explain (analyze, buffers) select c::text, count(c) from t group
by c::text;
   QUERY PLAN
-
 HashAggregate  (cost=49.27..49.29 rows=1 width=40) (actual
time=3.329..3.333 rows=2 loops=1)
   Group Key: (c)::text
   Batches: 1  Memory Usage: 24kB
   Buffers: shared hit=6
   ->  Index Only Scan using ti on t  (cost=0.28..44.27 rows=1000 width=48)
(actual time=0.107..2.269 rows=1000 loops=1)
 Heap Fetches: 0
 Buffers: shared hit=6
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.319 ms
 Execution Time: 3.432 ms
(11 rows)

postgres=# select c::text, count(c) from t group by c::text;
   c| count
+---
 1 year 1 mon   |   500
 1 year 30 days |   500
(2 rows)

>  * Generic "equalimage" support function.
> *
> * B-Tree operator classes whose equality function could safely be
replaced by
> * datum_image_eq() in all cases can use this as their "equalimage" support
> * function.
It seems to me that as long as a data type has a deterministic sort but not
necessarily be equalimage,
it should be able to support deduplication.  e.g for interval type, we add
a byte wise tie breaker
after '24h' and '1day' are compared equally. In the btree, '24h' and '1day'
are still adjacent,
'1day' is always sorted before '24h' in a btree page, can we do dedup for
each value
without problem?
The assertion will still be triggered as it's testing btequalimage, but
I'll defer it for now.
Wanted to know if the above idea sounds sane first...


Donghang Lin
(ServiceNow)


On Thu, Oct 12, 2023 at 4:22 PM Peter Geoghegan  wrote:

> On Thu, Oct 12, 2023 at 4:10 PM Noah Misch  wrote:
> > > exactly one case like that post-fix (interval_ops is at least the only
> > > affected core code opfamily), so why not point that out directly with
> > > a HINT? A HINT could go a long way towards putting the problem in
> > > context, without really adding a special case, and without any real
> > > question of users being misled.
> >
> > Works for me.  Added.
>
> Looks good. Thanks!
>
> --
> Peter Geoghegan
>
>
>

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-16 Thread vignesh C

On Mon, 16 Oct 2023 at 14:44, Amit Kapila  wrote:
>
> On Sat, Oct 14, 2023 at 10:45 AM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Here is a new patch.
> >
> > Previously I wrote:
> > > Based on above idea, I made new version patch which some functionalities 
> > > were
> > > exported from pg_resetwal. In this approach, pg_upgrade itself removed 
> > > WALs and
> > > then create logical slots, then pg_resetwal would be called with new 
> > > option
> > > --no-switch, which avoid to switch a WAL segment file. The option is only 
> > > used
> > > for the upgrading purpose so it is not written in doc and usage(). This 
> > > option
> > > is not required if pg_resetwal -o does not discard WAL records. Please 
> > > see the
> > > fork thread [1].
> >
> > But for now, these changes were reverted because changing pg_resetwal -o 
> > stuff
> > may be a bit risky. This has been located more than ten years so that we 
> > should
> > be more careful for modifying.
> > Also, I cannot come up with problems if slots are created after the 
> > pg_resetwal.
> > Background processes would not generate decodable changes (listed in [1]), 
> > and
> > BGworkers by extensions could be ignored [2].
> > Based on the discussion on forked thread [3] and if it is accepted, we will 
> > apply
> > again.
> >

1) Should this:
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
be:
"Tests for upgrading logical replication slots"

2)  This statement is not entirely true:
+ 
+  
+   The old cluster has replicated all the changes to subscribers.
+  

If we have some changes like shutdown_checkpoint the upgrade passes,
if we have some changes like create view whose changes will not be
replicated the upgrade fails.

3) All these includes are not required except for "logical.h"
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@

 #include "postgres.h"

+#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
+#include "funcapi.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/slot.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"

4) We could print two_phase as true/false instead of 0/1:
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+   /* Quick return if there are no logical slots. */
+   if (slot_arr->nslots == 0)
+   return;
+
+   pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+   for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+   {
+   LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+   pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",
two_phase: %d",
+  slot_info->slotname,
+  slot_info->plugin,
+  slot_info->two_phase);
+   }
+}

5) test passes without the below, maybe this is not required:
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#   tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+   "SELECT count(*) FROM
pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);

6) This message "run of pg_upgrade of old cluster with idle
replication slots" seems wrong:
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+   [
+   'pg_upgrade', '--no-sync',
+   '-d', $old_publisher->data_dir,
+   '-D', $new_publisher->data_dir,
+   '-b', $bindir,
+   '-B', $bindir,
+   '-s', $new_publisher->host,
+   '-p', $old_publisher->port,
+   '-P', $new_publisher->port,
+   $mode,
+   ],
+   1,
+   [
+   qr/Your installation contains invalid logical
replication slots./
+   ],
+   [qr//],
+   'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+   "pg_upgrade_output.d/ not removed after pg_upgrade failure");

7) You could run pgindent and pgperlytidy, it shows there are few
issues present with the patch.

Regards,
Vignesh

Re: False "pg_serial": apparent wraparound” in logs

On Mon, Oct 16, 2023 at 04:58:31PM +0900, Michael Paquier wrote:
> On Sat, Oct 14, 2023 at 07:29:54PM +, Imseih (AWS), Sami wrote:
>> After looking at this a bit more, I don't think the previous rev is correct.
>> We should not fall through to the " The SLRU is no longer needed." Which
>> also sets the headPage to invalid. We should only truncate up to the
>> head page.
> 
> Seems correct to me.  Or this would count as if the SLRU is not in
> use, but it's being used.

So, I've spent more time on that and applied the simplification today,
doing as you have suggested to use the head page rather than the tail
page when the tail XID is ahead of the head XID, but without disabling
the whole.  I've simplified a bit the code and the comments, though,
while on it (some renames and a slight refactoring of tailPage, for
example).
--
Michael


signature.asc
Description: PGP signature

Re: Add support for AT LOCAL

Noah Misch  writes:
> On Mon, Oct 16, 2023 at 01:54:23AM -0400, Tom Lane wrote:
>> Ugh.  So if the failure is robust enough to persist across
>> several major xlc versions, why couldn't Thomas reproduce it?

> Beats me.  hornet wipes its starting environment down to OBJECT_MODE=32_64
> PERL5LIB=/home/nm/sw/cpan64/lib/perl5 SPECIES=xlc64 PATH=/usr/bin, then
> applies all the environment settings seen in buildfarm logs.

I was able to reproduce the failure on cfarm111 after adopting
these settings from hornet's configuration:

export OBJECT_MODE=64
export CC='xlc_r -D_LARGE_FILES=1 '
export CFLAGS='-O2 -qmaxmem=33554432 -qsuppress=1500-010:1506-995 
-qsuppress=1506-010:1506-416:1506-450:1506-480:1506-481:1506-492:1506-944:1506-1264
 -qinfo=all:nocnd:noeff:noext:nogot:noini:noord:nopar:noppc:norea:nouni:nouse 
-qsuppress=1506-374:1506-419:1506-434:1506-438:1506-451:1506-452:1506-453:1506-495:1506-786'

and doing

./configure --enable-cassert --enable-debug --without-icu 
--prefix=/home/tgl/testversion

etc etc.

It is absolutely, gold-platedly, a compiler bug, because inserting
a debug printf into the loop

while (result->time >= USECS_PER_DAY)
result->time -= USECS_PER_DAY;

makes the failure go away.  Unfortunately, I've not yet found another
way to make it go away :-(.  My upthread idea of using a local variable
instead of result->time is no help, and some other random code
alterations didn't change the results either.

Not sure where we go from here.  While I don't have much hesitation
about blowing off xlc_r 12.1, it would be sad if their latest
toolchain doesn't work either.  (I didn't try permuting the code
while using the newer compiler, though.)

regards, tom lane

Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

2023-10-16 Thread Pavel Stehule

út 17. 10. 2023 v 3:30 odesílatel Quan Zongliang 
napsal:

>
>
> On 2023/10/16 20:05, Pavel Stehule wrote:
> >
> >
> > po 16. 10. 2023 v 13:56 odesílatel Daniel Gustafsson  > > napsal:
> >
> >  > On 16 Oct 2023, at 12:15, Quan Zongliang  > > wrote:
> >
> >  > Implement TODO item:
> >  > PL/pgSQL
> >  > Incomplete item Allow handling of %TYPE arrays, e.g.
> tab.col%TYPE[]
> >
> > Cool!  While I haven't looked at the patch yet, I've wanted this
> > myself many
> > times in the past when working with large plpgsql codebases.
> >
> >  > As a first step, deal only with [], such as
> >  > xxx.yyy%TYPE[]
> >  > xxx%TYPE[]
> >  >
> >  > It can be extended to support multi-dimensional and complex
> > syntax in the future.
> >
> > Was this omitted due to complexity of implementation or for some
> > other reason?
> >
> Because of complexity.
>
> >
> > There is no reason for describing enhancement. The size and dimensions
> > of postgresql arrays are dynamic, depends on the value, not on
> > declaration. Now, this information is ignored, and can be compatibility
> > break to check and enforce this info.
> >
> Yes. I don't think it's necessary.
> If anyone needs it, we can continue to enhance it in the future.
>

I don't think it is possible to do it.  But there is another missing
functionality, if I remember well. There is no possibility to declare
variables for elements of array.

I propose syntax xxx.yyy%ELEMENTTYPE and xxx%ELEMENTTYPE

What do you think about it?

Regards

Pavel


> >
> > --
> > Daniel Gustafsson
> >
> >
> >
>
>

Re: remaining sql/json patches

2023-10-16 Thread Amit Langote

On Mon, Oct 16, 2023 at 10:44 PM Anton A. Melnikov
 wrote:
> On 16.10.2023 15:49, jian he wrote:
> > add the following code after ExecEvalJsonExprCoercion if
> > (!InputFunctionCallSafe(...) works, but seems like a hack.
> >
> > if (!val_string)
> > {
> > *op->resnull = true;
> > *op->resvalue = (Datum) 0;
> > }
>
> It seems the constraint should work here:
>
> After
>
> CREATE TABLE test (
> js text,
> i int,
> x jsonb DEFAULT JSON_QUERY(jsonb '[1,2]', '$[*]' WITH WRAPPER)
> CONSTRAINT test_constraint
> CHECK (JSON_QUERY(js::jsonb, '$.a' RETURNING char(5) OMIT 
> QUOTES EMPTY ARRAY ON EMPTY) > 'a')
> );
>
> INSERT INTO test_jsonb_constraints VALUES ('[]');
>
> one expected to see an error like that:
>
> ERROR:  new row for relation "test" violates check constraint 
> "test_constraint"
> DETAIL:  Failing row contains ([], null, [1, 2]).
>
> not "INSERT 0 1"

Yes, the correct thing here is for the constraint to fail.

One thing jian he missed during the debugging is that
ExecEvalJsonExprCoersion() receives the EMPTY ARRAY value via
*op->resvalue/resnull, set by ExecEvalJsonExprBehavior(), because
that's the ON EMPTY behavior specified in the constraint.  The bug was
that the code in ExecEvalJsonExprCoercion() failed to set val_string
to that value ("[]") before passing to InputFunctionCallSafe(), so the
latter would assume the input is NULL.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: UniqueKey v2

2023-10-16 Thread zhihuifan1213


jian he  writes:

Hi jian,

> hi.
> After `git am`, I still cannot build.
>
> ../../Desktop/pg_sources/main/postgres/src/backend/optimizer/path/uniquekey.c:125:45:
> error: variable ‘var’ set but not used
> [-Werror=unused-but-set-variable]
>   125 | Var*var;
>   | ^~~

Thanks for this report, looks clang 11 can't capture this error.  I have
switched to clang 17 which would report this issue at the first place. 

>
> You also need to change src/backend/optimizer/path/meson.build.

Great thanks.

>
> git apply failed.
>
> git am warning:
> Applying: uniquekey on base relation and used it for mark-distinct-as-op.
> .git/rebase-apply/patch:876: new blank line at EOF.
> +
> warning: 1 line adds whitespace errors.
>
> I think you can use `git diff --check`
> (https://git-scm.com/docs/git-diff) to check for whitespace related
> errors.

thanks for the really good suggestion.  Here is the newer version:

>From c2c752a3f66353d7d391976b0152c353a239ce60 Mon Sep 17 00:00:00 2001
From: Andy Fan 
Date: Tue, 17 Oct 2023 11:06:53 +0800
Subject: [PATCH v2 1/1] uniquekey on base relation and used it for
 mark-distinct-as-noop.

---
 src/backend/nodes/list.c|  17 ++
 src/backend/optimizer/path/Makefile |   3 +-
 src/backend/optimizer/path/allpaths.c   |   2 +
 src/backend/optimizer/path/equivclass.c |  48 +++
 src/backend/optimizer/path/uniquekey.c  | 384 
 src/backend/optimizer/plan/initsplan.c  |   8 +
 src/backend/optimizer/plan/planner.c|  33 ++
 src/backend/optimizer/util/plancat.c|  10 +
 src/backend/optimizer/util/relnode.c|   2 +
 src/include/nodes/pathnodes.h   |  19 ++
 src/include/nodes/pg_list.h |   2 +
 src/include/optimizer/paths.h   |  10 +
 src/test/regress/expected/uniquekey.out |  69 +
 src/test/regress/parallel_schedule  |   2 +-
 src/test/regress/sql/uniquekey.sql  |  28 ++
 15 files changed, 635 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c
 create mode 100644 src/test/regress/expected/uniquekey.out
 create mode 100644 src/test/regress/sql/uniquekey.sql

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 750ee5a7e55..eaeaa19ad2d 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -694,6 +694,23 @@ list_member_ptr(const List *list, const void *datum)
 	return false;
 }
 
+/*
+ * Return index of the datum in list if found. otherwise return -1.
+ */
+int
+list_member_ptr_pos(const List *list, const void *datum)
+{
+	ListCell   *lc;
+
+	foreach(lc, list)
+	{
+		if (lfirst(lc) == datum)
+			return foreach_current_index(lc);
+	}
+
+	return -1;
+}
+
 /*
  * Return true iff the integer 'datum' is a member of the list.
  */
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f7..63cc1505d96 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3cda88e..815a74a9dfa 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -458,6 +458,8 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	populate_baserel_uniquekeys(root, rel);
+
 	/*
 	 * We insist that all non-dummy rels have a nonzero rowcount estimate.
 	 */
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 7fa502d6e25..29edc8d1b5a 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -744,6 +744,54 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 	return newec;
 }
 
+/*
+ * find_ec_position_matching_expr
+ *		Locate the position of EquivalenceClass whose members matching
+ *		the given expr, if any; return -1 if no match.
+ */
+int
+find_ec_position_matching_expr(PlannerInfo *root,
+			   Expr *expr,
+			   RelOptInfo *baserel)
+{
+	int			i = -1;
+
+	while ((i = bms_next_member(baserel->eclass_indexes, i)) >= 0)
+	{
+		EquivalenceClass *ec = list_nth(root->eq_classes, i);
+
+		if (find_ec_member_matching_expr(ec, expr, baserel->relids))
+			return i;
+	}
+	return -1;
+}
+
+/*
+ * build_ec_positions_for_exprs
+ *
+ *		Given a list of exprs, find the related EC positions for each of
+ *		them. if any exprs has no EC related, return NULL;
+ */
+Bitmapset *
+build_ec_positions_for_exprs(PlannerInfo *root, List *exprs, RelOptInfo *rel)
+{
+	ListCell   *lc;
+	Bitmapset  *res = NULL;
+
+	foreach(lc, exprs)
+	{
+		int			pos = find_ec_position_matching_expr(root, lfirst(lc), rel);
+
+		if (pos < 0)
+		{
+			bms_free(res);
+			return NULL;
+		}
+		res = bms_add_member(res, pos);
+	}
+	return res;
+}
+
 /*
  * f

Re: broken master regress tests

2023-10-16 Thread Jeff Davis

On Tue, 2023-10-10 at 17:54 -0700, Andres Freund wrote:
> Yea. I wonder if the better fix would have been to copy
> setenv("LC_MESSAGES", "C", 1);
> to the initdb template creation. That afaict also fixes the issue,
> with a
> smaller blast radius?

Sounds good to me. Is there anything else we should do to notice that
tests are unexpectedly skipped, or is this a rare enough problem to not
worry about other cases?

Regards,
Jeff Davis

Re: run pgindent on a regular basis / scripted manner

On Mon, Oct 16, 2023 at 08:45:00PM -0400, Tom Lane wrote:
> 2. We could raise awareness of this issue by adding indent verification
> to CI testing.  I hesitate to suggest that, though, for a couple of
> reasons:
>2a. It seems fairly expensive, though I might be misjudging.
>2b. It's often pretty handy to submit patches that aren't fully
>indent-clean; I have such a patch in flight right now at [1].
> 
> 2b could be ameliorated by making the indent check be a separate
> test process that doesn't obscure the results of other testing.

I see an extra reason with not doing that: this increases the
difficulty when it comes to send and maintain patches to the lists and 
newcomers would need to learn more tooling.  I don't think that we
should make that more complicated for code-formatting reasons.
--
Michael


signature.asc
Description: PGP signature

Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-16 Thread David E. Wheeler

On Oct 16, 2023, at 18:07, Erik Wienhold  wrote:

>> Okay, added, let’s just put all our cards on the table. :-)
> 
> I'll have a look but the attached v3 is not a patch but some applefile.

Weird, should be no different from previous attachments. I believe Apple Mail 
always uses application/octet-stream for attachments it doesn’t recognize, 
which includes .patch and .diff files, sadly.

> One of the added  is invalid by the looks of it.  Maybe 
> is missing because it says "got (para para )" at the end.

Oh, I thought it would report issues from the files they were found in. You’re 
right, I forgot a title. Fixed in v4.

David

v4-0001-Improve-boolean-predicate-JSON-Path-docs.patch
Description: Binary data

Re: CREATE DATABASE with filesystem cloning

2023-10-16 Thread Dan Langille

On Sat, Oct 7, 2023, at 1:51 AM, Thomas Munro wrote:

> I tested on bleeding edge FreeBSD/ZFS, where you need to set sysctl
> vfs.zfs.bclone_enabled=1 to enable the optimisation, as it's still a
> very new feature that is still being rolled out.  The system call
> succeeds either way, but that controls whether the new database
> initially shares blocks on disk, or get new copies.  I also tested on
> a Mac.  In both cases I could clone large databases in a fraction of a
> second.

WOOT! Looking forward to this. It would greatly improve times needed
when deploying test environments.

Thank you.

-- 
  Dan Langille
  d...@langille.org

Re: Add support for AT LOCAL

Michael Paquier  writes:
> On Mon, Oct 16, 2023 at 09:54:41AM -0400, Tom Lane wrote:
>> I'm mighty tempted though to (a) add coverage for timetz_izone
>> to HEAD, and (b) backpatch the new tests, sans the AT LOCAL case,
>> to the back branches (maybe not v11).

> I see that you've already done (a) with 2f04720307.  I'd be curious to
> see what happens for (b), as well, once (a) is processed on hornet
> once..

Sure enough, timetz_izone has exactly the same behavior [1].

I'd kind of decided that back-patching wouldn't be worth the trouble;
do you foresee that it'd teach us anything new?

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2023-10-17%2000%3A07%3A36

Re: Fix output of zero privileges in psql

On Mon, Oct 16, 2023 at 6:19 PM Laurenz Albe 
wrote:

> On Mon, 2023-10-16 at 23:51 +0200, Erik Wienhold wrote:
> > What's the process for the CommitFest now since we settled on your
> > patch?  This is my first time being involved in this, so still learning.
> > I'see that you've withdrawn your initial patch [1] but this thread is
> > still attached to my patch [2].  Should I edit my CF entry (or withdraw
> > it) and you reactivate yours?
>
> I don't think it makes sense to have two competing commitfest entries,
> and we should leave it on this thread.  If you are concerned about
> authorship, we could both be mentioned as co-authors, if the patch ever
> gets committed.
>
> I'd still like to wait for feedback from David before I change anything.
>
>
Reading both threads I'm not seeing any specific rejection of the solution
that we simply represent empty privileges as "(none)".

I see an apparent consensus that if we do continue to represent it as NULL
that the printout should respect \pset null.

Tom commented in favor of (none) on the other thread with some commentary
regarding how extremely rare it is; then turns around and is "fairly
concerned" about changing the current blank presentation of its value which
is going to happen for some people regardless of which approach is chosen.
(idk...maybe in favor of (none))

Peter's comment doesn't strike me as recognizing that (none) is even an
option on the table...(n/a)

Stuart, the original user complainant, seems to favor (none)

Erik seems to favors (none)

I favor (none)

It's unclear to me whether you Laurenz are against (none) or just trying to
go with the group consensus that doesn't actually seem to be against (none).

I'm in favor of iterating on v1 on this thread (haven't read it yet) and
presenting it as the proposed solution.  If that ends up getting shot down
we can revive v5 (still need to review as well).

We should probably post on that thread that this one exists and post a link
to it.

David J.

Re: run pgindent on a regular basis / scripted manner

2023-10-16 Thread Peter Geoghegan

On Mon, Oct 16, 2023 at 6:32 PM Tom Lane  wrote:
> But it's *not* a hard rule --- we explicitly rejected mechanisms
> that would make it so (such as a precommit hook).  I view "koel
> is unhappy" as something that you ought to clean up, but if you
> don't get to it for a day or three there's not much harm done.

It's hard to square that with what you said about needing greater peer
pressure on committers.

> Right now I think we just need to raise
> committers' awareness of this enough that they routinely run
> pgindent on the files they're touching.  In the problem cases
> so far, they very clearly didn't.  I don't see much point in
> worrying about second-order problems until that first-order
> problem is tamped down.

Realistically, if you're the committer that broke koel, you are at
least the subject of mild disapproval -- you have likely
inconvenienced others. I always try to avoid that -- it pretty much
rounds up to "hard rule" in my thinking. Babysitting koel really does
seem like it could cut into my dinner plans or what have you.

-- 
Peter Geoghegan

Re: Move bki file pre-processing from initdb to bootstrap

2023-10-16 Thread Krishnakumar R

> The version comparison has been moved from initdb to bootstrap. This
> created some compatibility problems with windows tests. For now I kept
> the version check to not have \n added, which worked fine and serves
> the purpose. However hoping to have something better in v3 in addition
> to addressing any other comments.

With help from Thomas, figured out that on windows fopen uses binary
mode in the backend which causes issues with EOL. Please find the
attached patch updated with a fix for this.

--
Thanks and Regards,
Krishnakumar (KK).
[Microsoft]


v3-0001-Remove-BKI-file-token-pre-processing-logic-from-i.patch
Description: Binary data

Re: run pgindent on a regular basis / scripted manner

Peter Geoghegan  writes:
> My main objection to the new policy is that it's not quite clear what
> process I should go through in order to be 100% confident that koel
> won't start whining (short of waiting around for koel to whine). I
> know how to run pgindent, of course, and haven't had any problems so
> far...but it still seems quite haphazard. If we're going to make this
> a hard rule, enforced on every commit, it should be dead easy to
> comply with the rule.

But it's *not* a hard rule --- we explicitly rejected mechanisms
that would make it so (such as a precommit hook).  I view "koel
is unhappy" as something that you ought to clean up, but if you
don't get to it for a day or three there's not much harm done.

In theory koel might complain even if you'd locally gotten
clean results from pgindent (as a consequence of skew in the
typedef lists being used, for example).  We've not seen cases
of that so far though.  Right now I think we just need to raise
committers' awareness of this enough that they routinely run
pgindent on the files they're touching.  In the problem cases
so far, they very clearly didn't.  I don't see much point in
worrying about second-order problems until that first-order
problem is tamped down.

regards, tom lane

Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

On 2023/10/16 20:05, Pavel Stehule wrote:

po 16. 10. 2023 v 13:56 odesílatel Daniel Gustafsson > napsal:

 > On 16 Oct 2023, at 12:15, Quan Zongliang mailto:quanzongli...@yeah.net>> wrote:

 > Implement TODO item:
 > PL/pgSQL
 > Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

Cool!  While I haven't looked at the patch yet, I've wanted this
myself many
times in the past when working with large plpgsql codebases.

 > As a first step, deal only with [], such as
 > xxx.yyy%TYPE[]
 > xxx%TYPE[]
 >
 > It can be extended to support multi-dimensional and complex
syntax in the future.

Was this omitted due to complexity of implementation or for some
other reason?

Because of complexity.

There is no reason for describing enhancement. The size and dimensions 
of postgresql arrays are dynamic, depends on the value, not on 
declaration. Now, this information is ignored, and can be compatibility 
break to check and enforce this info.

Yes. I don't think it's necessary.
If anyone needs it, we can continue to enhance it in the future.

--
Daniel Gustafsson

Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]




Error messages still seem ambiguous.
   do not support multi-dimensional arrays in PL/pgSQL

Isn't that better?
   do not support multi-dimensional %TYPE arrays in PL/pgSQL


On 2023/10/17 09:19, Quan Zongliang wrote:


Attached new patch
   More explicit error messages based on type.


On 2023/10/16 18:15, Quan Zongliang wrote:



Implement TODO item:
PL/pgSQL
Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

As a first step, deal only with [], such as
xxx.yyy%TYPE[]
xxx%TYPE[]

It can be extended to support multi-dimensional and complex syntax in 
the future.



--
Quan Zongliang

Re: run pgindent on a regular basis / scripted manner

2023-10-16 Thread Peter Geoghegan

On Mon, Oct 16, 2023 at 5:45 PM Tom Lane  wrote:
> Two thoughts about that:
>
> 1. We should not commit indent fixups on behalf of somebody else's
> misindented commit.  Peer pressure on committers who aren't being
> careful about this is the only way to improve matters; so complain
> to the person at fault until they fix it.

Thomas Munro's recent commit 01529c704008 was added to
.git-blame-ignore-revs by Michael Paquier, despite the fact that
Munro's commit technically isn't just a pure indentation fix (it also
fixed some typos). It's hard to judge Michael too harshly for this,
since in general it's harder to commit things when koel is already
complaining about existing misindetation -- I get why he'd prefer to
take care of that first.

> 2. We could raise awareness of this issue by adding indent verification
> to CI testing.  I hesitate to suggest that, though, for a couple of
> reasons:
>2a. It seems fairly expensive, though I might be misjudging.
>2b. It's often pretty handy to submit patches that aren't fully
>indent-clean; I have such a patch in flight right now at [1].

It's also often handy to make a minor change to a comment or something
at the last minute, without necessarily having the comment indented
perfectly.

> 2b could be ameliorated by making the indent check be a separate
> test process that doesn't obscure the results of other testing.

I was hoping that "go back to the old status quo" would also appear as
an option.

My main objection to the new policy is that it's not quite clear what
process I should go through in order to be 100% confident that koel
won't start whining (short of waiting around for koel to whine). I
know how to run pgindent, of course, and haven't had any problems so
far...but it still seems quite haphazard. If we're going to make this
a hard rule, enforced on every commit, it should be dead easy to
comply with the rule.

-- 
Peter Geoghegan

Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]



Attached new patch
  More explicit error messages based on type.


On 2023/10/16 18:15, Quan Zongliang wrote:



Implement TODO item:
PL/pgSQL
Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

As a first step, deal only with [], such as
xxx.yyy%TYPE[]
xxx%TYPE[]

It can be extended to support multi-dimensional and complex syntax in 
the future.



--
Quan Zongliangdiff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 6a09bfdd67..c6377dcf2c 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -23,6 +23,7 @@
 #include "parser/scanner.h"
 #include "parser/scansup.h"
 #include "utils/builtins.h"
+#include "utils/syscache.h"
 
 #include "plpgsql.h"
 
@@ -76,6 +77,7 @@ staticPLpgSQL_expr*read_sql_expression2(int 
until, int until2,

  int *endtoken);
 static PLpgSQL_expr*read_sql_stmt(void);
 static PLpgSQL_type*read_datatype(int tok);
+static PLpgSQL_type*read_datatype_array(PLpgSQL_type *elem_typ);
 static PLpgSQL_stmt*make_execsql_stmt(int firsttoken, int location);
 static PLpgSQL_stmt_fetch *read_fetch_direction(void);
 static void complete_direction(PLpgSQL_stmt_fetch *fetch,
@@ -2783,6 +2785,74 @@ read_sql_construct(int until,
return expr;
 }
 
+static PLpgSQL_type *
+read_datatype_array(PLpgSQL_type *elem_typ)
+{
+   int tok;
+   HeapTuple   type_tup = NULL;
+   Form_pg_typetype_frm;
+
+   Assert(elem_typ);
+   if (!OidIsValid(elem_typ->typoid))
+   return elem_typ;
+
+   tok = yylex();
+   /* Next token is not square bracket. */
+   if (tok != '[')
+   {
+   plpgsql_push_back_token(tok);
+
+   return elem_typ;
+   }
+
+   tok = yylex();
+   /* For now, deal only with []. */
+   if (tok != ']')
+   {
+   plpgsql_push_back_token(tok);
+   plpgsql_push_back_token('[');
+
+   return elem_typ;
+   }
+
+   type_tup = SearchSysCache1(TYPEOID,
+ 
ObjectIdGetDatum(elem_typ->typoid));
+
+   /* should not happen. */
+   if (!HeapTupleIsValid(type_tup))
+   elog(ERROR, "cache lookup failed for type %u", 
elem_typ->typoid);
+
+   type_frm = (Form_pg_type) GETSTRUCT(type_tup);
+
+   if (OidIsValid(type_frm->typarray))
+   {
+   Oid arrtyp_oid = type_frm->typarray;
+
+   ReleaseSysCache(type_tup);
+
+   return plpgsql_build_datatype(arrtyp_oid,
+   elem_typ->atttypmod,
+   elem_typ->collation,
+   NULL);
+   }
+
+   if (type_frm->typcategory != TYPCATEGORY_ARRAY)
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_OBJECT),
+   errmsg("type \"%s[]\" does not exist",
+   NameStr(type_frm->typname)),
+   parser_errposition(yylloc)));
+
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_OBJECT),
+   errmsg("do not support multi-dimensional arrays 
in PL/pgSQL"),
+   parser_errposition(yylloc)));
+
+   ReleaseSysCache(type_tup);
+
+   return elem_typ;
+}
+
 static PLpgSQL_type *
 read_datatype(int tok)
 {
@@ -2818,7 +2888,9 @@ read_datatype(int tok)
{
result = plpgsql_parse_wordtype(dtname);
if (result)
-   return result;
+   {
+   return read_datatype_array(result);
+   }
}
else if (tok_is_keyword(tok, &yylval,

K_ROWTYPE, "rowtype"))
@@ -2842,7 +2914,9 @@ read_datatype(int tok)
{
result = plpgsql_parse_wordtype(dtname);
if (result)
-   return result;
+   {
+   return read_datatype_array(result);
+   }
}
else if (tok_is_keyword(tok, &yylval,

K_ROWTYPE, "rowtype"))
@@ -2866,7 +2940,9 @@ read_datatype(int tok)
{
result = plpgsql_parse_cwordtype(dtnames);
if (result)
-

Re: Fix output of zero privileges in psql

On Mon, 2023-10-16 at 23:51 +0200, Erik Wienhold wrote:
> What's the process for the CommitFest now since we settled on your
> patch?  This is my first time being involved in this, so still learning.
> I'see that you've withdrawn your initial patch [1] but this thread is
> still attached to my patch [2].  Should I edit my CF entry (or withdraw
> it) and you reactivate yours?

I don't think it makes sense to have two competing commitfest entries,
and we should leave it on this thread.  If you are concerned about
authorship, we could both be mentioned as co-authors, if the patch ever
gets committed.

I'd still like to wait for feedback from David before I change anything.

Yours,
Laurenz Albe

Re: run pgindent on a regular basis / scripted manner

Peter Geoghegan  writes:
> There were two independent fixup commits to address complaints from
> koel just today (from two different committers). Plus there was a
> third issue (involving a third committer) this past Wednesday.

> This policy isn't working.

Two thoughts about that:

1. We should not commit indent fixups on behalf of somebody else's
misindented commit.  Peer pressure on committers who aren't being
careful about this is the only way to improve matters; so complain
to the person at fault until they fix it.

2. We could raise awareness of this issue by adding indent verification
to CI testing.  I hesitate to suggest that, though, for a couple of
reasons:
   2a. It seems fairly expensive, though I might be misjudging.
   2b. It's often pretty handy to submit patches that aren't fully
   indent-clean; I have such a patch in flight right now at [1].

2b could be ameliorated by making the indent check be a separate
test process that doesn't obscure the results of other testing.

regards, tom lane

[1] https://www.postgresql.org/message-id/2617358.1697501956%40sss.pgh.pa.us

Allow ALTER SYSTEM SET on unrecognized custom GUCs

Currently we have this odd behavior (for a superuser):

regression=# ALTER SYSTEM SET foo.bar TO 'baz';
ERROR:  unrecognized configuration parameter "foo.bar"
regression=# SET foo.bar TO 'baz';
SET
regression=# ALTER SYSTEM SET foo.bar TO 'baz';
ALTER SYSTEM

That is, you can't ALTER SYSTEM SET a random custom GUC unless there
is already a placeholder GUC for it, because the find_option call in
AlterSystemSetConfigFile fails.  This is surely pretty inconsistent.
Either the first ALTER SYSTEM SET ought to succeed, or the second one
ought to fail too, because we don't have any more knowledge about the
custom GUC than we did before.

In the original discussion about this [1], I initially leaned towards
"they should both fail", but I reconsidered: there doesn't seem to be
any harm in allowing ALTER SYSTEM SET to succeed for any custom GUC
name, as long as you're superuser.

Hence, attached is a patch for that.  Much of it is refactoring to
avoid duplicating the code that checks for a reserved GUC name, which
I think should still be done here --- otherwise, we're losing a lot of
the typo detection that that check was intended to provide.  (That is,
if you have loaded an extension that defines "foo" as a prefix, we
should honor the extension's opinion about whether "foo.bar" is
valid.)  I also fixed the code for GRANT ON PARAMETER so that it
follows the same rules and throws the same errors for invalid cases.

There's a chunk of AlterSystemSetConfigFile that now needs indenting
one more tab stop, but I didn't do that yet for ease of review.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/169746329791.169914.16613647309012285391%40wrigleys.postgresql.org

diff --git a/src/backend/catalog/pg_parameter_acl.c b/src/backend/catalog/pg_parameter_acl.c
index 073392e2c4..f4bc10bafe 100644
--- a/src/backend/catalog/pg_parameter_acl.c
+++ b/src/backend/catalog/pg_parameter_acl.c
@@ -82,11 +82,7 @@ ParameterAclCreate(const char *parameter)
 	 * To prevent cluttering pg_parameter_acl with useless entries, insist
 	 * that the name be valid.
 	 */
-	if (!check_GUC_name_for_parameter_acl(parameter))
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_NAME),
- errmsg("invalid parameter name \"%s\"",
-		parameter)));
+	check_GUC_name_for_parameter_acl(parameter);
 
 	/* Convert name to the form it should have in pg_parameter_acl. */
 	parname = convert_GUC_name_for_parameter_acl(parameter);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c25c697a06..e1ea5561d7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -250,6 +250,8 @@ static void write_auto_conf_file(int fd, const char *filename, ConfigVariable *h
 static void replace_auto_config_value(ConfigVariable **head_p, ConfigVariable **tail_p,
 	  const char *name, const char *value);
 static bool valid_custom_variable_name(const char *name);
+static bool assignable_custom_variable_name(const char *name, bool skip_errors,
+			int elevel);
 static void do_serialize(char **destptr, Size *maxbytes,
 		 const char *fmt,...) pg_attribute_printf(3, 4);
 static bool call_bool_check_hook(struct config_bool *conf, bool *newval,
@@ -1063,7 +1065,7 @@ add_guc_variable(struct config_generic *var, int elevel)
  *
  * It must be two or more identifiers separated by dots, where the rules
  * for what is an identifier agree with scan.l.  (If you change this rule,
- * adjust the errdetail in find_option().)
+ * adjust the errdetail in assignable_custom_variable_name().)
  */
 static bool
 valid_custom_variable_name(const char *name)
@@ -1098,6 +1100,71 @@ valid_custom_variable_name(const char *name)
 	return saw_sep;
 }
 
+/*
+ * Decide whether an unrecognized variable name is allowed to be SET.
+ *
+ * It must pass the syntactic rules of valid_custom_variable_name(),
+ * and it must not be in any namespace already reserved by an extension.
+ * (We make this separate from valid_custom_variable_name() because we don't
+ * apply the reserved-namespace test when reading configuration files.)
+ *
+ * If valid, return true.  Otherwise, return false if skip_errors is true,
+ * else throw a suitable error at the specified elevel (and return false
+ * if that's less than ERROR).
+ */
+static bool
+assignable_custom_variable_name(const char *name, bool skip_errors, int elevel)
+{
+	/* If there's no separator, it can't be a custom variable */
+	const char *sep = strchr(name, GUC_QUALIFIER_SEPARATOR);
+
+	if (sep != NULL)
+	{
+		size_t		classLen = sep - name;
+		ListCell   *lc;
+
+		/* The name must be syntactically acceptable ... */
+		if (!valid_custom_variable_name(name))
+		{
+			if (!skip_errors)
+ereport(elevel,
+		(errcode(ERRCODE_INVALID_NAME),
+		 errmsg("invalid configuration parameter name \"%s\"",
+name),
+		 errdetail("Custom parameter names must be two or more simple identifiers separated by dots.")));
+			return false;
+		}
+		/*

Re: The danger of deleting backup_label


On 10/16/23 15:06, Robert Haas wrote:

On Mon, Oct 16, 2023 at 1:00 PM David Steele  wrote:

After some agonizing (we hope) they decide to delete backup_label and,
wow, it just works! So now they merrily go on their way with a corrupted
cluster. They also remember for the next time that deleting backup_label
is definitely a good procedure.

The idea behind this patch is that deleting backup_label would produce a
hard error because pg_control would be missing as well (if the backup
software did its job). If both pg_control and backup_label are present
(but pg_control has not been loaded with the contents of backup_label,
i.e. it is the running copy from the backup cluster) we can also error.


I mean, I think we're just going in circles, here. I did and do
understand, but I didn't and don't agree. You're hypothesizing a user
who is willing to do ONE thing that they shouldn't do during backup
restoration (namely, remove backup_label) but who won't be willing to
do a SECOND thing that they shouldn't do during backup restoration
(namely, run pg_resetwal). 


In my experience the first case is much more likely than the second. 
Your experience may vary.


Anyway, I think they are pretty different. Deleting backup label appears 
to give a perfectly valid restore. Running pg_resetwal is more clearly 
(I think) the nuclear solution.



I understand that you feel differently, and that's fine, but I don't
think our disagreement here stems from me being confused. I just ...
don't agree.


Fair enough, we don't agree.

Regards,
-David

Re: The danger of deleting backup_label

On Mon, Oct 16, 2023 at 12:25:59PM -0400, Robert Haas wrote:
> On Mon, Oct 16, 2023 at 11:45 AM David Steele  wrote:
> > If you start from the last checkpoint (which is what will generally be
> > stored in pg_control) then the effect is pretty similar.
> 
> If the backup didn't span a checkpoint, then restoring from the one in
> pg_control actually works fine. Not that I'm encouraging that. But if
> you replay WAL from the control file, you at least get the last
> checkpoint's worth of WAL; if you use pg_resetwal, you get nothing.

There's no guarantee that the backend didn't spawn an extra checkpoint
while a base backup was taken, either, because we don't fail hard at
the end of the BASE_BACKUP code paths if the redo LSN has been updated
since the beginning of a BASE_BACKUP.  So that's really *never* do it
except if you like silent corruptions.

> I don't really want to get hung up on this though. My main point here
> is that I have trouble believing that an error after you've already
> screwed up your backup helps much. I think what we need is to make it
> less likely that you will screw up your backup in the first place.

Yeah..  Now what's the best user experience?  Is it better for a base
backup to fail and have a user retry?  Or is it better to have the
backend-side backup logic do what we think is safer?  The former
(likely with a REDO check or similar), will likely never work on large
instances, while users will likely always find ways to screw up base
backups taken by latter methods.  A third approach is to put more
careful checks at restore time, and the latter helps a lot here.
--
Michael


signature.asc
Description: PGP signature

Re: Is this a problem in GenericXLogFinish()?

2023-10-16 Thread Jeff Davis

On Wed, 2023-10-11 at 14:53 +0300, Heikki Linnakangas wrote:
> 
> The comment suggests that you don't need to hold an exclusive lock
> when 
> you call this, but there's an assertion that you do.

Reworded.

> Is it a new requirement that you must hold an exclusive lock on the 
> buffer when you call XLogRegisterBuffer()? I think that's reasonable.

Agreed.

> 
> I'd suggest a comment here to explain what's wrong if someone hits
> this 
> assertion. Something like "The buffer must be marked as dirty before 
> registering, unless REGBUF_CLEAN_OK is set to indicate that the
> buffer 
> is not being modified".

Done, with different wording.

> 
> > If we commit this, ideally someone would look into the places where
> > REGBUF_CLEAN_OK is used and make sure that it's not a bug, and
> > perhaps
> > refactor so that it would benefit from the Assert. But refactoring
> > those places is outside of the scope of this patch.

I don't think it makes sense to register a buffer that is perhaps clean
and perhaps not. After registering a buffer and writing WAL, you need
to PageSetLSN(), and that should only be done after MarkBufferDirty(),
right?

So if you need to condition PageSetLSN() on whether MarkBufferDirty()
has happened, it would be trivial to use the same condition for
XLogRegisterBuffer(). Therefore I changed the name back to
REGBUF_NO_CHANGE, as you suggested originally.

The hash indexing code knows it's not modifying the bucket buffer,
doesn't mark it dirty, and doesn't set the LSN. I presume that's safe.

However, there are quite a few places where
XLogRegisterBuffer()+WAL+PageSetLSN() all happen, but it's not
immediately obvious that all code paths to get there also
MarkBufferDirty(). For instanace:

   lazy_scan_heap() -- conditionally marks heapbuf as dirty
   visibilitymap_set() -- conditionally calls log_heap_visible
   log_heap_visible()   
   XLogRegisterBuffer(1, heap_buffer, flags)

if those conditions don't match up exactly, it seems we could get into
a situation where we update the LSN of a clean page, which seems bad.

There are a few other places in the hash code and spgist code where I
have similar concerns. Not many, though, I looked at all of the call
sites (at least briefly) and most of them are fine.

> About that: you added the flag to a couple of XLogRegisterBuffer()
> calls 
> in GIN, because they call MarkBufferDirty() only after 
> XLogRegisterBuffer(). Those would be easy to refactor so I'd suggest 
> doing that now.

Done.

> > It sounds like we have no intention to change the hash index code,
> > so
> > are we OK if this flag just lasts forever? Do you still think it
> > offers
> > a useful check?
> 
> Yeah, I think this is a useful assertion. It might catch some bugs in
> extensions too.

I was asking more about the REGBUF_NO_CHANGE flag. It feels very
specific to the hash indexes, and I'm not sure we want to encourage
extensions to use it.

Though maybe the locking protocol used by hash indexes is more
generally applicable, and other indexing strategies might want to use
it, too?

Another option might be to just change the hash indexing code to follow
the correct protocol, locking and calling MarkBufferDirty() in those 3
call sites. Marking the buffer dirty is easy, but making sure that it's
locked might require some refactoring. Robert, would following the
right protocol here affect performance?

Regards,
Jeff Davis

From f0c2d489dd8f2268803f4d2cc8642e29cb0d3576 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Thu, 28 Sep 2023 11:19:06 -0700
Subject: [PATCH v4] Assert that buffers are marked dirty before
 XLogRegisterBuffer().

Enforce the rule from transam/README in XLogRegisterBuffer(), and
update callers to follow the rule.

Hash indexes sometimes register clean pages as a part of the locking
protocol, so provide a REGBUF_NO_CHANGE flag to support that use.

Discussion: https://postgr.es/m/c84114f8-c7f1-5b57-f85a-3adc31e1a...@iki.fi
---
 src/backend/access/gin/ginbtree.c   | 14 +++---
 src/backend/access/gin/gindatapage.c|  6 +++
 src/backend/access/gin/ginentrypage.c   |  3 ++
 src/backend/access/gin/ginfast.c|  5 ++-
 src/backend/access/hash/hash.c  | 11 +++--
 src/backend/access/hash/hashovfl.c  | 21 ++---
 src/backend/access/transam/xloginsert.c | 16 ++-
 src/backend/storage/buffer/bufmgr.c | 59 +
 src/include/access/xloginsert.h |  1 +
 src/include/storage/bufmgr.h|  2 +
 10 files changed, 118 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 06e97a7fbb..fc694b40f1 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -387,24 +387,22 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 		START_CRIT_SECTION();

 		if (RelationNeedsWAL(btree->index) && !btree->isBuild)
-		{
 			XLogBeginInsert();
-			XLogRegisterBuffer(0, stack->buffer, REGBUF_STANDARD);
-			if (BufferIsVa

Re: Add support for AT LOCAL

On Mon, Oct 16, 2023 at 09:54:41AM -0400, Tom Lane wrote:
> I'm mighty tempted though to (a) add coverage for timetz_izone
> to HEAD, and (b) backpatch the new tests, sans the AT LOCAL case,
> to the back branches (maybe not v11).

I see that you've already done (a) with 2f04720307.  I'd be curious to
see what happens for (b), as well, once (a) is processed on hornet
once..
--
Michael


signature.asc
Description: PGP signature

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

On Mon, Oct 16, 2023 at 05:48:43PM +0200, Laurenz Albe wrote:
> I don't have strong feelings either way.  If you have backup_label
> but no signal file, starting PostgreSQL may succeed (if the WAL
> with the checkpoint happens to be in pg_wal) or it may fail with
> an error message.  There is no danger of causing damage unless you
> remove backup_label, right?

A bit more happens currently if you have a backup_label with no signal
files, unfortunately, because this causes some startup states to not
be initialized.  See around here:
https://www.postgresql.org/message-id/Y/Q/17rpys7yg...@paquier.xyz
https://www.postgresql.org/message-id/Y/v0c+3W89NBT/i...@paquier.xyz

> I cannot think of a use case where you use such a configuration on
> purpose, and the current error message is more crypric than a plain
> "you must have a signal file to start from a backup", so perhaps
> your patch is a good idea.

I hope so.
--
Michael

signature.asc
Description: PGP signature

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2023-10-16 Thread Thomas Munro

I pushed the retry-loop-in-frontend-executables patch and the
missing-locking-in-SQL-functions patch yesterday.  That leaves the
backup ones, which I've rebased and attached, no change.  It sounds
like we need some more healthy debate about that backup label idea
that would mean we don't need these (two birds with one stone), so
I'll just leave these here to keep the cfbot happy in the meantime.
From 3ab478ba3bb1712e6fc529f4bdaf48b58a6d72c5 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 26 Jul 2023 08:46:19 +1200
Subject: [PATCH v7 1/2] Update control file atomically during backups.

If there is a backup in progress, switch to a rename-based approach when
writing out the control file to avoid problems with torn reads that can
occur on some systems (at least ext4, ntfs).  We don't want to make
UpdateControlFile() slower in general, because it's called frequently
during recovery, so we only do it while runningBackups > 0.  In order to
activate the new behavior without races, we now also acquire
ControlFileLock when backups begin and end.

Back-patch to all supported releases.

Reviewed-by: Anton A. Melnikov 
Reviewed-by: David Steele 
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 src/backend/access/transam/xlog.c  | 33 +++--
 src/bin/pg_checksums/pg_checksums.c|  2 +-
 src/bin/pg_resetwal/pg_resetwal.c  |  2 +-
 src/bin/pg_rewind/pg_rewind.c  |  2 +-
 src/common/controldata_utils.c | 41 --
 src/include/common/controldata_utils.h |  4 ++-
 6 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c0e4ca5089..c82e5c6ad4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -442,7 +442,9 @@ typedef struct XLogCtlInsert
 	/*
 	 * runningBackups is a counter indicating the number of backups currently
 	 * in progress. lastBackupStart is the latest checkpoint redo location
-	 * used as a starting point for an online backup.
+	 * used as a starting point for an online backup.  runningBackups is
+	 * protected by both ControlFileLock and WAL insertion lock (in that
+	 * order), because it affects the behavior of UpdateControlFile().
 	 */
 	int			runningBackups;
 	XLogRecPtr	lastBackupStart;
@@ -4208,7 +4210,13 @@ ReadControlFile(void)
 static void
 UpdateControlFile(void)
 {
-	update_controlfile(DataDir, ControlFile, true);
+	XLogCtlInsert *Insert = &XLogCtl->Insert;
+	bool		atomic;
+
+	Assert(LWLockHeldByMeInMode(ControlFileLock, LW_EXCLUSIVE));
+	atomic = Insert->runningBackups > 0;
+
+	update_controlfile(DataDir, ControlFile, atomic, true);
 }
 
 /*
@@ -5344,9 +5352,12 @@ StartupXLOG(void)
 		 * pg_control with any minimum recovery stop point obtained from a
 		 * backup history file.
 		 *
-		 * No need to hold ControlFileLock yet, we aren't up far enough.
+		 * ControlFileLock not really needed yet, we aren't up far enough, but
+		 * makes assertions simpler.
 		 */
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
 
 		/*
 		 * If there was a backup label file, it's done its job and the info
@@ -8335,6 +8346,12 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 
 	memcpy(state->name, backupidstr, strlen(backupidstr));
 
+	/*
+	 * UpdateControlFile() behaves differently while backups are running, and
+	 * we need to avoid races when the backups start.
+	 */
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
 	/*
 	 * Mark backup active in shared memory.  We must do full-page WAL writes
 	 * during an on-line backup even if not doing so at other times, because
@@ -8360,6 +8377,8 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 	XLogCtl->Insert.runningBackups++;
 	WALInsertLockRelease();
 
+	LWLockRelease(ControlFileLock);
+
 	/*
 	 * Ensure we decrement runningBackups if we fail below. NB -- for this to
 	 * work correctly, it is critical that sessionBackupState is only updated
@@ -8650,6 +8669,12 @@ do_pg_backup_stop(BackupState *state, bool waitforarchive)
  errmsg("WAL level not sufficient for making an online backup"),
  errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));
 
+	/*
+	 * Interlocking with UpdateControlFile(), so that it can start using atomic
+	 * mode while backups are running.
+	 */
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
 	/*
 	 * OK to update backup counter and session-level lock.
 	 *
@@ -8679,6 +8704,8 @@ do_pg_backup_stop(BackupState *state, bool waitforarchive)
 
 	WALInsertLockRelease();
 
+	LWLockRelease(ControlFileLock);
+
 	/*
 	 * If we are taking an online backup from the standby, we confirm that the
 	 * standby has not been promoted during the backup.
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index e009ba5e0b..4dab08b4a0 100644
--- a/src/

Re: Postgres Architecture

Timothy Nelson  writes:
> Great!  I'm not surprised it's been around a long time -- I didn't think I
> could be the only one to think of it.
> Thanks for the heads-up on Postgres-XL -- I'd missed that one somehow.

FWIW, we also have some in-core history with passing plans around,
for parallel-query workers.  The things I'd take away from that
are:

1. It's expensive.  In the parallel-query case it's hard to tease apart
the cost of passing across a plan from the cost of starting a worker,
but it's certainly high.  You would need a way of only invoking this
mechanism for expensive-anyway queries, which puts a hole in the idea
you seemed to have of having a hard separation between parse/plan
processes and execute processes.

2. Constant-folding at plan time is another reason you can't have
a hard separation: the planner might run arbitrary user-defined
code.

3. Locking is a pain.  In the Postgres architecture, table locks
acquired during parse/plan have to be held through to execution,
or concurrent DDL might invalidate your plan out from under you.
We finesse that in the parallel-query case by expecting the leader
process to keep hold of all the needed locks, and then having some
kluges that allow child workers to acquire the same locks without
blocking.  (The workers perhaps don't really need those locks,
but acquiring them avoids the need to poke holes in various
you-must-have-a-lock-to-do-this sanity checks.)  I fear this area
might be a great deal harder if you're trying to pass plans from a
parse/plan process to an arms-length execute process.

4. Sharing execute workers between sessions (which I think was an
implicit part of your idea) is hard; hard enough that we haven't
even tried.  There's too much context-sensitive state in a backend
and too little way of isolating which things depend on the current
user, current database etc.  Probably this could be cleaned up
with enough work, but it'd not be a small project.

regards, tom lane

Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-16 Thread Erik Wienhold

On 2023-10-16 21:59 +0200, David E. Wheeler write:
> On Oct 15, 2023, at 23:03, Erik Wienhold  wrote:
> 
> > Your call but I'm not against including it in this patch because it
> > already touches the modes section.
> 
> Okay, added, let’s just put all our cards on the table. :-)

I'll have a look but the attached v3 is not a patch but some applefile.

> Thanks, got it down to one:
> 
> postgres.sgml:112: element sect4: validity error : Element sect4 content does 
> not follow the DTD, expecting (sect4info? , (title , subtitle? , 
> titleabbrev?) , (toc | lot | index | glossary | bibliography)* , 
> (((calloutlist | glosslist | bibliolist | itemizedlist | orderedlist | 
> segmentedlist | simplelist | variablelist | caution | important | note | tip 
> | warning | literallayout | programlisting | programlistingco | screen | 
> screenco | screenshot | synopsis | cmdsynopsis | funcsynopsis | classsynopsis 
> | fieldsynopsis | constructorsynopsis | destructorsynopsis | methodsynopsis | 
> formalpara | para | simpara | address | blockquote | graphic | graphicco | 
> mediaobject | mediaobjectco | informalequation | informalexample | 
> informalfigure | informaltable | equation | example | figure | table | msgset 
> | procedure | sidebar | qandaset | task | anchor | bridgehead | remark | 
> highlights | abstract | authorblurb | epigraph | indexterm | beginpage)+ , 
> (refentry* | sect5* | simplesect*)) | refentry+ | sect5+ | simplesect+) , 
> (toc | lot | index | glossary | bibliography)*), got (para para )
>   &func;

One of the added  is invalid by the looks of it.  Maybe 
is missing because it says "got (para para )" at the end.

-- 
Erik

Re: Fix output of zero privileges in psql

2023-10-16 Thread Erik Wienhold

On 2023-10-16 17:56 +0200, Laurenz Albe write:
> David, how do you feel about this?  I am wondering whether to set this patch
> "ready for committer" or not.
> 
> There is Tom wondering upthread whether changing psql's behavior like that
> is too much of a compatibility break or not, but I guess it is alright to
> leave that final verdict to a committer.

What's the process for the CommitFest now since we settled on your
patch?  This is my first time being involved in this, so still learning.
I'see that you've withdrawn your initial patch [1] but this thread is
still attached to my patch [2].  Should I edit my CF entry (or withdraw
it) and you reactivate yours?

[1] https://commitfest.postgresql.org/45/4603/
[2] https://commitfest.postgresql.org/45/4593/

-- 
Erik

Re: Postgres Architecture

2023-10-16 Thread Timothy Nelson

Great!  I'm not surprised it's been around a long time -- I didn't think I
could be the only one to think of it.

Thanks for the heads-up on Postgres-XL -- I'd missed that one somehow.

I'm going to include the words "architecture" and "replication" so that
people searching the archives in the future have more chance of finding
this conversation.

Thanks!

On Tue, 17 Oct 2023 at 02:07, Jonah H. Harris 
wrote:

> On Mon, Oct 16, 2023 at 6:42 AM Timothy Nelson 
> wrote:
>
>> I'm expecting that people will pick the idea apart, and wanted to know
>> what people think of it.
>>
>
> Thanks for the proposal. This is actually a model that's been around for a
> very long time. And, in fact, variations of it (e.g. parsing done in one
> place and generated plan fragments shipped to remote execution nodes where
> the data resides) are already used by things like Postgres-XL. There have
> also been a number of academic implementations where parsing is done
> locally and raw parse trees are sent to the server as well. While these
> things do reduce CPU, there are a number of negative aspects to deal with
> that make such an architecture more difficult to manage.
>
> --
> Jonah H. Harris
>
>

Re: Improving Physical Backup/Restore within the Low Level API

On Mon, Oct 16, 2023 at 12:36 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Mon, Oct 16, 2023 at 12:09 PM Laurenz Albe 
> wrote:
>
>> I think it won't meet with favor if there are cases that require manual
>> intervention
>> for starting the server.  That was the main argument for getting rid of
>> the exclusive
>> backup API, which had a similar problem.
>>
>
> In the rare case of a crash of the source database while one or more
> databases are in progress.
>

Or even more simply, just document that should the process executing
pg_backup_start, and eventually pg_backup_end, that noticed its session die
out from under it, should just add crash.signal to the data directory
(there probably can be a bit more intelligence involved in case the session
crash was isolated).  A normal server shutdown should remove any
crash.signal files it sees (and ensure in_backup="false"...).  A non-normal
shutdown is going to end up in crash recovery anyway so having the signal
file there won't harm anything even if pg_control is showing
"in_backup=false".

In short, I probably don't know the details well enough to code the
solution but this seems solvable for those users that need automatic reboot
and crash recovery during an incomplete backup.  But no, by default, and
probably so far as pg_basebackup is concerned, a server crash during backup
results in requiring outside intervention in order to get the server to
restart.  It specifically requires creation of crash.signal, the specific
method being unimportant and its contents being fixed - whether empty or
otherwise.

David J.

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

On Mon, Oct 16, 2023 at 3:46 PM Peter Geoghegan  wrote:
> On Mon, Oct 16, 2023 at 11:06 AM Robert Haas  wrote:
> > I propose to commit these changes only to master. I have included a
> > fairly long paragraph about that in the commit message for 0002.
>
> LGTM, except for one small detail: I noticed that you misspelled
> "translations" in the commit message.

Oops. Fixed locally.

> Thanks for getting this over the line

Sure thing. I'm glad we're finally doing something about it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-16 Thread David E. Wheeler

On Oct 15, 2023, at 23:03, Erik Wienhold  wrote:

> Your call but I'm not against including it in this patch because it
> already touches the modes section.

Okay, added, let’s just put all our cards on the table. :-)

>> Agreed. Would be good if we could teach these functions and operators
>> to reject path expressions they don’t support.
> 
> Right, you mentioned that idea in [1] (separate types).  Not sure what
> the best strategy here is but it's likely to break existing queries.
> Maybe deprecating unsupported path expressions in the next major release
> and changing that to an error in the major release after that.

Well if the functions have a JsonPathItem struct, they can check its type 
attribute and reject those with a root type that’s a predicate in @? and reject 
it if it’s not a predicate in @@. Example of checking type here:

https://github.com/postgres/postgres/blob/54b208f90963cb8b48b9794a5392b2fae4b40a98/src/backend/utils/adt/jsonpath_exec.c#L622

>>> This can be checked with `make -C doc/src/sgml check`.
>> 
>> Thanks. That produces a bunch of warnings for postgres.sgml and
>> legal.sgml (and a failure to load the docbook DTD), but func.sgml is
>> clean now.
> 
> Hmm... I get no warnings on 1f89b73c4e.  Did you install all tools as
> described in [2]?  The DTD needs to be installed as well.

Thanks, got it down to one:

postgres.sgml:112: element sect4: validity error : Element sect4 content does 
not follow the DTD, expecting (sect4info? , (title , subtitle? , titleabbrev?) 
, (toc | lot | index | glossary | bibliography)* , (((calloutlist | glosslist | 
bibliolist | itemizedlist | orderedlist | segmentedlist | simplelist | 
variablelist | caution | important | note | tip | warning | literallayout | 
programlisting | programlistingco | screen | screenco | screenshot | synopsis | 
cmdsynopsis | funcsynopsis | classsynopsis | fieldsynopsis | 
constructorsynopsis | destructorsynopsis | methodsynopsis | formalpara | para | 
simpara | address | blockquote | graphic | graphicco | mediaobject | 
mediaobjectco | informalequation | informalexample | informalfigure | 
informaltable | equation | example | figure | table | msgset | procedure | 
sidebar | qandaset | task | anchor | bridgehead | remark | highlights | 
abstract | authorblurb | epigraph | indexterm | beginpage)+ , (refentry* | 
sect5* | simplesect*)) | refentry+ | sect5+ | simplesect+) , (toc | lot | index 
| glossary | bibliography)*), got (para para )
  &func;

David



v3-0001-Improve-boolean-predicate-JSON-Path-docs.patch
Description: application/applefile

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-16 Thread Peter Geoghegan

On Mon, Oct 16, 2023 at 11:06 AM Robert Haas  wrote:
> I propose to commit these changes only to master. I have included a
> fairly long paragraph about that in the commit message for 0002.

LGTM, except for one small detail: I noticed that you misspelled
"translations" in the commit message.

Thanks for getting this over the line
-- 
Peter Geoghegan

Re: Improving Physical Backup/Restore within the Low Level API

On Mon, Oct 16, 2023 at 12:09 PM Laurenz Albe 
wrote:

> I think it won't meet with favor if there are cases that require manual
> intervention
> for starting the server.  That was the main argument for getting rid of
> the exclusive
> backup API, which had a similar problem.
>

In the rare case of a crash of the source database while one or more
databases are in progress.  Restoring the backup requires manual
intervention with signal files today.

I get a desire for the live production server to not need intervention to
recover from a crash but I can't help but feel that this requirement plus
the goal of making this a non-interventionist as possible during recovery
are incompatible.  But I haven't given it a great amount of thought as I
felt the limited scope and situation were an acceptable cost for keeping
the process straight-forward (i.e., starting up a backup mode instance
requires a signal file that dictates the kind of recovery to perform).  We
can either make the live backup contents invalid until something happens
after pg_backup_stop ends that makes it valid or we have to make the
current system being backed up invalid so long as it's in backup mode.  The
later seemed easier and doesn't require actions outside of our control.

> Also, how do you envision two concurrent backups with your setup?
>

I don't know if I understand the question - if ensuring that "in backup" is
turned on when the first backup starts and is turned off when the last
backup ends isn't sufficient for concurrent usage I don't know what else I
need to deal with.  Apparently concurrent backups already work today and
I'm not seeing how, aside from the process ids for the metadata directories
(i.e., the user needs to remove all but their own process subdirectory from
pg_backup_metadata) and state flag they wouldn't continue to work as-is.

David J.

Re: Improving Physical Backup/Restore within the Low Level API

On Mon, 2023-10-16 at 11:18 -0700, David G. Johnston wrote:
> > I see a couple of problems and/or things that need clarification with that 
> > idea:
> > 
> > - Two backups can run concurrently.  How do you reconcile that with the "in 
> > backup"
> >   flag and crash.signal?
> > - I guess crash.signal is created during pg_start_backup().  So that file 
> > will be
> >   included in the backup.  How do you handle that during recovery?  Ignore 
> > it if
> >   another signal file is present?  And if the user forgets to create a 
> > signal file
> >   for recovery, how do you prevent PostgreSQL from performing crash 
> > recovery?
> > 
> 
> crash.signal is created in the pg_backup_metadata directory, not the root 
> directory.
> Should the server crash while any backup is in progress pg_control would be 
> aware
> of that fact (in_backup=true would still be there, instead of in_backup=false 
> which
> only comes back after all backups have completed) and the server will not 
> restart
> without user intervention - specifically, moving the crash.signal file from 
> (one of)
> the pg_backup_metadata subdirectories to the root directory.  As there is 
> nothing
> special about the crash.signal files in the pg_backup_metadata subdirectories
> "touch crash.signal" could be used.

I see - I missed the part with the pg_backup_metadata directory.

I think it won't meet with favor if there are cases that require manual 
intervention
for starting the server.  That was the main argument for getting rid of the 
exclusive
backup API, which had a similar problem.


Also, how do you envision two concurrent backups with your setup?

Yours,
Laurenz Albe

Re: The danger of deleting backup_label

On Mon, Oct 16, 2023 at 1:00 PM David Steele  wrote:
> After some agonizing (we hope) they decide to delete backup_label and,
> wow, it just works! So now they merrily go on their way with a corrupted
> cluster. They also remember for the next time that deleting backup_label
> is definitely a good procedure.
>
> The idea behind this patch is that deleting backup_label would produce a
> hard error because pg_control would be missing as well (if the backup
> software did its job). If both pg_control and backup_label are present
> (but pg_control has not been loaded with the contents of backup_label,
> i.e. it is the running copy from the backup cluster) we can also error.

I mean, I think we're just going in circles, here. I did and do
understand, but I didn't and don't agree. You're hypothesizing a user
who is willing to do ONE thing that they shouldn't do during backup
restoration (namely, remove backup_label) but who won't be willing to
do a SECOND thing that they shouldn't do during backup restoration
(namely, run pg_resetwal). In my experience, users who are willing to
corrupt their database don't typically limit themselves to one bad
decision, and therefore I doubt that this proposal delivers enough
value to justify the complexity.

I understand that you feel differently, and that's fine, but I don't
think our disagreement here stems from me being confused. I just ...
don't agree.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

On Fri, Oct 6, 2023 at 9:14 AM Peter Eisentraut  wrote:
> > Should we treat it the same fashion as ALTER COLUMN ... TYPE which
> > rewrites the column values? Of course that rewrites the whole table,
> > but logically they are comparable.
>
> I don't know.  What are the semantics of that command with respect to
> triggers and logical decoding?

ALTER COLUMN ... TYPE doesn't fire triggers, and I don't think logical
decoding will do anything with it, either. As Amul also suggested, I
tend to think that this command should behave like that command
instead of inventing some new behavior. Sure, this is kind of like an
UPDATE, but it's also not actually an UPDATE: it's DDL. Consider this
example:

rhaas=# create table foo (a int, b text);
CREATE TABLE
rhaas=# create function nozero () returns trigger as $$begin if (new.b
= '0') then raise 'zero is bad'; end if; return new; end$$ language
plpgsql;
CREATE FUNCTION
rhaas=# create trigger fnz before insert or update or delete on foo
for each row execute function nozero();
CREATE TRIGGER
rhaas=# insert into foo values (1, '0');
ERROR:  zero is bad
CONTEXT:  PL/pgSQL function nozero() line 1 at RAISE
rhaas=# insert into foo values (1, '00');
INSERT 0 1
rhaas=# alter table foo alter column b set data type integer using b::integer;
ALTER TABLE
rhaas=# select * from foo;
 a | b
---+---
 1 | 0
(1 row)

rhaas=# insert into foo values (2, '0');
ERROR:  type of parameter 14 (integer) does not match that when
preparing the plan (text)
CONTEXT:  PL/pgSQL function nozero() line 1 at IF
rhaas=# \c
You are now connected to database "rhaas" as user "rhaas".
rhaas=# insert into foo values (2, '0');
ERROR:  zero is bad
CONTEXT:  PL/pgSQL function nozero() line 1 at RAISE
rhaas=# insert into foo values (2, '00');
ERROR:  zero is bad
CONTEXT:  PL/pgSQL function nozero() line 1 at RAISE

The trigger here is supposed to prevent me from inserting 0 into
column b, but I've ended up with one anyway, because when the column
was of type text, I could insert 00, and when I changed the column to
type integer, the value got smashed down to just 0, and the trigger
wasn't fired to prevent that. You could certainly argue with that
behavior, but I think it's pretty reasonable, and it seems like if
this command behaved that way too, that would also be pretty
reasonable.  In fact, I'm inclined to think it would be preferable,
both for consistency and because it would be less work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Improving Physical Backup/Restore within the Low Level API

On Mon, Oct 16, 2023 at 10:26 AM Laurenz Albe 
wrote:

> On Mon, 2023-10-16 at 09:26 -0700, David G. Johnston wrote:
> > This email is a first pass at a user-visible design for how our backup
> and restore
> > process, as enabled by the Low Level API, can be modified to make it
> more mistake-proof.
> > In short, it requires pg_start_backup to further expand upon what it
> means for the
> > system to be in the midst of a backup, pg_stop_backup to reverse those
> things,
> > and modifying the startup process to deal with the server having crashed
> while the
> > system is in that backup state.  Notes at the end extend the design to
> handle concurrent backups.
> >
> > The core functional changes are:
> > 1) pg_backup_start modifies a newly added "in backup" state flag in
> pg_control to on.
> > 2) pg_backup_stop modifies that flag back to off.
> > 3) postmaster will refuse to start if that flag is on, unless one of:
> >   a) crash.signal exists in the data directory
> >   b) recovery.signal exists in the data directory
> >   c) standby.signal exists in the data directory
> > 4) Signal file processing causes the in-backup flag in pg_control to be
> set to off
> >
> > The newly added crash.signal file is required to handle the case where
> the server
> > crashes after pg_backup_start and before pg_backup_stop.  It initiates a
> crash recovery
> > of the instance just as is done today but with the added change of
> flipping the flag
> > to off when recovery is complete just before going live.
>
> I see a couple of problems and/or things that need clarification with that
> idea:
>
> - Two backups can run concurrently.  How do you reconcile that with the
> "in backup"
>   flag and crash.signal?
> - I guess crash.signal is created during pg_start_backup().  So that file
> will be
>   included in the backup.  How do you handle that during recovery?  Ignore
> it if
>   another signal file is present?  And if the user forgets to create a
> signal file
>   for recovery, how do you prevent PostgreSQL from performing crash
> recovery?
>
>
crash.signal is created in the pg_backup_metadata directory, not the root
directory.  Should the server crash while any backup is in progress
pg_control would be aware of that fact (in_backup=true would still be
there, instead of in_backup=false which only comes back after all backups
have completed) and the server will not restart without user intervention -
specifically, moving the crash.signal file from (one of) the
pg_backup_metadata subdirectories to the root directory.  As there is
nothing special about the crash.signal files in the pg_backup_metadata
subdirectories "touch crash.signal" could be used.

The backed up pg_control file will have in_backup=true (I haven't pondered
the torn reads dynamic of this - I'm supposing that placing a copy of
pg_control into the pg_backup_metadata directory might be part of solving
that problem).

David J.

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

On Fri, Oct 13, 2023 at 5:03 AM Aleksander Alekseev
 wrote:
> > Thanks for working on this. I will be relieved once this is finally
> > taken care of.
>
> +1, and many thanks for your attention to the patchset and all the details!

Cool. I committed that and back-patched to v14, and here's the rest.
0001 makes the terminology change that I proposed earlier, and then
0002 is the remainder of what was in the previous patch set that
wasn't covered by what I committed already, with a few adjustments.

In particular, I preferred to stick with "avoid" rather than changing
to "prevent," and I thought it was clearer to refer to "failures"
plural rather than "failure" collective. These are arguable decisions,
though.

I propose to commit these changes only to master. I have included a
fairly long paragraph about that in the commit message for 0002.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v11-0002-Reword-messages-about-impending-M-XID-exhaustion.patch
Description: Binary data

v11-0001-Talk-about-assigning-rather-than-generating-new-.patch
Description: Binary data

Re: Rename backup_label to recovery_control

On Mon, 2023-10-16 at 12:12 -0400, Robert Haas wrote:
> On Mon, Oct 16, 2023 at 12:06 PM Michael Banck  wrote:
> > Not sure what to do about this, but as people/companies start moving to
> > 15, I am afraid we will get people complaining about this. I think
> > having exclusive mode still be the default for pg_start_backup() (albeit
> > deprecated) in one release and then dropping it in the next was too
> > fast.
> 
> I completely agree, and I said so at the time, but got shouted down. I
> think the argument that exclusive backups were breaking anything at
> all was very weak. Nobody was being forced to use them, and they broke
> nothing for people who didn't.

+1

Yours,
Laurenz Albe

Re: Improving Physical Backup/Restore within the Low Level API

On Mon, 2023-10-16 at 09:26 -0700, David G. Johnston wrote:
> This email is a first pass at a user-visible design for how our backup and 
> restore
> process, as enabled by the Low Level API, can be modified to make it more 
> mistake-proof.
> In short, it requires pg_start_backup to further expand upon what it means 
> for the
> system to be in the midst of a backup, pg_stop_backup to reverse those things,
> and modifying the startup process to deal with the server having crashed 
> while the
> system is in that backup state.  Notes at the end extend the design to handle 
> concurrent backups.
> 
> The core functional changes are:
> 1) pg_backup_start modifies a newly added "in backup" state flag in 
> pg_control to on.
> 2) pg_backup_stop modifies that flag back to off.
> 3) postmaster will refuse to start if that flag is on, unless one of:
>   a) crash.signal exists in the data directory
>   b) recovery.signal exists in the data directory
>   c) standby.signal exists in the data directory
> 4) Signal file processing causes the in-backup flag in pg_control to be set 
> to off
> 
> The newly added crash.signal file is required to handle the case where the 
> server
> crashes after pg_backup_start and before pg_backup_stop.  It initiates a 
> crash recovery
> of the instance just as is done today but with the added change of flipping 
> the flag
> to off when recovery is complete just before going live.

I see a couple of problems and/or things that need clarification with that idea:

- Two backups can run concurrently.  How do you reconcile that with the "in 
backup"
  flag and crash.signal?
- I guess crash.signal is created during pg_start_backup().  So that file will 
be
  included in the backup.  How do you handle that during recovery?  Ignore it if
  another signal file is present?  And if the user forgets to create a signal 
file
  for recovery, how do you prevent PostgreSQL from performing crash recovery?

Yours,
Laurenz Albe

Re: Rename backup_label to recovery_control


On 10/16/23 12:12, Robert Haas wrote:

On Mon, Oct 16, 2023 at 12:06 PM Michael Banck  wrote:

Not sure what to do about this, but as people/companies start moving to
15, I am afraid we will get people complaining about this. I think
having exclusive mode still be the default for pg_start_backup() (albeit
deprecated) in one release and then dropping it in the next was too
fast.


I completely agree, and I said so at the time, but got shouted down. I
think the argument that exclusive backups were breaking anything at
all was very weak. Nobody was being forced to use them, and they broke
nothing for people who didn't.


My argument then (and now) is that exclusive backup prevented us from 
making material improvements in backup and recovery. It was complicated, 
duplicative (in code and docs), and entirely untested.


So you are correct that it was only dangerous to the people who were 
using it (even if they did not know they were), but it was also a 
barrier to progress.


Regards,
-David

Re: Rename backup_label to recovery_control





On 10/16/23 12:06, Michael Banck wrote:

On Mon, Oct 16, 2023 at 11:15:53AM -0400, David Steele wrote:

On 10/16/23 10:19, Robert Haas wrote:

We got rid of exclusive backup mode. We replaced pg_start_backup
with pg_backup_start.


I do think this was an improvement. For example it allows us to do
[1], which I believe is a better overall solution to the problem of
torn reads of pg_control. With exclusive backup we would not have this
option.


Well maybe, but it also seems to mean that any other 3rd party (i.e. not
Postgres-specific) backup tool seems to only support Postgres up till
version 14, as they cannot deal with non-exclusive mode - they are used
to a simple pre/post-script approach.


I'd be curious to know what enterprise solutions currently depend on 
this method. At the very least they'd need to manage a WAL archive since 
copying pg_wal is not a safe thing to do (without a snapshot), so it's 
not just a matter of using start/stop scripts. And you'd probably want 
PITR, etc.



Not sure what to do about this, but as people/companies start moving to
15, I am afraid we will get people complaining about this. I think
having exclusive mode still be the default for pg_start_backup() (albeit
deprecated) in one release and then dropping it in the next was too
fast.


But lots of companies are on PG15 and lots of hosting providers support 
it, apparently with no issues. Perhaps the companies you are referring 
to are lagging in adoption (a pretty common scenario) but I still see no 
evidence that there is a big problem looming.


Exclusive backup was deprecated for six releases, which should have been 
ample time to switch over. All the backup solutions I am familiar with 
have supported non-exclusive backup for years.



Or is somebody helping those "enterprise" backup solutions along in
implementing non-exclusive Postgres backups?


I couldn't say, but there are many examples in open source projects of 
how to do this. Somebody (Laurenz, I believe) also wrote a shell script 
to simulate exclusive backup behavior for those that want to continue 
using it. Not what I would recommend, but he showed that it was possible.


Regards,
-David

Re: The danger of deleting backup_label


On 10/16/23 12:25, Robert Haas wrote:

On Mon, Oct 16, 2023 at 11:45 AM David Steele  wrote:

Hmmm, the reason to back patch this is that it would fix [1], which sure
looks like a problem to me even if it is not a "bug". We can certainly
require backup software to retry pg_control until the checksum is valid
but that seems like a pretty big ask, even considering how complicated
backup is.


That seems like a problem with pg_control not being written atomically
when the standby server is updating it during recovery, rather than a
problem with backup_label not being used at the start of recovery.
Unless I'm confused.


You are not confused. But the fact that it practically can be read as 
torn at all on the standby is a function of how rapidly it is being 
written to update min recovery point. Writing it atomically via a temp 
file may affect performance in this area, but only during the backup, 
which may cause recovery to run more slowly during a backup.


I don't have proof of this because I don't have any hosts large enough 
to test the theory.



Right now the user can remove backup_label and get a "successful"
restore and not realize that they have just corrupted their cluster.
This is independent of the backup/restore tool doing all the right things.


I don't think it's independent of that at all.


I think it is. Imagine the user does backup/recovery using their 
favorite tool and everything is fine. But due to some misconfiguration 
or a problem in the WAL archive, they get this error:


FATAL:  could not locate required checkpoint record
2023-10-16 16:42:50.132 UTC HINT:  If you are restoring from a backup, 
touch "/home/dev/test/data/recovery.signal" or 
"/home/dev/test/data/standby.signal" and add required recovery options.
If you are not restoring from a backup, try removing the file 
"/home/dev/test/data/backup_label".
Be careful: removing "/home/dev/test/data/backup_label" will 
result in a corrupt cluster if restoring from a backup.


I did this by setting restore_command=false, but it could just as easily 
be the actual restore command that returns false due to a variety of 
reasons. The user has no idea what "could not locate required checkpoint 
record" means and if there is enough automation they may not even 
realize they just restored from a backup.


After some agonizing (we hope) they decide to delete backup_label and, 
wow, it just works! So now they merrily go on their way with a corrupted 
cluster. They also remember for the next time that deleting backup_label 
is definitely a good procedure.


The idea behind this patch is that deleting backup_label would produce a 
hard error because pg_control would be missing as well (if the backup 
software did its job). If both pg_control and backup_label are present 
(but pg_control has not been loaded with the contents of backup_label, 
i.e. it is the running copy from the backup cluster) we can also error.


It's not perfect, because they could backup (or restore) pg_control but 
not backup_label, but we are narrowing the cases where something can go 
wrong and they have silent corruption, especially if their 
backup/restore software follows the directions.


Regards,
-David

Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE

2023-10-16 Thread Michael Christofides

> EXPLAIN ANALYZE for parallel Bitmap Heap Scans currently only reports
> the number of heap blocks processed by the leader. It's missing the
> per-worker stats.


Hi David,

According to the docs[1]: "In a parallel bitmap heap scan, one process is
chosen as the leader. That process performs a scan of one or more indexes
and builds a bitmap indicating which table blocks need to be visited. These
blocks are then divided among the cooperating processes as in a parallel
sequential scan."

My understanding is that the "Heap Blocks" statistic is only reporting
blocks for the bitmap (i.e. not the subsequent scan). As such, I think it
is correct that the workers do not report additional exact heap blocks.


> explain (analyze, costs off, timing off) select * from foo where col0 >
> 900 or col1 = 1;
>

In your example, if you add the buffers and verbose parameters, do the
worker reported buffers numbers report what you are looking for?

i.e. explain (analyze, buffers, verbose, costs off, timing off) select *
from foo where col0 > 900 or col1 = 1;

—
Michael Christofides
Founder, pgMustard 

[1]:
https://www.postgresql.org/docs/current/parallel-plans.html#PARALLEL-SCANS

Improving Physical Backup/Restore within the Low Level API

Hi!

This email is a first pass at a user-visible design for how our backup and
restore process, as enabled by the Low Level API, can be modified to make
it more mistake-proof.  In short, it requires pg_start_backup to further
expand upon what it means for the system to be in the midst of a backup,
pg_stop_backup to reverse those things, and modifying the startup process
to deal with the server having crashed while the system is in that backup
state.  Notes at the end extend the design to handle concurrent backups.

The core functional changes are:
1) pg_backup_start modifies a newly added "in backup" state flag in
pg_control to on.
2) pg_backup_stop modifies that flag back to off.
3) postmaster will refuse to start if that flag is on, unless one of:
  a) crash.signal exists in the data directory
  b) recovery.signal exists in the data directory
  c) standby.signal exists in the data directory
4) Signal file processing causes the in-backup flag in pg_control to be set
to off

The newly added crash.signal file is required to handle the case where the
server crashes after pg_backup_start and before pg_backup_stop.  It
initiates a crash recovery of the instance just as is done today but with
the added change of flipping the flag to off when recovery is complete just
before going live.

The error message for the failed startup while in backup will tell the dba
that one of the three signal files must exist.
When processing recovery.signal or standby.signal the presence of the
backup_label and tablespace_map files are mandatory and the system will
also fail to start should they be missing.

For non-functional changes I would also suggest doing the following:
pg_backup_start will create a "pg_backup_metadata" directory if there is
not already one, or will empty it if there is.
pg_backup_start will create a crash.signal file in that directory
pg_backup_stop  will create files within pg_backup_metadata upon its
completion:
backup_label
tablespace_map
recovery.signal
standby.signal

All of the instructions regarding what to place in those files should be
removed and instead the system should write them - no copy-paste.

The instructions modified to say "copy the backup_label and tablespace_map
files to the root of the backup directory and the recovery and standby
signal files to the pg_backup_metadata directory in the backup.
Additionally, we document crash recovery by saying "move crash.signal from
pg_backup_metadata to the root of the data directory". We should explicitly
advise excluding or removing pg_backup_metadata/crash.signal from the
backup as well.

Extending the above to handle concurrent backup, for pg_control we'd sill
use the on/off flag but we have to have a shared in-memory session lock on
something so that only the last surviving process actually changes it to
off while also dealing with sessions that terminate without issuing
pg_backup_stop and without the server itself crashing. (I'm unfamiliar with
how this is handled today but I presume a mechanism exists already that
just needs to be extended).

For the non-functional stuff, pg_backup_start returns a process id, and
subdirectories under pg_backup_metadata are created named with such.  Add a
pg_backup_cleanup() function that executes while not in backup mode to
clean up those subdirectories.  Any subdirectory in the backup that isn't
the specified process id from pg_start_backup should be excluded/removed.

David J.

Re: The danger of deleting backup_label

On Mon, Oct 16, 2023 at 11:45 AM David Steele  wrote:
> Hmmm, the reason to back patch this is that it would fix [1], which sure
> looks like a problem to me even if it is not a "bug". We can certainly
> require backup software to retry pg_control until the checksum is valid
> but that seems like a pretty big ask, even considering how complicated
> backup is.

That seems like a problem with pg_control not being written atomically
when the standby server is updating it during recovery, rather than a
problem with backup_label not being used at the start of recovery.
Unless I'm confused.

> If you start from the last checkpoint (which is what will generally be
> stored in pg_control) then the effect is pretty similar.

If the backup didn't span a checkpoint, then restoring from the one in
pg_control actually works fine. Not that I'm encouraging that. But if
you replay WAL from the control file, you at least get the last
checkpoint's worth of WAL; if you use pg_resetwal, you get nothing.

I don't really want to get hung up on this though. My main point here
is that I have trouble believing that an error after you've already
screwed up your backup helps much. I think what we need is to make it
less likely that you will screw up your backup in the first place.

> Right now the user can remove backup_label and get a "successful"
> restore and not realize that they have just corrupted their cluster.
> This is independent of the backup/restore tool doing all the right things.

I don't think it's independent of that at all.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Asymmetric partition-wise JOIN

On Mon, Oct 16, 2023 at 10:24 AM Andrei Lepikhov
 wrote:
>
> >
> > Great!  I'm looking forward to the revised patch
> Before preparing a new patch, it would be better to find the common
> ground in the next issue:
> So far, this optimization stays aside, proposing an alternative path for
> a join RelOptInfo if we have an underlying append path in the outer.
> My back burner is redesigning the approach: asymmetric join doesn't
> change the partitioning scheme and bounds of the partitioned side. So,
> it looks consecutive to make it a part of partitionwise_join machinery
> and implement it as a part of the try_partitionwise_join /
> generate_partitionwise_join_paths routines.
>

I think we need an example where such a join will be faster than
non-partitioned join when both the sides are local. It might be
possible to come up with such an example without writing any code. The
idea would be to rewrite SQL as union of joins.

Whenever I visited this idea, I hit one issue prominently - how would
we differentiate different scans of the non-partitioned relation.
Normally we do that using different Relids but in this case we
wouldn't be able to know the number of such relations involved in the
query unless we start planning such a join. It's late to add new base
relations and assign them new Relids. Of course I haven't thought hard
about it. I haven't looked at the patch to see whether this problem is
solved and how.

-- 
Best Wishes,
Ashutosh Bapat

Re: Rename backup_label to recovery_control

On Mon, Oct 16, 2023 at 12:06 PM Michael Banck  wrote:
> Not sure what to do about this, but as people/companies start moving to
> 15, I am afraid we will get people complaining about this. I think
> having exclusive mode still be the default for pg_start_backup() (albeit
> deprecated) in one release and then dropping it in the next was too
> fast.

I completely agree, and I said so at the time, but got shouted down. I
think the argument that exclusive backups were breaking anything at
all was very weak. Nobody was being forced to use them, and they broke
nothing for people who didn't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Rename backup_label to recovery_control

2023-10-16 Thread Michael Banck

On Mon, Oct 16, 2023 at 11:15:53AM -0400, David Steele wrote:
> On 10/16/23 10:19, Robert Haas wrote:
> > We got rid of exclusive backup mode. We replaced pg_start_backup
> > with pg_backup_start.
> 
> I do think this was an improvement. For example it allows us to do
> [1], which I believe is a better overall solution to the problem of
> torn reads of pg_control. With exclusive backup we would not have this
> option.

Well maybe, but it also seems to mean that any other 3rd party (i.e. not
Postgres-specific) backup tool seems to only support Postgres up till
version 14, as they cannot deal with non-exclusive mode - they are used
to a simple pre/post-script approach.

Not sure what to do about this, but as people/companies start moving to
15, I am afraid we will get people complaining about this. I think
having exclusive mode still be the default for pg_start_backup() (albeit
deprecated) in one release and then dropping it in the next was too
fast.

Or is somebody helping those "enterprise" backup solutions along in
implementing non-exclusive Postgres backups?

Michael

Re: Fix output of zero privileges in psql

On Sat, 2023-10-14 at 02:45 +0200, Erik Wienhold wrote:
> On 2023-10-09 09:54 +0200, Laurenz Albe write:
> > 
> > I tinkered a bit with your documentation.  For example, the suggestion to
> > "\pset null" seemed to be in an inappropriate place.  Tell me what you 
> > think.
> 
> +1  You're right to put that sentence right after the explanation of
> empty privileges.

Thanks for looking.

David, how do you feel about this?  I am wondering whether to set this patch
"ready for committer" or not.

There is Tom wondering upthread whether changing psql's behavior like that
is too much of a compatibility break or not, but I guess it is alright to
leave that final verdict to a committer.

Yours,
Laurenz Albe

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

On Mon, 2023-10-16 at 14:54 +0900, Michael Paquier wrote:
> Thanks for the review.  Yes, I am wondering if other people would
> chime in here.  It doesn't feel like this has gathered enough
> opinions.

I don't have strong feelings either way.  If you have backup_label
but no signal file, starting PostgreSQL may succeed (if the WAL
with the checkpoint happens to be in pg_wal) or it may fail with
an error message.  There is no danger of causing damage unless you
remove backup_label, right?

I cannot think of a use case where you use such a configuration on
purpose, and the current error message is more crypric than a plain
"you must have a signal file to start from a backup", so perhaps
your patch is a good idea.

Yours,
Laurenz Albe

Re: The danger of deleting backup_label

On 10/16/23 10:55, Robert Haas wrote:

On Sat, Oct 14, 2023 at 11:33 AM David Steele wrote:

All of this is fixable in HEAD, but seems incredibly dangerous to back
patch. Even so, I have attached the patch in case somebody sees an
opportunity that I do not.

I really do not think we should be even thinking about back-patching
something like this. It's clearly not a bug fix, although I'm sure
that someone can try to characterize it that way, if they want to make
the well-worn argument that any behavior they don't like is a bug. But
that's a pretty lame argument. Usage errors on the part of users are
not bugs, even if we've coded the software in such a way as to make
those errors more likely.

Hmmm, the reason to back patch this is that it would fix [1], which sure
looks like a problem to me even if it is not a "bug". We can certainly
require backup software to retry pg_control until the checksum is valid
but that seems like a pretty big ask, even considering how complicated
backup is.

I investigated this as a solution to [1] because it seemed like a better
solution that what we have in [1]. But once I got into the weeds it was
obvious that this wasn't going to be something we could back patch.

I think what we ought to be talking about is whether a change like
this is a good idea even in master. I don't think it's a terrible
idea, but I'm also not sure that it's a good idea. The problem is that
if you're doing the right thing with your backup_label, then this is
unnecessary, and if you're doing the wrong thing, then why should you
do the right thing about this?

First and foremost, this solves the problem of pg_control being torn
when read by the backup software. It can't be torn if it is not there.

There are also other advantages -- we can massage pg_control before
writing it back out. This already happens, but before that happens there
is a copy of the (prior) running pg_control there that can mess up the
process.

I mean, admittedly you can't just
ignore a fatal error, but I think people will just run pg_resetwal,
which is even worse than starting from the wrong checkpoint.

If you start from the last checkpoint (which is what will generally be
stored in pg_control) then the effect is pretty similar.

I feel
like in cases where a customer I'm working with has a bad backup,
their entire focus is on doing something to that backup to get a
running system back, whatever it takes. It's already too late at that
point to fix the backup procedure - they only have the backups they
have. You could hope people would do test restores before disaster
strikes, but people who are that prepared are probably running a real
backup tool and will never have this problem in the first place.

Right now the user can remove backup_label and get a "successful"
restore and not realize that they have just corrupted their cluster.
This is independent of the backup/restore tool doing all the right things.

My goal here is to narrow the options to try and make it so there is
*one* valid procedure that will work. For this patch the idea is that
they *must* start Postgres to get a valid pg_control from the
backup_label. Any other action leads to a fatal error.

Note that the current patch is very WIP and does not actually do
everything I'm talking about here. I was just trying to see if it could
be used to solve the problem in [1]. It can't.

Perhaps that's all too pessimistic. I don't know. Certainly, other
people can have experiences that are different than mine. But I feel
like I struggle to think of a case where this would have prevented a
bad outcome, and that makes me wonder whether it's really a good idea
to complicate the system.

I'm specifically addressing cases like those that came up (twice!) in
[2]. This is the main place I see people stumbling these days. If even
hackers can make this mistake then we should do better.

Regards,
-David

[1]
https://www.postgresql.org/message-id/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
[2] [1]
https://www.postgresql.org/message-id/flat/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8%3D2aw%40mail.gmail.com#746e492bfcd2667635634f1477a61288

event trigger sgml touch-up

2023-10-16 Thread Erik Rijkers


Some small (grammatical) changes in event-trigger.sgml

(also one delete of 'community-we' (which I think is just confusing for 
the not-postgresql-community reader).



Erik--- doc/src/sgml/event-trigger.sgml.orig	2023-10-16 17:16:00.017452340 +0200
+++ doc/src/sgml/event-trigger.sgml	2023-10-16 17:22:32.965450992 +0200
@@ -40,7 +40,7 @@
  The login event occurs when an authenticated user logs
  into the system. Any bug in a trigger procedure for this event may
  prevent successful login to the system. Such bugs may be fixed by
- setting  is set to false
+ setting  to false
  either in a connection string or configuration file. Alternative is
  restarting the system in single-user mode (as event triggers are
  disabled in this mode). See the  reference
@@ -49,8 +49,8 @@
  To prevent servers from becoming inaccessible, such triggers must avoid
  writing anything to the database when running on a standby.
  Also, it's recommended to avoid long-running queries in
- login event triggers.  Notes that, for instance,
- cancelling connection in psql wouldn't cancel
+ login event triggers.  Note that, for instance,
+ cancelling a connection in psql wouldn't cancel
  the in-progress login trigger.

 
@@ -1359,7 +1359,7 @@
 END IF;
 
 -- The checks below cannot be performed on standby servers so
--- ensure the database is not in recovery before we perform any
+-- ensure the database is not in recovery before performing any
 -- operations.
 SELECT pg_is_in_recovery() INTO rec;
 IF rec THEN

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

On Mon, Oct 16, 2023 at 7:33 PM Tomas Vondra
 wrote:
>
> On 10/16/23 11:25, Ashutosh Bapat wrote:
> > Thanks Tomas for bringing this discussion to hackers.
> >
> >
> > On Fri, Oct 13, 2023 at 8:58 PM Dean Rasheed  
> > wrote:
> >>
> >> On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
> >>  wrote:
> >>>
> >>> I do plan to backpatch this, yes. I don't think there are many people
> >>> affected by this (few people are using infinite dates/timestamps, but
> >>> maybe the overflow could be more common).
> >>>
> >
> > The example you gave is missing CREATE INDEX command. Is it "create
> > index test_idx_a on test using brin(a);"
>
> Ah, you're right - apologies. FWIW when testing I usually use 1-page
> ranges, to possibly hit more combinations / problems. More importantly,
> it needs to specify the opclass, otherwise it'll use the default minmax
> opclass which does not use distance at all:
>
> create index test_idx_a on test
>  using brin (a timestamptz_minmax_multi_ops) with (pages_per_range=1);
>

Thanks.

> >
> > Do already create indexes have this issue? Do they need to rebuilt
> > after upgrading?
> >
>
> Yes, existing indexes will have inefficient ranges. I'm not sure we want
> to push people to reindex everything, the issue seem somewhat unlikely
> in practice.
>

If the column has infinity values only then they need to rebuild the
index. Such users may notice this bug fix in the release notes and
decide to rebuild the index themselves.

> >> I think this should be rewritten to compute delta from ia and ib
> >> without going via an intermediate Interval value. I.e., instead of
> >> computing "result", just do something like
> >>
> >> dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
> >> days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
> >> days += (int64) ib->day - (int64) ia->day;
> >> days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);
> >>
> >> then convert to double precision as it does now:
> >>
> >> delta = (double) days + dayfraction / (double) USECS_PER_DAY;
> >>
> >
> > Given Tomas's explanation of how these functions are supposed to work,
> > I think your suggestions is better.
> >
> > I was worried that above calculations may not produce the same result
> > as the current code when there is no error because modulo and integer
> > division are not distributive over subtraction. But it looks like
> > together they behave as normal division which is distributive over
> > subtraction. I couldn't find an example where that is not true.
> >
> > Tomas, you may want to incorporate this in your patch. If not, I will
> > incorporate it in my infinite interval patchset in [1].
> >
>
> I'd rather keep it as separate patch, although maybe let's deal with it
> separately from the larger patches. It's a bug, and having it in a patch
> set that adds a feature does not seem like a good idea (or maybe I don't
> understand what the other thread does, I haven't looked very closely).
>

If you incorporate these changes, I will need to remove 0009, which
mostly rewrites that function, from my patchset. If you don't, my
patch rewrites anyway. Either way is fine with me.

-- 
Best Wishes,
Ashutosh Bapat

Re: Rename backup_label to recovery_control

On 10/16/23 10:19, Robert Haas wrote:

On Sat, Oct 14, 2023 at 2:22 PM David Steele wrote:

I was recently discussing the complexities of dealing with pg_control
and backup_label with some hackers at PGConf NYC, when David Christensen
commented that backup_label was not a very good name since it gives the
impression of being informational and therefore something the user can
delete. In fact, we see this happen quite a lot, and there have been
some other discussions about it recently, see [1] and [2]. I bounced the
idea of a rename off various hackers at the conference and in general
people seemed to think it was a good idea.

Personally, I feel like this is an area where we keep moving the parts
around but I'm not sure we're really getting to anything better. We
got rid of recovery.conf.

I agree that this was not an improvement. I was fine with bringing the
recovery options into the GUC fold but never really liked forcing them
into postgresql.auto.conf. But I lost that argument.

We got rid of exclusive backup mode. We
replaced pg_start_backup with pg_backup_start.

I do think this was an improvement. For example it allows us to do [1],
which I believe is a better overall solution to the problem of torn
reads of pg_control. With exclusive backup we would not have this option.

It feels like every
other release or so we whack something around here, but I'm not
convinced that any of it is really making much of an impact. If
there's been any decrease in people screwing up their backups, I
haven't noticed it.

It's pretty subjective, but I feel much the same way. However, I think
the *areas* that people are messing up are changing as we remove
obstacles and I feel like we should address them. backup_label has
always been a bit of a problem -- basically deciding should it be deleted?

With the removal of exclusive backup we removed the only valid use case
(I think) for removing backup_label manually. Now, it should probably
never be removed manually, so we need to make adjustments to make that
clearer to the user, also see [1].

Better messaging may also help, and I am also thinking about that.

To be fair, I will grant that renaming pg_clog to pg_xact_status and
pg_xlog to pg_wal does seem to have reduced the incidence of people
nuking those directories, at least IME. So maybe this change would
help too, for similar reasons. But I'm still concerned that we're
doing too much superficial tinkering in this area. Breaking
compatibility is not without cost.

True enough, but ISTM that we have gotten few (or any) actual complaints
outside of hackers speculating that there will be complaints. For the
various maintainers of backup software this is just business as usual.
The changes to pg_basebackup are also pretty trivial.

I also do wonder with recovery_control is really a better name. Maybe
I just have backup_label too firmly stuck in my head, but is what that
file does really best described as recovery control? I'm not so sure
about that.

The thing it does that describes it as "recovery control" in my view is
that it contains the LSN where Postgres must start recovery (plus TLI,
backup method, etc.). There is some other informational stuff in there,
but the important fields are all about ensuring consistent recovery.

At the end of the day the entire point of backup *is* recovery and users
will interact with this file primarily in recovery scenarios.

Regards,
-David

---

[1]
https://www.postgresql.org/message-id/1330cb48-4e47-03ca-f2fb-b144b49514d8%40pgmasters.net

Re: Postgres Architecture

2023-10-16 Thread Jonah H. Harris

On Mon, Oct 16, 2023 at 6:42 AM Timothy Nelson 
wrote:

> I'm expecting that people will pick the idea apart, and wanted to know
> what people think of it.
>

Thanks for the proposal. This is actually a model that's been around for a
very long time. And, in fact, variations of it (e.g. parsing done in one
place and generated plan fragments shipped to remote execution nodes where
the data resides) are already used by things like Postgres-XL. There have
also been a number of academic implementations where parsing is done
locally and raw parse trees are sent to the server as well. While these
things do reduce CPU, there are a number of negative aspects to deal with
that make such an architecture more difficult to manage.

-- 
Jonah H. Harris

Re: Server crash on RHEL 9/s390x platform against PG16

On Sun, Oct 8, 2023 at 10:55 PM Suraj Kharage <
suraj.khar...@enterprisedb.com> wrote:

> It looks like an issue with JIT. If I disable the JIT then the above query
> runs successfully.
>
> postgres=# set jit to off;
>
> SET
>
> postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON
> rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON
> rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
>
>  pkey | val  | pkey |  label  | hidden | pkey | val | pkey
>
> --+--+--+-++--+-+--
>
> 1 | row1 |1 | hidden  | t  |1 |   1 |
>
> 1 | row1 |1 | hidden  | t  |2 |   1 |
>
> 2 | row2 |2 | visible | f  |1 |   1 |
>
> 2 | row2 |2 | visible | f  |2 |   1 |
>
> (4 rows)
>
> Any idea on this?
>

No, but I found a few previous threads complaining about JIT not working on
s390x.

https://www.postgresql.org/message-id/4106722.1616177...@sss.pgh.pa.us
https://www.postgresql.org/message-id/3ba50664-56a2-bcf4-2b24-05a3e0a75...@enterprisedb.com
https://www.postgresql.org/message-id/20200715091509.GA3354074%40msg.df7cb.de

The most interesting email I found in those threads was this one:

http://postgr.es/m/3358505.1594912...@sss.pgh.pa.us

The backtrace there is different from the one you posted here in
significant ways, but it seems like both that case and this one involve a
null pointer showing up for a non-null pass-by-reference datum. That
doesn't seem like a whole lot to go on, but maybe somebody who understands
the JIT stuff better than I do will have an idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: The danger of deleting backup_label

On Sat, Oct 14, 2023 at 11:33 AM David Steele  wrote:
> All of this is fixable in HEAD, but seems incredibly dangerous to back
> patch. Even so, I have attached the patch in case somebody sees an
> opportunity that I do not.

I really do not think we should be even thinking about back-patching
something like this. It's clearly not a bug fix, although I'm sure
that someone can try to characterize it that way, if they want to make
the well-worn argument that any behavior they don't like is a bug. But
that's a pretty lame argument. Usage errors on the part of users are
not bugs, even if we've coded the software in such a way as to make
those errors more likely.

I think what we ought to be talking about is whether a change like
this is a good idea even in master. I don't think it's a terrible
idea, but I'm also not sure that it's a good idea. The problem is that
if you're doing the right thing with your backup_label, then this is
unnecessary, and if you're doing the wrong thing, then why should you
do the right thing about this? I mean, admittedly you can't just
ignore a fatal error, but I think people will just run pg_resetwal,
which is even worse than starting from the wrong checkpoint. I feel
like in cases where a customer I'm working with has a bad backup,
their entire focus is on doing something to that backup to get a
running system back, whatever it takes. It's already too late at that
point to fix the backup procedure - they only have the backups they
have. You could hope people would do test restores before disaster
strikes, but people who are that prepared are probably running a real
backup tool and will never have this problem in the first place.

Perhaps that's all too pessimistic. I don't know. Certainly, other
people can have experiences that are different than mine. But I feel
like I struggle to think of a case where this would have prevented a
bad outcome, and that makes me wonder whether it's really a good idea
to complicate the system.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Rename backup_label to recovery_control

On Sat, Oct 14, 2023 at 2:22 PM David Steele  wrote:
> I was recently discussing the complexities of dealing with pg_control
> and backup_label with some hackers at PGConf NYC, when David Christensen
> commented that backup_label was not a very good name since it gives the
> impression of being informational and therefore something the user can
> delete. In fact, we see this happen quite a lot, and there have been
> some other discussions about it recently, see [1] and [2]. I bounced the
> idea of a rename off various hackers at the conference and in general
> people seemed to think it was a good idea.

Personally, I feel like this is an area where we keep moving the parts
around but I'm not sure we're really getting to anything better. We
got rid of recovery.conf. We got rid of exclusive backup mode. We
replaced pg_start_backup with pg_backup_start. It feels like every
other release or so we whack something around here, but I'm not
convinced that any of it is really making much of an impact. If
there's been any decrease in people screwing up their backups, I
haven't noticed it.

To be fair, I will grant that renaming pg_clog to pg_xact_status and
pg_xlog to pg_wal does seem to have reduced the incidence of people
nuking those directories, at least IME. So maybe this change would
help too, for similar reasons. But I'm still concerned that we're
doing too much superficial tinkering in this area. Breaking
compatibility is not without cost.

I also do wonder with recovery_control is really a better name. Maybe
I just have backup_label too firmly stuck in my head, but is what that
file does really best described as recovery control? I'm not so sure
about that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-16 Thread Tomas Vondra

On 10/16/23 11:25, Ashutosh Bapat wrote:
> Thanks Tomas for bringing this discussion to hackers.
> 
> 
> On Fri, Oct 13, 2023 at 8:58 PM Dean Rasheed  wrote:
>>
>> On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
>>  wrote:
>>>
>>> I do plan to backpatch this, yes. I don't think there are many people
>>> affected by this (few people are using infinite dates/timestamps, but
>>> maybe the overflow could be more common).
>>>
> 
> The example you gave is missing CREATE INDEX command. Is it "create
> index test_idx_a on test using brin(a);"

Ah, you're right - apologies. FWIW when testing I usually use 1-page
ranges, to possibly hit more combinations / problems. More importantly,
it needs to specify the opclass, otherwise it'll use the default minmax
opclass which does not use distance at all:

create index test_idx_a on test
 using brin (a timestamptz_minmax_multi_ops) with (pages_per_range=1);

> 
> Do already create indexes have this issue? Do they need to rebuilt
> after upgrading?
> 

Yes, existing indexes will have inefficient ranges. I'm not sure we want
to push people to reindex everything, the issue seem somewhat unlikely
in practice.

>>
>> OK, though I doubt that such values are common in practice.
>>
>> There's also an overflow problem in
>> brin_minmax_multi_distance_interval() though, and I think that's worse
>> because overflows there throw "interval out of range" errors, which
>> can prevent index creation or inserts.
>>
>> There's a patch (0009 in [1]) as part of the infinite interval work,
>> but that just uses interval_mi(), and so doesn't fix the
>> interval-out-of-range errors, except for infinite intervals, which are
>> treated as special cases, which I don't think is really necessary.
>>
> 
> Right. I used interval_mi() to preserve the finite value behaviour as
> is. But ...
> 
>> I think this should be rewritten to compute delta from ia and ib
>> without going via an intermediate Interval value. I.e., instead of
>> computing "result", just do something like
>>
>> dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
>> days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
>> days += (int64) ib->day - (int64) ia->day;
>> days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);
>>
>> then convert to double precision as it does now:
>>
>> delta = (double) days + dayfraction / (double) USECS_PER_DAY;
>>
> 
> Given Tomas's explanation of how these functions are supposed to work,
> I think your suggestions is better.
> 
> I was worried that above calculations may not produce the same result
> as the current code when there is no error because modulo and integer
> division are not distributive over subtraction. But it looks like
> together they behave as normal division which is distributive over
> subtraction. I couldn't find an example where that is not true.
> 
> Tomas, you may want to incorporate this in your patch. If not, I will
> incorporate it in my infinite interval patchset in [1].
> 

I'd rather keep it as separate patch, although maybe let's deal with it
separately from the larger patches. It's a bug, and having it in a patch
set that adds a feature does not seem like a good idea (or maybe I don't
understand what the other thread does, I haven't looked very closely).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add support for AT LOCAL

Michael Paquier  writes:
> Perhaps that's a stupid question..  But a server running under this
> environment fails the two following queries even for older branches,
> right?
> select timezone('UTC', '23:59:59.99-07'::timetz);
> select timezone('UTC', '23:59:00-07'::timetz);

One would expect, since the AT LOCAL syntax is just sugar for that.

I'm mighty tempted though to (a) add coverage for timetz_izone
to HEAD, and (b) backpatch the new tests, sans the AT LOCAL case,
to the back branches (maybe not v11).

regards, tom lane

Re: list of acknowledgments for PG16

2023-10-16 Thread Alvaro Herrera

On 2023-Aug-27, Peter Eisentraut wrote:

> On 22.08.23 15:48, Vik Fearing wrote:

> > I think these might be the same person:
> > 
> >      Zhihong Yu
> >      Zihong Yu
> > 
> > I did not spot any others.
> 
> Fixed.

Hm, I noticed we also list Ted Yu, but that's the same person as Zhihong Yu.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: remaining sql/json patches

2023-10-16 Thread Anton A. Melnikov


Hello!

On 16.10.2023 15:49, jian he wrote:

add the following code after ExecEvalJsonExprCoercion if
(!InputFunctionCallSafe(...) works, but seems like a hack.

if (!val_string)
{
*op->resnull = true;
*op->resvalue = (Datum) 0;
}


It seems the constraint should work here:

After

CREATE TABLE test (
js text,
i int,
x jsonb DEFAULT JSON_QUERY(jsonb '[1,2]', '$[*]' WITH WRAPPER)
CONSTRAINT test_constraint
CHECK (JSON_QUERY(js::jsonb, '$.a' RETURNING char(5) OMIT QUOTES 
EMPTY ARRAY ON EMPTY) > 'a')
);

INSERT INTO test_jsonb_constraints VALUES ('[]');

one expected to see an error like that:

ERROR:  new row for relation "test" violates check constraint "test_constraint"
DETAIL:  Failing row contains ([], null, [1, 2]).

not "INSERT 0 1"

With best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Rename backup_label to recovery_control


On 10/16/23 00:26, Kyotaro Horiguchi wrote:

At Mon, 16 Oct 2023 13:16:42 +0900 (JST), Kyotaro Horiguchi 
 wrote in

Just an idea in a slightly different direction, but I'm wondering if
we can simply merge the content of backup_label into control file.
The file is 8192 bytes long, yet only 256 bytes are used. As a result,
we anticipate no overhead.  Sucha configuration would forcibly prevent
uses from from removing the backup information.


In second thought, that would break the case of file-system level
backups, which require backup information separately from control
data.


Exactly -- but we do have a proposal to do the opposite and embed 
pg_control into backup_label [1] (or hopefully recovery_control).


Regards,
-David

[1] 
https://www.postgresql.org/message-id/1330cb48-4e47-03ca-f2fb-b144b49514d8%40pgmasters.net

Re: remaining sql/json patches

2023-10-16 Thread jian he

On Mon, Oct 16, 2023 at 5:47 PM Amit Langote  wrote:
>
> > We're currently looking into this case.
>
> Thanks for the report.  I think I've figured out the problem --
> ExecEvalJsonExprCoercion() mishandles the EMPTY ARRAY ON EMPTY case.
>
> I'm reading the other 2 patches...
>
> --
> Thanks, Amit Langote
> EDB: http://www.enterprisedb.com

query: select JSON_QUERY('[]'::jsonb, '$.mm' RETURNING text OMIT
QUOTES EMPTY ON EMPTY);

Breakpoint 2, ExecEvalJsonExpr (state=0x55e47ad685c0,
op=0x55e47ad68818, econtext=0x55e47ad682e8) at
../../Desktop/pg_sources/main/postgres/src/backend/executor/execExprInterp.c:4188
4188JsonExprState *jsestate = op->d.jsonexpr.jsestate;
(gdb) fin
Run till exit from #0  ExecEvalJsonExpr (state=0x55e47ad685c0,
op=0x55e47ad68818, econtext=0x55e47ad682e8)
at 
../../Desktop/pg_sources/main/postgres/src/backend/executor/execExprInterp.c:4188
ExecInterpExpr (state=0x55e47ad685c0, econtext=0x55e47ad682e8,
isnull=0x7ffe63659e2f) at
../../Desktop/pg_sources/main/postgres/src/backend/executor/execExprInterp.c:1556
1556EEO_NEXT();
(gdb) p *op->resnull
$1 = true
(gdb) cont
Continuing.

Breakpoint 1, ExecEvalJsonExprCoercion (state=0x55e47ad685c0,
op=0x55e47ad68998, econtext=0x55e47ad682e8, res=94439801785192,
resnull=false) at
../../Desktop/pg_sources/main/postgres/src/backend/executor/execExprInterp.c:4453
4453{
(gdb) i args
state = 0x55e47ad685c0
op = 0x55e47ad68998
econtext = 0x55e47ad682e8
res = 94439801785192
resnull = false
(gdb) p *op->resnull
$2 = false
---
in ExecEvalJsonExpr, *op->resnull is true.
then in ExecEvalJsonExprCoercion *op->resnull is false.
I am not sure why *op->resnull value changes, when changes.
---
in ExecEvalJsonExprCoercion, if resnull is true, then jb is null, but
it seems there is no code to handle the case.
-
add the following code after ExecEvalJsonExprCoercion if
(!InputFunctionCallSafe(...) works, but seems like a hack.

if (!val_string)
{
*op->resnull = true;
*op->resvalue = (Datum) 0;
}

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

Hi,

Thank you very much, I'll check it out. It looks like the
getObjectIdentity() used in
pg_identify_object() could do.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

2023-10-16 Thread Pavel Stehule

po 16. 10. 2023 v 13:56 odesílatel Daniel Gustafsson 
napsal:

> > On 16 Oct 2023, at 12:15, Quan Zongliang  wrote:
>
> > Implement TODO item:
> > PL/pgSQL
> > Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]
>
> Cool!  While I haven't looked at the patch yet, I've wanted this myself
> many
> times in the past when working with large plpgsql codebases.
>
> > As a first step, deal only with [], such as
> > xxx.yyy%TYPE[]
> > xxx%TYPE[]
> >
> > It can be extended to support multi-dimensional and complex syntax in
> the future.
>
> Was this omitted due to complexity of implementation or for some other
> reason?
>

There is no reason for describing enhancement. The size and dimensions of
postgresql arrays are dynamic, depends on the value, not on declaration.
Now, this information is ignored, and can be compatibility break to check
and enforce this info.


> --
> Daniel Gustafsson
>
>
>
>

Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

2023-10-16 Thread Daniel Gustafsson

> On 16 Oct 2023, at 12:15, Quan Zongliang  wrote:

> Implement TODO item:
> PL/pgSQL
> Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

Cool!  While I haven't looked at the patch yet, I've wanted this myself many
times in the past when working with large plpgsql codebases.

> As a first step, deal only with [], such as
> xxx.yyy%TYPE[]
> xxx%TYPE[]
> 
> It can be extended to support multi-dimensional and complex syntax in the 
> future.

Was this omitted due to complexity of implementation or for some other reason?

--
Daniel Gustafsson

Re: Pro et contra of preserving pg_proc oids during pg_upgrade

2023-10-16 Thread Alvaro Herrera

On 2023-Oct-13, Nikita Malakhov wrote:

> Textual representation requires a long text field because it could
> contain schema, arguments, it is difficult and not effective to be
> saved as part of the data, and must be parsed to retrieve function
> oid.

It is worse than that: the regproc textual representation depends on
search_path.  If you store the text now, the meaning could change later,
depending on the search_path that applies at read time.

Of course, the storage for OID is much shorter and not subject to this
problem; but it is subject to the problem that it breaks if you drop and
replace the function, which could happen for instance in an extensions
upgrade script.

I think a better way to store a function's identity is to store the
'identity' column from pg_identify_object().  It is fully qualified and
you can cast to regprocedure with no ambiguity (which gives you an OID,
if you need one).  And it should upgrade cleanly.

If you have a regproc column that you want to upgrade, maybe it would
work to do 'ALTER TABLE .. SET TYPE TEXT USING' and turn the value into
pg_identify_object().identity.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Postgres Architecture

2023-10-16 Thread Timothy Nelson

Hi all!

I'm a DevOps Manager/Engineer by trade (though the place I work is not,
unfortunately, using Postgres).  I've been thinking quite a bit about what
our ideal architecture at work will look like and what scaling looks like,
both for work and for home projects (where I *am* looking at using Postgres
:) ).

Tonight, (unrelated to work) I took the time to draw up a diagram of an
architecture that I think would help Postgres move one step towards both
more scalability, and better ease of use.

Since I'm not so hot at drawing ASCII art diagrams, I thought maybe the way
to go would be to drop it into a Google Presentation and make that public.

It's a couple of diagrams (existing and proposed architecture), and a list
of what I think the advantages and disadvantages are.

https://docs.google.com/presentation/d/1ew31STf8qC2keded5GfQiSUwb3fukmO0PFnZw12yAs8/edit?usp=sharing

To keep it short, the proposal is that the stages from Parse through Plan
be done in a separate  process (and potentially on a separate server) from
the Execute stage.  The idea is:
- The Parse/Plan servers don't care whether they're read or write
- The Parse/Plan know which Execute server is the writer (and which the
readers), and forward to the correct server for execution

I even wonder if this might not mean that the Parse/Plan servers can be
deployed as K8s containers, with the Execute server being the external
non-k8s server.

Note that in this e-mail, I've referred to:
- The Parse/Plan server (which my diagram calls the Postgres SQL server)
- The Execute server (which my diagram calls the Storage server)

I'm not sure what naming makes sense, but I intentionally used a couple of
different names in hopes that one of them would get the idea across --
please disregard whichever names don't make sense, and feel free to suggest
new ones.

I'm expecting that people will pick the idea apart, and wanted to know what
people think of it.

Thanks!

PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]




Implement TODO item:
PL/pgSQL
Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]

As a first step, deal only with [], such as
xxx.yyy%TYPE[]
xxx%TYPE[]

It can be extended to support multi-dimensional and complex syntax in 
the future.



--
Quan Zongliangdiff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 6a09bfdd67..3ff12b7af9 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -23,6 +23,7 @@
 #include "parser/scanner.h"
 #include "parser/scansup.h"
 #include "utils/builtins.h"
+#include "utils/syscache.h"
 
 #include "plpgsql.h"
 
@@ -76,6 +77,7 @@ staticPLpgSQL_expr*read_sql_expression2(int 
until, int until2,

  int *endtoken);
 static PLpgSQL_expr*read_sql_stmt(void);
 static PLpgSQL_type*read_datatype(int tok);
+static PLpgSQL_type*read_datatype_array(PLpgSQL_type *elem_typ);
 static PLpgSQL_stmt*make_execsql_stmt(int firsttoken, int location);
 static PLpgSQL_stmt_fetch *read_fetch_direction(void);
 static void complete_direction(PLpgSQL_stmt_fetch *fetch,
@@ -2783,6 +2785,55 @@ read_sql_construct(int until,
return expr;
 }
 
+static PLpgSQL_type *
+read_datatype_array(PLpgSQL_type *elem_typ)
+{
+   int tok;
+   HeapTuple   type_tup = NULL;
+   Form_pg_typetype_frm;
+   Oid arrtyp_oid;
+
+   Assert(elem_typ);
+   if (!OidIsValid(elem_typ->typoid))
+   return elem_typ;
+
+   tok = yylex();
+   /* Next token is not square bracket. */
+   if (tok != '[')
+   {
+   plpgsql_push_back_token(tok);
+
+   return elem_typ;
+   }
+
+   tok = yylex();
+   /* For now, deal only with []. */
+   if (tok != ']')
+   {
+   plpgsql_push_back_token('[');
+   plpgsql_push_back_token(tok);
+
+   return elem_typ;
+   }
+
+   type_tup = SearchSysCache1(TYPEOID,
+ 
ObjectIdGetDatum(elem_typ->typoid));
+   if (!HeapTupleIsValid(type_tup))
+   return elem_typ;
+
+   type_frm = (Form_pg_type) GETSTRUCT(type_tup);
+   arrtyp_oid = type_frm->typarray;
+   ReleaseSysCache(type_tup);
+
+   if (OidIsValid(arrtyp_oid))
+   return plpgsql_build_datatype(arrtyp_oid,
+   elem_typ->atttypmod,
+   elem_typ->collation,
+   NULL);
+   else
+   return elem_typ;
+}
+
 static PLpgSQL_type *
 read_datatype(int tok)
 {
@@ -2818,7 +2869,9 @@ read_datatype(int tok)
{
result = plpgsql_parse_wordtype(dtname);
if (result)
-   return result;
+   {
+   return read_datatype_array(result);
+   }
}
else if (tok_is_keyword(tok, &yylval,

K_ROWTYPE, "rowtype"))
@@ -2842,7 +2895,9 @@ read_datatype(int tok)
{
result = plpgsql_parse_wordtype(dtname);
if (result)
-   return result;
+   {
+   return read_datatype_array(result);
+   }
}
else if (tok_is_keyword(tok, &yylval,

K_ROWTYPE, "rowtype"))
@@ -2866,7 +2921,9 @@ read_datatype(int tok)
{
result = plpgsql_parse_cwordtype(dtnames);
if (result)
-   return result;
+   {
+   return read_datatype_array(result);
+   }
}
else if (tok_is_keyword(tok, &yylval,

K_ROWTYPE, "rowtype"))
diff --git a/src/test/regress/expected/plpgsql.out 
b/src/test/regress/expected/plpgsql.out
index 272f5d2111..8db28c1122 100644
--- a/src/test/regress/expected/plpgsql.out
+++ b/src/test/regress/expected/plpgsql.out
@@ -5814,6 +5814,31 @@ SELECT * FROM list_partitioned_table() AS t;
  2
 (2 rows)
 
+CREATE OR REPLACE FUNCTION array_partitioned_table()
+RETURNS SETOF partitioned_table.a%TYPE AS $$
+DECLARE
+i int;
+row partitioned_table%ROWTYPE;
+a_val p

Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}

2023-10-16 Thread Nazir Bilal Yavuz

Hi,

On Tue, 10 Oct 2023 at 03:54, Michael Paquier  wrote:
>
> In ~14, as far as I can see blk_write_time is only incremented for
> shared buffers.  FWIW, I agree that we should improve these stats for
> local buffers but I am not on board with a solution where we'd use the
> same counter for local and shared buffers while we've historically
> only counted the former, because that could confuse existing
> monitoring queries.  It seems to me that the right solution is to do
> the same separation as temp blocks with two separate counters, without
> a backpatch.  I'd like to go as far as renaming blk_read_time and
> blk_write_time to respectively shared_blk_read_time and
> shared_blk_write_time to know exactly what the type of block dealt
> with is when querying this data, particularly for pg_stat_statements's
> sake.

Yes, that could be a better solution. Also, having more detailed stats
for shared and local buffers is helpful. I updated patches in line
with that:

0001: Counts extends same way as a write.
0002: Rename blk_{read|write}_time as shared_blk_{read|write}_time.
0003: Add new local_blk_{read|write}_time variables.

Regards,
Nazir Bilal Yavuz
Microsoft
From acf5a2781760f284aef770da0f64acf8685b6445 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 13 Oct 2023 17:35:00 +0300
Subject: [PATCH v3 1/3] Include IOOp IOOP_EXTENDs while calculating block
 write times

Extends are counted as writes in block context, so include IOOP_EXTENDs
while calculating block write times.
---
 src/backend/utils/activity/pgstat_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index eb7d35d422..8ec8670199 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -119,7 +119,7 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 		INSTR_TIME_SET_CURRENT(io_time);
 		INSTR_TIME_SUBTRACT(io_time, start_time);
 
-		if (io_op == IOOP_WRITE)
+		if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND)
 		{
 			pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
 			if (io_object == IOOBJECT_RELATION)
-- 
2.42.0

From 1f0933c7057e20c074f6b9355f6d91a9277a8c09 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Thu, 12 Oct 2023 14:40:23 +0300
Subject: [PATCH v3 3/3] Add local_blk_{read|write}_time I/O timing statistics

There was no I/O timing statistics for counting local blocks' timings.
So, add local_blk_{read|write}_time variables for counting these. These
counters are exposed in EXPLAIN and pg_stat_statements.
pg_stat_statements is bumped to 1.12.
---
 contrib/pg_stat_statements/Makefile   |  1 +
 .../expected/oldextversions.out   | 60 
 contrib/pg_stat_statements/meson.build|  1 +
 .../pg_stat_statements--1.11--1.12.sql| 71 +++
 .../pg_stat_statements/pg_stat_statements.c   | 33 -
 .../pg_stat_statements.control|  2 +-
 .../pg_stat_statements/sql/oldextversions.sql |  5 ++
 doc/src/sgml/pgstatstatements.sgml| 20 ++
 src/backend/commands/explain.c| 23 +-
 src/backend/executor/instrument.c |  6 ++
 src/backend/utils/activity/pgstat_io.c|  4 ++
 src/include/executor/instrument.h |  2 +
 src/test/regress/expected/explain.out |  4 ++
 13 files changed, 228 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.11--1.12.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index eba4a95d91..b7d12bc7fb 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -7,6 +7,7 @@ OBJS = \
 
 EXTENSION = pg_stat_statements
 DATA = pg_stat_statements--1.4.sql \
+	pg_stat_statements--1.11--1.12.sql \
 	pg_stat_statements--1.10--1.11.sql \
 	pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
diff --git a/contrib/pg_stat_statements/expected/oldextversions.out b/contrib/pg_stat_statements/expected/oldextversions.out
index 9e18fe2e47..b84e0c8484 100644
--- a/contrib/pg_stat_statements/expected/oldextversions.out
+++ b/contrib/pg_stat_statements/expected/oldextversions.out
@@ -308,4 +308,64 @@ SELECT count(*) > 0 AS has_data FROM pg_stat_statements;
  t
 (1 row)
 
+-- New views for pg_stat_statements in 1.12
+AlTER EXTENSION pg_stat_statements UPDATE TO '1.12';
+\d pg_stat_statements
+  View "public.pg_stat_statements"
+ Column |   Type   | Collation | Nullable | Default 
++--+---+--+-
+ userid | oid  |   |  | 
+ dbid   | oid  |   |  | 
+ toplevel   | boolean  |   |

Re: remaining sql/json patches

Hi,

Sorry, forgot to mention above - patches from our patch set should be
applied
onto SQL/JSON part 3 - v22-0003-SQL-JSON-query-functions.patch, thus
they are numbered as v23-0003-1 and -2.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: remaining sql/json patches

2023-10-16 Thread Amit Langote

Hi,

On Mon, Oct 16, 2023 at 5:34 PM Nikita Malakhov  wrote:
>
> Hi,
>
> Also FYI - the following case results in segmentation fault:
>
> postgres@postgres=# CREATE TABLE test_jsonb_constraints (
> js text,
> i int,
> x jsonb DEFAULT JSON_QUERY(jsonb '[1,2]', '$[*]' WITH WRAPPER)
> CONSTRAINT test_jsonb_constraint1
> CHECK (js IS JSON)
> CONSTRAINT test_jsonb_constraint5
> CHECK (JSON_QUERY(js::jsonb, '$.mm' RETURNING char(5) OMIT 
> QUOTES EMPTY ARRAY ON EMPTY) >  'a' COLLATE "C")
> CONSTRAINT test_jsonb_constraint6
> CHECK (JSON_EXISTS(js::jsonb, 'strict $.a' RETURNING int TRUE 
> ON ERROR) < 2)
> );
> CREATE TABLE
> Time: 13.518 ms
> postgres@postgres=# INSERT INTO test_jsonb_constraints VALUES ('[]');
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The connection to the server was lost. Attempting reset: Failed.
> Time: 6.858 ms
> @!>
>
> We're currently looking into this case.

Thanks for the report.  I think I've figured out the problem --
ExecEvalJsonExprCoercion() mishandles the EMPTY ARRAY ON EMPTY case.

I'm reading the other 2 patches...

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

On Thu, Oct 12, 2023 at 6:51 PM torikoshia  wrote:
>
> On 2023-10-11 16:22, Ashutosh Bapat wrote:
> >
> > Considering the similarity with auto_explain I wondered whether this
> > function should be part of auto_explain contrib module itself? If we
> > do that users will need to load auto_explain extension and thus
> > install executor hooks when this function doesn't need those. So may
> > not be such a good idea. I didn't see any discussion on this.
>
> I once thought about adding this to auto_explain, but I left it asis for
> below reasons:
>
> - One of the typical use case of pg_log_query_plan() would be analyzing
> slow query on customer environments. On such environments, We cannot
> always control what extensions to install.

The same argument applies to auto_explain functionality as well. But
it's not part of the core.

>Of course auto_explain is a major extension and it is quite possible
> that they installed auto_explain, but but it is also possible they do
> not.
> - It seems a bit counter-intuitive that pg_log_query_plan() is in an
> extension called auto_explain, since it `manually`` logs plans
>

pg_log_query_plan() may not fit auto_explain but
pg_explain_backend_query() does. What we are logging is more than just
plan of the query, it might expand to be closer to explain output.
While auto in auto_explain would refer to its automatically logging
explain outputs, it can provide an additional function which provides
similar functionality by manually triggering it.

But we can defer this to a committer, if you want.

I am more interested in avoiding the duplication of code, esp. the
first comment in my reply
>> There is a lot of similarity between what this feature does and what
>> auto explain does. I see the code is also duplicated. There is some
>> merit in avoiding this duplication
>> 1. we will get all the features of auto_explain automatically like
>> choosing a format (this was expressed somebody earlier in this
>> thread), setings etc.
>> 2. avoid bugs. E.g your code switches context after ExplainState has
>> been allocated. These states may leak depending upon when this
>> function gets called.
>> 3. Building features on top as James envisions will be easier.

>
>=# select pg_log_query_plan(pid), application_name, backend_type from
> pg_stat_activity where backend_type = 'autovacuum launcher';
>WARNING:  PID 63323 is not a PostgreSQL client backend process
> pg_log_query_plan | application_name |backend_type
>---+--+-
> f |  | autovacuum launcher
>
>
> > I am also wondering whether it's better to report the WARNING as
> > status column in the output. E.g. instead of
> > #select pg_log_query_plan(100);
> > WARNING:  PID 100 is not a PostgreSQL backend process
> >  pg_log_query_plan
> > ---
> >  f
> > (1 row)
> > we output
> > #select pg_log_query_plan(100);
> >  pg_log_query_plan |   status
> > ---+-
> >  f | PID 100 is not a PostgreSQL backend process
> > (1 row)
> >
> > That looks neater and can easily be handled by scripts, applications
> > and such. But it will be inconsistent with other functions like
> > pg_terminate_backend() and pg_log_backend_memory_contexts().
>
> It seems neater, but it might be inconvenient because we can no longer
> use it  in select list like the following query as you wrote:
>
>#select pg_log_query_plan(pid), application_name, backend_type from
>pg_stat_activity where backend_type = 'autovacuum launcher';

Why is that?

-- 
Best Wishes,
Ashutosh Bapat

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

Thanks Tomas for bringing this discussion to hackers.

On Fri, Oct 13, 2023 at 8:58 PM Dean Rasheed  wrote:
>
> On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
>  wrote:
> >
> > I do plan to backpatch this, yes. I don't think there are many people
> > affected by this (few people are using infinite dates/timestamps, but
> > maybe the overflow could be more common).
> >

The example you gave is missing CREATE INDEX command. Is it "create
index test_idx_a on test using brin(a);"

Do already create indexes have this issue? Do they need to rebuilt
after upgrading?

>
> OK, though I doubt that such values are common in practice.
>
> There's also an overflow problem in
> brin_minmax_multi_distance_interval() though, and I think that's worse
> because overflows there throw "interval out of range" errors, which
> can prevent index creation or inserts.
>
> There's a patch (0009 in [1]) as part of the infinite interval work,
> but that just uses interval_mi(), and so doesn't fix the
> interval-out-of-range errors, except for infinite intervals, which are
> treated as special cases, which I don't think is really necessary.
>

Right. I used interval_mi() to preserve the finite value behaviour as
is. But ...

> I think this should be rewritten to compute delta from ia and ib
> without going via an intermediate Interval value. I.e., instead of
> computing "result", just do something like
>
> dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
> days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
> days += (int64) ib->day - (int64) ia->day;
> days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);
>
> then convert to double precision as it does now:
>
> delta = (double) days + dayfraction / (double) USECS_PER_DAY;
>

Given Tomas's explanation of how these functions are supposed to work,
I think your suggestions is better.

I was worried that above calculations may not produce the same result
as the current code when there is no error because modulo and integer
division are not distributive over subtraction. But it looks like
together they behave as normal division which is distributive over
subtraction. I couldn't find an example where that is not true.

Tomas, you may want to incorporate this in your patch. If not, I will
incorporate it in my infinite interval patchset in [1].

[1] 
https://www.postgresql.org/message-id/CAExHW5u1JE7dxK=wlzqhcszntoxqzjdiermhrepw6r8w6kc...@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-16 Thread Amit Kapila

On Sat, Oct 14, 2023 at 10:45 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Here is a new patch.
>
> Previously I wrote:
> > Based on above idea, I made new version patch which some functionalities 
> > were
> > exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs 
> > and
> > then create logical slots, then pg_resetwal would be called with new option
> > --no-switch, which avoid to switch a WAL segment file. The option is only 
> > used
> > for the upgrading purpose so it is not written in doc and usage(). This 
> > option
> > is not required if pg_resetwal -o does not discard WAL records. Please see 
> > the
> > fork thread [1].
>
> But for now, these changes were reverted because changing pg_resetwal -o stuff
> may be a bit risky. This has been located more than ten years so that we 
> should
> be more careful for modifying.
> Also, I cannot come up with problems if slots are created after the 
> pg_resetwal.
> Background processes would not generate decodable changes (listed in [1]), and
> BGworkers by extensions could be ignored [2].
> Based on the discussion on forked thread [3] and if it is accepted, we will 
> apply
> again.
>

Yeah, I think introducing additional complexity unless it is really
required sounds a bit scary to me as well. BTW, please find attached
some cosmetic changes.

One minor additional comment:
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');

Why do we need to set wal_level as logical for subscribers?

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
index 4144a43afd..cfa955a679 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -618,9 +618,9 @@ logicalmsg_decode(LogicalDecodingContext *ctx, 
XLogRecordBuffer *buf)
return;
 
/*
-* We can also skip decoding when in 'fast_forward' mode. This check 
must
-* be last because we don't want to set that processing_required flag
-* unnecessarily.
+* We also skip decoding in 'fast_forward' mode. This check must be last
+* because we don't want to set the processing_required flag unless
+* we have a decodable message.
 */
if (ctx->fast_forward)
{
@@ -1307,8 +1307,8 @@ DecodeTXNNeedSkip(LogicalDecodingContext *ctx, 
XLogRecordBuffer *buf,
return true;
 
/*
-* We can also skip decoding when in 'fast_forward' mode. In passing set
-* the 'processing_required' flag to indicate, were it not for this 
mode,
+* We also skip decoding in 'fast_forward' mode. In passing set the
+* 'processing_required' flag to indicate, were it not for this mode,
 * processing *would* have been required.
 */
if (ctx->fast_forward)
diff --git a/src/backend/replication/logical/logical.c 
b/src/backend/replication/logical/logical.c
index 32869a75ab..e02cd0fa44 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1953,9 +1953,9 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 }
 
 /*
- * Read to end of WAL starting from the decoding slot's restart_lsn. Return
- * true if any meaningful/decodable WAL records are encountered, otherwise
- * false.
+ * Read up to the end of WAL starting from the decoding slot's restart_lsn.
+ * Return true if any meaningful/decodable WAL records are encountered,
+ * otherwise false.
  *
  * Although this function is currently used only during pg_upgrade, there are
  * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
diff --git a/src/backend/utils/adt/pg_upgrade_support.c 
b/src/backend/utils/adt/pg_upgrade_support.c
index 2a831bc397..a3a8ade405 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -274,8 +274,8 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
  * Returns true if there are no decodable WAL records after the
  * confirmed_flush_lsn. Otherwise false.
  *
- * This is a special purpose function to ensure the given slot can be upgraded
- * without data loss.
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.
  */
 Datum
 binary_upgrade_slot_has_pending_wal(PG_FUNCTION_ARGS)
@@ -294,16 +294,10 @@ binary_upgrade_slot_has_pending_wal(PG_FUNCTION_ARGS)
 
slot_name = PG_GETARG_NAME(0);
 
-   /*
-* Acquire the given slot. There should be no error because the caller 
has
-* already checked the slot exists.
-*/
+   /* Acquire the given slot. */
ReplicationSlotAcquire(NameStr(*slot_name), true);
 
-   /*
-* It's caller's responsibility to check the health of the slot.  
Upcoming
-* functions assume the restart_lsn points to a valid record.
-*/
+   /* Slot

Re: remaining sql/json patches

Hi,

Also FYI - the following case results in segmentation fault:

postgres@postgres=# CREATE TABLE test_jsonb_constraints (
js text,
i int,
x jsonb DEFAULT JSON_QUERY(jsonb '[1,2]', '$[*]' WITH WRAPPER)
CONSTRAINT test_jsonb_constraint1
CHECK (js IS JSON)
CONSTRAINT test_jsonb_constraint5
CHECK (JSON_QUERY(js::jsonb, '$.mm' RETURNING char(5) OMIT
QUOTES EMPTY ARRAY ON EMPTY) >  'a' COLLATE "C")
CONSTRAINT test_jsonb_constraint6
CHECK (JSON_EXISTS(js::jsonb, 'strict $.a' RETURNING int
TRUE ON ERROR) < 2)
);
CREATE TABLE
Time: 13.518 ms
postgres@postgres=# INSERT INTO test_jsonb_constraints VALUES ('[]');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
Time: 6.858 ms
@!>

We're currently looking into this case.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: LLVM 16 (opaque pointers)

2023-10-16 Thread Ronan Dunklau

Le vendredi 13 octobre 2023, 22:32:13 CEST Thomas Munro a écrit :
> On Wed, Oct 11, 2023 at 10:31 PM Ronan Dunklau  
wrote:
> > Le mercredi 11 octobre 2023, 10:59:50 CEST Thomas Munro a écrit :
> > > The back-patch to 12 was a little trickier than anticipated, but after
> > > taking a break and trying again I now have PG 12...17 patches that
> > > I've tested against LLVM 10...18 (that's 54 combinations), in every
> > > case only with the clang corresponding to LLVM.
> > 
> > Thank you Thomas for those patches, and the extensive testing, I will run
> > my own and let you know.
> 
> Thanks!  No news is good news, I hope?  I'm hoping to commit this today.
> 
> > > I've attached only the patches for master, but the 12-16 versions are
> > > available at https://github.com/macdice/postgres/tree/llvm16-$N in
> > > case anyone has comments on those.
> > 
> > For PG13 and PG12, it looks like the ExecEvalBoolSubroutineTemplate is not
> > used anywhere, as ExecEvalBoolSubroutine was introduced in PG14 if I'm not
> > mistaken.
> 
> Right, looks like I can remove that in those branches.

Oh sorry I thought I followed up. I ran the same stress testing involving 
several hours of sqlsmith with all jit costs set to zero and didn't notice 
anything with LLVM16.

Thank you !

--
Ronan Dunklau

Re: Removing unneeded self joins

2023-10-16 Thread Andrei Lepikhov

On 12/10/2023 18:32, Alexander Korotkov wrote:

On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
 wrote:

On 4/10/2023 14:34, Alexander Korotkov wrote:

  > Relid replacement machinery is the most contradictory code here. We used
  > a utilitarian approach and implemented a simplistic variant.

  > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
  > > they are not necessary.  [1] points that infrastructure from [2] might
  > > be useful.  The patchset from [2] seems committed mow.  However, I
  > > can't see it is directly helpful in this matter.  Could we just skip
  > > adding IS NOT NULL clause for the columns, that have
  > > pg_attribute.attnotnull set?
  > Thanks for the links, I will look into that case.

To be more precise, in the attachment, you can find a diff to the main
patch, which shows the volume of changes to achieve the desired behaviour.
Some explains in regression tests shifted. So, I've made additional tests:

DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);

We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.

OK.  I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).

I made improvements over the code.  Mostly new comments, grammar
corrections of existing comments and small refactoring.

Also, I found that the  suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it.  That helps to evade quadratic complexity with
large number of relations.

Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).

I would like to propose one more minor improvement (see in attachment). 
The idea here is that after removing a self-join and changing clauses we 
should re-probe the set of relids with the same Oid, because we can find 
more removable self-joins (see the demo test in join.sql).

--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index 7b8dc7a2b7..f7ccda5231 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -2298,6 +2298,7 @@ remove_self_joins_recurse(PlannerInfo *root, List 
*joinlist, Relids toRemove)
{
/* Create group of relation indexes with the 
same oid */
Relids  group = NULL;
+   Relids  removed;

while (i < j)
{
@@ -2306,8 +2307,21 @@ remove_self_joins_recurse(PlannerInfo *root, List 
*joinlist, Relids toRemove)
}

relids = bms_del_members(relids, group);
-   toRemove = bms_add_members(toRemove,
-   
remove_self_joins_one_group(root, group));
+
+   /*
+* Try to remove self-joins from a group of 
identical entries.
+* Make next attempt iteratively - if something 
is deleted from
+* a group, changes in clauses and equivalence 
classes can give
+* us a chance to find more candidates.
+*/
+   do {
+   Assert(!bms_overlap(group, toRemove));
+   removed = 
remove_self_joins_one_group(root, group);
+   toRemove = bms_add_members(toRemove, 
removed);
+   group = bms_del_members(group, removed);
+   } while (!bms_is_empty(removed) &&
+bms_membership(group) == 
BMS_MULTIPLE);
+   bms_free(removed);
bms_free(group);

Re: remaining sql/json patches

Hi!

With the latest set of patches we encountered failure with the following
query:

postgres@postgres=# SELECT JSON_QUERY(jsonpath '"aaa"', '$' RETURNING text);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
Time: 11.165 ms

A colleague of mine, Anton Melnikov, proposed the following changes which
slightly
alter coercion functions to process this kind of error correctly.

Please check attached patch set.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


v23-0003-1-transformJsonExprCommon-fixup.patch
Description: Binary data


v23-0003-2-json-query-coercion-override.patch
Description: Binary data

Re: Some performance degradation in REL_16 vs REL_15

2023-10-16 Thread Anton A. Melnikov


On 13.10.2023 05:05, Andres Freund wrote:

Could you provide a bit more details about how you ran the benchmark?  The
reason I am asking is that ~330 TPS is pretty slow for -c20.  Even on spinning
rust and using the default settings, I get considerably higher results.

Oh - I do get results closer to yours if I use pgbench scale 1, causing a lot
of row level contention. What scale did you use?



I use default scale of 1.
And run the command sequence:
$pgbench -i bench
$sleep 1
$pgbench -c20 -T10 -j8
in a loop to get similar initial conditions for every "pgbench -c20 -T10 -j8" 
run.

Thanks for your interest!

With the best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: False "pg_serial": apparent wraparound” in logs

On Sat, Oct 14, 2023 at 07:29:54PM +, Imseih (AWS), Sami wrote:
>> Anyway, it looks like you're right, we don't really need the SLRU once
>> the tail is ahead of the tail because the SLRU has wrapped around due
>> to the effect of transactions aging out, so making the truncation a
>> bit smarter should be OK.
> 
> I assume you meant " the tail is ahead of the head".

Damn fingers on a keyboard who don't know how to type.

>> Hmm. This doesn't seem enough. Shouldn't we explain at least in
>> which scenarios the tail can get ahead of the head (aka at least
>> with long running transactions that make the SLRU wrap-around)?
>> Except if I am missing something, there is no explanation of that in
>> predicate.c.
> 
> After looking at this a bit more, I don't think the previous rev is correct.
> We should not fall through to the " The SLRU is no longer needed." Which
> also sets the headPage to invalid. We should only truncate up to the
> head page.

Seems correct to me.  Or this would count as if the SLRU is not in
use, but it's being used.
--
Michael


signature.asc
Description: PGP signature

Re: Add support for AT LOCAL

On Sun, Oct 15, 2023 at 11:05:10PM -0700, Noah Misch wrote:
> On Mon, Oct 16, 2023 at 01:54:23AM -0400, Tom Lane wrote:
>> Ugh.  So if the failure is robust enough to persist across
>> several major xlc versions, why couldn't Thomas reproduce it?
> 
> Beats me.  hornet wipes its starting environment down to OBJECT_MODE=32_64
> PERL5LIB=/home/nm/sw/cpan64/lib/perl5 SPECIES=xlc64 PATH=/usr/bin, then
> applies all the environment settings seen in buildfarm logs.

Perhaps that's a stupid question..  But a server running under this
environment fails the two following queries even for older branches,
right?
select timezone('UTC', '23:59:59.99-07'::timetz);
select timezone('UTC', '23:59:00-07'::timetz);
--
Michael


signature.asc
Description: PGP signature

Re: pg_logical_emit_message() misses a XLogFlush()

On Fri, Oct 13, 2023 at 03:20:30PM +0530, Amit Kapila wrote:
> I would prefer to associate the new parameter 'flush' with
> non-transactional messages as per the proposed patch.

Check.

> Is there a reason to make the functions strict now when they were not earlier?

These two are already STRICT on HEAD:
=# select proname, provolatile, proisstrict from pg_proc
 where proname ~ 'message';
 proname | provolatile | proisstrict
-+-+-
 pg_logical_emit_message | v   | t
 pg_logical_emit_message | v   | t
(2 rows)

> 2.
> +The flush parameter (default set to
> +false) controls if the message is immediately
> +flushed to WAL or not. flush has no effect
> +with transactional, as the message's WAL
> +record is flushed when its transaction is committed.
> 
> The last part of the message sounds a bit too specific (".. as the
> message's WAL record is flushed when its transaction is committed.")
> because sometimes the WAL could be flushed by walwriter even before
> the commit. Can we say something along the lines: ".. as the message's
> WAL record is flushed along with its transaction."?

Fine by me.

> 3.
> + /*
> + * Make sure that the message hits disk before leaving if not emitting a
> + * transactional message, if flush is requested.
> + */
> + if (!transactional && flush)
> 
> Two ifs in the above comment sound a bit odd but if we want to keep it
> like that then adding 'and' before the second if may slightly improve
> it.

Sure, I've improved this comment.

An updated version is attached.  How does it look?
--
Michael
From c74c97dd33958e6a4a3adc2b3f5bd5bfe786d256 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 16 Oct 2023 16:12:05 +0900
Subject: [PATCH v5] Add flush argument to pg_logical_emit_message()

The default is false, to not flush messages.  If set to true, the
message is flushed before returning back to the caller.

Note: Bump catalog version.
---
 src/include/catalog/pg_proc.dat   |  4 ++--
 src/include/replication/message.h |  3 ++-
 src/backend/catalog/system_functions.sql  | 20 +++
 .../replication/logical/logicalfuncs.c|  3 ++-
 src/backend/replication/logical/message.c | 13 ++--
 doc/src/sgml/func.sgml|  9 +++--
 contrib/test_decoding/expected/messages.out   |  5 +++--
 contrib/test_decoding/sql/messages.sql|  5 +++--
 8 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 72ea4aa8b8..c92d0631a0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11167,11 +11167,11 @@
   prosrc => 'pg_replication_slot_advance' },
 { oid => '3577', descr => 'emit a textual logical decoding message',
   proname => 'pg_logical_emit_message', provolatile => 'v', proparallel => 'u',
-  prorettype => 'pg_lsn', proargtypes => 'bool text text',
+  prorettype => 'pg_lsn', proargtypes => 'bool text text bool',
   prosrc => 'pg_logical_emit_message_text' },
 { oid => '3578', descr => 'emit a binary logical decoding message',
   proname => 'pg_logical_emit_message', provolatile => 'v', proparallel => 'u',
-  prorettype => 'pg_lsn', proargtypes => 'bool text bytea',
+  prorettype => 'pg_lsn', proargtypes => 'bool text bytea bool',
   prosrc => 'pg_logical_emit_message_bytea' },
 
 # event triggers
diff --git a/src/include/replication/message.h b/src/include/replication/message.h
index 6ce7f2038b..0f168d572c 100644
--- a/src/include/replication/message.h
+++ b/src/include/replication/message.h
@@ -30,7 +30,8 @@ typedef struct xl_logical_message
 #define SizeOfLogicalMessage	(offsetof(xl_logical_message, message))
 
 extern XLogRecPtr LogLogicalMessage(const char *prefix, const char *message,
-	size_t size, bool transactional);
+	size_t size, bool transactional,
+	bool flush);
 
 /* RMGR API */
 #define XLOG_LOGICAL_MESSAGE	0x00
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 07c0d89c4f..714f5da941 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -446,6 +446,26 @@ LANGUAGE INTERNAL
 VOLATILE ROWS 1000 COST 1000
 AS 'pg_logical_slot_peek_binary_changes';
 
+CREATE OR REPLACE FUNCTION pg_catalog.pg_logical_emit_message(
+transactional boolean,
+prefix text,
+message text,
+flush boolean DEFAULT false)
+RETURNS pg_lsn
+LANGUAGE INTERNAL
+VOLATILE STRICT
+AS 'pg_logical_emit_message_text';
+
+CREATE OR REPLACE FUNCTION pg_catalog.pg_logical_emit_message(
+transactional boolean,
+prefix text,
+message bytea,
+flush boolean DEFAULT false)
+RETURNS pg_lsn
+LANGUAGE INTERNAL
+VOLATILE STRICT
+AS 'pg_logical_emit_message_bytea';
+
 CREATE OR REPLACE FUNCTION pg_create_physical_replication_slot(
 IN slot_name name, IN immedi

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag