date:20230822

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

2023-08-22 Thread Andres Freund

Hi,

On 2023-08-07 19:15:41 -0700, Andres Freund wrote:
> As some of you might have seen when running CI, cirrus-ci is restricting how
> much CI cycles everyone can use for free (announcement at [1]). This takes
> effect September 1st.
>
> This obviously has consequences both for individual users of CI as well as
> cfbot.
>
> [...]

> Potential paths forward for individual CI:
>
> - migrate wholesale to another CI provider
>
> - split CI tasks across different CI providers, rely on github et al
>   displaying the CI status for different platforms
>
> - give up
>
>
> Potential paths forward for cfbot, in addition to the above:
>
> - Pay for compute / ask the various cloud providers to grant us compute
>   credits. At least some of the cloud providers can be used via cirrus-ci.
>
> - Host (some) CI runners ourselves. Particularly with macos and windows, that
>   could provide significant savings.

To make that possible, we need to make the compute resources for CI
configurable on a per-repository basis.  After experimenting with a bunch of
ways to do that, I got stuck on that for a while. But since today we have
sufficient macos runners for cfbot available, so... I think the approach I
finally settled on is decent, although not great. It's described in the "main"
commit message:
ci: Prepare to make compute resources for CI configurable

cirrus-ci will soon restrict the amount of free resources every user gets 
(as
have many other CI providers). For most users of CI that should not be an
issue. But e.g. for cfbot it will be an issue.

To allow configuring different resources on a per-repository basis, 
introduce
infrastructure for overriding the task execution environment. Unfortunately
this is not entirely trivial, as yaml anchors have to be defined before 
their
use, and cirrus-ci only allows injecting additional contents at the end of
.cirrus.yml.

To deal with that, move the definition of the CI tasks to
.cirrus.tasks.yml. The main .cirrus.yml is loaded first, then, if defined, 
the
file referenced by the REPO_CI_CONFIG_GIT_URL variable, will be added,
followed by the contents of .cirrus.tasks.yml. That allows
REPO_CI_CONFIG_GIT_URL to override the yaml anchors defined in .cirrus.yml.

Unfortunately git's default merge / rebase strategy does not handle copied
files, just renamed ones. To avoid painful rebasing over this change, this
commit just renames .cirrus.yml to .cirrus.tasks.yml, without adding a new
.cirrus.yml. That's done in the followup commit, which moves the relevant
portion of .cirrus.tasks.yml to .cirrus.yml.  Until that is done,
REPO_CI_CONFIG_GIT_URL does not fully work.

The subsequent commit adds documentation for how to configure custom compute
resources to src/tools/ci/README

Discussion: 
https://postgr.es/m/20230808021541.7lbzdefvma7qm...@awork3.anarazel.de
Backpatch: 15-, where CI support was added

I don't love moving most of the contents of .cirrus.yml into a new file, but I
don't see another way. I did implement it without that as well (see [1]), but
that ends up considerably harder to understand, and hardcodes what cfbot
needs.  Splitting the commit, as explained above, at least makes git rebase
fairly painless. FWIW, I did merge the changes into 15, with only reasonable
conflicts (due to new tasks, autoconf->meson).

A prerequisite commit converts "SanityCheck" and "CompilerWarnings" to use a
full VM instead of a container - that way providing custom compute resources
doesn't have to deal with containers in addition to VMs. It also looks like
the increased startup overhead is outweighed by the reduction in runtime
overhead.

I'm hoping to push this fairly soon, as I'll be on vacation the last week of
August. I'll be online intermittently though, if there are issues, I can react
(very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
appreciate a quick review or two.

Greetings,

Andres Freund

[1] 
https://github.com/anarazel/postgres/commit/b95fd302161b951f1dc14d586162ed3d85564bfc
>From d1fa394bf9318f08c1c529160c3e6cedc5bb5289 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 7 Aug 2023 17:31:15 -0700
Subject: [PATCH v3 01/10] ci: Don't specify amount of memory

The number of CPUs is the cost-determining factor. Most instance types that
run tests have more memory/core than what we specified, there's no real
benefit in wasting that.

Discussion: https://postgr.es/m/20230808021541.7lbzdefvma7qm...@awork3.anarazel.de
Backpatch: 15-, where CI support was added
---
 .cirrus.yml | 4 
 1 file changed, 4 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 727c434de40..9e84eb95be5 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -149,7 +149,6 @@ task:
 image: family/pg-ci-freebsd-13
 platform: freebsd
 cpu: $CPUS
-memory: 4G
 disk: 50

   sysinfo_script: |
@@ -291,7 +290,6 @@ task:
 image: family/pg-ci-bullseye
 platform:

RE: subscription/015_stream sometimes breaks

2023-08-22 Thread Zhijie Hou (Fujitsu)

> -Original Message-
> From: Zhijie Hou (Fujitsu) 
> Sent: Wednesday, August 23, 2023 10:27 AM
> To: Thomas Munro 
> Cc: Amit Kapila ; pgsql-hackers
> 
> Subject: RE: subscription/015_stream sometimes breaks
> 
> On Wednesday, August 23, 2023 4:55 AM Thomas Munro
>  wrote:
> >
> > On Wed, Aug 23, 2023 at 8:21 AM Thomas Munro
> 
> > wrote:
> > > I didn't study it closely but it looks like there might be a second
> > > deadlock, after the one that is expected by the test?  Examples from
> > > the past couple of weeks:
> >
> > I should add, it's not correlated with the patches that cfbot is
> > testing, and it's the most frequent failure for which that is the case.
> >
> > suite |name| distinct_patches | errors
> > --++--+
> >  subscription | 015_stream |   47 | 61
> >
> 
> Thanks for reporting !
> I am researching the failure and will share my analysis.

Hi,

After an off-list discussion with Amit, we figured out the reason.
From the crash log, I can see the apply worker crashed when accessing the
worker->proc, so I think it's because the work->proc has been released.

   577: /* Now terminate the worker ... */
>  578: kill(worker->proc->pid, signo);
   579: 
   580: /* ... and wait for it to die. */

Normally, this should not happen because we take a lock on LogicalRepWorkerLock
when shutting all the parallel workers[1] which can prevent concurrent worker
to free the worker info. But in logicalrep_worker_stop_internal(), when
stopping parallel worker #1, we will release the lock shortly. and at this
timing it's possible that another parallel worker #2 which reported an ERROR
will shutdown by itself and free the worker->proc. So when we try to stop that
parallel worker #2 in next round, we didn't realize it has been closed,
thus accessing invalid memory(worker->proc).

[1]--
LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);

workers = logicalrep_workers_find(MyLogicalRepWorker->subid, 
true);
foreach(lc, workers)
{
LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);

**  if (isParallelApplyWorker(w))
logicalrep_worker_stop_internal(w, SIGTERM);
}
--

The bug happens after commit 2a8b40e where isParallelApplyWorker() start to use
the worker->type to check but we forgot to reset the worker type at worker exit
time. So, even if the worker #2 has shutdown, the worker_type is still valid
and we try to stop it again.

Previously, the isParallelApplyWorker() used the worker->leader_pid which will
be reset when the worker exits, so the "if (isParallelApplyWorker(w))" won't 
pass
in this case and we don't try to stop the worker #2.

To fix it I think we need to reset the worker type at exit as well.
Attach the patch which does the same. I am also testing it locally
to see if there are other issues here.

Best Regards,
Hou Zhijie

0001-Reset-worker-type-at-exit-time.patch
Description: 0001-Reset-worker-type-at-exit-time.patch

Re: Synchronizing slots from primary to standby

2023-08-22 Thread shveta malik

On Thu, Aug 17, 2023 at 11:55 AM shveta malik  wrote:
>
> On Thu, Aug 17, 2023 at 11:44 AM Drouvot, Bertrand
>  wrote:
> >
> > Hi,
> >
> > On 8/14/23 11:52 AM, shveta malik wrote:
> >
> > >
> > > We (myself and Ajin) performed the tests to compute the lag in standby
> > > slots as compared to primary slots with different number of slot-sync
> > > workers configured.
> > >
> >
> > Thanks!
> >
> > > 3 DBs were created, each with 30 tables and each table having one
> > > logical-pub/sub configured. So this made a total of 90 logical
> > > replication slots to be synced. Then the workload was run for aprox 10
> > > mins. During this workload, at regular intervals, primary and standby
> > > slots' lsns were captured (from pg_replication_slots) and compared. At
> > > each capture, the intent was to know how much is each standby's slot
> > > lagging behind corresponding primary's slot by taking the distance
> > > between confirmed_flush_lsn of primary and standby slot. Then we took
> > > the average (integer value) of this distance over the span of 10 min
> > > workload
> >
> > Thanks for the explanations, make sense to me.
> >
> > > and this is what we got:
> > >
> > > With max_slot_sync_workers=1, average-lag =  42290.3563
> > > With max_slot_sync_workers=2, average-lag =  24585.1421
> > > With max_slot_sync_workers=3, average-lag =  14964.9215
> > >
> > > This shows that more workers have better chances to keep logical
> > > replication slots in sync for this case.
> > >
> >
> > Agree.
> >
> > > Another statistics if it interests you is, we ran a frequency test as
> > > well (this by changing code, unit test sort of) to figure out the
> > > 'total number of times synchronization done' with different number of
> > > sync-slots workers configured. Same 3 DBs setup with each DB having 30
> > > logical replication slots. With 'max_slot_sync_workers' set at 1, 2
> > > and 3; total number of times synchronization done was 15874, 20205 and
> > > 23414 respectively. Note: this is not on the same machine where we
> > > captured lsn-gap data, it is on  a little less efficient machine but
> > > gives almost the same picture
> > >
> > > Next we are planning to capture this data for a lesser number of slots
> > > like 10,30,50 etc. It may happen that the benefit of multi-workers
> > > over single workers in such cases could be less, but let's have the
> > > data to verify that.
> > >
> >
> > Thanks a lot for those numbers and for the testing!
> >
> > Do you think it would make sense to also get the number of using
> > the pg_failover_slots module? (and compare the pg_failover_slots numbers 
> > with the
> > "one worker" case here). Idea is to check if the patch does introduce
> > some overhead as compare to pg_failover_slots.
> >
>
> Yes, definitely. We will work on that and share the numbers soon.
>

Here are the numbers for pg_failover_extension. Thank You Ajin for
performing all the tests and providing the data offline.

--
pg_failover_slots extension:

40 slots:
default nap (60 sec):   12742133.96
10ms nap:   19984.34

90 slots:
default nap (60 sec):  10063342.72
10ms nap:   34483.82

--
slot-sync-workers  case (default 10ms nap for each test):
-
40 slots:
1 worker:20566.09
3 worker:7885.80

90 slots:
1 worker: 36706.84
3 worker: 10236.63

Observations:

1) Worker=1 case is slightly behind in our case as compared to
pg_failover_extension (for the same naptime of 10ms). This is due to
the support for multi-worker design where locks and dsm come into
play. I will review this case for optimization.
2) The multi-worker case seems way better in all tests.

Few points we observed while performing the tests on pg_failover_extension:

1) It has a naptime of 60sec which is on the higher side and thus we
see huge lag in slots being synchronized. Please see default-nap
readings above. The default data of extension is not comparable to our
default case. And thus for apple to apple comparisons, we changed
naptime to 10ms for pg_failover_extension.

2) It takes a lot of time while creating-slots. Every slot creation
needs workload to be run on primary i.e. if after say 4th slot
creation, there is no activity going on primary, it waits and does not
proceed to create rest of the slots and thus we had to make sure to
perform some activity on primary in parallel to each slot creation on
standby. This happens because after each slot-creation it checks if
'remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn' and
if so, it waits for primary to catch-up.  The restart_lsn for newly
created slot is set at XLOG-replay position and when standby is up to
date in terms of data (i.e. all xlog-streams are received and
replayed) and no activity is going on primary,  then the restart-lsn
on standby for a newly created

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-22 Thread Richard Guo

On Tue, Aug 22, 2023 at 10:39 PM Tom Lane  wrote:

> Richard Guo  writes:
> > I'm wondering if we can instead adjust the 'inner_req_outer' in
> > create_nestloop_path before we perform the check to work around this
> > issue for the back branches, something like
> > +   /*
> > +* Adjust the parameterization information, which refers to the
> topmost
> > +* parent.
> > +*/
> > +   inner_req_outer =
> > +   adjust_child_relids_multilevel(root, inner_req_outer,
> > +  outer_path->parent->relids,
> > +
> outer_path->parent->top_parent_relids);
>
> Mmm ... at the very least you'd need to not do that when top_parent_relids
> is unset.


Hmm, adjust_child_relids_multilevel would just return the given relids
unchanged when top_parent_relids is unset, so I think it should be safe
to call it even for non-other rel.


> ...  But I think the risk/reward ratio for messing with this in the
> back branches is unattractive in any case: to fix a corner case that
> apparently nobody uses in the field, we risk breaking any number of
> mainstream parameterized-path cases.  I'm content to commit the v5 patch
> (or a successor) into HEAD, but at this point I'm not sure I even want
> to risk it in v16, let alone perform delicate surgery to get it to work
> in older branches.  I think we ought to go with the "tablesample scans
> can't be reparameterized" approach in v16 and before.


If we go with the "tablesample scans can't be reparameterized" approach
in the back branches, I'm a little concerned that what if we find more
cases in the futrue where we need modify RTEs for reparameterization.
So I spent some time seeking and have managed to find one: there might
be lateral references in a scan path's restriction clauses, and
currently reparameterize_path_by_child fails to adjust them.

regression=# explain (verbose, costs off)
select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
  QUERY PLAN
--
 Aggregate
   Output: count(*)
   ->  Append
 ->  Nested Loop
   ->  Seq Scan on public.prt1_p1 t1_1
 Output: t1_1.a, t1_1.b
   ->  Index Scan using iprt2_p1_b on public.prt2_p1 t2_1
 Output: t2_1.b
 Index Cond: (t2_1.b = t1_1.a)
 Filter: (t1.b = t2_1.a)
 ->  Nested Loop
   ->  Seq Scan on public.prt1_p2 t1_2
 Output: t1_2.a, t1_2.b
   ->  Index Scan using iprt2_p2_b on public.prt2_p2 t2_2
 Output: t2_2.b
 Index Cond: (t2_2.b = t1_2.a)
 Filter: (t1.b = t2_2.a)
 ->  Nested Loop
   ->  Seq Scan on public.prt1_p3 t1_3
 Output: t1_3.a, t1_3.b
   ->  Index Scan using iprt2_p3_b on public.prt2_p3 t2_3
 Output: t2_3.b
 Index Cond: (t2_3.b = t1_3.a)
 Filter: (t1.b = t2_3.a)
(24 rows)

The clauses in 'Filter' are not adjusted correctly.  Var 't1.b' still
refers to the top parent.

For this query it would just give wrong results.

regression=# set enable_partitionwise_join to off;
SET
regression=# select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
 count
---
   100
(1 row)

regression=# set enable_partitionwise_join to on;
SET
regression=# select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
 count
---
 5
(1 row)


For another query it would error out during planning.

regression=# explain (verbose, costs off)
select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.b;
ERROR:  variable not found in subplan target list

To fix it we may need to modify RelOptInfos for Path, BitmapHeapPath,
ForeignPath and CustomPath, and modify IndexOptInfos for IndexPath.  It
seems that that is not easily done without postponing reparameterization
of paths until createplan.c.

Attached is a patch which is v5 + fix for this new issue.

Thanks
Richard


v7-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch
Description: Binary data

Re: persist logical slots to disk during shutdown checkpoint

2023-08-22 Thread vignesh C

On Tue, 22 Aug 2023 at 15:42, Amit Kapila  wrote:
>
> On Tue, Aug 22, 2023 at 2:56 PM Ashutosh Bapat
>  wrote:
> >
> > On Tue, Aug 22, 2023 at 9:48 AM Amit Kapila  wrote:
> > > >
> > > > Another idea is to record the confirm_flush_lsn at the time of
> > > > persisting the slot. We can use it in two different ways 1. to mark a
> > > > slot dirty and persist if the last confirm_flush_lsn when slot was
> > > > persisted was too far from the current confirm_flush_lsn of the slot.
> > > > 2. at shutdown checkpoint, persist all the slots which have these two
> > > > confirm_flush_lsns different.
> > > >
> > >
> > > I think using it in the second (2) way sounds advantageous as compared
> > > to storing another dirty flag because this requires us to update
> > > last_persisted_confirm_flush_lsn only while writing the slot info.
> > > OTOH, having a flag dirty_for_shutdown_checkpoint will require us to
> > > update it each time we update confirm_flush_lsn under spinlock at
> > > multiple places. But, I don't see the need of doing what you proposed
> > > in (1) as the use case for it is very minor, basically this may
> > > sometimes help us to avoid decoding after crash recovery.
> >
> > Once we have last_persisted_confirm_flush_lsn, (1) is just an
> > optimization on top of that. With that we take the opportunity to
> > persist confirmed_flush_lsn which is much farther than the current
> > persisted value and thus improving chances of updating restart_lsn and
> > catalog_xmin faster after a WAL sender restart. We need to keep that
> > in mind when implementing (2). The problem is if we don't implement
> > (1) right now, we might just forget to do that small incremental
> > change in future. My preference is 1. Do both (1) and (2) together 2.
> > Do (2) first and then (1) as a separate commit. 3. Just implement (2)
> > if we don't have time at all for first two options.
> >
>
> I prefer one of (2) or (3). Anyway, it is better to do that
> optimization (persist confirm_flush_lsn at a regular interval) as a
> separate patch as we need to test and prove its value separately.

Here is a patch to persist to disk logical slots during a shutdown
checkpoint if the updated confirmed_flush_lsn has not yet been
persisted.

Regards,
Vignesh
From f99c7da6f1d65a1324dce4e280040dcb29a912db Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v3] Persist to disk logical slots during a shutdown checkpoint
 if the updated confirmed_flush_lsn has not yet been persisted.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.

Author: Julien Rouhaud
Reviewed-by: Wang Wei, Peter Smith, Masahiko Sawada
---
 contrib/test_decoding/meson.build |  1 +
 contrib/test_decoding/t/002_always_persist.pl | 74 +++
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/replication/slot.c| 33 ++---
 src/include/replication/slot.h|  5 +-
 5 files changed, 103 insertions(+), 12 deletions(-)
 create mode 100644 contrib/test_decoding/t/002_always_persist.pl

diff --git a/contrib/test_decoding/meson.build b/contrib/test_decoding/meson.build
index 7b05cc25a3..12afb9ea8c 100644
--- a/contrib/test_decoding/meson.build
+++ b/contrib/test_decoding/meson.build
@@ -72,6 +72,7 @@ tests += {
   'tap': {
 'tests': [
   't/001_repl_stats.pl',
+  't/002_always_persist.pl',
 ],
   },
 }
diff --git a/contrib/test_decoding/t/002_always_persist.pl b/contrib/test_decoding/t/002_always_persist.pl
new file mode 100644
index 00..47bf64bb09
--- /dev/null
+++ b/contrib/test_decoding/t/002_always_persist.pl
@@ -0,0 +1,74 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persisted to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Test set-up
+my $node = PostgreSQL::Test::Cluster->new('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+});
+
+$node->start;
+
+# Create table
+$node->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
+
+# Create replication slot
+$node->safe_psql('postgres',
+	"SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding');"
+);
+
+# Insert some data
+$node->safe_psql('postgres',
+	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");
+
+# Consume WAL records
+$node->safe_psql('postgres',
+"SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL);"
+);
+
+# Stop the node to cause a shut

RE: persist logical slots to disk during shutdown checkpoint

2023-08-22 Thread Hayato Kuroda (Fujitsu)

Dear hackers,

Thanks for forking the thread! I think we would choose another design, but I 
wanted
to post the updated version once with the current approach. All comments came
from the parent thread [1].

> 1. GENERAL -- git apply
>
> The patch fails to apply cleanly. There are whitespace warnings.
>
> [postgres(at)CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch
> ../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch:102:
> trailing whitespace.
> # SHUTDOWN_CHECKPOINT record.
> warning: 1 line adds whitespace errors.

There was an extra blank, removed.

> 2. GENERAL -- which patch is the real one and which is the copy?
>
> IMO this patch has become muddled.
>
> Amit recently created a new thread [1] "persist logical slots to disk
> during shutdown checkpoint", which I thought was dedicated to the
> discussion/implementation of this 0001 patch. Therefore, I expected any
> 0001 patch changes to would be made only in that new thread from now on,
> (and maybe you would mirror them here in this thread).
>
> But now I see there are v23-0001 patch changes here again. So, now the same
> patch is in 2 places and they are different. It is no longer clear to me
> which 0001 ("Always persist...") patch is the definitive one, and which one
> is the copy.

Attached one in another thread is just copy to make cfbot happy, it could be
ignored.

> contrib/test_decoding/t/002_always_persist.pl
>
> 3.
> +
> +# Copyright (c) 2023, PostgreSQL Global Development Group
> +
> +# Test logical replication slots are always persist to disk during a
> shutdown
> +# checkpoint.
> +
> +use strict;
> +use warnings;
> +
> +use PostgreSQL::Test::Cluster;
> +use PostgreSQL::Test::Utils;
> +use Test::More;
>
> /always persist/always persisted/

Fixed.

> 4.
> +
> +# Test set-up
> my $node = PostgreSQL::Test::Cluster->new('test');
> $node->init(allows_streaming => 'logical');
> $node->append_conf('postgresql.conf', q{
> autovacuum = off
> checkpoint_timeout = 1h
> });
>
> $node->start;
>
> # Create table
> $node->safe_psql('postgres', "CREATE TABLE test (id int)");
>
> Maybe it is better to call the table something different instead of the
> same name as the cluster. e.g. 'test_tbl' would be better.

Changed to 'test_tbl'.

> 5.
> +# Shutdown the node once to do shutdown checkpoint
> $node->stop();
>
> SUGGESTION
> # Stop the node to cause a shutdown checkpoint

Fixed.

> 6.
> +# Fetch checkPoint from the control file itself
> my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
> my @control_data = split("\n", $stdout);
> my $latest_checkpoint = undef;
> foreach (@control_data)
> {
>  if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
>  {
>  $latest_checkpoint = $1;
>  last;
>  }
> }
> die "No checkPoint in control file found\n"
>   unless defined($latest_checkpoint);
>
> 6a.
> /checkPoint/checkpoint/  (2x)
>
> 6b.
> +die "No checkPoint in control file found\n"
>
> SUGGESTION
> "No checkpoint found in control file\n"

Hmm, these notations were followed the test recovery/t/016_min_consistency.pl,
it uses the word "minRecoveryPoint". So I preferred current one.

[1]: 
https://www.postgresql.org/message-id/CAHut%2BPtb%3DZYTM_awoLy3sJ5m9Oxe%3DJYn6Gve5rSW9cUdThpsVA%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v2-0001-Always-persist-to-disk-logical-slots-during-a-shu.patch
Description:  v2-0001-Always-persist-to-disk-logical-slots-during-a-shu.patch

Re: Direct I/O

2023-08-22 Thread Thomas Munro

On Wed, Aug 23, 2023 at 12:15 AM Peter Eisentraut  wrote:
> I suggest to also rename the hook functions (check and assign), like in
> the attached patch.  Mainly because utils/guc_hooks.h says to order the
> functions by GUC variable name, which was already wrong under the old
> name, but it would be pretty confusing to sort the functions by their
> GUC name that doesn't match the function names.

OK.  I'll push this tomorrow unless you do it while I'm asleep.  Thanks!

Re: Extending SMgrRelation lifetimes

2023-08-22 Thread Thomas Munro

On Fri, Aug 18, 2023 at 2:30 AM Robert Haas  wrote:
> I think this direction makes a lot of sense. The lack of a defined
> lifetime for SMgrRelation objects makes correct programming difficult,
> makes efficient programming difficult, and doesn't really have any
> advantages.

Thanks for looking!

> I know this is just a WIP patch but for the final version
> I think it would make sense to try to do a bit more work on the
> comments. For instance:
>
> - In src/include/utils/rel.h, instead of just deleting that comment,
> how about documenting the new object lifetime? Or maybe that comment
> belongs elsewhere, but I think it would definitely good to spell it
> out very explicitly at some suitable place.

Right, let's one find one good place.  I think smgropen() would be best.

> - When we change smgrcloseall() to smgrdestroyall(), maybe it's worth
> spelling out why destroying is needed and not just closing. For
> example, the second hunk in bgwriter.c includes a comment that says
> "After any checkpoint, close all smgr files. This is so we won't hang
> onto smgr references to deleted files indefinitely." But maybe it
> should say something like "After any checkpoint, close all smgr files
> and destroy the associated smgr objects. This guarantees that we close
> the actual file descriptors, that we close the File objects as managed
> by fd.c, and that we also destroy the smgr objects. We don't want any
> of these resources to stick around indefinitely after a relation file
> has been deleted."

There are several similar comments.  I think they can be divided into
two categories:

1.  The error-path ones, that we should now just delete along with the
code they describe, because the "various strange errors" should have
been fixed comprehensively by PROCSIGNAL_BARRIER_SMGRRELEASE.  Here is
a separate patch to do that.

2.  The per-checkpoint ones that still make sense to avoid unbounded
resource usage.  Here is a new attempt at explaining:

/*
-* After any checkpoint, close all smgr files.
This is so we
-* won't hang onto smgr references to deleted
files indefinitely.
+* After any checkpoint, free all smgr
objects.  Otherwise we
+* would never do so for dropped relations, as
the checkpointer
+* does not process shared invalidation messages or call
+* AtEOXact_SMgr().
 */
-   smgrcloseall();
+   smgrdestroyall();

> - Maybe it's worth adding comments around some of the smgrclose[all]
> calls to mentioning that in those cases we want the file descriptors
> (and File objects?) to get closed but don't want to invalidate
> pointers. But I'm less sure that this is necessary. I don't want to
> have a million comments everywhere, just enough that someone looking
> at this stuff in the future can orient themselves about what's going
> on without too much difficulty.

I covered that with the following comment for smgrclose():

+ * The object remains valid, but is moved to the unknown list where it will
+ * be destroyed by AtEOXact_SMgr().  It may be re-owned if it is accessed by a
+ * relation before then.
From fa3beb19e6fafccb0d75d2e3e60d75412757b326 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 23 Aug 2023 14:38:38 +1200
Subject: [PATCH v3 1/2] Remove some obsolete smgrcloseall() calls.

Before the advent of PROCSIGNAL_BARRIER_SMGRRELEASE, we didn't have a
comprehensive way to deal with Windows file handles that get in the way
of unlinking directories.  We had smgrcloseall() calls in a few places
to try to mitigate.

It's still a good idea for bgwriter and checkpointer to do that once per
checkpoint so they don't accumulate unbounded SMgrRelation objects, but
there is no longer any reason to close them at other random places such
as the error path, and the explanation as given in the comments is now
obsolete.

Discussion: https://postgr.es/m/message-id/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA%40mail.gmail.com
---
 src/backend/postmaster/bgwriter.c | 7 ---
 src/backend/postmaster/checkpointer.c | 7 ---
 src/backend/postmaster/walwriter.c| 7 ---
 3 files changed, 21 deletions(-)

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index caad642ec9..093cd034ea 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -197,13 +197,6 @@ BackgroundWriterMain(void)
 		 */
 		pg_usleep(100L);
 
-		/*
-		 * Close all open files after any error.  This is helpful on Windows,
-		 * where holding deleted files open causes various strange errors.
-		 * It's not clear we need it elsewhere, but shouldn't hurt.
-		 */
-		smgrcloseall();
-
 		/* Report wait end here, when there is no further possibility of wait */
 		pgstat_report_wait_end();
 	}
diff --git a/src/backend/postmaster/checkpointer.c b/sr

Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns

2023-08-22 Thread John Naylor

On Mon, Aug 21, 2023 at 1:33 PM Gurjeet Singh  wrote:
>
> Please see attached the proposed patch, which attempts to make that
> language newcomer-friendly. The patch adds one link for TOAST, and
> replaces Postgres-specific terms with generic ones.

This is off-topic for this thread (which has a CF entry), and overall I
don't find the changes to be an improvement. (It wouldn't hurt to link to
the TOAST section, though.)

--
John Naylor
EDB: http://www.enterprisedb.com

Re: subscription/015_stream sometimes breaks

2023-08-22 Thread vignesh C

On Wed, 23 Aug 2023 at 02:25, Thomas Munro  wrote:
>
> On Wed, Aug 23, 2023 at 8:21 AM Thomas Munro  wrote:
> > I didn't study it closely but it looks like there might be a second
> > deadlock, after the one that is expected by the test?  Examples from
> > the past couple of weeks:
>
> I should add, it's not correlated with the patches that cfbot is
> testing, and it's the most frequent failure for which that is the
> case.
>
> suite |name| distinct_patches | errors
> --++--+
>  subscription | 015_stream |   47 | 61

I had noticed that it is failing because of a segmentation fault:
2023-08-22 19:07:22.403 UTC [3823023][logical replication parallel
worker][4/44:767] FATAL:  terminating logical replication worker due
to administrator command
2023-08-22 19:07:22.403 UTC [3823023][logical replication parallel
worker][4/44:767] CONTEXT:  processing remote data for replication
origin "pg_16397" during message type "STREAM STOP" in transaction 748
2023-08-22 19:07:22.404 UTC [3819892][postmaster][:0] DEBUG:
unregistering background worker "logical replication parallel apply
worker for subscription 16397"
2023-08-22 19:07:22.404 UTC [3819892][postmaster][:0] LOG:  background
worker "logical replication parallel worker" (PID 3823455) exited with
exit code 1
2023-08-22 19:07:22.404 UTC [3819892][postmaster][:0] DEBUG:
unregistering background worker "logical replication parallel apply
worker for subscription 16397"
2023-08-22 19:07:22.404 UTC [3819892][postmaster][:0] LOG:  background
worker "logical replication parallel worker" (PID 3823023) exited with
exit code 1
2023-08-22 19:07:22.419 UTC [3819892][postmaster][:0] LOG:  background
worker "logical replication apply worker" (PID 3822876) was terminated
by signal 11: Segmentation fault

The stack trace for the same generated at [1] is:
Core was generated by `postgres: subscriber: logical replication apply
worker for subscription 16397 '.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/3822876' in core file too small.
#0  0x007b461e in logicalrep_worker_stop_internal
(worker=, signo=) at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/launcher.c:583
583 kill(worker->proc->pid, signo);
#0  0x007b461e in logicalrep_worker_stop_internal
(worker=, signo=) at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/launcher.c:583
#1  0x007b565a in logicalrep_worker_detach () at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/launcher.c:774
#2  0x007b49ff in logicalrep_worker_onexit (code=, arg=) at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/launcher.c:829
#3  0x008034c5 in shmem_exit (code=) at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/ipc.c:239
#4  0x008033dc in proc_exit_prepare (code=1) at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/ipc.c:194
#5  0x0080333d in proc_exit (code=1) at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/ipc.c:107
#6  0x00797068 in StartBackgroundWorker () at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/postmaster/bgworker.c:827
#7  0x0079f257 in do_start_bgworker (rw=0x284e750) at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:5734
#8  0x0079b541 in maybe_start_bgworkers () at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:5958
#9  0x0079cb51 in process_pm_pmsignal () at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:5121
#10 0x0079b6bb in ServerLoop () at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1769
#11 0x0079aaa5 in PostmasterMain (argc=4, argv=) at 
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1462
#12 0x006d82a0 in main (argc=4, argv=0x27e3fd0) at
/home/bf/bf-build/dragonet/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:198
$1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad =
{64, 0 }, _kill = {si_pid = 64, si_uid = 0}, _timer
= {si_tid = 64, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr
= 0x0}}, _rt = {si_pid = 64, si_uid = 0, si_sigval = {sival_int = 0,
sival_ptr = 0x0}}, _sigchld = {si_pid = 64, si_uid = 0, si_status = 0,
si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x40, _addr_lsb =
0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band =
64, si_fd = 0}, _sigsys = {_call_addr = 0x40, _syscall = 0, _arch =
0}}}

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dragonet&dt=2023-08-22%2018%3A56%3A04&stg=subscription-check

Regards,
Vignesh

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-22 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thanks for giving comments! New version can be available in [1].

>
1.
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 
Was this #include needed here? I noticed you've already included the same in 
the "pg_upgrade.h".
>

It was needed because the macro LSN_FORMAT_ARGS() was used in the file.
I preferred all the needed file are included even if it has already been done 
in header, so 
#include was written here.

>
2. check_for_lost_slots

+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

I can't quite describe my doubts about this, but something seems a bit strange. 
Didn't we already iterate every single slot in all DBs in the earlier function 
get_logical_slot_infos_per_db()? There we were only looking for wal_status <> 
'lost', but we could have got *every* wal_status and also detected these 'lost' 
ones at the same time up-front, instead of having this extra function with more 
SQL to do pretty much the same SELECT.

Perhaps coding the current way there is a clear separation of the fetching code 
and the checking code, and that might be the best approach, but it somehow 
seems a shame/waste to be executing almost the same slots data with the same 
SQL 2x, so I wondered if there is a better way to arrange this.
 >

Hmm, but you did not like to do additional checks in the 
get_logical_slot_infos(),
right? They cannot go together. In case of check_new_cluster(), information for
relations is extracted in get_db_and_rel_infos() and then checked whether it is
empty or not in check_new_cluster_is_empty(). The phase is also separated.

>
src/bin/pg_upgrade/info.c

3. get_logical_slot_infos

+
+ /* Do additional checks if slots are found */
+ if (slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

Aren't these checks only intended for checking the 'old_cluster'? But AFAICT 
they are not guarded here so they will be executed by both sides. Previously 
(in my review of v22-0003) I suggested these calls maybe belonged in the 
calling function check_and_dump_old_cluster(). I think that.
>

Moved to check_and_dump_old_cluster().

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866DD3348B5224E0A1BFC3EF51CA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-22 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thanks for giving comments! PSA the new version.

>
==
1. GENERAL

Please try to run a spell/grammar check on all the text like commit message and 
docs changes before posting (e.g. cut/paste the rendered text into some tool 
like MSWord or Grammarly or ChatGPT or whatever tool you like and cross-check). 
There are lots of small typos etc but one up-front check could avoid long 
cycles of reviewing/reporting/fixing/re-posting/confirming...
>

I checked all of sentences for Grammarly. Sorry for poor English.

>
==
Commit message

2.
Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to ths restriction, the timing of restoring replication slots is
different from other objects.

~

/ths/this/
>

Fixed.

>
doc/src/sgml/ref/pgupgrade.sgml

3.
+
+ Before you start upgrading the publisher cluster, ensure that the
+ subscription is temporarily disabled, by executing
+ ALTER SUBSCRIPTION ... 
DISABLE.
+ After the upgrade is complete, then re-enable the subscription. Note that
+ if the new cluser uses different port number from old one,
+ ALTER SUBSCRIPTION ... 
CONNECTION
+ command must be also executed on subscriber.
+

3a.
BEFORE
After the upgrade is complete, then re-enable the subscription.

SUGGESTION
Re-enable the subscription after the upgrade.
>

Fixed.


>
3b.
/cluser/cluster/

~

3c.
Note that
+ if the new cluser uses different port number from old one,
+ ALTER SUBSCRIPTION ... 
CONNECTION
+ command must be also executed on subscriber.

SUGGESTION
Note that if the new cluster uses a different port number ALTER SUBSCRIPTION 
... CONNECTION command must be also executed on the subscriber.
>

The part was removed.

>
4.
+ 
+  
+   confirmed_flush_lsn (see )
+   of all slots on old cluster must be same as latest checkpoint location.
+  
+ 

4a.
/on old cluster/on the old cluster/
>

Fixed.

>
4b.
/as latest/as the latest/
>
Fixed.

>
5.
+ 
+  
+   The output plugins referenced by the slots on the old cluster must be
+   installed on the new PostgreSQL executable directory.
+  
+ 

/installed on/installed in/ ??
>

"installed in" is better, fixed.

>
6.
+ 
+  
+   The new cluster must have
+   max_replication_slots
+   configured to value larger than the existing slots on the old cluster.
+  
+ 

BEFORE
...to value larger than the existing slots on the old cluster.

SUGGESTION
...to a value greater than or equal to the number of slots present on the old 
cluster.
>

Fixed.

>
src/bin/pg_upgrade/check.c  

7. GENERAL - check_for_logical_replication_slots

AFAICT this function is called *only* for the new_cluster, yet there is no 
Assert and no checking inside this function to ensure that is the case or not.  
It seems strange that the *cluster is passed as an argument but then the whole 
function body and messages assume it can only be a new cluster anyway.

IMO it would be better to rename this function to something like 
check_new_cluster_logical_replication_slots() and DO NOT pass any parameter but 
just use the global new_cluster within the function body.
>

Hmm, I followed other functions, e.g., check_for_composite_data_type_usage() is
called only for old one but it has an argument *cluster. What is the difference
between them? Moreover, how about check_for_lost_slots() and
check_for_confirmed_flush_lsn()? Fixed for the moment.

>
8. check_for_logical_replication_slots

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+ return;

Start comment with uppercase for consistency.
>

The part was removed.

>
9. check_for_logical_replication_slots

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+  "FROM pg_catalog.pg_replication_slots "
+  "WHERE slot_type = 'logical' AND "
+  "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but found 
\"%s\"",
+ PQgetvalue(res, 0, 0));

/replication slot/replication slots/
>

Fixed.

>
10. check_for_logical_replication_slots

+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

SUGGESTION
Do additional checks when there are logical replication slots on the old 
cluster.
>

Per suggestion from Amit, the part was removed.

>
11.
+ if (nslots > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than or equal to existing 
logical "
+ "replication slots on old cluster.");

11a.
SUGGESTION
max_replication_slots (%d) must be greater than or equal to the number of 
logical replication slots (%d) on the old cluster.

11b.
I think it would be helpful for the current values to be displayed in the fatal 
message so the user will know more about what value to set. Notice that my 
above sugges

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-22 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

Thanks for giving comment. New version will be available in the upcoming post.

> + res = executeQueryOrDie(conn, "SELECT slot_name "
> +   "FROM pg_catalog.pg_replication_slots "
> +   "WHERE slot_type = 'logical' AND "
> +   "temporary IS FALSE;");
> +
> + if (PQntuples(res))
> + pg_fatal("New cluster must not have logical replication slot, but
> found \"%s\"",
> + PQgetvalue(res, 0, 0));
> +
> + PQclear(res);
> +
> + nslots = count_logical_slots(&old_cluster);
> +
> + /*
> + * Do additional checks when the logical replication slots have on the old
> + * cluster.
> + */
> + if (nslots)
> 
> Shouldn't these checks be reversed? I mean it would be better to test
> the presence of slots on the new cluster if there is any slot present
> on the old cluster.

Hmm, I think the later part is meaningful only when the old cluster has logical
slots. To sum up, any checking should be done when the
count_logical_slots(&old_cluster) > 0, right? Fixed like that.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-22 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

Thanks for the comment! Next version will be available in upcoming post.

> > > + /* logical replication slots can be migrated since PG17. */
> > > + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
> > > + return;
> > >
> > > IMO the code matches the comment better if you say < 1700 instead of <=
> 1600.
> >
> > Changed.
> >
> 
> I think it is better to be consistent with the existing code. There
> are a few other checks in pg_upgrade.c that uses <=, so it is better
> to use it in the same way here.

OK, reverted.

> Another minor comment:
> Note that
> + if the new cluser uses different port number from old one,
> + ALTER
> SUBSCRIPTION ... CONNECTION
> + command must be also executed on subscriber.
> 
> I think this is true in general as well and not specific to
> pg_upgrade. So, we can avoid adding anything about connection change
> here.

Removed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-22 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

> Here are some review comments for v23-0001

Thanks for the comment! But I did not update 0001 patch in this thread.
It will be managed in the forked one...

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-22 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thanks for giving comments! New version will be available
in the upcoming post.

>
1. check_for_lost_slots

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

1a
Maybe the comment should start uppercase for consistency with others.
>

Seems right, but I revisit check_and_dump_old_cluster() again and found that
some version-specific checks are done outside the checking function.
So I started to follow the style and then the part is moved to
check_and_dump_old_cluster(). Also, version checking for new cluster is also
moved to check_new_cluster(). Is it OK for you?

>
1b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with the 
comment.
>

Per suggestion from Amit, I used < 1700. Some other changes in 0002 were 
reverted.

>
2. check_for_lost_slots
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+   PQgetvalue(res, i, i_slotname));
+ }
+
+

The braces {} are not needed anymore
>

Fixed. 

>
3. check_for_confirmed_flush_lsn

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

3a.
Maybe the comment should start uppercase for consistency with others.
>

Per reply for comment 1, the part was no longer needed.

>
3b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with the 
comment.
>

Per suggestion from Amit, I used < 1700.

>
4. check_for_confirmed_flush_lsn
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+ PQgetvalue(res, i, i_slotname));
+ }
+

The braces {} are not needed anymore
>

Fixed.

>
5. get_control_data
+ /*
+ * Gather latest checkpoint location if the cluster is newer or
+ * equal to 17. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 17)

5a.
/newer or equal to 17/PG17 or later/
>

Fixed.

>
5b.
>= 17 should be >= 1700
>

Per suggestion from Amit, I used < 1700.

>
6. get_control_data
+ {
+ char *slash = NULL;
+ uint64 upper_lsn, lower_lsn;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ /*
+ * Upper and lower part of LSN must be read separately
+ * because it is reported as %X/%X format.
+ */
+ upper_lsn = strtoul(p, &slash, 16);
+ lower_lsn = strtoul(++slash, NULL, 16);
+
+ /* And combine them */
+ cluster->controldata.chkpnt_latest =
+ (upper_lsn << 32) | lower_lsn;
+ }

Should 'upper_lsn' and 'lower_lsn' be declared as uint32? That seems a better 
mirror for LSN_FORMAT_ARGS.
>

Changed the definition to uint32, and a cast was added.

>
7. get_logical_slot_infos
+
+ /*
+ * Do additional checks if slots are found on the old node. If something is
+ * found on the new node, a subsequent function
+ * check_new_cluster_is_empty() would report the name of slots and raise a
+ * fatal error.
+ */
+ if (cluster == &old_cluster && slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

It somehow doesn't feel right for these extra checks to be jammed into this 
function, just because you conveniently have the slot_count available.

On the NEW cluster side, there was extra checking in the check_new_cluster() 
function.

For consistency, I think this OLD cluster checking should be done in the 
check_and_dump_old_cluster() function -- see the "Check for various failure 
cases" comment -- IMO this new fragment belongs there with the other checks.
>

All the checks were moved to check_and_dump_old_cluster(), and adds a check for 
its major version.

>
8.
  bool date_is_int;
  bool float8_pass_by_value;
  uint32 data_checksum_version;
+
+ XLogRecPtr chkpnt_latest;
 } ControlData;

I don't think the new field is particularly different from all the others that 
it needs a blank line separator.
 >

I removed the blank. Actually I wondered where the attribute should be, but 
kept at last.

>
9.
 # Initialize old node
 my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
 $old_publisher->init(allows_streaming => 'logical');
-$old_publisher->start;
 
 # Initialize new node
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'replica');
 
-my $bindir = $new_publisher->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
 
-$old_publisher->stop;
+my $bindir = $new_publisher->config_data('--bindir');

~

Are those removal of the old_publisher start/stop changes that actually should 
be done in the 0002 patch?
>

Yes, It should be removed from 0002.

>
10.
 $old_publisher->safe_psql(
  'postgres', qq[
  SELECT pg_create_

RE: subscription/015_stream sometimes breaks

2023-08-22 Thread Zhijie Hou (Fujitsu)

On Wednesday, August 23, 2023 4:55 AM Thomas Munro  
wrote:
> 
> On Wed, Aug 23, 2023 at 8:21 AM Thomas Munro
>  wrote:
> > I didn't study it closely but it looks like there might be a second
> > deadlock, after the one that is expected by the test?  Examples from
> > the past couple of weeks:
> 
> I should add, it's not correlated with the patches that cfbot is testing, and 
> it's
> the most frequent failure for which that is the case.
> 
> suite |name| distinct_patches | errors
> --++--+
>  subscription | 015_stream |   47 | 61
> 

Thanks for reporting !
I am researching the failure and will share my analysis.

Best Regards,
Hou zj

Re: Using defines for protocol characters

2023-08-22 Thread Nathan Bossart

On Thu, Aug 17, 2023 at 09:52:03AM -0700, Nathan Bossart wrote:
> On Thu, Aug 17, 2023 at 12:22:24PM -0400, Robert Haas wrote:
>> I think this is going to be a major improvement in terms of readability!
>> 
>> I'm pretty happy with this version, too. We may be running out of
>> things to argue about -- and no, I don't want to argue about
>> underscores more!
> 
> Awesome.  If we are indeed out of things to argue about, I'll plan on
> committing this sometime next week.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: PG 16 draft release notes ready

2023-08-22 Thread Bruce Momjian

On Tue, Aug 22, 2023 at 01:42:41PM -0700, Jeff Davis wrote:
> On Thu, 2023-05-18 at 16:49 -0400, Bruce Momjian wrote:
> > I have completed the first draft of the PG 16 release notes.  You can
> > see the output here:
> 
> 
> https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-LOCALIZATION
> 
> I notice that this item is still listed:
> 
>  * Determine the ICU default locale from the environment (Jeff Davis)
> 
> But that was reverted as part of 2535c74b1a.

The original commit is:

Author: Jeff Davis 
2023-03-10 [c45dc7ffb] initdb: derive encoding from locale for ICU; 
similar to

and I don't see that reverted by 2535c74b1a.  Is that a problem?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Support run-time partition pruning for hash join

2023-08-22 Thread Andy Fan

>
> > fwiw, the current master totally ignores the cost reduction for run-time
> > partition prune, even for init partition prune.  So in some real cases,
> > pg chooses a hash join just because the cost of nest loop join is
> > highly over estimated.
>
> This is true about the existing code. It's a very tricky thing to cost
> given that the parameter values are always unknown to the planner.
> The best we have for these today is the various hardcoded constants in
> selfuncs.h. While I do agree that it's not great that the costing code
> knows nothing about run-time pruning, I also think that run-time
> pruning during execution with parameterised nested loops is much more
> likely to be able to prune partitions and save actual work than the
> equivalent with Hash Joins.  It's more common for the planner to
> choose to Nested Loop when there are fewer outer rows, so the pruning
> code is likely to be called fewer times with Nested Loop than with
> Hash Join.
>

Yes, I agree with this.  In my 4 years of PostgresSQL,  I just run into
2 cases of this issue and 1 of them is joining 12+ tables with run-time
partition prune for every join.  But this situation causes more issues than
generating a wrong plan, like for a simple SELECT * FROM p WHERE
partkey = $1;  generic plan will never win so we have to pay the expensive
planning cost for partitioned table.

If we don't require very accurate costing for every case,  like we only
care about '=' operator which is the most common case,  it should be
easier than the case here since we just need to know if only 1 partition
will survive after pruning, but don't care about which one it is.  I'd like
to discuss in another thread, and leave this thread for Richard's patch
only.

-- 
Best Regards
Andy Fan

pg_upgrade - a function parameter shadows global 'new_cluster'

2023-08-22 Thread Peter Smith

Hi hackers.

During a recent review of nearby code I noticed that there was a shadowing
of the 'new_cluster' global variable by a function parameter:

Here:
static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);

~~~

It looks like it has been like this for a couple of years. I guess this
might have been found/fixed earlier had the code been compiled differently:

check.c: In function ‘check_for_new_tablespace_dir’:
check.c:381:43: warning: declaration of ‘new_cluster’ shadows a global
declaration [-Wshadow]
 check_for_new_tablespace_dir(ClusterInfo *new_cluster)
   ^
In file included from check.c:16:0:
pg_upgrade.h:337:4: warning: shadowed declaration is here [-Wshadow]
new_cluster;
^

~~~

PSA a small patch to remove the unnecessary parameter, and so eliminate
this shadowing.

Thoughts?

--
Kind Regards,
Peter Smith.
Fujitsu Australia.


v1-0001-Remove-the-shadowing-of-new_cluster-global.patch
Description: Binary data

Re: Support run-time partition pruning for hash join

2023-08-22 Thread Andy Fan

On Tue, Aug 22, 2023 at 5:43 PM Richard Guo  wrote:

>
> On Mon, Aug 21, 2023 at 8:34 PM Andy Fan  wrote:
>
>> This feature looks good, but is it possible to know if we can prune
>> any subnodes before we pay the extra effort (building the Hash
>> table, for each row... stuff)?
>>
>
> It might be possible if we take the partition prunning into
> consideration when estimating costs.  But it seems not easy to calculate
> the costs accurately.
>

This is a real place I am worried about the future of this patch.
Personally, I do like this patch,  but not sure what if this issue can't be
fixed to make everyone happy, and fixing this perfectly looks hopeless
for me.  However, let's see what will happen.


>
>
>> Maybe at least,  if we have found no subnodes can be skipped
>> during the hashing, we can stop doing such work anymore.
>>
>
> Yeah, this is what we can do.
>

cool.


>
>
>> In my current knowledge, we have to build the inner table first for this
>> optimization?  so hash join and sort merge should be OK, but nestloop
>> should
>> be impossible unless I missed something.
>>
>
> For nestloop and mergejoin, we'd always execute the outer side first.
> So the Append/MergeAppend nodes need to be on the inner side for the
> join partition prunning to take effect.  For a mergejoin that will
> explicitly sort the outer side, the sort node would process all the
> outer rows before scanning the inner side, so we can do the join
> partition prunning with that.  For a nestloop, if we have a Material
> node on the outer side, we can do that too, but I wonder if we'd have
> such a plan in real world, because we only add Material to the inner
> side of nestloop.
>

This is more interesting than I expected，thanks for the explaination.

-- 
Best Regards
Andy Fan

Re: initdb caching during tests

2023-08-22 Thread Andres Freund

Hi,

On 2023-08-22 23:47:24 +0200, Daniel Gustafsson wrote:
> I had a look at this today and have been running a lot of tests with it 
> without
> finding anything that breaks.

Thanks!

> The duplicated code is unfortunate, but after playing around with some
> options I agree that it's likely the best option.

Good and bad to hear :)

> While looking I did venture down the rabbithole of making it support extra
> params as well, but I don't think moving the goalposts there is doing us any
> favors, it's clearly chasing diminishing returns.

Agreed. I also went down that rabbithole, but it quickly gets a lot more code
and complexity - and there just aren't that many tests using non-default
options.

> My only small gripe is that I keep thinking about template databases for 
> CREATE
> DATABASE when reading the error messages in this patch, which is clearly not
> related to what this does.
> 
> +   note("initializing database system by copying initdb template");
> 
> I personally would've used cache instead of template in the user facing parts
> to keep concepts separated, but thats personal taste.

I am going back and forth on that one (as one can notice with $subject). It
doesn't quite seem like a cache, as it's not "created" on demand and only
usable when the exactly same parameters are used repeatedly. But template is
overloaded as you say...

> All in all, I think this is committable as is.

Cool. Planning to do that tomorrow. We can easily extend / adjust this later,
it just affects testing infrastructure.

Greetings,

Andres Freund

False "pg_serial": apparent wraparound” in logs

2023-08-22 Thread Imseih (AWS), Sami

Hi,

I Recently encountered a situation on the field in which the message
“could not truncate directory "pg_serial": apparent wraparound”
was logged even through there was no danger of wraparound. This
was on a brand new cluster and only took a few minutes to see
the message in the logs.

Reading on some history of this error message, it appears that there
was work done to improve SLRU truncation and associated wraparound
log messages [1]. The attached repro on master still shows that this message
can be logged incorrectly.

The repro runs updates with 90 threads in serializable mode and kicks
off a “long running” select on the same table in serializable mode.

As soon as the long running select commits, the next checkpoint fails
to truncate the SLRU and logs the error message.

Besides the confusing log message, there may also also be risk with
pg_serial getting unnecessarily bloated and depleting the disk space.

Is this a bug?

[1] 
https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com

Regards,

Sami Imseih
Amazon Web Services (AWS)





### create the pgbench script
echo "\set idv random(1, 1)" > /tmp/pgbench.sql
echo "BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;" >> /tmp/pgbench.sql
echo "UPDATE tab1 SET id2 = id2 WHERE id = :idv;" >> /tmp/pgbench.sql
echo "COMMIT;" >> /tmp/pgbench.sql

### create the driver script
cat < /tmp/repro.sh
psql<

Re: Make --help output fit within 80 columns per line

2023-08-22 Thread Masahiro Ikeda


Hi,

On 2023-08-22 22:57, torikoshia wrote:

On 2023-08-21 13:08, Masahiro Ikeda wrote:

(2)

Is there any reason that only src/bin commands are targeted? I found 
that
we also need to fix vacuumlo with the above test. I think it's better 
to

fix it because it's a contrib module.

$ vacuumlo --help | while IFS='' read line; do echo $((`echo $line |
wc -m` - 1)) $line; done | sort -n -r  | head -n 2
84   -n, --dry-run don't remove large objects, just show
what would be done
74   -l, --limit=LIMIT commit after removing each LIMIT large 
objects


This is because I wasn't sure making all --help outputs fit into 80
columns per line is right thing to do as described below:

| If this is the way to go, I'll do same things for contrib commands.

If there are no objection, I'm going to make other commands fit within
80 columns per line including (4).


OK. Sorry, I missed the sentence above.
I'd like to hear what others comment too.


(3)

Is to delete '/mnt/server' intended?  I though it better to leave it 
as

is since archive_cleanup_command example uses the absolute path.

-"  pg_archivecleanup /mnt/server/archiverdir
00010010.0020.backup\n"));
+"  pg_archivecleanup archiverdir
00010010.0020.backup\n"));

I will confirmed that the --help text are not changed and only
the line breaks are changed.  But, currently the above change
break it.


Yes, it is intended as described in the thread.

https://www.postgresql.org/message-id/20230615.152036.1556630042388070221.horikyota.ntt%40gmail.com

| We could shorten it by removing the "/mnt/server" portion, but
I'm not sure if it's worth doing.

However, I feel it is acceptable to make an exception and exceed 80
characters for this line.


Thanks for sharing the thread. I understood.

It seems that the directory name should be consistent with the example
of archive_cleanup_command. However, it does not seem appropriate to
make archive_cleanup_command to use a relative path.

```

pg_archivecleanup --help

(snip)
e.g.
  archive_cleanup_command = 'pg_archivecleanup /mnt/server/archiverdir 
%r'


Or for use as a standalone archive cleaner:
e.g.
  pg_archivecleanup /mnt/server/archiverdir 
00010010.0020.backup

```

IMHO, is simply breaking the line acceptable?

```
Or for use as a standalone archive cleaner:
e.g.
  pg_archivecleanup /mnt/server/archiverdir
  00010010.0020.backup
```

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: Adding a LogicalRepWorker type field

2023-08-22 Thread Peter Smith

On Tue, Aug 22, 2023 at 6:24 PM Amit Kapila  wrote:

> On Mon, Aug 21, 2023 at 3:48 PM Amit Kapila 
> wrote:
> >
> > On Mon, Aug 21, 2023 at 5:34 AM Peter Smith 
> wrote:
> > >
> > > PSA v9
> > >
> >
> > LGTM. I'll push this tomorrow unless there are any more comments.
> >
>
> Pushed.
>

Thanks for pushing this. The CF entry has been updated [1]

--
[1] . https://commitfest.postgresql.org/44/4472/

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Prevent psql \watch from running queries that return no rows

2023-08-22 Thread Daniel Gustafsson

> On 22 Aug 2023, at 23:23, Greg Sabino Mullane  wrote:
> 
> Thank you for the feedback, everyone. Attached is version 4 of the patch, 
> featuring a few tests and minor rewordings.

Thanks, the changes seem good from a quick skim.  I'll take a better look
tomorrow to hopefully close this one.

--
Daniel Gustafsson

Re: initdb caching during tests

2023-08-22 Thread Daniel Gustafsson

> On 5 Aug 2023, at 21:56, Andres Freund  wrote:

> We have some issues with CI on macos and windows being too expensive (more on
> that soon in a separate email), which reminded me of this thread (with
> original title: [1])
> 
> I've attached a somewhat cleaned up version of the patch to cache initdb
> across runs.  The results are still fairly impressive in my opinion.
> 
> One thing I do not like, but don't have a good idea for how to improve, is
> that there's a bunch of duplicated logic in pg_regress.c and Cluster.pm. I've
> tried to move that into initdb.c itself, but that ends up pretty ugly, because
> we need to be a lot more careful about checking whether options are compatible
> etc. I've also thought about just putting this into a separate perl script,
> but right now we still allow basic regression tests without perl being
> available.  So I concluded that for now just having the copies is the best
> answer.

I had a look at this today and have been running a lot of tests with it without
finding anything that breaks.  The duplicated code is unfortunate, but after
playing around with some options I agree that it's likely the best option.

While looking I did venture down the rabbithole of making it support extra
params as well, but I don't think moving the goalposts there is doing us any
favors, it's clearly chasing diminishing returns.

My only small gripe is that I keep thinking about template databases for CREATE
DATABASE when reading the error messages in this patch, which is clearly not
related to what this does.

+   note("initializing database system by copying initdb template");

I personally would've used cache instead of template in the user facing parts
to keep concepts separated, but thats personal taste.

All in all, I think this is committable as is.

--
Daniel Gustafsson

Re: Faster "SET search_path"

2023-08-22 Thread Jeff Davis

On Wed, 2023-08-16 at 15:09 -0700, Jeff Davis wrote:
> To bring the overhead closer to zero we need to somehow avoid
> repeating
> so much work in guc.c, though. If we don't go around it, another
> approach would be to separate GUC setting into two phases: one that
> does the checks, and one that performs the final change. All the real
> work (hash lookup, guc_strdup, and check_search_path) is done in the
> "check" phase.
> 
> It's a bit painful to reorganize the code in guc.c, though, so that
> might need to happen in a few parts and will depend on how great the
> demand is.

I looked into this, and one problem is that there are two different
memory allocators at work. In guc.c, the new value is guc_strdup'd, and
needs to be saved until it replaces the old value. But if the caller
(in fmgr.c) saves the state in fmgr_security_definer_cache, it would be
saving a guc_strdup'd value, and we'd need to be careful about freeing
that if (and only if) the final set step of the GUC doesn't happen. Or,
we'd need to somehow use pstrdup() instead, and somehow tell guc.c not
to guc_free() it.

Those are obviously solvable, but I don't see a clean way to do so.
Perhaps it's still better than circumventing guc.c entirely (because
that approach probably suffers from the same problem, in addition to
the problem you raised), but I'm not sure if it's worth it to take on
this level of complexity. Suggestions welcome.

Right now I'm still intending to get the cache in place, which doesn't
have these downsides and provides a nice speedup.

Regards,
Jeff Davis

Re: [17] collation provider "builtin"

2023-08-22 Thread Jeff Davis

On Wed, 2023-06-14 at 15:55 -0700, Jeff Davis wrote:
> The locale "C" (and equivalently, "POSIX") is not really a libc
> locale;
> it's implemented internally with memcmp for collation and
> pg_ascii_tolower, etc., for ctype.
> 
> The attached patch implements a new collation provider, "builtin",
> which only supports "C" and "POSIX".

Rebased patch attached.

I got some generally positive comments, but it needs some more feedback
on the specifics to be committable.

This might be a good time to summarize my thoughts on collation after
my work in v16:

* Picking a database default collation other than UCS_BASIC (a.k.a.
"C", a.k.a. memcmp(), a.k.a. provider=builtin) is something that should
be done intentionally. It's an impactful choice that affects semantics,
performance, and upgrades/deployment. Beyond that, our implementation
still lacks a good way to manage versions of collation provider
libraries and track object dependencies in a safe way to prevent index
corruption, so the safest choice is really just to use stable memcmp()
semantics.

* The defaults for initdb seem bad in a number of ways, but it's too
hard to change that default now (I tried in v16 and reverted it). So
the job of reasonable choices is left for higher-level tools and
documentation.

* We can handle the collation and character classification
independently. The main use case is to set the collation to memcmp()
semantics (for stability and performance) and set the character
classification to something interesting (on the grounds that it's more
likely to be stable and less likely to be used in an index than a
collation). Right now the only way to do that is to use the libc
provider and set the collation to C and the ctype to a libc locale. But
there is also a use case for having ICU as the provider for character
classification. One option is to have separate datcolprovider=b
(builtin provider) and datctypeprovider=i, so that the collation would
be handled with memcmp and the character classification daticulocale.
It feels like we're growing the fields in pg_database a little too
much, but the use case seems valid, and perhaps we can reorganize the
catalog representation a bit.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

From ec35f2c1e31c3d793f5eb4982cb562b701ce71fd Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Mon, 1 May 2023 15:38:29 -0700
Subject: [PATCH v12] Introduce collation provider "builtin".

Only supports locale "C" (or equivalently, "POSIX"). Provides
locale-unaware semantics that are implemented as fast byte operations
in Postgres, independent of the operating system or any provider
libraries.

Equivalent (in semantics and implementation) to the libc provider with
locale "C", except that LC_COLLATE and LC_CTYPE can be set
independently.

Use provider "builtin" for built-in collation "ucs_basic" instead of
libc.

Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798...@joeconway.com
---
 doc/src/sgml/charset.sgml  |  89 +
 doc/src/sgml/ref/create_collation.sgml |  11 ++-
 doc/src/sgml/ref/create_database.sgml  |   8 +-
 doc/src/sgml/ref/createdb.sgml |   2 +-
 doc/src/sgml/ref/initdb.sgml   |   7 +-
 src/backend/catalog/pg_collation.c |   7 +-
 src/backend/commands/collationcmds.c   | 103 +
 src/backend/commands/dbcommands.c  |  69 ++---
 src/backend/utils/adt/pg_locale.c  |  27 ++-
 src/backend/utils/init/postinit.c  |  10 ++-
 src/bin/initdb/initdb.c|  24 --
 src/bin/initdb/t/001_initdb.pl |  49 +++-
 src/bin/pg_dump/pg_dump.c  |  15 +++-
 src/bin/pg_upgrade/t/002_pg_upgrade.pl |  35 +++--
 src/bin/psql/describe.c|   4 +-
 src/bin/scripts/createdb.c |   2 +-
 src/bin/scripts/t/020_createdb.pl  |  56 ++
 src/include/catalog/pg_collation.dat   |   3 +-
 src/include/catalog/pg_collation.h |   3 +
 src/test/icu/t/010_database.pl |  18 ++---
 src/test/regress/expected/collate.out  |  25 +-
 src/test/regress/sql/collate.sql   |  10 +++
 22 files changed, 493 insertions(+), 84 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index ed84465996..b38cf82f83 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -342,22 +342,14 @@ initdb --locale=sv_SE
Locale Providers

-PostgreSQL supports multiple locale
-providers.  This specifies which library supplies the locale
-data.  One standard provider name is libc, which uses
-the locales provided by the operating system C library.  These are the
-locales used by most tools provided by the operating system.  Another
-provider is icu, which uses the external
-ICUICU library.  ICU locales can
-only be used if support for ICU was configured when PostgreSQL was built.
+A locale provider specifies which library defines the locale behavior for
+collations a

Re: BUG #18059: Unexpected error 25001 in stored procedure

2023-08-22 Thread Tom Lane

Robert Haas  writes:
> On Sat, Aug 19, 2023 at 1:19 PM Tom Lane  wrote:
>> What I'm inclined to propose, therefore, is that we make revalidation
>> be a no-op for every statement type for which transformStmt() reaches
>> its default: case.  (When it does so, the resulting CMD_UTILITY Query
>> will not get any processing from the rewriter or planner either.)

> That sounds like the right thing. It is perhaps unfortunate that we
> don't have a proper parse analysis/execution distinction for other
> types of statements, but if that ever changes then this can be
> revisited.

I started to code this, and immediately noticed that transformStmt()
already has a companion function analyze_requires_snapshot() that
returns "true" in the cases of interest ... except that it does
not return true for T_CallStmt.  Perhaps that was intentional to
begin with, but it is very hard to believe that it isn't a bug now,
since transformCallStmt can invoke nearly arbitrary processing via
transformExpr().  What semantic anomalies, if any, do we risk if CALL
processing forces a transaction start?  (I rather imagine it does
already, somewhere later on...)

Anyway, I'm now of two minds whether to use analyze_requires_snapshot()
as-is for plancache.c's invalidation test, or duplicate it under a
different name, or have two names but one is just an alias for the
other.  It still seems like "analyze requires snapshot" isn't
necessarily the exact inverse condition of "analyze is a no-op", but
it is today (assuming we agree that CALL needs a snapshot), and maybe
maintaining two duplicate functions is silly.  Thoughts?

regards, tom lane

Re: Prevent psql \watch from running queries that return no rows

2023-08-22 Thread Greg Sabino Mullane

Thank you for the feedback, everyone. Attached is version 4 of the patch,
featuring a few tests and minor rewordings.

Cheers,
Greg


psql_watch_exit_on_zero_rows_v4.patch
Description: Binary data

Re: subscription/015_stream sometimes breaks

2023-08-22 Thread Thomas Munro

On Wed, Aug 23, 2023 at 8:21 AM Thomas Munro  wrote:
> I didn't study it closely but it looks like there might be a second
> deadlock, after the one that is expected by the test?  Examples from
> the past couple of weeks:

I should add, it's not correlated with the patches that cfbot is
testing, and it's the most frequent failure for which that is the
case.

suite |name| distinct_patches | errors
--++--+
 subscription | 015_stream |   47 | 61

Re: Direct I/O

2023-08-22 Thread Peter Geoghegan

On Wed, Apr 19, 2023 at 10:43 AM Andres Freund  wrote:
> I don't think the "evict on every buffer access" works quite that way - unless
> you have a completely even access pattern, buffer access frequency will
> increase usage count much more frequently on some buffers than others. And if
> you have a completely even access pattern, it's hard to come up with a good
> cache replacement algorithm...

My guess is that the most immediate problem in this area is the
problem of "correlated references" (to use the term from the LRU-K
paper). I gave an example of that here:

https://postgr.es/m/CAH2-Wzk7T9K3d9_NY+jEXp2qQGMYoP=gzmor8q1cv57sxaw...@mail.gmail.com

In other words, I think that the most immediate problem may in fact be
the tendency of usage_count to get incremented multiple times in
response to what is (for all intents and purposes) the same logical
page access. Even if it's not as important as I imagine, it still
seems likely that verifying that our input information isn't garbage
is the logical place to begin work in this general area. It's
difficult to be sure about that because it's so hard to look at just
one problem in isolation. I suspect that you were right to point out
that a larger shared buffers tends to look quite different to a
smaller shared buffers. That same factor is going to complicate any
analysis of the specific problem that I've highlighted (to say nothing
of the way that contention complicates the picture).

There is an interesting paper that compared the hit rates seen for
TPC-C to TPC-E on relatively modern hardware:

https://www.cs.cmu.edu/~chensm/papers/TPCE-sigmod-record10.pdf

It concludes that the buffer misses for each workload look rather
similar, past a certain point (past a certain buffer pool size): both
workloads have cache misses that seem totally random. The access
patterns may be very different, but that doesn't necessarily have any
visible effect on buffer misses. At least provided that you make
certain modest assumptions about buffer pool size, relative to working
set size.

The most sophisticated cache management algorithms (like ARC) work by
maintaining metadata about recently evicted buffers, which is used to
decide whether to favor recency over frequency. If you work backwards
then it follows that having cache misses that look completely random
is desirable, and perhaps even something to work towards. What you
really don't want is a situation where the same small minority of
pages keep getting ping-ponged into and out of the buffer pool,
without ever settling, even though the buffer cache is large enough
that that's possible in principle. That pathological profile is the
furthest possible thing from random.

With smaller shared_buffers, it's perhaps inevitable that buffer cache
misses are random, and so I'd expect that managing the problem of
contention will tend to matter most. With larger shared_buffers it
isn't inevitable at all, so the quality of the cache eviction scheme
is likely to matter quite a bit more.

-- 
Peter Geoghegan

Re: PG 16 draft release notes ready

2023-08-22 Thread Jeff Davis

On Thu, 2023-05-18 at 16:49 -0400, Bruce Momjian wrote:
> I have completed the first draft of the PG 16 release notes.  You can
> see the output here:


https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-LOCALIZATION

I notice that this item is still listed:

 * Determine the ICU default locale from the environment (Jeff Davis)

But that was reverted as part of 2535c74b1a.

Regards,
Jeff Davis

subscription/015_stream sometimes breaks

2023-08-22 Thread Thomas Munro

Hi,

A couple of times a day, cfbot reports an error like this:

https://cirrus-ci.com/task/6424286882168832

I didn't study it closely but it looks like there might be a second
deadlock, after the one that is expected by the test?  Examples from
the past couple of weeks:

cfbot=> select test.task_id, task.task_name, task.created as time from
test join task using (task_id) where suite = 'subscription' and name =
'015_stream' and result = 'ERROR' and task.created > now() - interval
'14 days' order by task.created desc;
 task_id  |   task_name
--+
 6600867550330880 | Windows - Server 2019, VS 2019 - Meson & ninja
 6222795470798848 | Windows - Server 2019, VS 2019 - Meson & ninja
 6162088322662400 | macOS - Ventura - Meson
 5014781862608896 | Windows - Server 2019, VS 2019 - Meson & ninja
 6424286882168832 | Windows - Server 2019, VS 2019 - Meson & ninja
 6683705222103040 | Windows - Server 2019, VS 2019 - Meson & ninja
 5881665076068352 | Windows - Server 2019, VS 2019 - Meson & ninja
 5865929054093312 | Windows - Server 2019, VS 2019 - Meson & ninja
 5855144995192832 | Windows - Server 2019, VS 2019 - Meson & ninja
 6071567994585088 | Windows - Server 2019, VS 2019 - Meson & ninja
 5311312343859200 | Windows - Server 2019, VS 2019 - Meson & ninja
 4986100071006208 | FreeBSD - 13 - Meson
 630230138880 | Windows - Server 2019, VS 2019 - Meson & ninja
 4554627119579136 | Windows - Server 2019, VS 2019 - Meson & ninja
 6106090807492608 | Windows - Server 2019, VS 2019 - Meson & ninja
 5190113534148608 | Windows - Server 2019, VS 2019 - Meson & ninja
 6452324697112576 | Windows - Server 2019, VS 2019 - Meson & ninja
 4610228927332352 | Windows - Server 2019, VS 2019 - Meson & ninja
 4928567608344576 | Windows - Server 2019, VS 2019 - Meson & ninja
 5705451157848064 | FreeBSD - 13 - Meson
 5952066133164032 | Windows - Server 2019, VS 2019 - Meson & ninja
 5341101565935616 | Windows - Server 2019, VS 2019 - Meson & ninja
 6751165837213696 | Windows - Server 2019, VS 2019 - Meson & ninja
 4624168109473792 | Windows - Server 2019, VS 2019 - Meson & ninja
 6730863963013120 | Windows - Server 2019, VS 2019 - Meson & ninja
 6174140269330432 | Windows - Server 2019, VS 2019 - Meson & ninja
 4637318561136640 | Windows - Server 2019, VS 2019 - Meson & ninja
 4535300303618048 | Windows - Server 2019, VS 2019 - Meson & ninja
 5672693542944768 | FreeBSD - 13 - Meson
 6087381225308160 | Windows - Server 2019, VS 2019 - Meson & ninja
 6098413217906688 | Windows - Server 2019, VS 2019 - Meson & ninja
 6130601380544512 | Windows - Server 2019, VS 2019 - Meson & ninja
 5546054284738560 | Windows - Server 2019, VS 2019 - Meson & ninja
 6674258676416512 | Windows - Server 2019, VS 2019 - Meson & ninja
 4571976740634624 | FreeBSD - 13 - Meson
 6566515328155648 | Windows - Server 2019, VS 2019 - Meson & ninja
 6576879084240896 | Windows - Server 2019, VS 2019 - Meson & ninja
 5295804827566080 | Windows - Server 2019, VS 2019 - Meson & ninja
 6426387188285440 | Windows - Server 2019, VS 2019 - Meson & ninja
 4763275859066880 | Windows - Server 2019, VS 2019 - Meson & ninja
 6137227240013824 | Windows - Server 2019, VS 2019 - Meson & ninja
 5185063273365504 | Windows - Server 2019, VS 2019 - Meson & ninja
 6542656449282048 | Windows - Server 2019, VS 2019 - Meson & ninja
 4874919171850240 | Windows - Server 2019, VS 2019 - Meson & ninja
 6531290556530688 | Windows - Server 2019, VS 2019 - Meson & ninja
 5377848232378368 | FreeBSD - 13 - Meson
 6436049925177344 | FreeBSD - 13 - Meson
 6057679748071424 | Windows - Server 2019, VS 2019 - Meson & ninja
 5534694867992576 | Windows - Server 2019, VS 2019 - Meson & ninja
 4831311429369856 | Windows - Server 2019, VS 2019 - Meson & ninja
 4704271531245568 | macOS - Ventura - Meson
 5297047549509632 | Windows - Server 2019, VS 2019 - Meson & ninja
 5836487120388096 | macOS - Ventura - Meson
 6527459915464704 | FreeBSD - 13 - Meson
 4985483743199232 | FreeBSD - 13 - Meson
 4583651082502144 | Linux - Debian Bullseye - Meson
 5498444756811776 | FreeBSD - 13 - Meson
 5146601035923456 | Windows - Server 2019, VS 2019 - Meson & ninja
 5709550989344768 | macOS - Ventura - Meson
 6357936616767488 | FreeBSD - 13 - Meson
(60 rows)

Re: list of acknowledgments for PG16

2023-08-22 Thread Bruce Momjian

On Tue, Aug 22, 2023 at 10:03:29AM -0400, Joe Conway wrote:
> On 8/22/23 09:44, Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > On 2023-Aug-22, Peter Eisentraut wrote:
> > > > The list of acknowledgments for the PG16 release notes has been 
> > > > committed.
> > > > It should show up here sometime: 
> > > > .
> > 
> > > Hmm, I think these docs would only regenerate during the RC1 release, so
> > > it'll be a couple of weeks, unless we manually poke the doc builder.
> > 
> > Yeah.  I could produce a new set of tarballs from the v16 branch tip,
> > but I don't know the process (nor have the admin permissions) to
> > extract the HTML docs and put them on the website.
> 
> 
> These days the docs update is part of a scripted process for doing an entire
> release.
> 
> I'm sure we could figure out how to just release the updated docs, but with
> RC1 a week away, is it really worthwhile?

You can see the list in my automated build:


https://momjian.us/pgsql_docs/release-16.html#RELEASE-16-ACKNOWLEDGEMENTS

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: PG 16 draft release notes ready

2023-08-22 Thread Bruce Momjian

On Tue, Aug 22, 2023 at 10:02:16AM +0300, Pavel Luzanov wrote:
> On 22.08.2023 00:58, Bruce Momjian wrote:
> > Attached is an applied patch that moves the inherit item into
> > incompatibilities. clarifies it, and splits out the ADMIN syntax item.
> 
> > The role's default inheritance behavior can be overridden with the new
> GRANT ... WITH INHERIT clause.
> 
> The only question about "can be". Why not "will be"? The inheritance
> behavior will be changed anyway.

I used "can be" to highlight you "can" override it, but don't need to.

> > Please let me know if I need any other changes.  Thanks.
> 
> * Allow psql's access privilege commands to show system objects (Nathan
> Bossart, Pavel Luzanov)
>     The options are \dpS, \zS, and \drg.
> 
> I think that this description correct only for the \dpS and \zS commands.
> (By the way, unfortunately after reverting MAINTAIN privilege this commands
> are not much useful in v16.)
> 
> But the \drg command is a different thing. This is a full featured
> replacement for "Member of" column of the \du, \dg commands.
> It shows not only members, but granted options (admin, inherit, set) and
> grantor.
> This is important information for membership usage and administration.
> IMO, removing the "Member of" column from the \du & \dg commands also
> requires attention in release notes.
> 
> So, I suggest new item in the psql section for \drg. Something like this:
> 
> * Add psql \drg command to display role grants and remove the "Member of"
> column from \du & \dg altogether.

I see your point.  Attached is an applied patch which fixes this by
splitting \drg into a separate item and adding text.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml
index 1f98ccdc62..c464be5ee1 100644
--- a/doc/src/sgml/release-16.sgml
+++ b/doc/src/sgml/release-16.sgml
@@ -2176,11 +2176,28 @@ Author: Tom Lane 
 
 
 
-Allow psql's access privilege commands to show system objects (Nathan Bossart, Pavel Luzanov)
+Add psql command \drg to show role membership details (Pavel Luzanov)
 
 
 
-The options are \dpS, \zS, and \drg.
+The Member of output column has been removed from \du and \dg because this new command displays this informaion in more detail.
+
+
+
+
+
+
+
+Allow psql's access privilege commands to show system objects (Nathan Bossart)
+
+
+
+The options are \dpS and \zS.

Re: Convert encrypted SSL test keys to PKCS#8 format

2023-08-22 Thread Jacob Champion

On Tue, Aug 22, 2023 at 1:07 AM Peter Eisentraut  wrote:
> I have attached two patches, one to update the generation rules, and one
> where I have converted the existing test files.  (I didn't generate them
> from scratch, so for example
> src/test/modules/ssl_passphrase_callback/server.crt that corresponds to
> one of the keys does not need to be updated.)

Looks good from here. I don't have a FIPS setup right now, but the new
files pass tests on OpenSSL 1.0.2u, 1.1.1v, 3.0.2-0ubuntu1.10, and
LibreSSL 3.8. Tests continue to pass after a full clean and rebuild of
the sslfiles.

> It's also interesting that if you generate all private keys from scratch
> using the existing rules on a new OpenSSL version (3+), they will be
> generated in PKCS#8 format by default.  In those OpenSSL versions, the
> openssl-rsa command has a -traditional option to get the old format, but
> of course old OpenSSL versions don't have that.  As OpenSSL 3 gets more
> widespread, we might need to rethink these rules anyway to make sure we
> get consistent behavior.

Yeah. Looks like OpenSSL 3 also adds new v3 extensions to the
certificates... For now they look benign, but I assume someone's going
to run into weirdness at some point.

Thanks!
--Jacob

Re: Make all Perl warnings fatal

2023-08-22 Thread Andrew Dunstan



On 2023-08-22 Tu 09:20, Alvaro Herrera wrote:

On 2023-Aug-10, Peter Eisentraut wrote:


I wanted to figure put if we can catch these more reliably, in the style of
-Werror.  AFAICT, there is no way to automatically turn all warnings into
fatal errors.  But there is a way to do it per script, by replacing

 use warnings;

by

 use warnings FATAL => 'all';

See attached patch to try it out.

BTW in case we do find that there's some unforeseen problem and we want
to roll back, it would be great to have a way to disable this without
having to edit every single Perl file again later.  However, I didn't
find a way to do it -- I thought about creating a separate PgWarnings.pm
file that would do the "use warnings FATAL => 'all'" dance and which
every other Perl file would use or include; but couldn't make it work.
Maybe some Perl expert knows a good answer to this.

Maybe the BEGIN block of each file can `eval` a new PgWarnings.pm that
emits the "use warnings" line?



Once we try it, I doubt we would want to revoke it globally, and if we 
did I'd rather not be left with a wart like this. As I mentioned 
upthread, it is possible to override the setting locally. The manual 
page for the warnings pragma contains details.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-08-22 Thread Jeff Davis

On Mon, 2023-08-14 at 10:34 +0200, Peter Eisentraut wrote:
> I have investigated this.  My assessment is that how PostgreSQL 
> interfaces with ICU is correct.  Whether what ICU does is correct
> might 
> be debatable.  I have filed a bug with ICU about this: 
> https://unicode-org.atlassian.net/browse/ICU-22456 , but there is no 
> response yet.

Is everything other than the language and region simply discarded when
a rules string is present, or are some attributes preserved, or is
there some other nuance?

> You can work around this by including the desired attributes in the 
> rules string, for example
> 
>  create collation c3 (provider=icu,
>    locale='und-u-ka-shifted-ks-level1',
>    rules='[alternate shifted][strength 1]',
>    deterministic=false);
> 
> So I don't think there is anything we need to do here for PostgreSQL
> 16.

Is there some way we can warn a user that some attributes will be
discarded, or improve the documentation? Letting the user figure this
out for themselves doesn't seem right.

Are you sure we want to allow rules for the database default collation
in 16, or should we start with just allowing them in CREATE COLLATION
and then expand to the database default collation later? I'm still a
bit concerned about users getting too fancy with daticurules, and
ending up not being able to connect to their database anymore.

Regards,
Jeff Davis

Re: should frontend tools use syncfs() ?

2023-08-22 Thread Nathan Bossart

On Tue, Aug 22, 2023 at 12:53:53PM +0900, Michael Paquier wrote:
> On Mon, Aug 21, 2023 at 07:06:32PM -0700, Nathan Bossart wrote:
>> This would look something like the attached patch.  I think this is nicer.
>> With this patch, we don't have to choose between including fd.h or
>> redefining the macros in the frontend code.
> 
> Yes, this one is moving the needle in the good direction.  +1.

Great.  Here is a new patch set that includes this change as well as the
suggested documentation updates.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 25357aefb8956e12b4555881346c5b4a73c1f4e5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 21 Aug 2023 19:05:14 -0700
Subject: [PATCH v5 1/2] move PG_TEMP_FILE* macros to file_utils.h

---
 src/backend/backup/basebackup.c |  2 +-
 src/backend/postmaster/postmaster.c |  1 +
 src/backend/storage/file/fileset.c  |  1 +
 src/bin/pg_checksums/pg_checksums.c | 10 --
 src/bin/pg_rewind/filemap.c |  2 +-
 src/include/common/file_utils.h |  4 
 src/include/storage/fd.h|  4 
 7 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 45be21131c..5d66014499 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -25,6 +25,7 @@
 #include "commands/defrem.h"
 #include "common/compression.h"
 #include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "nodes/pg_list.h"
@@ -37,7 +38,6 @@
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/dsm_impl.h"
-#include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9c8ec779f9..d375dcb795 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -94,6 +94,7 @@
 #include "access/xlogrecovery.h"
 #include "catalog/pg_control.h"
 #include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/ip.h"
 #include "common/pg_prng.h"
 #include "common/string.h"
diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c
index e9951b0269..84cae32548 100644
--- a/src/backend/storage/file/fileset.c
+++ b/src/backend/storage/file/fileset.c
@@ -25,6 +25,7 @@
 
 #include "catalog/pg_tablespace.h"
 #include "commands/tablespace.h"
+#include "common/file_utils.h"
 #include "common/hashfn.h"
 #include "miscadmin.h"
 #include "storage/ipc.h"
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 19eb67e485..9011a19b4e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -52,16 +52,6 @@ typedef enum
 	PG_MODE_ENABLE
 } PgChecksumMode;
 
-/*
- * Filename components.
- *
- * XXX: fd.h is not declared here as frontend side code is not able to
- * interact with the backend-side definitions for the various fsync
- * wrappers.
- */
-#define PG_TEMP_FILES_DIR "pgsql_tmp"
-#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
-
 static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index bd5c598e20..58280d9abc 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -27,12 +27,12 @@
 #include 
 
 #include "catalog/pg_tablespace_d.h"
+#include "common/file_utils.h"
 #include "common/hashfn.h"
 #include "common/string.h"
 #include "datapagemap.h"
 #include "filemap.h"
 #include "pg_rewind.h"
-#include "storage/fd.h"
 
 /*
  * Define a hash table which we can use to store information about the files
diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h
index b7efa1226d..dd1532bcb0 100644
--- a/src/include/common/file_utils.h
+++ b/src/include/common/file_utils.h
@@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd,
 
 extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset);
 
+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
 #endif			/* FILE_UTILS_H */
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 6791a406fc..aea30c0622 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -195,8 +195,4 @@ extern int	durable_unlink(const char *fname, int elevel);
 extern void SyncDataDirectory(void);
 extern int	data_sync_elevel(int elevel);
 
-/* Filename components */
-#define PG_TEMP_FILES_DIR "pgsql_tmp"
-#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
-
 #endif			/* FD_H */
-- 
2.25.1

>From 4cc8395720a3af1916f3baada04826177783ec80 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 28 Jul 2023 15:56:24 -0700
Subject: [PATCH v5 2/2] allow syncfs in frontend utilities

---
 doc/src/sgml/config.sgml  | 12 +---
 doc/src/sgml/filelist.sgml|  1 +
 doc/src

Re: DecodeInterval fixes

2023-08-22 Thread Jacob Champion

On Mon, Aug 21, 2023 at 10:39 PM Michael Paquier  wrote:
> 0002 and 0003 make this stuff fail, but isn't there a risk that this
> breaks applications that relied on these accidental behaviors?
> Assuming that this is OK makes me nervous.

I wouldn't argue for backpatching, for sure, but I guess I saw this as
falling into the same vein as 5b3c5953 and bcc704b52 which were
already committed.

--Jacob

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-22 Thread Ashutosh Bapat

Hi Tom,

On Tue, Aug 22, 2023 at 2:18 AM Tom Lane  wrote:

>
> (BTW, I did look at Ashutosh's idea of merging the
> reparameterize_path_by_child and path_is_reparameterizable_by_child
> functions, but I didn't think that would be an improvement,
> because we'd have to clutter reparameterize_path_by_child with
> a lot of should-I-skip-this-step tests.  Some of that could be
> hidden in the macros, but a lot would not be.  Another issue
> is that I do not think we can change reparameterize_path_by_child's
> API contract in the back branches, because we advertise it as
> something that FDWs and custom scan providers can use.)

Here are two patches
0001 same as your v5 patch
0002 implements what I had in mind - combining the assessment and
reparameterization in a single function.

I don't know whether you had something similar to 0002 when you
considered that approach. Hence implemented it quickly. The names of
the functions and arguments can be changed. But I think this version
will help us keep assessment and actual reparameterization in sync,
very tightly. Most of the conditionals have been pushed into macros,
except for two which need to be outside the macros and actually look
better to me. Let me know what you think of this.

FWIW I ran make check and all the tests pass.

I agree that we can not consider this for backbranches but we are
anyway thinking of disabling reparameterization for tablesample scans
anyway.

--
Best Wishes,
Ashutosh Bapat
From 15735df5693a962b51e0a8e384154db802c7870d Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Tue, 22 Aug 2023 17:07:37 +0530
Subject: [PATCH 1/2] Delay reparameterization of paths till create_plan()

v5 patch from Tom Lane as is
---
 src/backend/optimizer/path/joinpath.c|  67 +++
 src/backend/optimizer/plan/createplan.c  |  17 ++
 src/backend/optimizer/util/pathnode.c| 195 ++-
 src/include/optimizer/pathnode.h |   2 +
 src/test/regress/expected/partition_join.out |  60 ++
 src/test/regress/sql/partition_join.sql  |  12 ++
 6 files changed, 310 insertions(+), 43 deletions(-)

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 821d282497..e2ecf5b14b 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -30,8 +30,9 @@
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 /*
- * Paths parameterized by the parent can be considered to be parameterized by
- * any of its child.
+ * Paths parameterized by a parent rel can be considered to be parameterized
+ * by any of its children, when we are performing partitionwise joins.  These
+ * macros simplify checking for such cases.  Beware multiple eval of args.
  */
 #define PATH_PARAM_BY_PARENT(path, rel)	\
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path),	\
@@ -762,6 +763,20 @@ try_nestloop_path(PlannerInfo *root,
 	/* If we got past that, we shouldn't have any unsafe outer-join refs */
 	Assert(!have_unsafe_outer_join_ref(root, outerrelids, inner_paramrels));
 
+	/*
+	 * If the inner path is parameterized, it is parameterized by the topmost
+	 * parent of the outer rel, not the outer rel itself.  We will need to
+	 * translate the parameterization, if this path is chosen, during
+	 * create_plan().  Here we just check whether we will be able to perform
+	 * the translation, and if not avoid creating a nestloop path.
+	 */
+	if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent) &&
+		!path_is_reparameterizable_by_child(inner_path, outer_path->parent))
+	{
+		bms_free(required_outer);
+		return;
+	}
+
 	/*
 	 * Do a precheck to quickly eliminate obviously-inferior paths.  We
 	 * calculate a cheap lower bound on the path's cost and then use
@@ -778,27 +793,6 @@ try_nestloop_path(PlannerInfo *root,
 		  workspace.startup_cost, workspace.total_cost,
 		  pathkeys, required_outer))
 	{
-		/*
-		 * If the inner path is parameterized, it is parameterized by the
-		 * topmost parent of the outer rel, not the outer rel itself.  Fix
-		 * that.
-		 */
-		if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
-		{
-			inner_path = reparameterize_path_by_child(root, inner_path,
-	  outer_path->parent);
-
-			/*
-			 * If we could not translate the path, we can't create nest loop
-			 * path.
-			 */
-			if (!inner_path)
-			{
-bms_free(required_outer);
-return;
-			}
-		}
-
 		add_path(joinrel, (Path *)
  create_nestloop_path(root,
 	  joinrel,
@@ -861,6 +855,17 @@ try_partial_nestloop_path(PlannerInfo *root,
 			return;
 	}
 
+	/*
+	 * If the inner path is parameterized, it is parameterized by the topmost
+	 * parent of the outer rel, not the outer rel itself.  We will need to
+	 * translate the parameterization, if this path is chosen, during
+	 * create_plan().  Here we just check whether we will be able to perform
+	 * the translation, and if not avoid creating a nestloop path.
+	 */
+	if (PATH_PARAM_BY_PARENT(inner_

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-22 Thread Tom Lane

I wrote:
> ... I think the risk/reward ratio for messing with this in the
> back branches is unattractive in any case: to fix a corner case that
> apparently nobody uses in the field, we risk breaking any number of
> mainstream parameterized-path cases.  I'm content to commit the v5 patch
> (or a successor) into HEAD, but at this point I'm not sure I even want
> to risk it in v16, let alone perform delicate surgery to get it to work
> in older branches.  I think we ought to go with the "tablesample scans
> can't be reparameterized" approach in v16 and before.

Concretely, about like this for v16, and similarly in older branches.

regards, tom lane

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 5f5596841c..c4ec6ed5e6 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -4087,6 +4087,28 @@ do { \
 	switch (nodeTag(path))
 	{
 		case T_Path:
+
+			/*
+			 * If it's a SampleScan with tablesample parameters referencing
+			 * the other relation, we can't reparameterize, because we must
+			 * not change the RTE's contents here.  (Doing so would break
+			 * things if we end up using a non-partitionwise join.)
+			 */
+			if (path->pathtype == T_SampleScan)
+			{
+Index		scan_relid = path->parent->relid;
+RangeTblEntry *rte;
+
+/* it should be a base rel with a tablesample clause... */
+Assert(scan_relid > 0);
+rte = planner_rt_fetch(scan_relid, root);
+Assert(rte->rtekind == RTE_RELATION);
+Assert(rte->tablesample != NULL);
+
+if (bms_overlap(pull_varnos(root, (Node *) rte->tablesample),
+child_rel->top_parent_relids))
+	return NULL;
+			}
 			FLAT_COPY_PATH(new_path, path, Path);
 			break;
 
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 6560fe2416..a69a8a70f3 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -505,6 +505,32 @@ SELECT t1.a, ss.t2a, ss.t2c FROM prt1 t1 LEFT JOIN LATERAL
  550 | | 
 (12 rows)
 
+-- lateral reference in sample scan
+SET max_parallel_workers_per_gather = 0;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1 t1 JOIN LATERAL
+			  (SELECT * FROM prt1 t2 TABLESAMPLE SYSTEM (t1.a) REPEATABLE(t1.b)) s
+			  ON t1.a = s.a;
+   QUERY PLAN
+-
+ Nested Loop
+   ->  Append
+ ->  Seq Scan on prt1_p1 t1_1
+ ->  Seq Scan on prt1_p2 t1_2
+ ->  Seq Scan on prt1_p3 t1_3
+   ->  Append
+ ->  Sample Scan on prt1_p1 t2_1
+   Sampling: system (t1.a) REPEATABLE (t1.b)
+   Filter: (t1.a = a)
+ ->  Sample Scan on prt1_p2 t2_2
+   Sampling: system (t1.a) REPEATABLE (t1.b)
+   Filter: (t1.a = a)
+ ->  Sample Scan on prt1_p3 t2_3
+   Sampling: system (t1.a) REPEATABLE (t1.b)
+   Filter: (t1.a = a)
+(15 rows)
+
+RESET max_parallel_workers_per_gather;
 -- bug with inadequate sort key representation
 SET enable_partitionwise_aggregate TO true;
 SET enable_hashjoin TO false;
@@ -1944,6 +1970,40 @@ SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
  550 | 0 | 0002 | |  | | |  
 (12 rows)
 
+-- partitionwise join with lateral reference in sample scan
+SET max_parallel_workers_per_gather = 0;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_l t1 JOIN LATERAL
+			  (SELECT * FROM prt1_l t2 TABLESAMPLE SYSTEM (t1.a) REPEATABLE(t1.b)) s ON
+			  t1.a = s.a AND t1.b = s.b AND t1.c = s.c;
+QUERY PLAN
+--
+ Nested Loop
+   ->  Append
+ ->  Seq Scan on prt1_l_p1 t1_1
+ ->  Seq Scan on prt1_l_p2_p1 t1_2
+ ->  Seq Scan on prt1_l_p2_p2 t1_3
+ ->  Seq Scan on prt1_l_p3_p1 t1_4
+ ->  Seq Scan on prt1_l_p3_p2 t1_5
+   ->  Append
+ ->  Sample Scan on prt1_l_p1 t2_1
+   Sampling: system (t1.a) REPEATABLE (t1.b)
+   Filter: ((t1.a = a) AND (t1.b = b) AND ((t1.c)::text = (c)::text))
+ ->  Sample Scan on prt1_l_p2_p1 t2_2
+   Sampling: system (t1.a) REPEATABLE (t1.b)
+   Filter: ((t1.a = a) AND (t1.b = b) AND ((t1.c)::text = (c)::text))
+ ->  Sample Scan on prt1_l_p2_p2 t2_3
+   Sampling: system (t1.a) REPEATABLE (t1.b)
+   Filter: ((t1.a = a) AND (t1.b = b) AND ((t1.c)::text = (c)::text))
+ ->  Sample Scan on prt1_l_p3_p1 t2_4
+   Sampling: system (t1.a) REPEATABLE (t1.b)
+   Filter: ((t1.a = a) AND (t1.b = b) AND ((t1.c)::text = (c)::text))
+ ->  Sample Scan on prt1_l_p3_p2 t2_5
+   Sampling: system (t1.a) REPEATABLE (t1.b)
+   Filter: ((t1.a = a) AND (t1.b = b) AND ((t1.c)::text = (c)

Re: [PATCH] Add function to_oct

2023-08-22 Thread Peter Eisentraut


On 22.08.23 16:26, Nathan Bossart wrote:

I don't understand the reason for this handling of negative values.  I would
expect that, say, to_hex(-1234) would return '-' || to_hex(1234).

For this patch set, I was trying to retain the current behavior, which is
to return the two's complement representation.  I'm open to changing this
if there's agreement on what the output should be.


Ah, if there is existing behavior then we should probably keep it.

Re: persist logical slots to disk during shutdown checkpoint

2023-08-22 Thread Ashutosh Bapat

On Tue, Aug 22, 2023 at 3:42 PM Amit Kapila  wrote:
> >
> > Once we have last_persisted_confirm_flush_lsn, (1) is just an
> > optimization on top of that. With that we take the opportunity to
> > persist confirmed_flush_lsn which is much farther than the current
> > persisted value and thus improving chances of updating restart_lsn and
> > catalog_xmin faster after a WAL sender restart. We need to keep that
> > in mind when implementing (2). The problem is if we don't implement
> > (1) right now, we might just forget to do that small incremental
> > change in future. My preference is 1. Do both (1) and (2) together 2.
> > Do (2) first and then (1) as a separate commit. 3. Just implement (2)
> > if we don't have time at all for first two options.
> >
>
> I prefer one of (2) or (3). Anyway, it is better to do that
> optimization (persist confirm_flush_lsn at a regular interval) as a
> separate patch as we need to test and prove its value separately.

Fine with me.


-- 
Best Wishes,
Ashutosh Bapat

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-22 Thread Tom Lane

Richard Guo  writes:
> On Tue, Aug 22, 2023 at 4:48 AM Tom Lane  wrote:
>> So that led me to the attached v5, which seemed committable to me so I
>> set about back-patching it ... and it fell over immediately in v15, as
>> shown in the attached regression diffs from v15.  It looks to me like
>> we are now failing to recognize that reparameterized quals are
>> redundant with not-reparameterized ones, so this patch is somehow
>> dependent on restructuring that happened during the v16 cycle.
>> I don't have time to dig deeper than that, and I'm not sure that that
>> is an area we'd want to mess with in a back-patched bug fix.

> I looked at this and I think the culprit is that in create_nestloop_path
> we are failing to recognize those restrict_clauses that have been moved
> into the inner path.  In v16, we have the new serial number stuff to
> help detect such clauses and that works very well.  In v15, we are still
> using join_clause_is_movable_into() for that purpose.  It does not work
> well with the patch because now in create_nestloop_path the inner path
> is still parameterized by top-level parents, while the restrict_clauses
> already have been adjusted to refer to the child rels.  So the check
> performed by join_clause_is_movable_into() would always fail.

Ah.

> I'm wondering if we can instead adjust the 'inner_req_outer' in
> create_nestloop_path before we perform the check to work around this
> issue for the back branches, something like
> +   /*
> +* Adjust the parameterization information, which refers to the topmost
> +* parent.
> +*/
> +   inner_req_outer =
> +   adjust_child_relids_multilevel(root, inner_req_outer,
> +  outer_path->parent->relids,
> +  outer_path->parent->top_parent_relids);

Mmm ... at the very least you'd need to not do that when top_parent_relids
is unset.  But I think the risk/reward ratio for messing with this in the
back branches is unattractive in any case: to fix a corner case that
apparently nobody uses in the field, we risk breaking any number of
mainstream parameterized-path cases.  I'm content to commit the v5 patch
(or a successor) into HEAD, but at this point I'm not sure I even want
to risk it in v16, let alone perform delicate surgery to get it to work
in older branches.  I think we ought to go with the "tablesample scans
can't be reparameterized" approach in v16 and before.

regards, tom lane

Re: Make error messages about WAL segment size more consistent

2023-08-22 Thread Aleksander Alekseev

Hi Peter,

> This started out as a small patch to make pg_controldata use the logging
> API instead of printf statements, and then it became a larger patch to
> adjust error and warning messages about invalid WAL segment sizes
> (IsValidWalSegSize()) across the board.

Thanks for working on this.

> I went through and made the
> primary messages more compact and made the detail messages uniform.  In
> initdb.c and pg_resetwal.c, I use the newish option_parse_int() to
> simplify some of the option parsing.  For the backend GUC
> wal_segment_size, I added a GUC check hook to do the verification
> instead of coding it in bootstrap.c.  This might be overkill, but that
> way the check is in the right place and it becomes more self-documenting.

I reviewed the code and tested it on Linux and MacOS with Autotools
and Meson. The patch LGTM.

--
Best regards,
Aleksander Alekseev

Re: [PATCH] Add function to_oct

2023-08-22 Thread Nathan Bossart

On Tue, Aug 22, 2023 at 04:20:02PM +0200, Peter Eisentraut wrote:
> On 20.08.23 17:25, Nathan Bossart wrote:
>> > Doing a quick test, shows that this changes the current behaviour,
>> > because all inputs are now treated as 64-bit:
>> > 
>> > HEAD:
>> > 
>> > select to_hex((-1234)::int);
>> >to_hex
>> > --
>> >   fb2e
>> > 
>> > With patch:
>> > 
>> > select to_hex((-1234)::int);
>> >to_hex
>> > --
>> >   fb2e
>> Good catch.  In v8, I fixed this by first casting the input to uint32 for
>> the 32-bit versions of the functions.  This prevents the conversion to
>> uint64 from setting the rest of the bits.  AFAICT this behavior is pretty
>> well defined in the standard.
> 
> What standard?

C99

> I don't understand the reason for this handling of negative values.  I would
> expect that, say, to_hex(-1234) would return '-' || to_hex(1234).

For this patch set, I was trying to retain the current behavior, which is
to return the two's complement representation.  I'm open to changing this
if there's agreement on what the output should be.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: list of acknowledgments for PG16

2023-08-22 Thread Peter Eisentraut


On 22.08.23 15:29, Tom Lane wrote:

Alvaro Herrera  writes:

Yeah, I've been proposing this kind of thing for many years; the
problem, until not long ago, was that the tooling was unable to process
non-Latin1 characters in all the output formats that we use.  But
tooling has changed and the oldest platforms have disappeared, so maybe
it works now; do you want to inject some Chinese, Cyrillic, Japanese
names and give it a spin?  At least HTML and PDF need to work correctly.


I'm pretty sure the PDF toolchain still fails on non-Latin1 characters.
At least it does the way I have it installed; maybe adding some
non-default dependencies would help?


See here: 
https://www.postgresql.org/message-id/f58a0973-6e06-65de-8fb8-b3b93518b...@2ndquadrant.com

Re: [PATCH] Add function to_oct

2023-08-22 Thread Peter Eisentraut


On 20.08.23 17:25, Nathan Bossart wrote:

Doing a quick test, shows that this changes the current behaviour,
because all inputs are now treated as 64-bit:

HEAD:

select to_hex((-1234)::int);
   to_hex
--
  fb2e

With patch:

select to_hex((-1234)::int);
   to_hex
--
  fb2e

Good catch.  In v8, I fixed this by first casting the input to uint32 for
the 32-bit versions of the functions.  This prevents the conversion to
uint64 from setting the rest of the bits.  AFAICT this behavior is pretty
well defined in the standard.


What standard?

I don't understand the reason for this handling of negative values.  I 
would expect that, say, to_hex(-1234) would return '-' || to_hex(1234).

Re: add timing information to pg_upgrade

2023-08-22 Thread Nathan Bossart

On Tue, Aug 22, 2023 at 11:49:33AM +0200, Peter Eisentraut wrote:
> Let's change MESSAGE_WIDTH to 62 in v16, and then pursue the larger
> restructuring leisurely.

Sounds good.  I plan to commit this within the next couple of days.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: list of acknowledgments for PG16

2023-08-22 Thread Joe Conway


On 8/22/23 09:44, Tom Lane wrote:

Alvaro Herrera  writes:

On 2023-Aug-22, Peter Eisentraut wrote:

The list of acknowledgments for the PG16 release notes has been committed.
It should show up here sometime: 
.



Hmm, I think these docs would only regenerate during the RC1 release, so
it'll be a couple of weeks, unless we manually poke the doc builder.


Yeah.  I could produce a new set of tarballs from the v16 branch tip,
but I don't know the process (nor have the admin permissions) to
extract the HTML docs and put them on the website.



These days the docs update is part of a scripted process for doing an 
entire release.


I'm sure we could figure out how to just release the updated docs, but 
with RC1 a week away, is it really worthwhile?


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Make --help output fit within 80 columns per line

2023-08-22 Thread torikoshia


On 2023-08-21 13:08, Masahiro Ikeda wrote:
Thanks for your review!


(1)

Why don't you add test for the purpose? It could be overkill...
I though the following function is the best place.

diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm
b/src/test/perl/PostgreSQL/Test/Utils.pm
index 617caa022f..1bdb81ac56 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -843,6 +843,10 @@ sub program_help_ok
ok($result, "$cmd --help exit code 0");
isnt($stdout, '', "$cmd --help goes to stdout");
is($stderr, '', "$cmd --help nothing to stderr");
+   foreach my $line (split /\n/, $stdout)
+   {
+   ok(length($line) <= 80, "$cmd --help output fit within
80 columns per line");
+   }
return;
 }


Agreed.


(2)

Is there any reason that only src/bin commands are targeted? I found 
that
we also need to fix vacuumlo with the above test. I think it's better 
to

fix it because it's a contrib module.

$ vacuumlo --help | while IFS='' read line; do echo $((`echo $line |
wc -m` - 1)) $line; done | sort -n -r  | head -n 2
84   -n, --dry-run don't remove large objects, just show
what would be done
74   -l, --limit=LIMIT commit after removing each LIMIT large 
objects


This is because I wasn't sure making all --help outputs fit into 80 
columns per line is right thing to do as described below:


| If this is the way to go, I'll do same things for contrib commands.

If there are no objection, I'm going to make other commands fit within 
80 columns per line including (4).



(3)

Is to delete '/mnt/server' intended?  I though it better to leave it as
is since archive_cleanup_command example uses the absolute path.

-"  pg_archivecleanup /mnt/server/archiverdir
00010010.0020.backup\n"));
+"  pg_archivecleanup archiverdir
00010010.0020.backup\n"));

I will confirmed that the --help text are not changed and only
the line breaks are changed.  But, currently the above change
break it.


Yes, it is intended as described in the thread.

https://www.postgresql.org/message-id/20230615.152036.1556630042388070221.horikyota.ntt%40gmail.com

| We could shorten it by removing the "/mnt/server" portion, but
I'm not sure if it's worth doing.

However, I feel it is acceptable to make an exception and exceed 80 
characters for this line.



(4)

I found that some binaries, for example ecpg, are not tested with
program_help_ok(). Is it better to add tests in the patch?


Agreed.


BTW, I check the difference with the following commands
# files include "--help"
$ find -name "*.c" | xargs -I {} sh -c 'if [ `grep -e --help {} | wc
-l` -gt 0 ]; then echo {}; fi'

# programs which is tested with program_help_ok
$ find -name "*.pl" | xargs -I {} sh -c 'grep -e program_help_ok {}'


Thanks for sharing your procedure!

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: list of acknowledgments for PG16

2023-08-22 Thread Vik Fearing


On 8/22/23 11:33, Peter Eisentraut wrote:
The list of acknowledgments for the PG16 release notes has been 
committed.  It should show up here sometime: 
.  As usual, please check for problems such as wrong sorting, duplicate names in different variants, or names in the wrong order etc.  (Our convention is given name followed by surname.)


I think these might be the same person:

Zhihong Yu
Zihong Yu

I did not spot any others.
--
Vik Fearing

Re: list of acknowledgments for PG16

2023-08-22 Thread Tom Lane

Alvaro Herrera  writes:
> On 2023-Aug-22, Peter Eisentraut wrote:
>> The list of acknowledgments for the PG16 release notes has been committed.
>> It should show up here sometime: 
>> .

> Hmm, I think these docs would only regenerate during the RC1 release, so
> it'll be a couple of weeks, unless we manually poke the doc builder.

Yeah.  I could produce a new set of tarballs from the v16 branch tip,
but I don't know the process (nor have the admin permissions) to
extract the HTML docs and put them on the website.

regards, tom lane

Make error messages about WAL segment size more consistent

2023-08-22 Thread Peter Eisentraut

This started out as a small patch to make pg_controldata use the logging 
API instead of printf statements, and then it became a larger patch to 
adjust error and warning messages about invalid WAL segment sizes 
(IsValidWalSegSize()) across the board.  I went through and made the 
primary messages more compact and made the detail messages uniform.  In 
initdb.c and pg_resetwal.c, I use the newish option_parse_int() to 
simplify some of the option parsing.  For the backend GUC 
wal_segment_size, I added a GUC check hook to do the verification 
instead of coding it in bootstrap.c.  This might be overkill, but that 
way the check is in the right place and it becomes more self-documenting.From f5a933aa4ea7980c3df6d74d845a95f2ce0d5153 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 22 Aug 2023 15:16:28 +0200
Subject: [PATCH] Make error messages about WAL segment size more consistent

In passing, make pg_controldata use the logging API for warning
messages.
---
 src/backend/access/transam/xlog.c | 19 +++---
 src/backend/bootstrap/bootstrap.c | 11 +---
 src/backend/utils/misc/guc_tables.c   |  2 +-
 src/bin/initdb/initdb.c   | 25 +--
 src/bin/pg_basebackup/streamutil.c|  5 ++--
 src/bin/pg_controldata/pg_controldata.c   | 23 -
 .../pg_controldata/t/001_pg_controldata.pl|  6 ++---
 src/bin/pg_resetwal/Makefile  |  2 ++
 src/bin/pg_resetwal/pg_resetwal.c | 18 +++--
 src/bin/pg_rewind/pg_rewind.c | 12 ++---
 src/bin/pg_waldump/pg_waldump.c   | 12 ++---
 src/include/utils/guc_hooks.h |  1 +
 12 files changed, 71 insertions(+), 65 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 60c0b7ec3a..f6f8adc72a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1995,6 +1995,18 @@ assign_checkpoint_completion_target(double newval, void 
*extra)
CalculateCheckpointSegments();
 }
 
+bool
+check_wal_segment_size(int *newval, void **extra, GucSource source)
+{
+   if (!IsValidWalSegSize(*newval))
+   {
+   GUC_check_errdetail("The WAL segment size must be a power of 
two between 1 MB and 1 GB.");
+   return false;
+   }
+
+   return true;
+}
+
 /*
  * At a checkpoint, how many WAL segments to recycle as preallocated future
  * XLOG segments? Returns the highest segment that should be preallocated.
@@ -4145,10 +4157,11 @@ ReadControlFile(void)
 
if (!IsValidWalSegSize(wal_segment_size))
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-   errmsg_plural("WAL segment size 
must be a power of two between 1 MB and 1 GB, but the control file specifies %d 
byte",
- "WAL 
segment size must be a power of two between 1 MB and 1 GB, but the control file 
specifies %d bytes",
+   errmsg_plural("invalid WAL 
segment size in control file (%d byte)",
+ 
"invalid WAL segment size in control file (%d bytes)",
  
wal_segment_size,
- 
wal_segment_size)));
+ 
wal_segment_size),
+   errdetail("The WAL segment size 
must be a power of two between 1 MB and 1 GB.")));
 
snprintf(wal_segsz_str, sizeof(wal_segsz_str), "%d", wal_segment_size);
SetConfigOption("wal_segment_size", wal_segsz_str, PGC_INTERNAL,
diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index 49e956b2c5..c9ed649d0d 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -280,16 +280,7 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
strlcpy(OutputFileName, optarg, MAXPGPATH);
break;
case 'X':
-   {
-   int WalSegSz = 
strtoul(optarg, NULL, 0);
-
-   if (!IsValidWalSegSize(WalSegSz))
-   ereport(ERROR,
-   
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-errmsg("-X 
requires a power of two value between 1 MB and 1 GB")));
-   SetConfigOption("wal_segment_size", 
optarg, PGC_INTERNAL,
-

Re: list of acknowledgments for PG16

2023-08-22 Thread Vik Fearing


On 8/22/23 15:29, Tom Lane wrote:

Alvaro Herrera  writes:

Yeah, I've been proposing this kind of thing for many years; the
problem, until not long ago, was that the tooling was unable to process
non-Latin1 characters in all the output formats that we use.  But
tooling has changed and the oldest platforms have disappeared, so maybe
it works now; do you want to inject some Chinese, Cyrillic, Japanese
names and give it a spin?  At least HTML and PDF need to work correctly.


I'm pretty sure the PDF toolchain still fails on non-Latin1 characters.
At least it does the way I have it installed; maybe adding some
non-default dependencies would help?


I am struggling to find documentation on how to build the pdfs with 
meson.  Any pointers?

--
Vik Fearing

Re: list of acknowledgments for PG16

2023-08-22 Thread Tom Lane

Alvaro Herrera  writes:
> Yeah, I've been proposing this kind of thing for many years; the
> problem, until not long ago, was that the tooling was unable to process
> non-Latin1 characters in all the output formats that we use.  But
> tooling has changed and the oldest platforms have disappeared, so maybe
> it works now; do you want to inject some Chinese, Cyrillic, Japanese
> names and give it a spin?  At least HTML and PDF need to work correctly.

I'm pretty sure the PDF toolchain still fails on non-Latin1 characters.
At least it does the way I have it installed; maybe adding some
non-default dependencies would help?

regards, tom lane

Re: Make all Perl warnings fatal

2023-08-22 Thread Alvaro Herrera

On 2023-Aug-10, Peter Eisentraut wrote:

> I wanted to figure put if we can catch these more reliably, in the style of
> -Werror.  AFAICT, there is no way to automatically turn all warnings into
> fatal errors.  But there is a way to do it per script, by replacing
> 
> use warnings;
> 
> by
> 
> use warnings FATAL => 'all';
> 
> See attached patch to try it out.

BTW in case we do find that there's some unforeseen problem and we want
to roll back, it would be great to have a way to disable this without
having to edit every single Perl file again later.  However, I didn't
find a way to do it -- I thought about creating a separate PgWarnings.pm
file that would do the "use warnings FATAL => 'all'" dance and which
every other Perl file would use or include; but couldn't make it work.
Maybe some Perl expert knows a good answer to this.

Maybe the BEGIN block of each file can `eval` a new PgWarnings.pm that
emits the "use warnings" line?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No renuncies a nada. No te aferres a nada."

Re: Make all Perl warnings fatal

2023-08-22 Thread Andrew Dunstan



On 2023-08-22 Tu 00:05, Michael Paquier wrote:

On Mon, Aug 21, 2023 at 11:51:24AM -0400, Andrew Dunstan wrote:

It's not really the same as -Werror, because many warnings can be generated
at runtime rather than compile-time.

Still, I guess that might not matter too much since apart from plperl we
only use perl for building / testing.

However, is it possible to trust the out-of-core perl modules posted
on CPAN, assuming that these will never produce warnings?  I've never
seen any issues with IPC::Run in these last years, so perhaps that's
OK in the long-run.



If we do find any such issues then warnings can be turned off locally. 
We already do that in several places.




Regarding the dangers mentioned, I guess we can undo it if it proves a
nuisance.

Yeah.  I am wondering what the buildfarm would say with this change.


+1 to getting rid if the unnecessary call to getprotobyname().

Looking around here..
https://perldoc.perl.org/perlipc#Sockets%3A-Client%2FServer-Communication

Hmm.  Are you sure that this is OK even in the case where the TAP
tests run on Windows without unix-domain socket support?  The CI runs
on Windows, but always with unix domain sockets around as far as I
know.



The socket call in question is for a PF_INET socket, so this has nothing 
at all to do with unix domain sockets. See the man page for socket() (2) 
for an explanation of why 0 is ok in this case. (There's only one 
protocol that matches the rest of the parameters).



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Fix the error message when failing to restore the snapshot

2023-08-22 Thread Amit Kapila

On Tue, Aug 22, 2023 at 6:09 PM Zhijie Hou (Fujitsu)
 wrote:
>
> While testing the logical snapshot restore functionality, I noticed the
> data size reported in the error message seems not correct.
>
> I think it's because we used a const value here:
>
> SnapBuildRestoreContents(int fd, char *dest, Size size, const char *path)
> ...
> readBytes = read(fd, dest, size);
> pgstat_report_wait_end();
> if (readBytes != size)
> ...
> ereport(ERROR,
> (errcode(ERRCODE_DATA_CORRUPTED),
>  errmsg("could not read file \"%s\": 
> read %d of %zu",
> **  path, readBytes, * 
> sizeof(SnapBuild) *)));
>
> I think we need to pass the size here.
>

Good catch. I'll take care of this.

-- 
With Regards,
Amit Kapila.

Re: list of acknowledgments for PG16

2023-08-22 Thread Alvaro Herrera

On 2023-Aug-22, Vik Fearing wrote:

> On 8/22/23 11:33, Peter Eisentraut wrote:
> > As usual, please check for problems such as wrong sorting, duplicate
> > names in different variants, or names in the wrong order etc.  (Our
> > convention is given name followed by surname.)
> 
> Not necessarily for this time around, but I would like to see this
> convention be a bit more inclusive of other cultures.  My proposed solution
> is to list them the same way we do now, but also have in parentheses or
> something their name in their native order and script.

Yeah, I've been proposing this kind of thing for many years; the
problem, until not long ago, was that the tooling was unable to process
non-Latin1 characters in all the output formats that we use.  But
tooling has changed and the oldest platforms have disappeared, so maybe
it works now; do you want to inject some Chinese, Cyrillic, Japanese
names and give it a spin?  At least HTML and PDF need to work correctly.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)

Fix the error message when failing to restore the snapshot

2023-08-22 Thread Zhijie Hou (Fujitsu)

Hi,

While testing the logical snapshot restore functionality, I noticed the
data size reported in the error message seems not correct.

I think it's because we used a const value here:

SnapBuildRestoreContents(int fd, char *dest, Size size, const char *path)
...
readBytes = read(fd, dest, size);
pgstat_report_wait_end();
if (readBytes != size)
...
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg("could not read file \"%s\": 
read %d of %zu",
**  path, readBytes, * 
sizeof(SnapBuild) *)));

I think we need to pass the size here.

Attach a small patch to fix this. BTW, the error message exists in HEAD ~ PG10.

Best Regards,
Hou zj



0001-Fix-the-error-message-when-failing-to-restore-the-sn.patch
Description:  0001-Fix-the-error-message-when-failing-to-restore-the-sn.patch

Re: Extract numeric filed in JSONB more effectively

2023-08-22 Thread Chapman Flack


On 2023-08-22 01:54, Andy Fan wrote:

After we label it, we will get error like this:

select (a->'a')::int4 from m;
ERROR:  cannot display a value of type internal


Without looking in depth right now, I would double-check
what relabel node is being applied at the result. The idea,
of course, was to relabel the result as the expected result
type, not internal.

(Or, as in the restructuring suggested earlier, to use a
finish function whose return type is already as expected,
and needs no relabeling.)

Regards,
-Chap

Re: Direct I/O

2023-08-22 Thread Peter Eisentraut


On 01.05.23 04:47, Thomas Munro wrote:

On Mon, May 1, 2023 at 12:00 PM Tom Lane  wrote:

Justin Pryzby  writes:

On Sun, Apr 30, 2023 at 06:35:30PM +1200, Thomas Munro wrote:

What about a
warning message about that at startup if it's on?



Such a warning wouldn't be particularly likely to be seen by someone who
already didn't read/understand the docs for the not-feature that they
turned on.


Yeah, I doubt that that would be helpful at all.


Fair enough.


Since this is -currently- a developer-only feature, it seems reasonable
to rename the GUC to debug_direct_io, and (at such time as it's
considered to be helpful to users) later rename it to direct_io.


+1


Yeah, the future cross-version confusion thing is compelling.  OK,
here's a rename patch.  I left all the variable names and related
symbols as they were, so it's just the GUC that gains the prefix.  I
moved the documentation hunk up to be sorted alphabetically like
nearby entries, because that seemed to look nicer, even though the
list isn't globally sorted.


I suggest to also rename the hook functions (check and assign), like in 
the attached patch.  Mainly because utils/guc_hooks.h says to order the 
functions by GUC variable name, which was already wrong under the old 
name, but it would be pretty confusing to sort the functions by their 
GUC name that doesn't match the function names.From b549b5972410f28a375325e09f022f703afc12ab Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 22 Aug 2023 14:12:45 +0200
Subject: [PATCH] Rename hook functions for debug_io_direct to match variable
 name

---
 src/backend/storage/file/fd.c   | 6 +++---
 src/backend/utils/misc/guc_tables.c | 6 +++---
 src/include/utils/guc_hooks.h   | 4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 3c2a2fbef7..16b3e8f905 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3886,7 +3886,7 @@ data_sync_elevel(int elevel)
 }
 
 bool
-check_io_direct(char **newval, void **extra, GucSource source)
+check_debug_io_direct(char **newval, void **extra, GucSource source)
 {
boolresult = true;
int flags;
@@ -3960,7 +3960,7 @@ check_io_direct(char **newval, void **extra, GucSource 
source)
if (!result)
return result;
 
-   /* Save the flags in *extra, for use by assign_io_direct */
+   /* Save the flags in *extra, for use by assign_debug_io_direct */
*extra = guc_malloc(ERROR, sizeof(int));
*((int *) *extra) = flags;
 
@@ -3968,7 +3968,7 @@ check_io_direct(char **newval, void **extra, GucSource 
source)
 }
 
 extern void
-assign_io_direct(const char *newval, void *extra)
+assign_debug_io_direct(const char *newval, void *extra)
 {
int*flags = (int *) extra;
 
diff --git a/src/backend/utils/misc/guc_tables.c 
b/src/backend/utils/misc/guc_tables.c
index 15b51f2c5b..45b93fe0f9 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -563,7 +563,7 @@ static char *datestyle_string;
 static char *server_encoding_string;
 static char *server_version_string;
 static int server_version_num;
-static char *io_direct_string;
+static char *debug_io_direct_string;
 
 #ifdef HAVE_SYSLOG
 #defineDEFAULT_SYSLOG_FACILITY LOG_LOCAL0
@@ -4544,9 +4544,9 @@ struct config_string ConfigureNamesString[] =
NULL,
GUC_LIST_INPUT | GUC_NOT_IN_SAMPLE
},
-   &io_direct_string,
+   &debug_io_direct_string,
"",
-   check_io_direct, assign_io_direct, NULL
+   check_debug_io_direct, assign_debug_io_direct, NULL
},
 
/* End-of-list marker */
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 2ecb9fc086..952293a1c3 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -49,6 +49,8 @@ extern bool check_cluster_name(char **newval, void **extra, 
GucSource source);
 extern const char *show_data_directory_mode(void);
 extern bool check_datestyle(char **newval, void **extra, GucSource source);
 extern void assign_datestyle(const char *newval, void *extra);
+extern bool check_debug_io_direct(char **newval, void **extra, GucSource 
source);
+extern void assign_debug_io_direct(const char *newval, void *extra);
 extern bool check_default_table_access_method(char **newval, void **extra,

  GucSource source);
 extern bool check_default_tablespace(char **newval, void **extra,
@@ -157,7 +159,5 @@ extern bool check_wal_consistency_checking(char **newval, 
void **extra,

   GucSource source);
 extern void assign_wal_consistency_checking(const char *newval, void *extra);
 exte

Re: list of acknowledgments for PG16

2023-08-22 Thread Vik Fearing


On 8/22/23 11:33, Peter Eisentraut wrote:

As usual, please check for problems such as wrong sorting, duplicate names in 
different variants, or names in the wrong order etc.  (Our convention is given 
name followed by surname.)


Not necessarily for this time around, but I would like to see this 
convention be a bit more inclusive of other cultures.  My proposed 
solution is to list them the same way we do now, but also have in 
parentheses or something their name in their native order and script.

--
Vik Fearing

Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-08-22 Thread Vik Fearing


On 8/2/23 12:35, Amul Sul wrote:

Hi,

Currently, we have an option to drop the expression of stored generated
columns
as:

ALTER [ COLUMN ] column_name DROP EXPRESSION [ IF EXISTS ]

But don't have support to update that expression. The attached patch
provides
that as:

ALTER [ COLUMN ] column_name SET EXPRESSION expression


I am surprised this is not in the standard already.  I will go work on that.
--
Vik Fearing

Re: list of acknowledgments for PG16

2023-08-22 Thread Alvaro Herrera

On 2023-Aug-22, Peter Eisentraut wrote:

> The list of acknowledgments for the PG16 release notes has been committed.
> It should show up here sometime: 
> .
> As usual, please check for problems such as wrong sorting, duplicate names
> in different variants, or names in the wrong order etc.  (Our convention is
> given name followed by surname.)

Hmm, I think these docs would only regenerate during the RC1 release, so
it'll be a couple of weeks, unless we manually poke the doc builder.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Here's a general engineering tip: if the non-fun part is too complex for you
to figure out, that might indicate the fun part is too ambitious." (John Naylor)
https://postgr.es/m/CAFBsxsG4OWHBbSDM%3DsSeXrQGOtkPiOEOuME4yD7Ce41NtaAD9g%40mail.gmail.com

Re: persist logical slots to disk during shutdown checkpoint

2023-08-22 Thread Amit Kapila

On Tue, Aug 22, 2023 at 2:56 PM Ashutosh Bapat
 wrote:
>
> On Tue, Aug 22, 2023 at 9:48 AM Amit Kapila  wrote:
> > >
> > > Another idea is to record the confirm_flush_lsn at the time of
> > > persisting the slot. We can use it in two different ways 1. to mark a
> > > slot dirty and persist if the last confirm_flush_lsn when slot was
> > > persisted was too far from the current confirm_flush_lsn of the slot.
> > > 2. at shutdown checkpoint, persist all the slots which have these two
> > > confirm_flush_lsns different.
> > >
> >
> > I think using it in the second (2) way sounds advantageous as compared
> > to storing another dirty flag because this requires us to update
> > last_persisted_confirm_flush_lsn only while writing the slot info.
> > OTOH, having a flag dirty_for_shutdown_checkpoint will require us to
> > update it each time we update confirm_flush_lsn under spinlock at
> > multiple places. But, I don't see the need of doing what you proposed
> > in (1) as the use case for it is very minor, basically this may
> > sometimes help us to avoid decoding after crash recovery.
>
> Once we have last_persisted_confirm_flush_lsn, (1) is just an
> optimization on top of that. With that we take the opportunity to
> persist confirmed_flush_lsn which is much farther than the current
> persisted value and thus improving chances of updating restart_lsn and
> catalog_xmin faster after a WAL sender restart. We need to keep that
> in mind when implementing (2). The problem is if we don't implement
> (1) right now, we might just forget to do that small incremental
> change in future. My preference is 1. Do both (1) and (2) together 2.
> Do (2) first and then (1) as a separate commit. 3. Just implement (2)
> if we don't have time at all for first two options.
>

I prefer one of (2) or (3). Anyway, it is better to do that
optimization (persist confirm_flush_lsn at a regular interval) as a
separate patch as we need to test and prove its value separately.

-- 
With Regards,
Amit Kapila.

Re: Support run-time partition pruning for hash join

2023-08-22 Thread Richard Guo

On Tue, Aug 22, 2023 at 2:38 PM David Rowley  wrote:

> With Hash Join, it seems to me that the pruning must take place for
> every row that makes it into the hash table.  There will be maybe
> cases where the unioned set of partitions simply yields every
> partition and all the work results in no savings. Pruning on a scalar
> value seems much more likely to be able to prune away unneeded
> Append/MergeAppend subnodes.

Yeah, you're right.  If we have 'pt HashJoin t', for a subnode of 'pt'
to be pruned, it needs every row of 't' to be able to prune that
subnode.  The situation may improve if we have more than 2-way hash
joins, because the final surviving subnodes would be the intersection of
matching subnodes in each Hash.

With parameterized nestloop I agree that it's more likely to be able to
prune subnodes at rescan of Append/MergeAppend nodes based on scalar
values.

Sometimes we may just not generate parameterized nestloop as final plan,
such as when there are no indexes and no lateral references in the
Append/MergeAppend node.  In this case I think it would be great if we
can still do some partition prunning.  So I think this new 'join
partition prunning mechanism' (maybe this is not a proper name) should
be treated as a supplement to, not a substitute for, the current
run-time partition prunning based on parameterized nestloop, and it is
so implemented in the patch.

> Perhaps there can be something adaptive in Hash Join which stops
> trying to prune when all partitions must be visited.  On a quick
> glance of the patch, I don't see any code in ExecJoinPartitionPrune()
> which gives up trying to prune when the number of members in
> part_prune_result is equal to the prunable Append/MergeAppend
> subnodes.

Yeah, we can do that.

> But run-time pruning already works for Nested Loops... I must be
> missing something here.

Here I mean nestloop with non-parameterized inner path.  As I explained
upthread, we need to have a Material node on the outer side for that to
work, which seems not possible in real world.

Thanks
Richard

Re: add timing information to pg_upgrade

2023-08-22 Thread Peter Eisentraut


On 01.08.23 17:45, Nathan Bossart wrote:

On Tue, Aug 01, 2023 at 09:46:02AM +0200, Peter Eisentraut wrote:

On 31.07.23 20:37, Nathan Bossart wrote:

-   prep_status("Checking for incompatible \"aclitem\" data type in user 
tables");
+   prep_status("Checking for \"aclitem\" data type in user tables");


Why these changes?  I think this is losing precision about what it's doing.


The message is too long, so there's no space between it and the "ok"
message:

Checking for incompatible "aclitem" data type in user tablesok

Instead of altering the messages, we could bump MESSAGE_WIDTH from 60 to
62 or 64.  Do you prefer that approach?  (BTW this probably needs to be
back-patched to v16.)


Let's change MESSAGE_WIDTH to 62 in v16, and then pursue the larger 
restructuring leisurely.

Re: Support run-time partition pruning for hash join

2023-08-22 Thread Richard Guo

On Mon, Aug 21, 2023 at 8:34 PM Andy Fan  wrote:

> This feature looks good, but is it possible to know if we can prune
> any subnodes before we pay the extra effort (building the Hash
> table, for each row... stuff)?
>

It might be possible if we take the partition prunning into
consideration when estimating costs.  But it seems not easy to calculate
the costs accurately.

> Maybe at least,  if we have found no subnodes can be skipped
> during the hashing, we can stop doing such work anymore.
>

Yeah, this is what we can do.

> In my current knowledge, we have to build the inner table first for this
> optimization?  so hash join and sort merge should be OK, but nestloop
> should
> be impossible unless I missed something.
>

For nestloop and mergejoin, we'd always execute the outer side first.
So the Append/MergeAppend nodes need to be on the inner side for the
join partition prunning to take effect.  For a mergejoin that will
explicitly sort the outer side, the sort node would process all the
outer rows before scanning the inner side, so we can do the join
partition prunning with that.  For a nestloop, if we have a Material
node on the outer side, we can do that too, but I wonder if we'd have
such a plan in real world, because we only add Material to the inner
side of nestloop.

Thanks
Richard

list of acknowledgments for PG16

2023-08-22 Thread Peter Eisentraut

The list of acknowledgments for the PG16 release notes has been 
committed.  It should show up here sometime: 
. 
 As usual, please check for problems such as wrong sorting, duplicate 
names in different variants, or names in the wrong order etc.  (Our 
convention is given name followed by surname.)

Re: persist logical slots to disk during shutdown checkpoint

2023-08-22 Thread Ashutosh Bapat

On Tue, Aug 22, 2023 at 9:48 AM Amit Kapila  wrote:
> >
> > Another idea is to record the confirm_flush_lsn at the time of
> > persisting the slot. We can use it in two different ways 1. to mark a
> > slot dirty and persist if the last confirm_flush_lsn when slot was
> > persisted was too far from the current confirm_flush_lsn of the slot.
> > 2. at shutdown checkpoint, persist all the slots which have these two
> > confirm_flush_lsns different.
> >
>
> I think using it in the second (2) way sounds advantageous as compared
> to storing another dirty flag because this requires us to update
> last_persisted_confirm_flush_lsn only while writing the slot info.
> OTOH, having a flag dirty_for_shutdown_checkpoint will require us to
> update it each time we update confirm_flush_lsn under spinlock at
> multiple places. But, I don't see the need of doing what you proposed
> in (1) as the use case for it is very minor, basically this may
> sometimes help us to avoid decoding after crash recovery.

Once we have last_persisted_confirm_flush_lsn, (1) is just an
optimization on top of that. With that we take the opportunity to
persist confirmed_flush_lsn which is much farther than the current
persisted value and thus improving chances of updating restart_lsn and
catalog_xmin faster after a WAL sender restart. We need to keep that
in mind when implementing (2). The problem is if we don't implement
(1) right now, we might just forget to do that small incremental
change in future. My preference is 1. Do both (1) and (2) together 2.
Do (2) first and then (1) as a separate commit. 3. Just implement (2)
if we don't have time at all for first two options.

-- 
Best Wishes,
Ashutosh Bapat

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-22 Thread Peter Smith

Hi Kuroda-san.

I already posted a review for v22-0003 earlier today, but v23-0003 was
already posted so those are not yet addressed.

Here are a few more review comments I noticed when looking at the latest
v23-0003.

==
src/bin/pg_upgrade/check.c

1.
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"

Was this #include needed here? I noticed you've already included the same
in the "pg_upgrade.h".

~~~

2. check_for_lost_slots

+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

I can't quite describe my doubts about this, but something seems a bit
strange. Didn't we already iterate every single slot in all DBs in the
earlier function get_logical_slot_infos_per_db()? There we were only
looking for wal_status <> 'lost', but we could have got *every* wal_status
and also detected these 'lost' ones at the same time up-front, instead of
having this extra function with more SQL to do pretty much the same SELECT.

Perhaps coding the current way there is a clear separation of the fetching
code and the checking code, and that might be the best approach, but it
somehow seems a shame/waste to be executing almost the same slots data with
the same SQL 2x, so I wondered if there is a better way to arrange this.

==
src/bin/pg_upgrade/info.c

3. get_logical_slot_infos

+
+ /* Do additional checks if slots are found */
+ if (slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

Aren't these checks only intended for checking the 'old_cluster'? But
AFAICT they are not guarded here so they will be executed by both sides.
Previously (in my review of v22-0003) I suggested these calls maybe
belonged in the calling function check_and_dump_old_cluster(). I think that.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Adding a LogicalRepWorker type field

2023-08-22 Thread Amit Kapila

On Mon, Aug 21, 2023 at 3:48 PM Amit Kapila  wrote:
>
> On Mon, Aug 21, 2023 at 5:34 AM Peter Smith  wrote:
> >
> > PSA v9
> >
>
> LGTM. I'll push this tomorrow unless there are any more comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Convert encrypted SSL test keys to PKCS#8 format

2023-08-22 Thread Peter Eisentraut

This is part of the larger project of allowing all test suites to pass 
in OpenSSL FIPS mode.  We had previously integrated several patches that 
avoid or isolate use of MD5 in various forms in the tests.  Now to 
another issue.


OpenSSL in FIPS mode rejects several encrypted private keys used in the 
test suites ssl and ssl_passphrase_callback.  The reason for this is 
explained in [0]:


> Technically you shouldn't use keys created outside FIPS mode in FIPS
> mode.
>
> In FIPS mode the "traditional" format is not supported because it used
> MD5 for key derivation. The more standard PKCS#8 mode using SHA1 for
> key derivation is use instead. You can convert keys using the pkcs8
> command outside FIPS mode but again technically you aren't supposed
> to...

[0]: 
https://groups.google.com/g/mailing.openssl.users/c/Sd5E8VY5O2s/m/QYGezoQeo84J


The affected files are

src/test/modules/ssl_passphrase_callback/server.key
src/test/ssl/ssl/client-encrypted-pem.key
src/test/ssl/ssl/server-password.key

A fix is to convert them from their existing PKCS#1 format to the PKCS#8 
format, like this:


openssl pkcs8 -topk8 -in 
src/test/modules/ssl_passphrase_callback/server.key -passin pass:FooBaR1 
-out src/test/modules/ssl_passphrase_callback/server.key.new -passout 
pass:FooBaR1
mv src/test/modules/ssl_passphrase_callback/server.key.new 
src/test/modules/ssl_passphrase_callback/server.key


etc.

(Fun fact: The above command also doesn't work if your OpenSSL 
installation is in FIPS mode because it will refuse to read the old file.)


We should also update the generation rules to generate the newer format, 
like this:


-   $(OPENSSL) rsa -aes256 -in server.ckey -out server.key -passout 
pass:$(PASS)
+   $(OPENSSL) pkey -aes256 -in server.ckey -out server.key -passout 
pass:$(PASS)


I have attached two patches, one to update the generation rules, and one 
where I have converted the existing test files.  (I didn't generate them 
from scratch, so for example 
src/test/modules/ssl_passphrase_callback/server.crt that corresponds to 
one of the keys does not need to be updated.)


To check that these new files are backward compatible, I have 
successfully tested them on CentOS 7 with the included version 1.0.2k.


It's also interesting that if you generate all private keys from scratch 
using the existing rules on a new OpenSSL version (3+), they will be 
generated in PKCS#8 format by default.  In those OpenSSL versions, the 
openssl-rsa command has a -traditional option to get the old format, but 
of course old OpenSSL versions don't have that.  As OpenSSL 3 gets more 
widespread, we might need to rethink these rules anyway to make sure we 
get consistent behavior.From 6da8ecf7cb4c5bce6c00ee7d85443ac082d6aaeb Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 22 Aug 2023 09:25:34 +0200
Subject: [PATCH 1/2] Generate encrypted SSL test keys in PKCS#8 format

---
 src/test/modules/ssl_passphrase_callback/Makefile| 2 +-
 src/test/modules/ssl_passphrase_callback/meson.build | 2 +-
 src/test/ssl/sslfiles.mk | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/test/modules/ssl_passphrase_callback/Makefile 
b/src/test/modules/ssl_passphrase_callback/Makefile
index 922f0ee078..40ed38dc70 100644
--- a/src/test/modules/ssl_passphrase_callback/Makefile
+++ b/src/test/modules/ssl_passphrase_callback/Makefile
@@ -33,7 +33,7 @@ PASS = FooBaR1
 ssl-files:
$(OPENSSL) req -new -x509 -days 1 -nodes -out server.crt \
-keyout server.ckey -subj "/CN=localhost"
-   $(OPENSSL) rsa -aes256 -in server.ckey -out server.key -passout 
pass:$(PASS)
+   $(OPENSSL) pkey -aes256 -in server.ckey -out server.key -passout 
pass:$(PASS)
rm server.ckey
 
 ssl-files-clean:
diff --git a/src/test/modules/ssl_passphrase_callback/meson.build 
b/src/test/modules/ssl_passphrase_callback/meson.build
index c2a022b4f1..3e35f8cae0 100644
--- a/src/test/modules/ssl_passphrase_callback/meson.build
+++ b/src/test/modules/ssl_passphrase_callback/meson.build
@@ -40,7 +40,7 @@ if openssl.found()
   custom_target('server.key',
 input: [cert[1]],
 output: ['server.key'],
-command: [openssl, 'rsa', '-aes256', '-in', '@INPUT0@', '-out', 
'@OUTPUT0@', '-passout', 'pass:@0@'.format(pass)]
+command: [openssl, 'pkey', '-aes256', '-in', '@INPUT0@', '-out', 
'@OUTPUT0@', '-passout', 'pass:@0@'.format(pass)]
   )
 endif
 
diff --git a/src/test/ssl/sslfiles.mk b/src/test/ssl/sslfiles.mk
index f7ababe42c..569f1731cd 100644
--- a/src/test/ssl/sslfiles.mk
+++ b/src/test/ssl/sslfiles.mk
@@ -109,7 +109,7 @@ ssl/server-rsapss.crt: ssl/server-rsapss.key 
conf/server-rsapss.config
 
 # Password-protected version of server-cn-only.key
 ssl/server-password.key: ssl/server-cn-only.key
-   $(OPENSSL) rsa -aes256 -in $< -out $@ -passout 'pass:secret1'
+   $(OPENSSL) pkey -aes256 -in $< -out $@ -passout 'pass:secret1'
 
 # Key that uses the RSA-PSS algorithm
 ssl/serv

Re: Report planning memory in EXPLAIN ANALYZE

2023-08-22 Thread jian he

On Mon, Aug 14, 2023 at 3:13 PM Ashutosh Bapat
 wrote:
>
> On Mon, Aug 14, 2023 at 8:22 AM Andrey Lepikhov
>  wrote:
> >
> > Really, the current approach with the final value of consumed memory
> > smooths peaks of memory consumption. I recall examples likewise massive
> > million-sized arrays or reparameterization with many partitions where
> > the optimizer consumes much additional memory during planning.
> > Ideally, to dive into the planner issues, we should have something like
> > a report-in-progress in the vacuum, reporting on memory consumption at
> > each subquery and join level. But it looks too much for typical queries.
>
> Planner finishes usually finish within a second. When partitioning is
> involved it might take a few dozens of seconds but it's still within a
> minute and we are working to reduce that as well to a couple hundred
> milliseconds at max. Tracking memory usages during this small time may
> not be worth it. The tracking itself might make the planning
> in-efficient and we might still miss the spikes in memory allocations,
> if they are very short lived. If the planner runs for more than a few
> minutes, maybe we could add some tracking.
>
> --
> Best Wishes,
> Ashutosh Bapat
>
>

Hi. I tested it.
not sure if following is desired behavior. first run with explain,
then run with explain(summary on).
the second time,  Planning Memory: 0 bytes.

regression=# PREPARE q4 AS SELECT 1 AS a;
explain EXECUTE q4;
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=4)
(1 row)

regression=# explain(summary on) EXECUTE q4;
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=4)
 Planning Time: 0.009 ms
 Planning Memory: 0 bytes
(3 rows)
-
previously, if you want stats of a given memory context and its
children, you can only use MemoryContextStatsDetail.
but it will only go to stderr or LOG_SERVER_ONLY.
Now, MemoryContextMemUsed is being exposed. I can do something like:

mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
//do stuff.
mem_consumed = MemoryContextMemUsed(CurrentMemoryContext) - mem_consumed;

it will give me the NET memory consumed by doing staff in between. Is
my understanding correct?

Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-08-22 Thread ajitpostgres awekar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi Amul,
Patch changes look fine.
Below are some of my observations as soon as we alter default expression on 
column

1. Materialized view becomes stale and starts giving incorrect results. We need 
to refresh the materialized view to get correct results.
2. Index on generated column need to be reindexed in order to use new 
expression.
3. Column Stats become stale and plan may be impacted due to outdated stats.

These things also happen as soon as we delete default expression or set default 
expression on column.

Thanks & Best Regards,
Ajit

Re: PG 16 draft release notes ready

2023-08-22 Thread Pavel Luzanov


On 22.08.2023 00:58, Bruce Momjian wrote:

Attached is an applied patch that moves the inherit item into
incompatibilities. clarifies it, and splits out the ADMIN syntax item.


> The role's default inheritance behavior can be overridden with the 
new GRANT ... WITH INHERIT clause.


The only question about "can be". Why not "will be"? The inheritance 
behavior will be changed anyway.



Please let me know if I need any other changes.  Thanks.


* Allow psql's access privilege commands to show system objects (Nathan 
Bossart, Pavel Luzanov)

    The options are \dpS, \zS, and \drg.

I think that this description correct only for the \dpS and \zS commands.
(By the way, unfortunately after reverting MAINTAIN privilege this 
commands are not much useful in v16.)


But the \drg command is a different thing. This is a full featured 
replacement for "Member of" column of the \du, \dg commands.
It shows not only members, but granted options (admin, inherit, set) and 
grantor.

This is important information for membership usage and administration.
IMO, removing the "Member of" column from the \du & \dg commands also 
requires attention in release notes.


So, I suggest new item in the psql section for \drg. Something like this:

* Add psql \drg command to display role grants and remove the "Member 
of" column from \du & \dg altogether.


--
Pavel Luzanov
Postgres Professional: https://postgrespro.com

82 matches

Mail list logo