date:20220411

fix cost subqueryscan wrong parallel cost

2022-04-11 Thread bu...@sohu.com

The cost_subqueryscan function does not judge whether it is parallel.
regress
-- Incremental sort vs. set operations with varno 0
set enable_hashagg to off;
explain (costs off) select * from t union select * from t order by 1,3;
QUERY PLAN
--
 Incremental Sort
   Sort Key: t.a, t.c
   Presorted Key: t.a
   ->  Unique
 ->  Sort
   Sort Key: t.a, t.b, t.c
   ->  Append
 ->  Gather
   Workers Planned: 2
   ->  Parallel Seq Scan on t
 ->  Gather
   Workers Planned: 2
   ->  Parallel Seq Scan on t t_1
to
 Incremental Sort
   Sort Key: t.a, t.c
   Presorted Key: t.a
   ->  Unique
 ->  Sort
   Sort Key: t.a, t.b, t.c
   ->  Gather
 Workers Planned: 2
 ->  Parallel Append
   ->  Parallel Seq Scan on t
   ->  Parallel Seq Scan on t t_1
Obviously the latter is less expensive



bu...@sohu.com


fix-cost_subqueryscan-get-wrong-parallel-cost.patch
Description: Binary data

Re: Handle infinite recursion in logical replication setup

2022-04-11 Thread vignesh C

On Tue, Apr 12, 2022 at 10:26 AM Peter Smith  wrote:
>
> On Thu, Apr 7, 2022 at 2:09 PM Peter Smith  wrote:
> >
> > FYI, here is a test script that is using the current patch (v6) to
> > demonstrate a way to share table data between different numbers of
> > nodes (up to 5 of them here).
> >
> > The script starts off with just 2-way sharing (nodes N1, N2),
> > then expands to 3-way sharing (nodes N1, N2, N3),
> > then 4-way sharing (nodes N1, N2, N3, N4),
> > then 5-way sharing (nodes N1, N2, N3, N4, N5).
> >
> > As an extra complication, for this test, all 5 nodes have different
> > initial table data, which gets replicated to the others whenever each
> > new node joins the existing share group.
> >
> > PSA.
> >
>
>
> Hi Vignesh. I had some problems getting the above test script working.
> It was OK up until I tried to join the 5th node (N5) to the existing 4
> nodes. The ERROR was manifesting itself strangely because it appeared
> that there was an index violation in the pg_subscription_rel catalog
> even though AFAIK the N5 did not have any entries in it.
>
>
> e.g.
> 2022-04-07 09:13:28.361 AEST [24237] ERROR: duplicate key value
> violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
> 2022-04-07 09:13:28.361 AEST [24237] DETAIL: Key (srrelid,
> srsubid)=(16384, 16393) already exists.
> 2022-04-07 09:13:28.361 AEST [24237] STATEMENT: create subscription
> sub51 connection 'port=7651' publication pub1 with
> (subscribe_local_only=true,copy_data=force);
> 2022-04-07 09:13:28.380 AEST [24237] ERROR: duplicate key value
> violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
> 2022-04-07 09:13:28.380 AEST [24237] DETAIL: Key (srrelid,
> srsubid)=(16384, 16394) already exists.
> 2022-04-07 09:13:28.380 AEST [24237] STATEMENT: create subscription
> sub52 connection 'port=7652' publication pub2 with
> (subscribe_local_only=true,copy_data=false);
> 2022-04-07 09:13:28.405 AEST [24237] ERROR: duplicate key value
> violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
> 2022-04-07 09:13:28.405 AEST [24237] DETAIL: Key (srrelid,
> srsubid)=(16384, 16395) already exists.
> 2022-04-07 09:13:28.405 AEST [24237] STATEMENT: create subscription
> sub53 connection 'port=7653' publication pub3 with
> (subscribe_local_only=true,copy_data=false);
> 2022-04-07 09:13:28.425 AEST [24237] ERROR: duplicate key value
> violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
> 2022-04-07 09:13:28.425 AEST [24237] DETAIL: Key (srrelid,
> srsubid)=(16384, 16396) already exists.
> 2022-04-07 09:13:28.425 AEST [24237] STATEMENT: create subscription
> sub54 connection 'port=7654' publication pub4 with
> (subscribe_local_only=true,copy_data=false);
> 2022-04-07 09:17:52.472 AEST [25852] ERROR: duplicate key value
> violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
> 2022-04-07 09:17:52.472 AEST [25852] DETAIL: Key (srrelid,
> srsubid)=(16384, 16397) already exists.
> 2022-04-07 09:17:52.472 AEST [25852] STATEMENT: create subscription
> sub51 connection 'port=7651' publication pub1;
>
> ~~~
>
> When I debugged this it seemed like each of the CREAT SUBSCRIPTION was
> trying to make a double-entry, because the fetch_tables (your patch
> v6-0002 modified SQL of this) was retuning the same table 2x.
>
> (gdb) bt
> #0 errfinish (filename=0xbc1057 "nbtinsert.c", lineno=671,
> funcname=0xbc25e0 <__func__.15798> "_bt_check_unique") at elog.c:510
> #1 0x00526d83 in _bt_check_unique (rel=0x7f654219c2a0,
> insertstate=0x7ffd9629ddd0, heapRel=0x7f65421b0e28,
> checkUnique=UNIQUE_CHECK_YES, is_unique=0x7ffd9629de01,
> speculativeToken=0x7ffd9629ddcc) at nbtinsert.c:664
> #2 0x00526157 in _bt_doinsert (rel=0x7f654219c2a0,
> itup=0x19ea8e8, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false,
> heapRel=0x7f65421b0e28) at nbtinsert.c:208
> #3 0x0053450e in btinsert (rel=0x7f654219c2a0,
> values=0x7ffd9629df10, isnull=0x7ffd9629def0, ht_ctid=0x19ea894,
> heapRel=0x7f65421b0e28, checkUnique=UNIQUE_CHECK_YES,
> indexUnchanged=false, indexInfo=0x19dea80) at nbtree.c:201
> #4 0x005213b6 in index_insert (indexRelation=0x7f654219c2a0,
> values=0x7ffd9629df10, isnull=0x7ffd9629def0, heap_t_ctid=0x19ea894,
> heapRelation=0x7f65421b0e28,  checkUnique=UNIQUE_CHECK_YES,
> indexUnchanged=false, indexInfo=0x19dea80) at indexam.c:193
> #5 0x005c81d5 in CatalogIndexInsert (indstate=0x19de540,
> heapTuple=0x19ea890) at indexing.c:158
> #6 0x005c8325 in CatalogTupleInsert (heapRel=0x7f65421b0e28,
> tup=0x19ea890) at indexing.c:231
> #7 0x005f0170 in AddSubscriptionRelState (subid=16400,
> relid=16384, state=105 'i', sublsn=0) at pg_subscription.c:315
> #8 0x006d6fa5 in CreateSubscription (pstate=0x1942dc0,
> stmt=0x191f6a0, isTopLevel=true) at subscriptioncmds.c:767
>
> ~~
>
> Aside: All this was happening when I did not have enough logical
> replication workers configured. (There were WARNINGS in the logfile
> that I had

Re: Skipping schema changes in publication

2022-04-11 Thread Amit Kapila

On Tue, Apr 12, 2022 at 11:53 AM vignesh C  wrote:
>
> On Sat, Mar 26, 2022 at 7:37 PM vignesh C  wrote:
> >
> > On Tue, Mar 22, 2022 at 12:38 PM vignesh C  wrote:
> > >
> > > Hi,
> > >
> > > This feature adds an option to skip changes of all tables in specified
> > > schema while creating publication.
> > > This feature is helpful for use cases where the user wants to
> > > subscribe to all the changes except for the changes present in a few
> > > schemas.
> > > Ex:
> > > CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
> > > OR
> > > ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;
> > >
> > > A new column pnskip is added to table "pg_publication_namespace", to
> > > maintain the schemas that the user wants to skip publishing through
> > > the publication. Modified the output plugin (pgoutput) to skip
> > > publishing the changes if the relation is part of skip schema
> > > publication.
> > > As a continuation to this, I will work on implementing skipping tables
> > > from all tables in schema and skipping tables from all tables
> > > publication.
> > >
> > > Attached patch has the implementation for this.
> >
> > The patch was not applying on top of HEAD because of the recent
> > commits, attached patch is rebased on top of HEAD.
>
> The patch does not apply on top of HEAD because of the recent commit,
> attached patch is rebased on top of HEAD.
>
> I have also included the implementation for skipping a few tables from
> all tables publication, the 0002 patch has the implementation for the
> same.
> This feature is helpful for use cases where the user wants to
> subscribe to all the changes except for the changes present in a few
> tables.
> Ex:
> CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
> OR
> ALTER PUBLICATION pub1 ADD SKIP  TABLE t1,t2;
>

For the second syntax (Alter Publication ...), isn't it better to
avoid using ADD? It looks odd to me because we are not adding anything
in publication with this sytax.


-- 
With Regards,
Amit Kapila.

Re: avoid multiple hard links to same WAL file after a crash

2022-04-11 Thread Kyotaro Horiguchi

At Mon, 11 Apr 2022 09:52:57 -0700, Nathan Bossart  
wrote in 
> On Mon, Apr 11, 2022 at 12:28:47PM -0400, Tom Lane wrote:
> > Robert Haas  writes:
> >> On Mon, Apr 11, 2022 at 5:12 AM Kyotaro Horiguchi
> >>  wrote:
> >>> If this diagnosis is correct, the comment is proved to be paranoid.
> > 
> >> It's sometimes difficult to understand what problems really old code
> >> comments are worrying about. For example, could they have been
> >> worrying about bugs in the code? Could they have been worrying about
> >> manual interference with the pg_wal directory? It's hard to know.
> > 
> > "git blame" can be helpful here, if you trace back to when the comment
> > was written and then try to find the associated mailing-list discussion.
> > (That leap can be difficult for commits pre-dating our current
> > convention of including links in the commit message, but it's usually
> > not *that* hard to locate contemporaneous discussion.)
> 
> I traced this back a while ago.  I believe the link() was first added in
> November 2000 as part of f0e37a8.  This even predates WAL recycling, which
> was added in July 2001 as part of 7d4d5c0.

f0e37a8 lacks discussion.. It introduced the CHECKPOINT command from
somwhere out of the ML.. This patch changed XLogFileInit to
supportusing existent files so that XLogWrite can use the new segment
provided by checkpoint and still allow XLogWrite to create a new
segment by itself.

Just before the commit, calls to XLogFileInit were protected (or
serialized) by logwr_lck.  At the commit calls to the same function
were still serialized by ControlFileLockId.

I *guess* that Vadim faced/noticed a race condition when he added
checkpoint. Thus introduced the link+remove protocol but finally it
became useless by moving the call to XLogFileInit within
ControlFileLockId section.  But, of course, all of story is mere a
guess.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: make MaxBackends available in _PG_init

2022-04-11 Thread Julien Rouhaud

Hi,

On Mon, Apr 11, 2022 at 02:14:35PM -0700, Nathan Bossart wrote:
> On Mon, Apr 11, 2022 at 01:44:42PM -0700, Nathan Bossart wrote:
> > On Mon, Apr 11, 2022 at 04:36:36PM -0400, Robert Haas wrote:
> >> If we throw an error while defining_custom_guc is true, how will it
> >> ever again become false?
> > 
> > Ah, I knew I was forgetting something this morning.
> > 
> > It won't, but the only place it is presently needed is when loading
> > shared_preload_libraries, so I believe startup will fail anyway.  However,
> > I can see defining_custom_guc being used elsewhere, so that is probably not
> > good enough.  Another approach could be to add a static
> > set_config_option_internal() function with a boolean argument to indicate
> > whether it is being used while defining a custom GUC.  I'll adjust 0003
> > with that approach unless a better idea prevails.
> 
> Here's a new patch set.  I've only changed 0003 as described above.  My
> apologies for the unnecessary round trip.

It looks sensible to me, although I think I would apply 0003 before 0002.

+   if (process_shared_preload_libraries_in_progress &&
+   !allow_when_loading_preload_libs)
+   ereport(ERROR,
+   (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
+errmsg("cannot change parameters while loading "
+   "\"shared_preload_libraries\"")));
+

I think the error message should mention at least which GUC is being modified.

Re: Handle infinite recursion in logical replication setup

2022-04-11 Thread Peter Smith

On Thu, Apr 7, 2022 at 2:09 PM Peter Smith  wrote:
>
> FYI, here is a test script that is using the current patch (v6) to
> demonstrate a way to share table data between different numbers of
> nodes (up to 5 of them here).
>
> The script starts off with just 2-way sharing (nodes N1, N2),
> then expands to 3-way sharing (nodes N1, N2, N3),
> then 4-way sharing (nodes N1, N2, N3, N4),
> then 5-way sharing (nodes N1, N2, N3, N4, N5).
>
> As an extra complication, for this test, all 5 nodes have different
> initial table data, which gets replicated to the others whenever each
> new node joins the existing share group.
>
> PSA.
>


Hi Vignesh. I had some problems getting the above test script working.
It was OK up until I tried to join the 5th node (N5) to the existing 4
nodes. The ERROR was manifesting itself strangely because it appeared
that there was an index violation in the pg_subscription_rel catalog
even though AFAIK the N5 did not have any entries in it.


e.g.
2022-04-07 09:13:28.361 AEST [24237] ERROR: duplicate key value
violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
2022-04-07 09:13:28.361 AEST [24237] DETAIL: Key (srrelid,
srsubid)=(16384, 16393) already exists.
2022-04-07 09:13:28.361 AEST [24237] STATEMENT: create subscription
sub51 connection 'port=7651' publication pub1 with
(subscribe_local_only=true,copy_data=force);
2022-04-07 09:13:28.380 AEST [24237] ERROR: duplicate key value
violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
2022-04-07 09:13:28.380 AEST [24237] DETAIL: Key (srrelid,
srsubid)=(16384, 16394) already exists.
2022-04-07 09:13:28.380 AEST [24237] STATEMENT: create subscription
sub52 connection 'port=7652' publication pub2 with
(subscribe_local_only=true,copy_data=false);
2022-04-07 09:13:28.405 AEST [24237] ERROR: duplicate key value
violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
2022-04-07 09:13:28.405 AEST [24237] DETAIL: Key (srrelid,
srsubid)=(16384, 16395) already exists.
2022-04-07 09:13:28.405 AEST [24237] STATEMENT: create subscription
sub53 connection 'port=7653' publication pub3 with
(subscribe_local_only=true,copy_data=false);
2022-04-07 09:13:28.425 AEST [24237] ERROR: duplicate key value
violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
2022-04-07 09:13:28.425 AEST [24237] DETAIL: Key (srrelid,
srsubid)=(16384, 16396) already exists.
2022-04-07 09:13:28.425 AEST [24237] STATEMENT: create subscription
sub54 connection 'port=7654' publication pub4 with
(subscribe_local_only=true,copy_data=false);
2022-04-07 09:17:52.472 AEST [25852] ERROR: duplicate key value
violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
2022-04-07 09:17:52.472 AEST [25852] DETAIL: Key (srrelid,
srsubid)=(16384, 16397) already exists.
2022-04-07 09:17:52.472 AEST [25852] STATEMENT: create subscription
sub51 connection 'port=7651' publication pub1;

~~~

When I debugged this it seemed like each of the CREAT SUBSCRIPTION was
trying to make a double-entry, because the fetch_tables (your patch
v6-0002 modified SQL of this) was retuning the same table 2x.

(gdb) bt
#0 errfinish (filename=0xbc1057 "nbtinsert.c", lineno=671,
funcname=0xbc25e0 <__func__.15798> "_bt_check_unique") at elog.c:510
#1 0x00526d83 in _bt_check_unique (rel=0x7f654219c2a0,
insertstate=0x7ffd9629ddd0, heapRel=0x7f65421b0e28,
checkUnique=UNIQUE_CHECK_YES, is_unique=0x7ffd9629de01,
speculativeToken=0x7ffd9629ddcc) at nbtinsert.c:664
#2 0x00526157 in _bt_doinsert (rel=0x7f654219c2a0,
itup=0x19ea8e8, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false,
heapRel=0x7f65421b0e28) at nbtinsert.c:208
#3 0x0053450e in btinsert (rel=0x7f654219c2a0,
values=0x7ffd9629df10, isnull=0x7ffd9629def0, ht_ctid=0x19ea894,
heapRel=0x7f65421b0e28, checkUnique=UNIQUE_CHECK_YES,
indexUnchanged=false, indexInfo=0x19dea80) at nbtree.c:201
#4 0x005213b6 in index_insert (indexRelation=0x7f654219c2a0,
values=0x7ffd9629df10, isnull=0x7ffd9629def0, heap_t_ctid=0x19ea894,
heapRelation=0x7f65421b0e28,  checkUnique=UNIQUE_CHECK_YES,
indexUnchanged=false, indexInfo=0x19dea80) at indexam.c:193
#5 0x005c81d5 in CatalogIndexInsert (indstate=0x19de540,
heapTuple=0x19ea890) at indexing.c:158
#6 0x005c8325 in CatalogTupleInsert (heapRel=0x7f65421b0e28,
tup=0x19ea890) at indexing.c:231
#7 0x005f0170 in AddSubscriptionRelState (subid=16400,
relid=16384, state=105 'i', sublsn=0) at pg_subscription.c:315
#8 0x006d6fa5 in CreateSubscription (pstate=0x1942dc0,
stmt=0x191f6a0, isTopLevel=true) at subscriptioncmds.c:767

~~

Aside: All this was happening when I did not have enough logical
replication workers configured. (There were WARNINGS in the logfile
that I had not noticed).
When I fix the configuration then all these other problems went away!

~~

So to summarize, I'm not sure if the fetch_tables still has some
potential problem lurking or not, but I feel that the SQL in that
function maybe needs a clo

Documentation issue with pg_stat_recovery_prefetch

2022-04-11 Thread sirisha chamarthi

Hi,

I was going through pg_stat_recovery_prefetch documentation and saw an
issue with formatting. Attached a small patch to fix the issue. This is the
first time I am sending an email to hackers. Please educate me if I
miss something.

https://www.postgresql.org/docs/devel/monitoring-stats.html#PG-STAT-RECOVERY-PREFETCH-VIEW

Thanks,
Sirisha


0001-Fix-documentation-bug-for-pg_stat_recovery_prefetch.patch
Description: Binary data

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-04-11 Thread Amit Kapila

On Mon, Apr 11, 2022 at 6:33 PM Bharath Rupireddy
 wrote:
>
> On Mon, Apr 11, 2022 at 4:21 PM Amit Kapila  wrote:
> >
> > On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy
> >  wrote:
> > >
> >
> > I am facing the below doc build failure on my machine due to this work:
> >
> > ./filelist.sgml:
> > Tabs appear in SGML/XML files
> > make: *** [check-tabs] Error 1
> >
> > The attached patch fixes this for me.
>
> Thanks. It looks like there's a TAB in between. Your patch LGTM.
>
> I'm wondering why this hasn't been caught in the build farm members
> (or it may have been found but I'm missing to locate it.).
>
> Can you please provide me with the doc build command to catch these
> kinds of errors?
>

Nothing special. In the doc/src/sgml, I did make clean followed by make check.


-- 
With Regards,
Amit Kapila.

Re: PG DOCS - logical replication filtering

2022-04-11 Thread Amit Kapila

On Mon, Apr 11, 2022 at 11:03 PM Euler Taveira  wrote:
>
> On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote:
>
> Regarding the new examples (for partitioned tables), shouldn't we move the
> parent / child definitions to the beginning of the Examples section?
>

I think that will make examples less clear.

> It seems
> confusing use the same code snippet to show repeated table definitions
> (publisher and subscriber). I checked fast and after a few seconds I realized
> that the example is not wrong but the database name has a small difference 
> (one
> letter "s" x "p").
>

Can you be more specific? AFAICS, dbname used (testpub) is same.

> The publication and subscription definitions are fine there.
>
> I think reusing the same tables and publication introduces complexity.
> Shouldn't we just use different tables and publication to provide an "easy"
> example? It would avoid DROP PUBLICATION, ALTER SUBSCRIPTION and TRUNCATE.
>

I don't know. I find the current way understandable. I feel using
different names won't gain much and make the example difficult to
follow.

-- 
With Regards,
Amit Kapila.

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-11 Thread Michael Paquier

On Mon, Apr 11, 2022 at 06:48:58PM +1200, Thomas Munro wrote:
> 1.  This test had some pre-existing bugs/races, which hadn't failed
> before due to scheduling, even under Valgrind.  The above changes
> appear to fix those problems.  To Michael for comment.

Yeah, there are two problems here.  From what I can see, ensuring the
execution of archive_cleanup_command on the standby needs the
checkpoint on the primary and the restart point on the standby.  So
pg_current_wal_lsn() should be located after the primary's checkpoint 
and not before it so as we are sure that the checkpoint records finds
its way to the standby.  That's what Tom mentioned upthread.

The second problem is to make sure that $standby2 sees the promotion
of $standby and its history file, but we also want to recover
0002.history from some archives to create a RECOVERYHISTORY at
recovery for the purpose of the test.  Switching to a new segment as
proposed by Andres does not seem completely right to me because we are
not 100% sure of the ordering an archive is going to happen, no?  I
think that the logic to create $standby2 from the initial backup of
the primary is right, because there is no 0002.history in it, but
we also need to be sure that 0002.history has been archived once
the promotion of $standby is done.  This can be validated thanks to
the logs, actually.

>> What is that second test really testing?
>>
>> # Check the presence of temporary files specifically generated during
>> # archive recovery.  To ensure the presence of the temporary history
>> # file, switch to a timeline large enough to allow a standby to recover
>> # a history file from an archive.  As this requires at least two timeline
>> # switches, promote the existing standby first.  Then create a second
>> # standby based on the promoted one.  Finally, the second standby is
>> # promoted.
>>
>> Note "Then create a second standby based on the promoted one." - but that's
>> not actually what's happening:
> 
> 2.  There may also be other problems with the test but those aren't
> relevant to skink's failure, which starts on the 5th test.  To Michael
> for comment.

This comes from df86e52, where we want to recovery a history file that
would be created as RECOVERYHISTORY and make sure that the file gets
removed at the end of recovery.  So $standby2 should choose a new
timeline different from the one of chosen by $standby.  Looking back
at what has been done, it seems to me that the comment is the
incorrect part:
https://www.postgresql.org/message-id/20190930080340.go2...@paquier.xyz

All that stuff leads me to the attached.  Thoughts?
--
Michael
diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index c8f5ffbaf0..45aafcb35c 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -24,6 +24,8 @@ $node_primary->backup($backup_name);

 # Initialize standby node from backup, fetching WAL from archives
 my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+# Note that this makes the standby archive its contents on the archives
+# of the primary.
 $node_standby->init_from_backup($node_primary, $backup_name,
 	has_restoring => 1);
 $node_standby->append_conf('postgresql.conf',
@@ -44,13 +46,16 @@ $node_standby->start;
 # Create some content on primary
 $node_primary->safe_psql('postgres',
 	"CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a");
-my $current_lsn =
-  $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");

 # Note the presence of this checkpoint for the archive_cleanup_command
 # check done below, before switching to a new segment.
 $node_primary->safe_psql('postgres', "CHECKPOINT");

+# Done after the checkpoint to ensure that the checkpoint gets replayed
+# on the standby.
+my $current_lsn =
+  $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
+
 # Force archiving of WAL file to make it present on primary
 $node_primary->safe_psql('postgres', "SELECT pg_switch_wal()");

@@ -81,10 +86,34 @@ ok( !-f "$data_dir/$recovery_end_command_file",
 # file, switch to a timeline large enough to allow a standby to recover
 # a history file from an archive.  As this requires at least two timeline
 # switches, promote the existing standby first.  Then create a second
-# standby based on the promoted one.  Finally, the second standby is
-# promoted.
+# standby based on the primary, using its archives.  Finally, the second
+# standby is promoted.
 $node_standby->promote;

+# Wait until the history file has been archived on the archives of the
+# primary once the promotion of the standby completes.
+my $primary_archive = $node_primary->archive_dir;
+my $max_attempts = 10 * $PostgreSQL::Test::Utils::timeout_default;
+my $attempts = 0;
+while ($attempts < $max_attempts)
+{
+	last if (-e "$primary_archive/0002.history");
+
+	# Wait 0.1 second before retrying.
+	usleep(100_000);
+
+	$attempts++;
+}
+
+if ($attempts >= $max_attempts)
+{
+

Re: API stability

2022-04-11 Thread Kyotaro Horiguchi

At Mon, 11 Apr 2022 12:48:25 +0200, Matthias van de Meent 
 wrote in 
> On Mon, 11 Apr 2022 at 06:30, Kyotaro Horiguchi  
> wrote:
> I won't speak for Robert H., but this might be because of gmail not
> putting this mail in the right thread: Your mail client dropped the
> "[was: pgsql: ...]" tag, which Gmail subsequently displays as a
> different thread (that is, in my Gmail UI there's three "Re: API
> stability" threads, one of which has the [was: pgsql: ...]-tag, and
> two of which seem to be started by you as a reply on the original
> thread, but with the [was: pgsql: ...]-tag dropped and thus considered
> a new thread).

Mmm. d*** gmail..  My main mailer does that defaltly but I think I can
*fix* that behavior.

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: row filtering for logical replication

2022-04-11 Thread Peter Smith

On Tue, Apr 12, 2022 at 11:31 AM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, April 12, 2022 8:40 AM Peter Smith  wrote:
> >
> > FYI, I was playing with row filters and partitions recently, and while doing
> > something a bit unusual I received a cache leak warning.
> >
> > Below are the steps to reproduce it:
> >
> >
> > test_pub=# CREATE TABLE parent(a int primary key) PARTITION BY RANGE(a);
> > CREATE TABLE
> >
> > test_pub=# CREATE TABLE child PARTITION OF parent DEFAULT; CREATE TABLE
> >
> > test_pub=# CREATE PUBLICATION p4 FOR TABLE parent WHERE (a < 5), child
> > WHERE (a >= 5) WITH (publish_via_partition_root=true);
> > CREATE PUBLICATION
> >
> > test_pub=# ALTER PUBLICATION p4 SET TABLE parent, child WHERE (a >= 5);
> > ALTER PUBLICATION
> >
> > test_pub=# ALTER PUBLICATION p4 SET (publish_via_partition_root = false);
> > 2022-04-11 17:37:58.426 AEST [28152] WARNING:  cache reference leak:
> > cache pg_publication_rel (49), tuple 0/12 has count 1
> > WARNING:  cache reference leak: cache pg_publication_rel (49), tuple
> > 0/12 has count 1
> > ALTER PUBLICATION
>
> Thanks for reporting.
>
> I think the reason is that we didn't invoke ReleaseSysCache when rftuple is
> valid and no filter exists. We need to release the tuple whenever the
> rftuple is valid. Attach a patch which fix this.
>

Thanks! Your patch could be applied cleanly, and the reported problem
now seems fixed.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

RE: row filtering for logical replication

2022-04-11 Thread houzj.f...@fujitsu.com

On Tuesday, April 12, 2022 8:40 AM Peter Smith  wrote:
> 
> FYI, I was playing with row filters and partitions recently, and while doing
> something a bit unusual I received a cache leak warning.
> 
> Below are the steps to reproduce it:
> 
> 
> test_pub=# CREATE TABLE parent(a int primary key) PARTITION BY RANGE(a);
> CREATE TABLE
> 
> test_pub=# CREATE TABLE child PARTITION OF parent DEFAULT; CREATE TABLE
> 
> test_pub=# CREATE PUBLICATION p4 FOR TABLE parent WHERE (a < 5), child
> WHERE (a >= 5) WITH (publish_via_partition_root=true);
> CREATE PUBLICATION
> 
> test_pub=# ALTER PUBLICATION p4 SET TABLE parent, child WHERE (a >= 5);
> ALTER PUBLICATION
> 
> test_pub=# ALTER PUBLICATION p4 SET (publish_via_partition_root = false);
> 2022-04-11 17:37:58.426 AEST [28152] WARNING:  cache reference leak:
> cache pg_publication_rel (49), tuple 0/12 has count 1
> WARNING:  cache reference leak: cache pg_publication_rel (49), tuple
> 0/12 has count 1
> ALTER PUBLICATION

Thanks for reporting.

I think the reason is that we didn't invoke ReleaseSysCache when rftuple is
valid and no filter exists. We need to release the tuple whenever the
rftuple is valid. Attach a patch which fix this.

Best regards,
Hou zj


0001-Fix-missed-ReleaseSysCache-in-AlterPublicationOption.patch
Description:  0001-Fix-missed-ReleaseSysCache-in-AlterPublicationOption.patch

Re: Skip partition tuple routing with constant partition key

2022-04-11 Thread Masahiko Sawada

Hi,

On Thu, Apr 7, 2022 at 4:37 PM Andres Freund  wrote:
>
> Hi,
>
> On 2022-04-06 00:07:07 -0400, Tom Lane wrote:
> > Amit Langote  writes:
> > > On Sun, Apr 3, 2022 at 10:31 PM Greg Stark  wrote:
> > >> Is this a problem with the patch or its tests?
> > >> [18:14:20.798] Test Summary Report
> > >> [18:14:20.798] ---
> > >> [18:14:20.798] t/013_partition.pl (Wstat: 15360 Tests: 31 Failed: 0)
> >
> > > Hmm, make check-world passes for me after rebasing the patch (v10) to
> > > the latest HEAD (clean), nor do I see a failure on cfbot:
> > > http://cfbot.cputube.org/amit-langote.html
> >
> > 013_partition.pl has been failing regularly in the buildfarm,
> > most recently here:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2022-03-31%2000%3A49%3A45
>
> Just failed locally on my machine as well.
>
>
> > I don't think there's room to blame any uncommitted patches
> > for that.  Somebody broke it a short time before here:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-03-17%2016%3A08%3A19
>
> The obvious thing to point a finger at is
>
> commit c91f71b9dc91ef95e1d50d6d782f477258374fc6
> Author: Tomas Vondra 
> Date:   2022-03-16 16:42:47 +0100
>
> Fix publish_as_relid with multiple publications
>

I've not managed to reproduce this issue on my machine but while
reviewing the code and the server logs[1] I may have found possible
bugs:

2022-04-08 12:59:30.701 EDT [91997:1] LOG:  logical replication apply
worker for subscription "sub2" has started
2022-04-08 12:59:30.702 EDT [91998:3] 013_partition.pl LOG:
statement: ALTER SUBSCRIPTION sub2 SET PUBLICATION pub_lower_level,
pub_all
2022-04-08 12:59:30.733 EDT [91998:4] 013_partition.pl LOG:
disconnection: session time: 0:00:00.036 user=buildfarm
database=postgres host=[local]
2022-04-08 12:59:30.740 EDT [92001:1] LOG:  logical replication table
synchronization worker for subscription "sub2", table "tab4_1" has
started
2022-04-08 12:59:30.744 EDT [91997:2] LOG:  logical replication apply
worker for subscription "sub2" will restart because of a parameter
change
2022-04-08 12:59:30.750 EDT [92003:1] LOG:  logical replication table
synchronization worker for subscription "sub2", table "tab3" has
started

The logs say that the apply worker for "sub2" finished whereas the
tablesync workers for "tab4_1" and "tab3" started. After these logs,
there are no logs that these tablesync workers finished and the apply
worker for "sub2" restarted, until the timeout. While reviewing the
code, I realized that the tablesync workers can advance its relstate
even without the apply worker intervention.

After a tablesync worker copies the table it sets
SUBREL_STATE_SYNCWAIT to its relstate, then it waits for the apply
worker to update the relstate to SUBREL_STATE_CATCHUP. If the apply
worker has already died, it breaks from the wait loop and returns
false:

wait_for_worker_state_change():

for (;;)
{
LogicalRepWorker *worker;

:

/*
 * Bail out if the apply worker has died, else signal it we're
 * waiting.
 */
LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
InvalidOid, false);
if (worker && worker->proc)
logicalrep_worker_wakeup_ptr(worker);
LWLockRelease(LogicalRepWorkerLock);
if (!worker)
break;

:
}

return false;

However, the caller doesn't check the return value at all:

/*
 * We are done with the initial data synchronization, update the state.
 */
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT;
MyLogicalRepWorker->relstate_lsn = *origin_startpos;
SpinLockRelease(&MyLogicalRepWorker->relmutex);

/*
 * Finally, wait until the main apply worker tells us to catch up and then
 * return to let LogicalRepApplyLoop do it.
 */
wait_for_worker_state_change(SUBREL_STATE_CATCHUP);
return slotname;

Therefore, the tablesync worker started logical replication while
keeping its relstate as SUBREL_STATE_SYNCWAIT.

Given the server logs, it's likely that both tablesync workers for
"tab4_1" and "tab3" were in this situation. That is, there were two
tablesync workers who were applying changes for the target relation
but the relstate was SUBREL_STATE_SYNCWAIT.

When it comes to starting the apply worker, probably it didn't happen
since there are already running tablesync workers as much as
max_sync_workers_per_subscription (2 by default):

logicalrep_worker_launch():

/*
 * If we reached the sync worker limit per subscription, just exit
 * silently as we might get here because of an otherwise harmless race
 * condition.
 */
if (nsyncworkers >= max_sync_workers_per_subscription)
{
LWLockRelease(LogicalRepWorkerLock);
return;
}

This scenar

Re: A qsort template

2022-04-11 Thread David Rowley

On Mon, 11 Apr 2022 at 22:11, John Naylor  wrote:
>
> On Mon, Apr 11, 2022 at 5:34 AM David Rowley  wrote:
>
> > With this particular test, v15 is about 15% *slower* than v14.  I
> > didn't know what to blame at first, so I tried commenting out the sort
> > specialisations and got the results in the red bars in the graph. This
> > made it about 7.5% *faster* than v14. So looks like this patch is to
> > blame.  I then hacked the comparator function that's used in the
> > specialisations for BIGINT to comment out the tiebreak to remove the
> > indirect function call, which happens to do nothing in this 1 column
> > sort case.  The aim here was to get an idea what the performance would
> > be if there was a specialisation for single column sorts. That's the
> > yellow bars, which show about 10% *faster* than master.
>
> Thanks for investigating! (I assume you meant 10% faster than v14?)

Yes, I did mean to say v14.   (I'm too used to comparing everything to master)

David

Re: row filtering for logical replication

2022-04-11 Thread Peter Smith

FYI, I was playing with row filters and partitions recently, and while
doing something a bit unusual I received a cache leak warning.

Below are the steps to reproduce it:


test_pub=# CREATE TABLE parent(a int primary key) PARTITION BY RANGE(a);
CREATE TABLE

test_pub=# CREATE TABLE child PARTITION OF parent DEFAULT;
CREATE TABLE

test_pub=# CREATE PUBLICATION p4 FOR TABLE parent WHERE (a < 5), child
WHERE (a >= 5) WITH (publish_via_partition_root=true);
CREATE PUBLICATION

test_pub=# ALTER PUBLICATION p4 SET TABLE parent, child WHERE (a >= 5);
ALTER PUBLICATION

test_pub=# ALTER PUBLICATION p4 SET (publish_via_partition_root = false);
2022-04-11 17:37:58.426 AEST [28152] WARNING:  cache reference leak:
cache pg_publication_rel (49), tuple 0/12 has count 1
WARNING:  cache reference leak: cache pg_publication_rel (49), tuple
0/12 has count 1
ALTER PUBLICATION

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: How about a psql backslash command to show GUCs?

2022-04-11 Thread Jonathan S. Katz


On 4/11/22 4:11 PM, Tom Lane wrote:

"Jonathan S. Katz"  writes:

My question is if we're only going to list out the settings that are
customized, are we going to:



1. Hide a setting if it matches a default value, even if a user set it
to be the default value? OR
2. Comment out all of the settings in a generated postgresql.conf file?


As committed, it prints anything that's shown as "source != 'default'"
in pg_settings, which means anything for which the value wasn't
taken from the wired-in default.  I suppose an alternative definition
could be "setting != boot_val".  Not really sure if that's better.

This idea does somewhat address my unhappiness upthread about printing
values with source = 'internal', but I see that it gets confused by
some GUCs with custom show hooks, like unix_socket_permissions.
Maybe it needs to be "source != 'default' AND setting != boot_val"?


Running through a few GUCs, that seems reasonable. Happy to test the 
patch out prior to commit to see if it renders better.


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Re: random() function documentation

2022-04-11 Thread Dean Rasheed

On Mon, 11 Apr 2022 at 20:20, Tom Lane  wrote:
>
> >> How about we just say "uses a linear-feedback shift register algorithm"?

I think it'd be sufficient to just say that it's a deterministic
pseudorandom number generator. I don't see much value in documenting
the internal algorithm used.

> > Should we
> > perhaps also add a warning that the same seed is not guaranteed to
> > produce the same sequence across different (major?) versions?
>
> I wouldn't bother, on the grounds that then we'd need such disclaimers
> in a whole lot of places.  Others might see it differently though.

Agreed, though I think when the release notes are written, they ought
to warn that the sequence will change with this release.

Regards,
Dean

Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2022-04-11 Thread Michael Paquier

On Mon, Apr 11, 2022 at 01:21:23PM -0700, SATYANARAYANA NARLAPURAM wrote:
> Correct. The idea is to make sure the file is fully allocated before
> treating it as a current file.

Another problem comes to compression, as the pre-padding cannot be
applied in this case because zlib and lz4 don't know the size of the
compressed segment until we reach 16MB of data received, but you can
get a good estimate as long as you know how much space is left on a
device.  FWIW, I had to deal with this problem a couple of years ago
for the integration of an archiver in a certain thing, and the
requirement was that the WAL archiver service had to be a maximum
self-aware and automated, which is what you wish to achieve here.  It
basically came down to measure how much WAL one wishes to keep in the
WAL archives for the sizing of the disk partition storing the
archives (aka how much back in time you want to go), in combination to
how much WAL would get produced on a rather-linear production load.

Another thing is that you never really want to stress too much your
partition so as it gets filled at 100%, as there could be opened files
and the kind that consume more space than the actual amount of data
stored, but you'd usually want to keep up to 70~90% of it.  At the
end, we finished with:
- A dependency to statvfs(), which is not portable on WIN32, to find
out how much space was left on the partition (f_blocks*f_bsize for
the total size and f_bfree*f_bsize for the free size I guess, by
looking at its man page).
- Control the amount of WAL to keep around using a percentage rate of
maximum disk space allowed (or just a percentage of free disk space),
with pg_receivewal doing a cleanup of up to WalSegSz worth of data for
the oldest segments.  The segments of the oldest TLIs are removed
first.  For any compression algorithm, unlinking this much amount of
data is not necessary but that's fine as you usually just remove one
compressed or uncompressed segment per cycle, at it does not matter
with dozens of gigs worth of WAL archives, or even more.
--
Michael

signature.asc
Description: PGP signature

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2022-04-11 Thread Andres Freund

Hi,

On 2022-04-12 10:33:28 +1200, Thomas Munro wrote:
> Huh.  I wouldn't have started a separate thread for this if I'd
> realised I was getting close to the cause of the CI failure...

:)


> Instead of bothering to create N different XXXPending variables for
> the different conflict "reasons", I used an array.  Other than that,
> it's much like existing examples.

It kind of bothers me that each pending conflict has its own external function
call. It doesn't actually cost anything, because it's quite unlikely that
there's more than one pending conflict.  Besides aesthetically displeasing,
it also leads to an unnecessarily large amount of code needed, because the
calls to RecoveryConflictInterrupt() can't be merged...

But that's perhaps best fixed separately.


What might actually make more sense is to just have a bitmask or something?


> The existing use of the global variable RecoveryConflictReason seems a
> little woolly.  Doesn't it get clobbered every time a signal arrives,
> even if we determine that there is no conflict?  Not sure why that's
> OK, but anyway, this patch always sets it together with
> RecoveryConflictPending = true.

Yea. It's probably ok, kind of, because there shouldn't be multiple
outstanding conflicts with very few exceptions (deadlock and buffer pin). And
it doesn't matter that much which of those gets handled. And we'll retry
again. But brrr.


> +/*
> + * Check one recovery conflict reason.  This is called when the corresponding
> + * RecoveryConflictInterruptPending flags is set.  If we decide that a 
> conflict
> + * exists, then RecoveryConflictReason and RecoveryConflictPending will be 
> set,
> + * to be handled later in the same invocation of ProcessInterrupts().
> + */
> +static void
> +ProcessRecoveryConflictInterrupt(ProcSignalReason reason)
> +{
>   /*
>* Don't joggle the elbow of proc_exit
>*/
>   if (!proc_exit_inprogress)
>   {
> - RecoveryConflictReason = reason;
>   switch (reason)
>   {
>   case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
> @@ -3084,9 +3094,9 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
>   if (IsAbortedTransactionBlockState())
>   return;
>  
> + RecoveryConflictReason = reason;
>   RecoveryConflictPending = true;
>   QueryCancelPending = true;
> - InterruptPending = true;
>   break;
>   }
>  
> @@ -3094,9 +3104,9 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
>   /* FALLTHROUGH */
>  
>   case PROCSIG_RECOVERY_CONFLICT_DATABASE:
> + RecoveryConflictReason = reason;
>   RecoveryConflictPending = true;
>   ProcDiePending = true;
> - InterruptPending = true;
>   break;
>  
>   default:
> @@ -3115,15 +3125,6 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
>   if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
>   RecoveryConflictRetryable = false;
>   }

It's pretty weird that we have all this stuff that we then just check a short
while later in ProcessInterrupts() whether they've been set.

Seems like it'd make more sense to throw the error in
ProcessRecoveryConflictInterrupt(), now that it's not in a a signal handler
anymore?


>  /*
> @@ -3147,6 +3148,22 @@ ProcessInterrupts(void)
>   return;
>   InterruptPending = false;
>  
> + /*
> +  * Have we been asked to check for a recovery conflict?  Processing 
> these
> +  * interrupts may result in RecoveryConflictPending and related 
> variables
> +  * being set, to be handled further down.
> +  */
> + for (int i = PROCSIG_RECOVERY_CONFLICT_FIRST;
> +  i <= PROCSIG_RECOVERY_CONFLICT_LAST;
> +  ++i)
> + {
> + if (RecoveryConflictInterruptPending[i])
> + {
> + RecoveryConflictInterruptPending[i] = false;
> + ProcessRecoveryConflictInterrupt(i);
> + }
> + }

Hm. This seems like it shouldn't be in ProcessInterrupts(). How about checking
calling a wrapper doing all this if RecoveryConflictPending?


Greetings,

Andres Freund

Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}

2022-04-11 Thread Michael Paquier

On Mon, Apr 11, 2022 at 11:15:46AM -0400, Robert Haas wrote:
> +1 for this in general, but I think that naming like
> "compression_algo" stinks. If you think "compression_algorithm" is too
> long, I think you should use "algorithm" or "compression" or
> "compression_method" or something.

Yes, I found "compression_algorithm" to be too long initially.  For
walmethods.c and pg_receivewal.c, it may be better to just stick to
"algorithm"  then, at least that's consistent with pg_basebackup.c.
--
Michael


signature.asc
Description: PGP signature

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2022-04-11 Thread Thomas Munro

On Sun, Apr 10, 2022 at 11:00 AM Andres Freund  wrote:
> On 2022-04-09 14:39:16 -0700, Andres Freund wrote:
> > On 2022-04-09 17:00:41 -0400, Tom Lane wrote:
> > > Thomas Munro  writes:
> > > > Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
> > > > doesn't just set a sig_atomic_t flag and poke the latch.  Is the extra
> > > > stuff it does safe?  For example, is this call stack OK (to pick one
> > > > that jumps out, but not the only one)?
> > >
> > > > procsignal_sigusr1_handler
> > > > -> RecoveryConflictInterrupt
> > > >  -> HoldingBufferPinThatDelaysRecovery
> > > >   -> GetPrivateRefCount
> > > >-> GetPrivateRefCountEntry
> > > > -> hash_search(...hash table that might be in the middle of an 
> > > > update...)
> > >
> > > Ugh.  That one was safe before somebody decided we needed a hash table
> > > for buffer refcounts, but it's surely not safe now.
> >
> > Mea culpa. This is 4b4b680c3d6d - from 2014.
>
> Whoa. There's way worse: StandbyTimeoutHandler() calls
> SendRecoveryConflictWithBufferPin(), which calls CancelDBBackends(), which
> acquires lwlocks etc.
>
> Which very plausibly is the cause for the issue I'm investigating in
> https://www.postgresql.org/message-id/20220409220054.fqn5arvbeesmxdg5%40alap3.anarazel.de

Huh.  I wouldn't have started a separate thread for this if I'd
realised I was getting close to the cause of the CI failure... I
thought this was an incidental observation.  Anyway, I made a first
attempt at fixing this SIGUSR1 problem (I think Andres is looking at
the SIGALRM problem in the other thread).

Instead of bothering to create N different XXXPending variables for
the different conflict "reasons", I used an array.  Other than that,
it's much like existing examples.

The existing use of the global variable RecoveryConflictReason seems a
little woolly.  Doesn't it get clobbered every time a signal arrives,
even if we determine that there is no conflict?  Not sure why that's
OK, but anyway, this patch always sets it together with
RecoveryConflictPending = true.
From 1ba60808a23159b8d99cfec70111b6724a55e57b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 12 Apr 2022 07:33:59 +1200
Subject: [PATCH] Fix recovery conflict SIGUSR1 handling.

We shouldn't be doing real work in a signal handler.  Move the check
into the next CFI.

(There's a related problem in the recovery conflict SIGALRM handling,
for another patch.)

Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/storage/ipc/procsignal.c | 12 +++
 src/backend/tcop/postgres.c  | 53 ++--
 src/include/storage/procsignal.h |  4 ++-
 src/include/tcop/tcopprot.h  |  3 +-
 4 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index f41563a0a4..ce08580b5b 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -653,22 +653,22 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 		HandleLogMemoryContextInterrupt();
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_TABLESPACE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
 	SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 95dc2e2c83..a89066badb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -172,6 +172,7 @@ static bool EchoQuery = false;	/* -E switch */
 static bool UseSemiNewlineNewline = false;	/* -j switch */
 
 /* whether or not, and why, we were canceled by conflict with recovery */
+volatile sig_atomic_t RecoveryConflictInterruptPending[NUM_PROCSIGNALS];
 static bool RecoveryConflictPending = false;
 static bool RecoveryConflictRetryable = true;
 static ProcSignalReason RecoveryConflictReason;
@@ -2993,22 +2994,31 @@ FloatExceptionHandler(SIGNA

Re: Fixing code that ignores failure of XLogRecGetBlockTag

2022-04-11 Thread Tom Lane

Thomas Munro  writes:
> On Tue, Apr 12, 2022 at 8:58 AM Robert Haas  wrote:
>> On Mon, Apr 11, 2022 at 2:20 PM Tom Lane  wrote:
>>> I think we should make this a little less fragile.  Since we
>>> already have XLogRecGetBlockTagExtended, I propose that callers
>>> that need to handle the case of no-such-block must use that,
>>> while XLogRecGetBlockTag throws an error.  The attached patch
>>> fixes that up, and also cleans up some random inconsistency
>>> about use of XLogRecHasBlockRef().

>> Looks reasonable.

> +1

Pushed, thanks for looking.

regards, tom lane

Re: Fixing code that ignores failure of XLogRecGetBlockTag

2022-04-11 Thread Thomas Munro

On Tue, Apr 12, 2022 at 8:58 AM Robert Haas  wrote:
> On Mon, Apr 11, 2022 at 2:20 PM Tom Lane  wrote:
> > Currently, XLogRecGetBlockTag has 41 callers, of which only four
> > bother to check the function's result.  The remainder take it on
> > faith that they got valid data back, and many of them will
> > misbehave in seriously nasty ways if they didn't.  (This point
> > was drawn to my attention by a Coverity complaint.)
> >
> > I think we should make this a little less fragile.  Since we
> > already have XLogRecGetBlockTagExtended, I propose that callers
> > that need to handle the case of no-such-block must use that,
> > while XLogRecGetBlockTag throws an error.  The attached patch
> > fixes that up, and also cleans up some random inconsistency
> > about use of XLogRecHasBlockRef().
>
> Looks reasonable.

+1

Re: make MaxBackends available in _PG_init

2022-04-11 Thread Nathan Bossart

On Mon, Apr 11, 2022 at 01:44:42PM -0700, Nathan Bossart wrote:
> On Mon, Apr 11, 2022 at 04:36:36PM -0400, Robert Haas wrote:
>> If we throw an error while defining_custom_guc is true, how will it
>> ever again become false?
> 
> Ah, I knew I was forgetting something this morning.
> 
> It won't, but the only place it is presently needed is when loading
> shared_preload_libraries, so I believe startup will fail anyway.  However,
> I can see defining_custom_guc being used elsewhere, so that is probably not
> good enough.  Another approach could be to add a static
> set_config_option_internal() function with a boolean argument to indicate
> whether it is being used while defining a custom GUC.  I'll adjust 0003
> with that approach unless a better idea prevails.

Here's a new patch set.  I've only changed 0003 as described above.  My
apologies for the unnecessary round trip.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 9ffd0af3215eb667a9df53df3ff43a063bfc2818 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 9 Apr 2022 15:51:07 -0700
Subject: [PATCH v2 1/3] Revert GetMaxBackends().

This reverts commits 0147fc7, 4567596, aa64f23, and 5ecd018.
---
 src/backend/access/nbtree/nbtutils.c|  4 +-
 src/backend/access/transam/multixact.c  | 31 +++-
 src/backend/access/transam/twophase.c   |  3 +-
 src/backend/commands/async.c| 12 ++-
 src/backend/libpq/pqcomm.c  |  3 +-
 src/backend/postmaster/auxprocess.c |  2 +-
 src/backend/postmaster/postmaster.c | 14 ++--
 src/backend/storage/ipc/dsm.c   |  2 +-
 src/backend/storage/ipc/procarray.c | 25 ++
 src/backend/storage/ipc/procsignal.c| 37 -
 src/backend/storage/ipc/sinvaladt.c |  4 +-
 src/backend/storage/lmgr/deadlock.c | 31 
 src/backend/storage/lmgr/lock.c | 23 +++---
 src/backend/storage/lmgr/predicate.c| 10 +--
 src/backend/storage/lmgr/proc.c | 17 ++--
 src/backend/utils/activity/backend_status.c | 88 ++---
 src/backend/utils/adt/lockfuncs.c   |  5 +-
 src/backend/utils/init/postinit.c   | 55 +++--
 src/include/miscadmin.h |  3 +-
 19 files changed, 142 insertions(+), 227 deletions(-)

diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 96c72fc432..fd1b53885c 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -2072,7 +2072,7 @@ BTreeShmemSize(void)
 	Size		size;
 
 	size = offsetof(BTVacInfo, vacuums);
-	size = add_size(size, mul_size(GetMaxBackends(), sizeof(BTOneVacInfo)));
+	size = add_size(size, mul_size(MaxBackends, sizeof(BTOneVacInfo)));
 	return size;
 }
 
@@ -2101,7 +2101,7 @@ BTreeShmemInit(void)
 		btvacinfo->cycle_ctr = (BTCycleId) time(NULL);
 
 		btvacinfo->num_vacuums = 0;
-		btvacinfo->max_vacuums = GetMaxBackends();
+		btvacinfo->max_vacuums = MaxBackends;
 	}
 	else
 		Assert(found);
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 45907d1b44..8f7d12950e 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -282,11 +282,12 @@ typedef struct MultiXactStateData
 } MultiXactStateData;
 
 /*
- * Pointers to the state data in shared memory
- *
- * The index of the last element of the OldestMemberMXactId and
- * OldestVisibleMXactId arrays can be obtained with GetMaxOldestSlot().
+ * Last element of OldestMemberMXactId and OldestVisibleMXactId arrays.
+ * Valid elements are (1..MaxOldestSlot); element 0 is never used.
  */
+#define MaxOldestSlot	(MaxBackends + max_prepared_xacts)
+
+/* Pointers to the state data in shared memory */
 static MultiXactStateData *MultiXactState;
 static MultiXactId *OldestMemberMXactId;
 static MultiXactId *OldestVisibleMXactId;
@@ -341,7 +342,6 @@ static void MultiXactIdSetOldestVisible(void);
 static void RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 			   int nmembers, MultiXactMember *members);
 static MultiXactId GetNewMultiXactId(int nmembers, MultiXactOffset *offset);
-static inline int GetMaxOldestSlot(void);
 
 /* MultiXact cache management */
 static int	mxactMemberComparator(const void *arg1, const void *arg2);
@@ -662,17 +662,6 @@ MultiXactIdSetOldestMember(void)
 	}
 }
 
-/*
- * Retrieve the index of the last element of the OldestMemberMXactId and
- * OldestVisibleMXactId arrays.  Valid elements are (1..MaxOldestSlot); element
- * 0 is never used.
- */
-static inline int
-GetMaxOldestSlot(void)
-{
-	return GetMaxBackends() + max_prepared_xacts;
-}
-
 /*
  * MultiXactIdSetOldestVisible
  *		Save the oldest MultiXactId this transaction considers possibly live.
@@ -695,7 +684,6 @@ MultiXactIdSetOldestVisible(void)
 	if (!MultiXactIdIsValid(OldestVisibleMXactId[MyBackendId]))
 	{
 		MultiXactId oldestMXact;
-		int			maxOldestSlot = GetMaxOldestSlot();
 		int

Re: Fixing code that ignores failure of XLogRecGetBlockTag

2022-04-11 Thread Robert Haas

On Mon, Apr 11, 2022 at 2:20 PM Tom Lane  wrote:
> Currently, XLogRecGetBlockTag has 41 callers, of which only four
> bother to check the function's result.  The remainder take it on
> faith that they got valid data back, and many of them will
> misbehave in seriously nasty ways if they didn't.  (This point
> was drawn to my attention by a Coverity complaint.)
>
> I think we should make this a little less fragile.  Since we
> already have XLogRecGetBlockTagExtended, I propose that callers
> that need to handle the case of no-such-block must use that,
> while XLogRecGetBlockTag throws an error.  The attached patch
> fixes that up, and also cleans up some random inconsistency
> about use of XLogRecHasBlockRef().

Looks reasonable.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: make MaxBackends available in _PG_init

2022-04-11 Thread Nathan Bossart

On Mon, Apr 11, 2022 at 04:36:36PM -0400, Robert Haas wrote:
> On Mon, Apr 11, 2022 at 12:44 PM Nathan Bossart
>  wrote:
>> Here are some patches.  0001 reverts all the recent commits in this area,
>> 0002 is the patch I posted in August, and 0003 is an attempt at blocking
>> GUC changes in preloaded libraries' _PG_init() functions.
> 
> If we throw an error while defining_custom_guc is true, how will it
> ever again become false?

Ah, I knew I was forgetting something this morning.

It won't, but the only place it is presently needed is when loading
shared_preload_libraries, so I believe startup will fail anyway.  However,
I can see defining_custom_guc being used elsewhere, so that is probably not
good enough.  Another approach could be to add a static
set_config_option_internal() function with a boolean argument to indicate
whether it is being used while defining a custom GUC.  I'll adjust 0003
with that approach unless a better idea prevails.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: make MaxBackends available in _PG_init

2022-04-11 Thread Robert Haas

On Mon, Apr 11, 2022 at 12:44 PM Nathan Bossart
 wrote:
> Here are some patches.  0001 reverts all the recent commits in this area,
> 0002 is the patch I posted in August, and 0003 is an attempt at blocking
> GUC changes in preloaded libraries' _PG_init() functions.

If we throw an error while defining_custom_guc is true, how will it
ever again become false?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Temporary file access API

2022-04-11 Thread Robert Haas

On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska  wrote:
> There are't really that many kinds of files to encrypt:
>
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
>
> (And pg_stat/* files should be removed from the list.)

This kind of gets into some theoretical questions. Like, do we think
that it's an information leak if people can look at how many
transactions are committing and aborting in pg_xact_status? In theory
it could be, but I know it's been argued that that's too much of a
side channel. I'm not sure I believe that, but it's arguable.
Similarly, the argument that global/pg_internal.init doesn't contain
user data relies on the theory that the only table data that will make
its way into the file is for system catalogs. I guess that's not user
data *exactly* but ... are we sure that's how we want to roll here?

I really don't know how you can argue that pg_dynshmem/mmap.NNN
doesn't contain user data - surely it can.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: How about a psql backslash command to show GUCs?

2022-04-11 Thread Tom Lane

Christoph Berg  writes:
> Plus maybe making initdb not set values to their default if the auto probing 
> ends up at that values.

Seems a bit fragile: we'd have to make sure that initdb knew what the
boot_val is.  IIRC, some of those are not necessarily immutable constants,
so there'd be room for error.

regards, tom lane

Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2022-04-11 Thread SATYANARAYANA NARLAPURAM

On Sun, Apr 10, 2022 at 11:16 PM Kyotaro Horiguchi 
wrote:

> Sorry for the terrible typos..
>
> At Sat, 9 Apr 2022 18:03:01 +0530, Bharath Rupireddy <
> bharath.rupireddyforpostg...@gmail.com> wrote in
> > On Tue, Jan 4, 2022 at 1:40 AM SATYANARAYANA NARLAPURAM
> >  wrote:
> > >
> > > On Sun, Jan 2, 2022 at 11:56 PM Michael Paquier 
> wrote:
> > >> Are you referring to the pre-padding when creating a new partial
> > >> segment, aka when we write chunks of XLOG_BLCKSZ full of zeros until
> > >> the file is fully created?  What kind of error did you see?  I guess
> > >> that a write() with ENOSPC would be more likely, but you got a
> > >> different problem?
> > >
> > > I see two cases, 1/ when no space  is left on the device and 2/ when
> the process is taken down forcibly (a VM/container crash)
> >
> > Yeah, these cases can occur leaving uninitialized .partial files which
> > can be a problem for both pg_receivewal and pg_basebackup that uses
> > dir_open_for_write (CreateWalDirectoryMethod).
> >
> > >>   I don't disagree with improving such cases, but we
> > >> should not do things so as there is a risk of leaving behind an
> > >> infinite set of segments in case of repeated errors
> > >
> > > Do you see a problem with the proposed patch that leaves the files
> behind, at least in my testing I don't see any files left behind?
>
> I guess that Michael took this patch as creating a temp file with a
> name such like "tmp.n" every time finding an incomplete file.
>
> > With the proposed patch, it doesn't leave the unpadded .partial files.
> > Also, the v2 patch always removes a leftover .partial.temp file before
> > it creates a new one.
> >
> > >> , and partial
> > >> segments are already a kind of temporary file.
>
> I'm not sure this is true for pg_receivewal case.  The .partial file
> is not a temporary file but the current working file for the tool.
>

Correct. The idea is to make sure the file is fully allocated before
treating it as a current file.


>
> > > if the .partial file exists with not zero-padded up to the wal segment
> size (WalSegSz), then open_walfile fails with the below error. I have two
> options here, 1/ to continue padding the existing partial file and let it
> zero up to WalSegSz , 2/create a temp file as I did in the patch. I thought
> the latter is safe because it can handle corrupt cases as described below.
> Thoughts?
>
> I think this patch shouldn't involve pg_basebackup.  I agree to Cary
> that deleting the erroring file should be fine.
>
> We already "skipping" (complete = non-.partial) WAL files with a wrong
> size in FindStreamingStart so we can error-out with suggesting a hint.
>
> $ pg_receivewal -D xlog -p 5432 -h /tmp
> pg_receivewal: error: segment file "0001002200F5.partial" has
> incorrect size 8404992
> hint: You can continue after removing the file.
>

The idea here is to make pg_receivewal self sufficient and reduce
human/third party tool interaction. Ideal case is running pg_Receivewal as
a service for wal archiving.


>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

Re: How about a psql backslash command to show GUCs?

2022-04-11 Thread Christoph Berg

Plus maybe making initdb not set values to their default if the auto probing 
ends up at that values.

Christoph

Re: How about a psql backslash command to show GUCs?

2022-04-11 Thread Tom Lane

"Jonathan S. Katz"  writes:
> My question is if we're only going to list out the settings that are 
> customized, are we going to:

> 1. Hide a setting if it matches a default value, even if a user set it 
> to be the default value? OR
> 2. Comment out all of the settings in a generated postgresql.conf file?

As committed, it prints anything that's shown as "source != 'default'"
in pg_settings, which means anything for which the value wasn't
taken from the wired-in default.  I suppose an alternative definition
could be "setting != boot_val".  Not really sure if that's better.

This idea does somewhat address my unhappiness upthread about printing
values with source = 'internal', but I see that it gets confused by
some GUCs with custom show hooks, like unix_socket_permissions.
Maybe it needs to be "source != 'default' AND setting != boot_val"?

regards, tom lane

Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-11 Thread Markus Wanner

On Mon, 2022-04-11 at 15:21 -0400, Robert Haas wrote:
> ... before v13, the commit in question actually
> changed the size of PGXACT, which is really quite bad -- it needs to
> be 12 bytes for performance reasons. And there's no spare bytes
> available, so I think we should follow one of the suggestions that he
> had over in that email thread, and put delayChkptEnd in PGPROC even
> though delayChkpt is in PGXACT.

This makes sense to me.  Kudos to Kyotaro for considering this.

At first read, this sounded like a trade-off between compatibility and
performance for PG 12 and older.  But I realize leaving delayChkpt in
PGXACT and adding just delayChkptEnd to PGPROC is compatible and leaves
PGXACT at a size of 12 bytes.  So this sounds like a good approach to
me.

Best Regards

Markus

Re: How about a psql backslash command to show GUCs?

2022-04-11 Thread Jonathan S. Katz


On 4/11/22 3:12 PM, Tom Lane wrote:

"Jonathan S. Katz"  writes:

On 4/9/22 12:27 PM, Tom Lane wrote:

Sure, but then you do "\dconfig *".  With there being several hundred
GUCs (and no doubt more coming), I'm not sure that "show me every GUC"
is a common use-case at all, let alone so common as to deserve being
the default behavior.

One thing we could perhaps do to reduce confusion is to change the
table heading when doing this, say from "List of configuration parameters"
to "List of non-default configuration parameters".



Reasonable points. I don't have any objections to this proposal.


Hearing no further comments, done like that.


Thanks!

I have a usability comment based on my testing.

I ran "\dconfig" and one of the settings that came up was 
"max_connections" which was set to 100, or the default.


I had noticed that "max_connections" was uncommented in my 
postgresql.conf file, which I believe was from the autogenerated 
provided by initdb.


Similarly, min_wal_size/max_wal_size were in the list and set to their 
default values. These were also uncommented in my postgresql.conf from 
the autogeneration.


My question is if we're only going to list out the settings that are 
customized, are we going to:


1. Hide a setting if it matches a default value, even if a user set it 
to be the default value? OR

2. Comment out all of the settings in a generated postgresql.conf file?

Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-11 Thread Robert Haas

On Fri, Apr 8, 2022 at 11:50 AM Robert Haas  wrote:
> On Fri, Apr 8, 2022 at 4:47 AM Markus Wanner
>  wrote:
> > I agree with Michael, it would be nice to not duplicate the code, but
> > use a common underlying method.  A modified patch is attached.
>
> I don't think this is better, but I don't think it's worth arguing
> about, either, so I'll do it this way if nobody objects.
>
> Meanwhile, I've committed the patch for master to master.

Well, I've just realized that Kyotaro Horiguchi volunteered to fix
this on an email thread I did not see because of the way Gmail breaks
the thread if you change the subject line. And he developed a very
similar patch to what we have here. I'm going to use this one as the
basis for going forward because I've already studied it in detail and
it's less work for me to stick with what I know than to go study
something else. But, he also noticed something which we didn't notice
here, which is that before v13, the commit in question actually
changed the size of PGXACT, which is really quite bad -- it needs to
be 12 bytes for performance reasons. And there's no spare bytes
available, so I think we should follow one of the suggestions that he
had over in that email thread, and put delayChkptEnd in PGPROC even
though delayChkpt is in PGXACT.

Comments?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: random() function documentation

2022-04-11 Thread Tom Lane

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Tom Lane  writes:
>> How about we just say "uses a linear-feedback shift register algorithm"?

> That works for me.  Nice and simple, and not overly specific.  Should we
> perhaps also add a warning that the same seed is not guaranteed to
> produce the same sequence across different (major?) versions?

I wouldn't bother, on the grounds that then we'd need such disclaimers
in a whole lot of places.  Others might see it differently though.

regards, tom lane

Re: How about a psql backslash command to show GUCs?

2022-04-11 Thread Tom Lane

"Jonathan S. Katz"  writes:
> On 4/9/22 12:27 PM, Tom Lane wrote:
>> Sure, but then you do "\dconfig *".  With there being several hundred
>> GUCs (and no doubt more coming), I'm not sure that "show me every GUC"
>> is a common use-case at all, let alone so common as to deserve being
>> the default behavior.
>> 
>> One thing we could perhaps do to reduce confusion is to change the
>> table heading when doing this, say from "List of configuration parameters"
>> to "List of non-default configuration parameters".

> Reasonable points. I don't have any objections to this proposal.

Hearing no further comments, done like that.

regards, tom lane

Re: random() function documentation

2022-04-11 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> =?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
>> I just noticed that the since the random() rewrite¹, the documentation's
>> claim² that it "uses a simple linear congruential algorithm" is no
>> longer accurate (xoroshiro128** is an xorshift variant, which is a
>> linear-feedback shift register algorithm).
>
>> I don't have a suggestion for the exact wording, since I don't know
>> whether xoroshiro128** qualifies as "simple", or to what level of
>> specificity we want to document the algorithm.
>
> How about we just say "uses a linear-feedback shift register algorithm"?

That works for me.  Nice and simple, and not overly specific.  Should we
perhaps also add a warning that the same seed is not guaranteed to
produce the same sequence across different (major?) versions?

> "Simple" is in the eye of the beholder anyway.

Indeed.

>   regards, tom lane

- ilmari

Re: [COMMITTERS] pgsql: Allow time delayed standbys and recovery

2022-04-11 Thread Thom Brown

On Mon, 11 Apr 2022, 15:55 Robert Haas,  wrote:

> On Fri, Apr 8, 2022 at 11:10 AM Robert Haas  wrote:
> > On Fri, Apr 8, 2022 at 10:45 AM Thom Brown  wrote:
> > > Thanks. This doesn't include my self-correction:
> > >
> > > s/kept on standby/kept on the standby/
> >
> > Here is v2, endeavoring to rectify that oversight.
>
> Committed.
>

Much appreciated.

Thom

>

Re: random() function documentation

2022-04-11 Thread Tom Lane

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> I just noticed that the since the random() rewrite¹, the documentation's
> claim² that it "uses a simple linear congruential algorithm" is no
> longer accurate (xoroshiro128** is an xorshift variant, which is a
> linear-feedback shift register algorithm).

> I don't have a suggestion for the exact wording, since I don't know
> whether xoroshiro128** qualifies as "simple", or to what level of
> specificity we want to document the algorithm.

How about we just say "uses a linear-feedback shift register algorithm"?
"Simple" is in the eye of the beholder anyway.

regards, tom lane

Re: PostgreSQL Program Application

2022-04-11 Thread Stephen Frost

Greetings,

* Qiongwen Liu (qiongwen7...@berkeley.edu) wrote:
> Hi, I am Qiongwen Liu, and I am studying at the University of California,
> Berkeley as an economics and computer science major. I am interested in the
> Develop Performance Farm Benchmarks and Website of PostgreSQL in
> this program, attached below is my application. Please check it out. Thank
> you.

Great, thanks!  Will respond off-list.

Stephen


signature.asc
Description: PGP signature

Re: GSoC: pgBackRest port to Windows

2022-04-11 Thread Stephen Frost

Greetings,

* Samuel Bassaly (shksh...@gmail.com) wrote:
> My name is Samuel Bassaly, and I would like to submit my proposal for this
> year's GSoC.

> Your feedback is highly appreciated.

Great, thanks!  Will respond off-list.

Stephen


signature.asc
Description: PGP signature

random() function documentation

2022-04-11 Thread Dagfinn Ilmari Mannsåker

Hi Hackers,

I just noticed that the since the random() rewrite¹, the documentation's
claim² that it "uses a simple linear congruential algorithm" is no
longer accurate (xoroshiro128** is an xorshift variant, which is a
linear-feedback shift register algorithm).

I don't have a suggestion for the exact wording, since I don't know
whether xoroshiro128** qualifies as "simple", or to what level of
specificity we want to document the algorithm.

- ilmari

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3804539e48e794781c6145c7f988f5d507418fa8
[2] 
https://www.postgresql.org/docs/devel/functions-math.html#FUNCTIONS-MATH-RANDOM-TABLE

Re: [Proposal] vacuumdb --schema only

2022-04-11 Thread Nathan Bossart

On Fri, Apr 08, 2022 at 05:16:06PM +0200, Gilles Darold wrote:
> Attached v7 of the patch that should pass cfbot.

Thanks for the new patch!  Unfortunately, it looks like some recent changes
have broken it again.

> +enum trivalue schema_is_exclude = TRI_DEFAULT;
> +
> +/*
> + * The kind of object filter to use. '0': none, 'n': schema, 't': table
> + * these values correspond to the -n | -N and -t command line options.
> + */
> +char objfilter = '0';

I think these should be combined into a single enum for simplicity and
readability (e.g., OBJFILTER_NONE, OBJFILTER_INCLUDE_SCHEMA,
OBJFILTER_EXCLUDE_SCHEMA, OBJFILTER_TABLE).

>* Instead, let the server decide whether a given relation can be
>* processed in which case the user will know about it.
>*/
> - if (!tables_listed)
> + if (!objects_listed || objfilter == 'n')
>   {
>   appendPQExpBufferStr(&catalog_query, " WHERE c.relkind 
> OPERATOR(pg_catalog.=) ANY (array["
>
> CppAsString2(RELKIND_RELATION) ", "

I think this deserveѕ a comment.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Fixing code that ignores failure of XLogRecGetBlockTag

2022-04-11 Thread Tom Lane

Currently, XLogRecGetBlockTag has 41 callers, of which only four
bother to check the function's result.  The remainder take it on
faith that they got valid data back, and many of them will
misbehave in seriously nasty ways if they didn't.  (This point
was drawn to my attention by a Coverity complaint.)

I think we should make this a little less fragile.  Since we
already have XLogRecGetBlockTagExtended, I propose that callers
that need to handle the case of no-such-block must use that,
while XLogRecGetBlockTag throws an error.  The attached patch
fixes that up, and also cleans up some random inconsistency
about use of XLogRecHasBlockRef().

regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a03122df8d..ba11bcd99e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -9363,7 +9363,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	oldtup.t_len = 0;
 
 	XLogRecGetBlockTag(record, 0, &rnode, NULL, &newblk);
-	if (XLogRecGetBlockTag(record, 1, NULL, NULL, &oldblk))
+	if (XLogRecGetBlockTagExtended(record, 1, NULL, NULL, &oldblk, NULL))
 	{
 		/* HOT updates are never done across pages */
 		Assert(!hot_update);
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index fba124b940..f9186ca233 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -267,7 +267,7 @@ btree_xlog_split(bool newitemonleft, XLogReaderState *record)
 
 	XLogRecGetBlockTag(record, 0, NULL, NULL, &origpagenumber);
 	XLogRecGetBlockTag(record, 1, NULL, NULL, &rightpagenumber);
-	if (!XLogRecGetBlockTag(record, 2, NULL, NULL, &spagenumber))
+	if (!XLogRecGetBlockTagExtended(record, 2, NULL, NULL, &spagenumber, NULL))
 		spagenumber = P_NONE;
 
 	/*
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index dff1e7685e..c0dfea40c7 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -219,15 +219,14 @@ XLogRecGetBlockRefInfo(XLogReaderState *record, bool pretty,
 
 	for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
 	{
-		RelFileNode rnode = {InvalidOid, InvalidOid, InvalidOid};
-		ForkNumber	forknum = InvalidForkNumber;
-		BlockNumber blk = InvalidBlockNumber;
+		RelFileNode rnode;
+		ForkNumber	forknum;
+		BlockNumber blk;
 
-		if (!XLogRecHasBlockRef(record, block_id))
+		if (!XLogRecGetBlockTagExtended(record, block_id,
+		&rnode, &forknum, &blk, NULL))
 			continue;
 
-		XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blk);
-
 		if (detailed_format)
 		{
 			/* Get block references in detailed format. */
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index b3e37003ac..cf5db23cb8 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -37,6 +37,8 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "utils/memutils.h"
+#else
+#include "common/logging.h"
 #endif
 
 static void report_invalid_record(XLogReaderState *state, const char *fmt,...)
@@ -1918,14 +1920,25 @@ err:
 
 /*
  * Returns information about the block that a block reference refers to.
- * See XLogRecGetBlockTagExtended().
+ *
+ * This is like XLogRecGetBlockTagExtended, except that the block reference
+ * must exist and there's no access to prefetch_buffer.
  */
-bool
+void
 XLogRecGetBlockTag(XLogReaderState *record, uint8 block_id,
    RelFileNode *rnode, ForkNumber *forknum, BlockNumber *blknum)
 {
-	return XLogRecGetBlockTagExtended(record, block_id, rnode, forknum, blknum,
-	  NULL);
+	if (!XLogRecGetBlockTagExtended(record, block_id, rnode, forknum, blknum,
+	NULL))
+	{
+#ifndef FRONTEND
+		elog(ERROR, "failed to locate backup block with ID %d in WAL record",
+			 block_id);
+#else
+		pg_fatal("failed to locate backup block with ID %d in WAL record",
+ block_id);
+#endif
+	}
 }
 
 /*
@@ -1944,8 +1957,7 @@ XLogRecGetBlockTagExtended(XLogReaderState *record, uint8 block_id,
 {
 	DecodedBkpBlock *bkpb;
 
-	if (block_id > record->record->max_block_id ||
-		!record->record->blocks[block_id].in_use)
+	if (!XLogRecHasBlockRef(record, block_id))
 		return false;
 
 	bkpb = &record->record->blocks[block_id];
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 4ee29182ac..26be94b3f1 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -2172,10 +2172,10 @@ xlog_block_info(StringInfo buf, XLogReaderState *record)
 		ForkNumber	forknum;
 		BlockNumber blk;
 
-		if (!XLogRecHasBlockRef(record, block_id))
+		if (!XLogRecGetBlockTagExtended(record, block_id,
+		&rnode, &forknum, &blk, NULL))
 			continue;
 
-		XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blk);
 		if (forknum != MAIN_FORKNUM)
 			appendStringInfo(buf, "; blkre

Re: API stability

2022-04-11 Thread Robert Haas

On Mon, Apr 11, 2022 at 6:48 AM Matthias van de Meent
 wrote:
> So, this might be the reason Robert overlooked your declaration to
> volunteer: he was looking for volunteers in the thread "Re: API
> Stability [was: pgsql: ...]" in the Gmail UI, which didn't show your
> messages there because of the different subject line.

Yes, that's what happened.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: typos

2022-04-11 Thread Robert Haas

On Mon, Apr 11, 2022 at 4:56 AM David Rowley  wrote:
> 0011 (Could do with input from Robert and Joe)

Seems like a reasonable change to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Support logical replication of DDLs

2022-04-11 Thread Zheng Li

> I'm planning to work on the initial DDL replication. I'll open a new thread as
> soon as I write a design for it. Just as an example, the pglogical approach is
> to use pg_dump behind the scenes to provide the schema [1]. It is a reasonable
> approach but an optimal solution should be an API to provide the initial DDL
> commands. I mean the main point of this feature is to have an API to create an
> object that the logical replication can use it for initial schema
> synchronization. This "DDL to create an object" was already discussed in the
> past [2].

Nice! I think automatic initial schema synchronization for replication
is a very useful feature.

Regards,
Zheng

Re: PG DOCS - logical replication filtering

2022-04-11 Thread Euler Taveira

On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote:
> On Mon, Apr 11, 2022 at 12:39 PM Peter Smith  wrote:
> >
> > On Fri, Apr 8, 2022 at 4:12 PM Peter Smith  wrote:
> >
> > OK. Added in v7 [1]
> >
> 
> Thanks, this looks mostly good to me. I didn't like the new section
> added for partitioned tables examples, so I removed it and added some
> explanation of the tests. I have slightly changed a few other lines. I
> am planning to commit the attached tomorrow unless there are more
> comments.
I have a few comments.

> If a publication publishes UPDATE and/or DELETE operations ...
>

If we are talking about operations, use lowercase like I suggested in the
previous version. See the publish parameter [1]. If we are talking about
commands, the word "operations" should be removed or replaced by "commands".
The "and/or" isn't required, "or" implies the same. If you prefer "operations",
my suggestion is to adjust the last sentence that says "only INSERT" to "only
insert operation".

> If the old row satisfies the row filter expression (it was sent to the
> subscriber) but the new row doesn't, then from a data consistency perspective
> the old row should be removed from the subscriber.
>

I suggested small additions to this sentence. We should at least add a comma
after "then" and "perspective". The same applies to the next paragraph too.

Regarding the "Summary", it is redundant as I said before. We already described
all 4 cases. I vote to remove it. However, if we would go with a table, I
suggest to improve the formatting: add borders, "old row" and "new row" should
be titles, "no match" and "match" should be represented by symbols ("" and "X",
for example), and "Case X" column should be removed because this extra column
adds nothing.

Regarding the "Partitioned Tables", I suggested to remove the bullets. We
generally have paragraphs with a few sentences. I tend to avoid short
paragraphs. I also didn't like the 2 bullets using almost the same words. In
the previous version, I suggested something like

If the publication contains a partitioned table, the parameter
publish_via_partition_root determines which row filter expression is used. If
the parameter publish_via_partition_root is true, the row filter expression
associated with the partitioned table is used. Otherwise, the row filter
expression associated with the individual partition is used.

> will be copied. (see Section 31.3.6 for details).

There is an extra period after "copied" that should be removed. The other
option is to remove the parentheses and have another sentence for "See ...".

> those expressions get OR'ed together

I prefer plain English here. This part of the sentence is also redundant with
the rest of the sentence so I suggested to remove it in the previous version.

rows that satisfy any of the row filter expressions is replicated.

instead of

those expressions get OR'ed together, so that rows satisfying any of the
expressions will be replicated.

I also didn't use a different paragraph (like I suggested in the previous
version) because we are talking about the same thing.

The bullets in the example sounds strange, that's why I suggested removing it.
We can even combine the 3 sentences into one paragraph.

> The PSQL command \dRp+ shows the row filter expressions (if defined) for each
> table of the publications.

Well, we don't use PSQL (upppercase) in the documentation. I suggested a
different sentence:

The psql shows the row filter expressions (if defined) for each table.

> The PSQL command \d shows what publications the table is a member of, as well
> as that table's row filter expression (if defined) in those publications.

It is not logical replication business to explain about psql behavior. If, for
some reason, someone decided to change it, this section will contain obsolete
information. The psql output is fine, the explanation is not. That's why I
suggested this modification.

> Only the rows satisfying the t1 WHERE clause of publication p1 are
> replicated.

Again, no bullets. This sentence is useful *before* the psql output. We're
presenting the results. Let's follow the pattern, describe the action and show
the results.

> The UPDATE replicates the change as normal.

This sentence should be *before* the psql output (see my previous version).

Regarding the new examples (for partitioned tables), shouldn't we move the
parent / child definitions to the beginning of the Examples section? It seems
confusing use the same code snippet to show repeated table definitions
(publisher and subscriber). I checked fast and after a few seconds I realized
that the example is not wrong but the database name has a small difference (one
letter "s" x "p"). The publication and subscription definitions are fine there.

I think reusing the same tables and publication introduces complexity.
Shouldn't we just use different tables and publication to provide an "easy"
example? It would avoid DROP PUBLICATION, ALTER SUBSCRIPTION and TRUNCATE.

> Do t

Re: Support logical replication of DDLs

2022-04-11 Thread Zheng Li

Hi Amit,

> Some initial comments:
> ===
> 1.
> +/*
> + * Write logical decoding DDL message into XLog.
> + */
> +XLogRecPtr
> +LogLogicalDDLMessage(const char *prefix, Oid roleoid, const char *message,
> + size_t size, bool transactional)
>
> I don't see anywhere the patch using a non-transactional message. Is
> it required for any future usage? The non-transactional behavior has
> some known consistency issues, see email [1].

The transactional flag is not required by the current usage. I thought
it might be useful if other logical decoding plugins want to log and
consume DDL messages in a non-transactional way. But I don't have a
specific use case yet.

> 2. For DDL replication, do we need to wait for a consistent point of
> snapshot? For DMLs, that point is a convenient point to initialize
> replication from, which is why we export a snapshot at that point,
> which is used to read normal data. Do we have any similar needs for
> DDL replication?

The current design requires manual schema initialization on the
subscriber before the logical replication setup.
As Euler Taveira pointed out, snapshot is needed in initial schema
synchronization. And that is a different topic.


> 3. The patch seems to be creating an entry in pg_subscription_rel for
> 'create' message. Do we need some handling on Drop, if not, why? I
> think adding some comments for this aspect would make it easier to
> follow.

It's already handled by existing logic in heap_drop_with_catalog:
https://github.com/zli236/postgres/blob/ddl_replication/src/backend/catalog/heap.c#L2005
I'll add some comment.

> 4. The handling related to partition tables seems missing because, on
> the subscriber-side, it always creates a relation entry in
> pg_subscription_rel which won't work. Check its interaction with
> publish_via_partition_root.

I will test it out.

>Even if this works, how will we make Alter Table statement work where
>it needs to rewrite the table? There also I think we can face a
>similar problem if we directly send the statement, once the table will
>be updated due to the DDL statement and then again due to table
>rewrite as that will have a separate WAL.

Yes, I think any DDL that can generate DML changes should be listed
out and handled properly or documented. Here is one extreme example
involving volatile functions:
ALTER TABLE nd_ddl ADD COLUMN t timestamp DEFAULT now().
Again, I think we need to somehow skip the data rewrite on the
subscriber when replicating such DDL and let DML replication handle
the rewrite.

>Another somewhat unrelated problem I see with this work is how to save
>recursion of the same command between nodes (when the involved nodes
>replicate DDLs). For DMLs, we can avoid that via replication origins
>as is being done in the patch proposed [1] but not sure how will we
>deal with that here?

I'll need to investigate "recursion of the same command between
nodes", could you provide an example?

Regards,
Zheng

Re: avoid multiple hard links to same WAL file after a crash

2022-04-11 Thread Nathan Bossart

On Mon, Apr 11, 2022 at 12:28:47PM -0400, Tom Lane wrote:
> Robert Haas  writes:
>> On Mon, Apr 11, 2022 at 5:12 AM Kyotaro Horiguchi
>>  wrote:
>>> If this diagnosis is correct, the comment is proved to be paranoid.
> 
>> It's sometimes difficult to understand what problems really old code
>> comments are worrying about. For example, could they have been
>> worrying about bugs in the code? Could they have been worrying about
>> manual interference with the pg_wal directory? It's hard to know.
> 
> "git blame" can be helpful here, if you trace back to when the comment
> was written and then try to find the associated mailing-list discussion.
> (That leap can be difficult for commits pre-dating our current
> convention of including links in the commit message, but it's usually
> not *that* hard to locate contemporaneous discussion.)

I traced this back a while ago.  I believe the link() was first added in
November 2000 as part of f0e37a8.  This even predates WAL recycling, which
was added in July 2001 as part of 7d4d5c0.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: make MaxBackends available in _PG_init

2022-04-11 Thread Nathan Bossart

On Mon, Apr 11, 2022 at 10:47:17AM -0400, Robert Haas wrote:
> On Sat, Apr 9, 2022 at 9:24 AM Julien Rouhaud  wrote:
>> > FWIW I would be on board with reverting all the GetMaxBackends() stuff if
>> > we made the value available in _PG_init() and stopped supporting GUC
>> > overrides by extensions (e.g., ERROR-ing in SetConfigOption() when
>> > process_shared_preload_libraries_in_progress is true).
>>
>> Yeah I would prefer this approach too, although it couldn't prevent extension
>> from directly modifying the underlying variables so I don't know how 
>> effective
>> it would be.
> 
> I think I also prefer this approach. I am willing to be convinced
> that's the wrong idea, but right now I favor it.

Here are some patches.  0001 reverts all the recent commits in this area,
0002 is the patch I posted in August, and 0003 is an attempt at blocking
GUC changes in preloaded libraries' _PG_init() functions.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 286e9d0c4d3de9d40a0021c9e18e06baf50abb74 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 9 Apr 2022 15:51:07 -0700
Subject: [PATCH v1 1/3] Revert GetMaxBackends().

This reverts commits 0147fc7, 4567596, aa64f23, and 5ecd018.
---
 src/backend/access/nbtree/nbtutils.c|  4 +-
 src/backend/access/transam/multixact.c  | 31 +++-
 src/backend/access/transam/twophase.c   |  3 +-
 src/backend/commands/async.c| 12 ++-
 src/backend/libpq/pqcomm.c  |  3 +-
 src/backend/postmaster/auxprocess.c |  2 +-
 src/backend/postmaster/postmaster.c | 14 ++--
 src/backend/storage/ipc/dsm.c   |  2 +-
 src/backend/storage/ipc/procarray.c | 25 ++
 src/backend/storage/ipc/procsignal.c| 37 -
 src/backend/storage/ipc/sinvaladt.c |  4 +-
 src/backend/storage/lmgr/deadlock.c | 31 
 src/backend/storage/lmgr/lock.c | 23 +++---
 src/backend/storage/lmgr/predicate.c| 10 +--
 src/backend/storage/lmgr/proc.c | 17 ++--
 src/backend/utils/activity/backend_status.c | 88 ++---
 src/backend/utils/adt/lockfuncs.c   |  5 +-
 src/backend/utils/init/postinit.c   | 55 +++--
 src/include/miscadmin.h |  3 +-
 19 files changed, 142 insertions(+), 227 deletions(-)

diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 96c72fc432..fd1b53885c 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -2072,7 +2072,7 @@ BTreeShmemSize(void)
 	Size		size;
 
 	size = offsetof(BTVacInfo, vacuums);
-	size = add_size(size, mul_size(GetMaxBackends(), sizeof(BTOneVacInfo)));
+	size = add_size(size, mul_size(MaxBackends, sizeof(BTOneVacInfo)));
 	return size;
 }
 
@@ -2101,7 +2101,7 @@ BTreeShmemInit(void)
 		btvacinfo->cycle_ctr = (BTCycleId) time(NULL);
 
 		btvacinfo->num_vacuums = 0;
-		btvacinfo->max_vacuums = GetMaxBackends();
+		btvacinfo->max_vacuums = MaxBackends;
 	}
 	else
 		Assert(found);
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 45907d1b44..8f7d12950e 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -282,11 +282,12 @@ typedef struct MultiXactStateData
 } MultiXactStateData;
 
 /*
- * Pointers to the state data in shared memory
- *
- * The index of the last element of the OldestMemberMXactId and
- * OldestVisibleMXactId arrays can be obtained with GetMaxOldestSlot().
+ * Last element of OldestMemberMXactId and OldestVisibleMXactId arrays.
+ * Valid elements are (1..MaxOldestSlot); element 0 is never used.
  */
+#define MaxOldestSlot	(MaxBackends + max_prepared_xacts)
+
+/* Pointers to the state data in shared memory */
 static MultiXactStateData *MultiXactState;
 static MultiXactId *OldestMemberMXactId;
 static MultiXactId *OldestVisibleMXactId;
@@ -341,7 +342,6 @@ static void MultiXactIdSetOldestVisible(void);
 static void RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 			   int nmembers, MultiXactMember *members);
 static MultiXactId GetNewMultiXactId(int nmembers, MultiXactOffset *offset);
-static inline int GetMaxOldestSlot(void);
 
 /* MultiXact cache management */
 static int	mxactMemberComparator(const void *arg1, const void *arg2);
@@ -662,17 +662,6 @@ MultiXactIdSetOldestMember(void)
 	}
 }
 
-/*
- * Retrieve the index of the last element of the OldestMemberMXactId and
- * OldestVisibleMXactId arrays.  Valid elements are (1..MaxOldestSlot); element
- * 0 is never used.
- */
-static inline int
-GetMaxOldestSlot(void)
-{
-	return GetMaxBackends() + max_prepared_xacts;
-}
-
 /*
  * MultiXactIdSetOldestVisible
  *		Save the oldest MultiXactId this transaction considers possibly live.
@@ -695,7 +684,6 @@ MultiXactIdSetOldestVisible(void)
 	if (!MultiXactIdIsValid(OldestVisibleMXactId[MyBackendId]))
 	{
 		MultiXactId oldestMXact;
-		int			maxOldestSlot

Re: Support logical replication of DDLs

2022-04-11 Thread Zheng Li

Hi,

> > Good catch. The reason for having isTopLevel in the condition is
> > because I haven't decided if a DDL statement inside a PL should
> > be replicated from the user point of view. For example, if I execute a
> > plpgsql function or a stored procedure which creates a table under the hood,
> > does it always make sense to replicate the DDL without running the same
> > function or stored procedure on the subscriber? It probably depends on
> > the specific
> > use case. Maybe we can consider making this behavior configurable by the 
> > user.
>
> But then this could be true for DML as well right?  Like after
> replicating the function to the subscriber if we are sending the DML
> done by function then what's the problem in DDL.  I mean if there is
> no design issue in implementing this then I don't think there is much
> point in blocking the same or even providing configuration for this.

Agreed. I'll unblock DDLs in functions/procedures and test it out. I
find out some DDLs in functions are replicated multiple times on the
subscriber while they should only be replicated once. Still trying to
figure out why.

>The patch no longer applies. The patch is a very good attempt, and I
>would also like to contribute if required.
>I have a few comments but will hold it till a rebased version is available.

Hi, Ajin. That'll be great! I'll rebase my github branch to master
head. In the meantime, this github branch still works
https://github.com/zli236/postgres/tree/ddl_replication

Regards,
Zheng

Re: avoid multiple hard links to same WAL file after a crash

2022-04-11 Thread Tom Lane

Robert Haas  writes:
> On Mon, Apr 11, 2022 at 5:12 AM Kyotaro Horiguchi
>  wrote:
>> If this diagnosis is correct, the comment is proved to be paranoid.

> It's sometimes difficult to understand what problems really old code
> comments are worrying about. For example, could they have been
> worrying about bugs in the code? Could they have been worrying about
> manual interference with the pg_wal directory? It's hard to know.

"git blame" can be helpful here, if you trace back to when the comment
was written and then try to find the associated mailing-list discussion.
(That leap can be difficult for commits pre-dating our current
convention of including links in the commit message, but it's usually
not *that* hard to locate contemporaneous discussion.)

regards, tom lane

Re: SQL/JSON: functions

2022-04-11 Thread Tom Lane

Justin Pryzby  writes:
> On Mon, Apr 11, 2022 at 11:54:11AM -0400, Andrew Dunstan wrote:
>> I will deal with the structural issues soon.

> I didn't actually intend this as a complaint.  The ability for a reference to
> be specific and granular is good.  So there may be reasons to change the
> structure, but my comment is only about a +0.1.

I was going to complain about it on the grounds that it looks
approximately nothing like other sections of func.sgml.  The layout
of the rest of that file is hard-won presentation knowledge and
should not be lightly ignored.

regards, tom lane

Re: avoid multiple hard links to same WAL file after a crash

2022-04-11 Thread Robert Haas

On Mon, Apr 11, 2022 at 5:12 AM Kyotaro Horiguchi
 wrote:
> So, the only thing we need to care is segment switch. Without it, the
> segment that InstallXLogFileSegment found by the stat loop is known to
> be safe to overwrite even if exists.
>
> When segment switch finds an existing file, it's no problem since the
> segment switch doesn't create a new segment.  Otherwise segment switch
> always calls InstallXLogFileSegment.  The section from searching for
> an empty segmetn slot until calling durable_rename_excl() is protected
> by ControlFileLock. Thus if a process is in the section, no other
> process can switch to a newly-created segment.
>
> If this diagnosis is correct, the comment is proved to be paranoid.

It's sometimes difficult to understand what problems really old code
comments are worrying about. For example, could they have been
worrying about bugs in the code? Could they have been worrying about
manual interference with the pg_wal directory? It's hard to know.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: SQL/JSON: functions

2022-04-11 Thread Justin Pryzby

On Mon, Apr 11, 2022 at 11:54:11AM -0400, Andrew Dunstan wrote:
> >> BTW, the documentation references look a little like OIDs...
> >> Does someone already have an SNMP-based doc browser ?
> >> | For details, see Section 9.16.3.4.2.
> >
> > I already had a couple of these items on my list but I ran out of time
> > before tiredness overcame me last night.
> >
> > I'm planning on removing some of that stuff that generates the last
> > complaint if I can do it without too much violence.
> 
> I will deal with the structural issues soon.

I didn't actually intend this as a complaint.  The ability for a reference to
be specific and granular is good.  So there may be reasons to change the
structure, but my comment is only about a +0.1.

Re: SQL/JSON: functions

2022-04-11 Thread Andrew Dunstan



On 2022-04-08 Fr 08:15, Andrew Dunstan wrote:
> On 4/8/22 08:02, Justin Pryzby wrote:
>> On Thu, Mar 31, 2022 at 04:25:58PM -0400, Andrew Dunstan wrote:
>>> No code chunks left, only a documentation patch which should land
>> Documentation review for a6baa4bad.
>>
>>> Construct a JSON the provided strings:
>> a JSON what ?
>> *from* the provided strings ?
>>
>>> Construct a JSON from the provided values various types:
>> should say "a JSON scalar" ?
>> *of* various types ?
>>
>>> Construct a JSON object from the provided key/value pairs of various types:
>> For comparison, that one looks ok.
>>
>> +  JSON_EXISTS function checks whether the provided
>> +   JSON_VALUE function extracts a value from the 
>> provided
>> +   JSON_QUERY function extracts an 
>> SQL/JSON
>> +  JSON_TABLE function queries 
>> JSON data
>> + JSON_TABLE uses the
>> +  JSON_SERIALIZE function transforms a SQL/JSON 
>> value
>>
>> I think all these should all begin with "THE >...< function ...", like the
>> others do.
>>
>> +To use other types, you must create the CAST from 
>> json for this type.
>> => create a cast from json to this type.
>>
>> +Values can be null, but not keys.
>> I think it's clearer to say "..but keys cannot."
>>
>> +  For any scalar other than a number or a Boolean the text
>>
>> Boolean COMMA the text
>>
>> + The path name must be unique and cannot coincide with column names.
>> Maybe say "name must be unique and distinct from the column names."
>>
>> +  ... If you specify a GROUP BY
>> +  or an ORDER BY clause, this function returns a 
>> separate JSON object
>> +  for each table row.
>>
>> "for each table row" sounds inaccurate or imprecise.  The SELECT docs say 
>> this:
>> | GROUP BY will condense into a single row all selected rows that share the 
>> same values for the grouped expressions
>>
>> BTW, the documentation references look a little like OIDs...
>> Does someone already have an SNMP-based doc browser ?
>> | For details, see Section 9.16.3.4.2.
>
>
> Many thanks, useful.
>
> I already had a couple of these items on my list but I ran out of time
> before tiredness overcame me last night.
>
> I'm planning on removing some of that stuff that generates the last
> complaint if I can do it without too much violence.
>
>

I have attended to most of these. I just removed the GROUP BY sentence,
I don't think it added anything useful. We already refer users to the
aggregates page.


I will deal with the structural issues soon.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: make MaxBackends available in _PG_init

2022-04-11 Thread Julien Rouhaud

Hi,

On Mon, Apr 11, 2022 at 10:47:17AM -0400, Robert Haas wrote:
> On Sat, Apr 9, 2022 at 9:24 AM Julien Rouhaud  wrote:
>
> > On the bright side, I see that citus is using SetConfigOption() to increase
> > max_prepared_transactions [1].  That's the only extension mentioned in that
> > thread that does modify some GUC, and this GUC was already marked as
> > PGDLLIMPORT since 2017 so it probably wasn't done that way for Windows
> > compatibility reason.
>
> I don't quite understand this, but I think if we want to support this
> kind of thing it needs a separate hook.

My point was that even if we say we don't support this, we have no way to
actually fully enforce it, as the variables are exported.  So it's good that
the only know case is using the GUC API, since we can enforce this one.

Re: Schema variables - new implementation for Postgres 15+1

2022-04-11 Thread Julien Rouhaud

Hi,

On Sun, Apr 10, 2022 at 03:43:33PM -0500, Justin Pryzby wrote:
> On Sun, Apr 10, 2022 at 08:30:39PM +0200, Pavel Stehule wrote:
> > I am sending fresh rebased patch + separation to more patches. This split
> > is initial, and can be changed later
> 
> The 0001 patch requires this, but it's not included until 0003.
> src/include/commands/session_variable.h
> 
> Each patch should compile and pass tests with the preceding patches, without
> the following patches.  I think the regression tests should be included with
> their corresponding patch.  Maybe it's ok to separate out the changes for
> pg_dump, docs, and psql - but they'd have to be merged together eventually.
> I realize some of this runs counter to Julien's suggestion to split patches.

Note that most of my suggestions were only to make the patch easier to review,
which was mostly trying to limit a bit the core of the new code.

Unfortunately, given the feature we can't really split the patch in many and
smaller parts and expect them to be completely self contained, so I'm not
against splitting smaller chunks like psql support and whatnot.  But I'm not
convinced that it will make it easier to review.

Re: Frontend error logging style

2022-04-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 08.04.22 22:26, Tom Lane wrote:
>>> I think we should put a centralized level check
>>> into logging.c, and get rid of at least the "if (likely())"
>>> checks, because those are going to succeed approximately 100.0%
>>> of the time.  Maybe there's an argument for keeping the unlikely()
>>> ones.

> Yeah, that seems ok to change.  The previous coding style is more useful 
> if you have a lot of debug messages in a hot code path, but that usually 
> doesn't apply to where this is used.

The patch I presented keeps the unlikely() checks in the debug-level
macros.  Do you think we should drop those too?  I figured that avoiding
evaluating the arguments would be worth something.

regards, tom lane

Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}

2022-04-11 Thread Robert Haas

On Mon, Apr 11, 2022 at 2:52 AM Michael Paquier  wrote:
> Since ba5 and the introduction of LZ4, I have reworked the way
> compression is controlled for pg_receivewal, with two options:
> - --compress-method, settable to "gzip", "none" or "lz4".
> - --compress, to pass down a compression level, where the allowed
> range is 1-9.  If passing down 0, we'd get an error rather than
> implying no compression, contrary to what we did in ~14.
>
> I initially thought that this was fine as-is, but then Robert and
> others have worked on client/server compression for pg_basebackup,
> introducing a much better design with centralized APIs where one can
> use METHOD:DETAIL for as compression value, where DETAIL is a
> comma-separated list of keyword=value (keyword = "level" or
> "workers"), with centralized checks and an extensible design.
>
> This is something I think we had better fix before beta1, because now
> we have binaries that use an inconsistent set of options.  So,
> attached is a patch set aimed at rework this option set from the
> ground, taking advantage of the recent work done by Robert and others
> for pg_basebackup:

+1 for this in general, but I think that naming like
"compression_algo" stinks. If you think "compression_algorithm" is too
long, I think you should use "algorithm" or "compression" or
"compression_method" or something.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: CLUSTER on partitioned index

2022-04-11 Thread Zhihong Yu

On Mon, Apr 11, 2022 at 7:06 AM Justin Pryzby  wrote:

> On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote:
> > Small things here.
>
> > 1. in VACUUM FULL we only process partitions that are owned by the
> > invoking user.  We don't have this test in the new code.  I'm not sure
> > why do we do that there; is it worth doing the same here?
>
> That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in
> CLUSTER
> itself).  The reason was to avoid blocking if an unprivileged user runs
> VACUUM
> FULL which would try to lock things (including shared catalogs) before
> checking
> if they have permission to vacuum them.  That commit also initially checks
> the
> owner of the partitioned table, and then re-checking owner of partitions
> later
> on.
>
> A similar issue exists here.  But 1) catalog tables are not partitioned,
> and,
> 2) ownership of a partitioned table is checked immediately.  So the
> problem can
> only occur if a user who owns a partitioned table but doesn't own all its
> partitions tries to cluster it, and it blocks behind another session.
> Fixing
> this is probably a good idea, but seems improbable that it would avoid a
> DOS.
>
> > 2. We should silently skip a partition that's a foreign table, I
> > suppose.
>
> I think it's not needed, since the loop over index children doesn't see a
> child
> index on the foreign table.  ?
>
> > 3. We do mark the index on the partitions as indisclustered AFAICS (we
> > claim that the partitioned table's index is not marked, which is
> > accurate).  So users doing unadorned CLUSTER afterwards will get the
> > partitions clustered too, once they cluster the partitioned table.  If
> > they don't want this, they would have to ALTER TABLE to remove the
> > marking.  How likely is that this will be a problem?  Maybe documenting
> > this point is enough.
>
> It seems at least as likely that someone would *want* the partitions to be
> marked clustered as that someone would want them to be unchanged.
>
> The cluster mark accurately reflects having been clustered.  It seems
> unlikely
> that a user would want something else to be clustered later by "cluster;".
> Since clustering on a partitioned table wasn't supported before, nothing
> weird
> will happen to someone who upgrades to v15 unless they elect to use the new
> feature.  As this seems to be POLA, it doesn't even need to be
> documented.  ?
>
Hi,
For v13-0002-cluster-early-ownership-check-of-partitions.patch :

only for it to fails ownership check anyway

to fails -> to fail

Cheers

Re: Frontend error logging style

2022-04-11 Thread Peter Eisentraut


On 08.04.22 22:26, Tom Lane wrote:

I wrote:

One other loose end is bothering me: I stuck with logging.h's
original choice to put "if (likely())" or "if (unlikely())"
conditionals into the macros, but I rather suspect that that's
just a waste.  I think we should put a centralized level check
into logging.c, and get rid of at least the "if (likely())"
checks, because those are going to succeed approximately 100.0%
of the time.  Maybe there's an argument for keeping the unlikely()
ones.


Concretely, something like the attached.  As a simple check,
I looked at the compiled size of pg_dump.  It went from

   textdata bss dec hex filename
  38029840081384  385690   5e29a /home/postgres/testversion/bin/pg_dump

to

textdata bss dec hex filename
  37495440081384  380346   5cdba src/bin/pg_dump/pg_dump

for a savings of about 5K or 1.5%.  Not a huge amount, but
not nothing either, especially considering that the existing
coding isn't buying us anything.


Yeah, that seems ok to change.  The previous coding style is more useful 
if you have a lot of debug messages in a hot code path, but that usually 
doesn't apply to where this is used.

Re: [COMMITTERS] pgsql: Allow time delayed standbys and recovery

2022-04-11 Thread Robert Haas

On Fri, Apr 8, 2022 at 11:10 AM Robert Haas  wrote:
> On Fri, Apr 8, 2022 at 10:45 AM Thom Brown  wrote:
> > Thanks. This doesn't include my self-correction:
> >
> > s/kept on standby/kept on the standby/
>
> Here is v2, endeavoring to rectify that oversight.

Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: make MaxBackends available in _PG_init

2022-04-11 Thread Robert Haas

On Sat, Apr 9, 2022 at 9:24 AM Julien Rouhaud  wrote:
> > FWIW I would be on board with reverting all the GetMaxBackends() stuff if
> > we made the value available in _PG_init() and stopped supporting GUC
> > overrides by extensions (e.g., ERROR-ing in SetConfigOption() when
> > process_shared_preload_libraries_in_progress is true).
>
> Yeah I would prefer this approach too, although it couldn't prevent extension
> from directly modifying the underlying variables so I don't know how effective
> it would be.

I think I also prefer this approach. I am willing to be convinced
that's the wrong idea, but right now I favor it.

> On the bright side, I see that citus is using SetConfigOption() to increase
> max_prepared_transactions [1].  That's the only extension mentioned in that
> thread that does modify some GUC, and this GUC was already marked as
> PGDLLIMPORT since 2017 so it probably wasn't done that way for Windows
> compatibility reason.

I don't quite understand this, but I think if we want to support this
kind of thing it needs a separate hook.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: CLUSTER on partitioned index

2022-04-11 Thread Justin Pryzby

On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote:
> Small things here.

> 1. in VACUUM FULL we only process partitions that are owned by the
> invoking user.  We don't have this test in the new code.  I'm not sure
> why do we do that there; is it worth doing the same here?

That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in CLUSTER
itself).  The reason was to avoid blocking if an unprivileged user runs VACUUM
FULL which would try to lock things (including shared catalogs) before checking
if they have permission to vacuum them.  That commit also initially checks the
owner of the partitioned table, and then re-checking owner of partitions later
on.

A similar issue exists here.  But 1) catalog tables are not partitioned, and,
2) ownership of a partitioned table is checked immediately.  So the problem can
only occur if a user who owns a partitioned table but doesn't own all its
partitions tries to cluster it, and it blocks behind another session.  Fixing
this is probably a good idea, but seems improbable that it would avoid a DOS.

> 2. We should silently skip a partition that's a foreign table, I
> suppose.

I think it's not needed, since the loop over index children doesn't see a child
index on the foreign table.  ?

> 3. We do mark the index on the partitions as indisclustered AFAICS (we
> claim that the partitioned table's index is not marked, which is
> accurate).  So users doing unadorned CLUSTER afterwards will get the
> partitions clustered too, once they cluster the partitioned table.  If
> they don't want this, they would have to ALTER TABLE to remove the
> marking.  How likely is that this will be a problem?  Maybe documenting
> this point is enough.

It seems at least as likely that someone would *want* the partitions to be
marked clustered as that someone would want them to be unchanged.

The cluster mark accurately reflects having been clustered.  It seems unlikely
that a user would want something else to be clustered later by "cluster;".
Since clustering on a partitioned table wasn't supported before, nothing weird
will happen to someone who upgrades to v15 unless they elect to use the new
feature.  As this seems to be POLA, it doesn't even need to be documented.  ?
>From 19c02209dfbcbf73494e1e6d3ca0db50f64dc5fd Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 24 Sep 2021 14:18:52 -0500
Subject: [PATCH v13 1/2] Recheck arg is not needed since 2011

It was added at: a4dde3bff36dac1ac0b699becad6f103d861a874
And not used since: 7e2f906201c8bb95f7fb17e56b8740c38bda5441
---
 src/backend/commands/cluster.c   | 6 +++---
 src/backend/commands/tablecmds.c | 2 +-
 src/include/commands/cluster.h   | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 322d6bb2f18..0f0a6e9f018 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -232,7 +232,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	if (rel != NULL)
 	{
 		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, true, AccessShareLock);
+		check_index_is_clusterable(rel, indexOid, AccessShareLock);
 		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
 
 		/* close relation, releasing lock on parent table */
@@ -434,7 +434,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
-		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
 
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
@@ -480,7 +480,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
  * protection here.
  */
 void
-check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMODE lockmode)
+check_index_is_clusterable(Relation OldHeap, Oid indexOid, LOCKMODE lockmode)
 {
 	Relation	OldIndex;
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 90edd0bb97d..1d7db41d172 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14055,7 +14055,7 @@ ATExecClusterOn(Relation rel, const char *indexName, LOCKMODE lockmode)
 		indexName, RelationGetRelationName(rel;
 
 	/* Check index is valid to cluster on */
-	check_index_is_clusterable(rel, indexOid, false, lockmode);
+	check_index_is_clusterable(rel, indexOid, lockmode);
 
 	/* And do the work */
 	mark_index_clustered(rel, indexOid, false);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 3c279f6210a..df8e73af409 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -34,7 +34,7 @@ typedef struct ClusterParams
 extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
 extern void cluster_rel(Oid

Re: Outdated copyright year in parse_jsontable.c

2022-04-11 Thread Tom Lane

Julien Rouhaud  writes:
> On Mon, Apr 11, 2022 at 03:58:01PM +0900, Michael Paquier wrote:
>> On Mon, Apr 11, 2022 at 02:08:38PM +0800, Julien Rouhaud wrote:
>>> I just noticed that parse_jsontable.c was added with a wrong copyright year,
>>> trivial patch attached.

>> Thanks, I'll go fix that in a bit.  I am spotting four more, as of
>> src/backend/replication/basebackup_*.c that point to 2020.

> Ah right, I now realize that my command found them too but it was hidden with
> the .po files and I missed them.

FTR, we already have a tool for this: src/tools/copyright.pl.
I ran it to check, and it found the same five files.

(I'm slightly annoyed at it for having touched the mod dates on
every file when it only needed to change five ... will go fix that.)

regards, tom lane

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-04-11 Thread Bharath Rupireddy

On Mon, Apr 11, 2022 at 4:21 PM Amit Kapila  wrote:
>
> On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy
>  wrote:
> >
>
> I am facing the below doc build failure on my machine due to this work:
>
> ./filelist.sgml:
> Tabs appear in SGML/XML files
> make: *** [check-tabs] Error 1
>
> The attached patch fixes this for me.

Thanks. It looks like there's a TAB in between. Your patch LGTM.

I'm wondering why this hasn't been caught in the build farm members
(or it may have been found but I'm missing to locate it.).

Can you please provide me with the doc build command to catch these
kinds of errors?

Regards,
Bharath Rupireddy.

Re: Support logical replication of DDLs

2022-04-11 Thread Euler Taveira

On Mon, Apr 11, 2022, at 2:00 AM, Amit Kapila wrote:
> On Thu, Apr 7, 2022 at 3:46 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 23, 2022 at 10:39 AM Japin Li  wrote:
> >
> > 2. For DDL replication, do we need to wait for a consistent point of
> > snapshot? For DMLs, that point is a convenient point to initialize
> > replication from, which is why we export a snapshot at that point,
> > which is used to read normal data. Do we have any similar needs for
> > DDL replication?
> >
> 
> I have thought a bit more about this and I think we need to build the
> snapshot for DML replication as we need to read catalog tables to
> decode the corresponding WAL but it is not clear to me if we have a
> similar requirement for DDL replication. If the catalog access is
> required then it makes sense to follow the current snapshot model,
> otherwise, we may need to think differently for DDL replication.
> 
> One more related point is that for DML replication, we do ensure that
> we copy the entire data of the table (via initial sync) which exists
> even before the publication for that table exists, so do we want to do
> something similar for DDLs? How do we sync the schema of the table
> before the user has defined the publication? Say the table has been
> created before the publication is defined and after that, there are
> only Alter statements, so do we expect, users to create the table on
> the subscriber and then we can replicate the Alter statements? And
> even if we do that it won't be clear which Alter statements will be
> replicated after publication is defined especially if those Alters
> happened concurrently with defining publications?
The *initial* DDL replication is a different problem than DDL replication. The
former requires a snapshot to read the current catalog data and build a CREATE
command as part of the subscription process. The subsequent DDLs in that object
will be handled by a different approach that is being discussed here.

I'm planning to work on the initial DDL replication. I'll open a new thread as
soon as I write a design for it. Just as an example, the pglogical approach is
to use pg_dump behind the scenes to provide the schema [1]. It is a reasonable
approach but an optimal solution should be an API to provide the initial DDL
commands. I mean the main point of this feature is to have an API to create an
object that the logical replication can use it for initial schema
synchronization. This "DDL to create an object" was already discussed in the
past [2].


[1] 
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_sync.c#L942
[2] https://www.postgresql.org/message-id/4E69156E.5060509%40dunslane.net


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}

2022-04-11 Thread gkokolatos

On Monday, April 11th, 2022 at 8:52 AM, Michael Paquier 
wrote:

> This is something I think we had better fix before beta1, because now
> we have binaries that use an inconsistent set of options. So,
> attached is a patch set aimed at rework this option set from the
> ground, taking advantage of the recent work done by Robert and others
> for pg_basebackup:

Agreed. It is rather inconsistent now.

> - 0001 is a simple rename of backup_compression.{c,h} to
> compression.{c,h}, removing anything related to base backups from
> that. One extra reason behind this renaming is that I would like to
> use this infra for pg_dump, but that's material for 16~.

I agree with the design. If you permit me a couple of nitpicks regarding naming.

+typedef enum pg_compress_algorithm
+{
+   PG_COMPRESSION_NONE,
+   PG_COMPRESSION_GZIP,
+   PG_COMPRESSION_LZ4,
+   PG_COMPRESSION_ZSTD
+} pg_compress_algorithm;

Elsewhere in the codebase, (e.g. create_table.sgml, alter_table.sgml,
brin_tuple.c, detoast.c, toast_compression.c, tupdesc.c, gist.c to mention a
few) variations of of the nomenclature "compression method" are used, like
'VARDATA_COMPRESSED_GET_COMPRESS_METHOD' or 'InvalidCompressionMethod' etc. I
feel that it would be nicer if we followed one naming rule for this and I
recommend to substitute algorithm for method throughout.

On a similar note, it would help readability to be able to distinguish at a
glance the type from the variable. Maybe uppercase or camelcase the type?

Last, even though it is not needed now, it will be helpful to have a
PG_COMPRESSION_INVALID in some scenarios. Though we can add it when we come to
it.

> - 0002 removes WalCompressionMethod, replacing it by
> pg_compress_algorithm as these are the same enums. Robert complained
> about the confusion that WalCompressionMethod could lead to as this
> could be used for the compression of data, and not only WAL. I have
> renamed some variables to be more consistent, while on it.

It looks good. If you choose to discard the comment regarding the use of
'method' over 'algorithm' from above, can you please use the full word in the
variable, e.g. 'wal_compress_algorithm' instead of 'wal_compress_algo'. I can
not really explain it, the later reads a bit rude. Then again that may be just
me.

> - 0003 is the actual option rework for pg_receivewal. This takes
> advantage of 0001, leading to the result of removing --compress-method
> and replacing it with --compress, taking care of the backward
> compatibility problems for an integer value, aka 0 implies no
> compression and val > 0 implies gzip. One bonus reason to switch to
> that is that this would make the addition of zstd for pg_receivewal
> easier in the future.

Looks good.

> I am going to add an open item for this stuff. Comments or thoughts?

I agree that it is better to not release pg_receivewal with the distinct set of
options.

Cheers,
//Georgios

> Thanks,
> --
> Michael

Re: [PATCH] Add extra statistics to explain for Nested Loop

2022-04-11 Thread Justin Pryzby

On Tue, Apr 05, 2022 at 05:14:09PM -0400, Greg Stark wrote:
> This is not passing regression tests due to some details of the plan
> output - marking Waiting on Author:

It's unstable due to parallel workers.
I'm not sure what the usual workarounds here.
Maybe set parallel_leader_participation=no for this test.

> diff -w -U3 c:/cirrus/src/test/regress/expected/partition_prune.out
> c:/cirrus/src/test/recovery/tmp_check/results/partition_prune.out
> --- c:/cirrus/src/test/regress/expected/partition_prune.out 2022-04-05
> 17:00:25.433576100 +
> +++ c:/cirrus/src/test/recovery/tmp_check/results/partition_prune.out
> 2022-04-05 17:18:30.092203500 +
> @@ -2251,10 +2251,7 @@
>   Workers Planned: 2
>   Workers Launched: N
>   ->  Parallel Seq Scan on public.lprt_b (actual rows=N loops=N)
> -   Loop Min Rows: N  Max Rows: N  Total Rows: N
> Output: lprt_b.b
> -   Worker 0:  actual rows=N loops=N
> -   Worker 1:  actual rows=N loops=N
> ->  Materialize (actual rows=N loops=N)
>   Loop Min Rows: N  Max Rows: N  Total Rows: N
>   Output: lprt_a.a
> @@ -2263,10 +2260,8 @@
> Workers Planned: 1
> Workers Launched: N
> ->  Parallel Seq Scan on public.lprt_a (actual rows=N loops=N)
> - Loop Min Rows: N  Max Rows: N  Total Rows: N
>   Output: lprt_a.a
> - Worker 0:  actual rows=N loops=N
> -(24 rows)
> +(19 rows)

Re: WIN32 pg_import_system_collations

2022-04-11 Thread Juan José Santamaría Flecha

On Tue, Mar 22, 2022 at 2:00 AM Andres Freund  wrote:

>
> Currently fails to apply, please rebase:
> http://cfbot.cputube.org/patch_37_3450.log
>
> Marked as waiting-on-author.
>
> Please, find attached a rebased version, no other significant change.

Regards,

Juan José Santamaría Flecha


v5-0001-WIN32-pg_import_system_collations.patch
Description: Binary data

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-04-11 Thread Amit Kapila

On Thu, Apr 7, 2022 at 3:35 PM Bharath Rupireddy
 wrote:
>

I am facing the below doc build failure on my machine due to this work:

./filelist.sgml:
Tabs appear in SGML/XML files
make: *** [check-tabs] Error 1

The attached patch fixes this for me.

-- 
With Regards,
Amit Kapila.


fix_tabs_1.patch
Description: Binary data

Re: API stability

2022-04-11 Thread Matthias van de Meent

On Mon, 11 Apr 2022 at 06:30, Kyotaro Horiguchi  wrote:
>
> (a bit off-topic)
>
> I'm not sure where I am..
>
> At Wed, 06 Apr 2022 10:36:30 +0900 (JST), Kyotaro Horiguchi 
>  wrote in
> me> > this if nobody else would like to do it, but let me ask whether
> me> > Kyotaro Horiguchi would like to propose a patch, since the original
> me> > patch did, and/or whether you would like to propose a patch, as the
> me> > person reporting the issue.
> me>
> me> I'd like to do that. Let me see.
>
> At Thu, 7 Apr 2022 10:04:20 -0400, Robert Haas  wrote 
> in
> > struct, which is what we now need to fix. Since I don't hear anyone
> > else volunteering to take care of that, I'll go work on it.
>
> Just confirmation. Is my message above didn't look like declaring that
> I'd like to volunteering?  If so, please teach me the correct way to
> say that, since I don't want to repeat the same mistake.  Or are there
> some other reasons?  (Sorry if this looks like a blame, but I asking
> plainly (really:).)

I won't speak for Robert H., but this might be because of gmail not
putting this mail in the right thread: Your mail client dropped the
"[was: pgsql: ...]" tag, which Gmail subsequently displays as a
different thread (that is, in my Gmail UI there's three "Re: API
stability" threads, one of which has the [was: pgsql: ...]-tag, and
two of which seem to be started by you as a reply on the original
thread, but with the [was: pgsql: ...]-tag dropped and thus considered
a new thread).

So, this might be the reason Robert overlooked your declaration to
volunteer: he was looking for volunteers in the thread "Re: API
Stability [was: pgsql: ...]" in the Gmail UI, which didn't show your
messages there because of the different subject line.

Kind regards,

Matthias van de Meent

Re: typos

2022-04-11 Thread Amit Kapila

On Mon, Apr 11, 2022 at 3:55 PM Masahiko Sawada  wrote:
>
> On Mon, Apr 11, 2022 at 7:10 PM Justin Pryzby  wrote:
> >
> > Amit or Masahiko may want to comment on 0012 (doc review: Add ALTER
> > SUBSCRIPTION ... SKIP).
>
> Thank you for the patch! I've looked at 0012 patch. Regarding the
> following part:
>
> pg_replication_origin_advance() function
> -   transaction.  Before using this function, the subscription needs
> to be disabled
> +   XXX? transaction.  Before using this function, the subscription
> needs to be disabled
> temporarily either by ALTER SUBSCRIPTION ...
> DISABLE or, the
>
> we can remove "transaction", it seems a typo.
>

Right.

> The rest looks good to me.
>

+1. I'll take care of pushing this one tomorrow unless we have more
comments on this part.

-- 
With Regards,
Amit Kapila.

Re: PG DOCS - logical replication filtering

2022-04-11 Thread Amit Kapila

On Mon, Apr 11, 2022 at 12:39 PM Peter Smith  wrote:
>
> On Fri, Apr 8, 2022 at 4:12 PM Peter Smith  wrote:
>
> OK. Added in v7 [1]
>

Thanks, this looks mostly good to me. I didn't like the new section
added for partitioned tables examples, so I removed it and added some
explanation of the tests. I have slightly changed a few other lines. I
am planning to commit the attached tomorrow unless there are more
comments.

-- 
With Regards,
Amit Kapila.

v8-0001-Add-additional-documentation-for-row-filters.patch
Description: Binary data

Re: typos

2022-04-11 Thread Masahiko Sawada

On Mon, Apr 11, 2022 at 7:10 PM Justin Pryzby  wrote:
>
> On Mon, Apr 11, 2022 at 04:39:30PM +1200, David Rowley wrote:
> > I'm not entirely certain this is an improvement.  Your commit message
> > I'd say is not true going by git grep "compression algorithm". There
> > are 3 matches in the docs and take [1], for example. I'd say in that
> > one it's better to use "algorithm".  In that case, "method" could be
> > talking about client or server.
>
> I am not wedded to this change; but, for context, I wrote this patch before
> basebackup supported multiple compression ...  "things".  I didn't touch
> basebackup here, since Robert defended that choice of words in another thread
> (starting at 20220320194050.gx28...@telsasoft.com).
>
> This change is for pg_column_compression(), and the only other use of
> "compression algorithm" in the docs is in pgcrypto (which is in contrib).  
> That
> the docs consistently use "method" suggests continuing to use that rather than
> something else.  It could be described in some central place (like if we
> support common syntax between interfaces which expose compression).
>
> > 0010:
> > I don't understand this change.
>
> The commit message mentions 959f6d6a1, which makes newbindir optional.  But 
> the
> documentation wasn't updated, and seems to indicate that it's still required.
> https://www.postgresql.org/docs/devel/pgupgrade.html
>
> > 0011:
> > I can't quite parse the original. I might not have enough context
> > here.  Robert, Joe? (new to master)
>
> See the link in the commit message where someone else reported the same
> problem.
>
> > 0019:
> > -1. pgindent will fix these.
>
> But two of those are from 2016.
>
> Thanks for amending and pushing those.  There's some more less obvious ones
> attached.
>
> Amit or Masahiko may want to comment on 0012 (doc review: Add ALTER
> SUBSCRIPTION ... SKIP).

Thank you for the patch! I've looked at 0012 patch. Regarding the
following part:

pg_replication_origin_advance() function
-   transaction.  Before using this function, the subscription needs
to be disabled
+   XXX? transaction.  Before using this function, the subscription
needs to be disabled
temporarily either by ALTER SUBSCRIPTION ...
DISABLE or, the

we can remove "transaction", it seems a typo. The rest looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: typos

2022-04-11 Thread Justin Pryzby

On Mon, Apr 11, 2022 at 04:39:30PM +1200, David Rowley wrote:
> I'm not entirely certain this is an improvement.  Your commit message
> I'd say is not true going by git grep "compression algorithm". There
> are 3 matches in the docs and take [1], for example. I'd say in that
> one it's better to use "algorithm".  In that case, "method" could be
> talking about client or server.

I am not wedded to this change; but, for context, I wrote this patch before
basebackup supported multiple compression ...  "things".  I didn't touch
basebackup here, since Robert defended that choice of words in another thread
(starting at 20220320194050.gx28...@telsasoft.com).

This change is for pg_column_compression(), and the only other use of
"compression algorithm" in the docs is in pgcrypto (which is in contrib).  That
the docs consistently use "method" suggests continuing to use that rather than
something else.  It could be described in some central place (like if we
support common syntax between interfaces which expose compression).

> 0010:
> I don't understand this change.

The commit message mentions 959f6d6a1, which makes newbindir optional.  But the
documentation wasn't updated, and seems to indicate that it's still required.
https://www.postgresql.org/docs/devel/pgupgrade.html

> 0011:
> I can't quite parse the original. I might not have enough context
> here.  Robert, Joe? (new to master)

See the link in the commit message where someone else reported the same
problem.

> 0019:
> -1. pgindent will fix these.

But two of those are from 2016.

Thanks for amending and pushing those.  There's some more less obvious ones
attached.

Amit or Masahiko may want to comment on 0012 (doc review: Add ALTER
SUBSCRIPTION ... SKIP).
>From 98778834b8c762a6cd76d8680191a7c866397d8b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 16 Feb 2022 21:07:53 -0600
Subject: [PATCH 01/13] doc: Remove 'synchronized' from --no-sync

Since it would be more accurate to say "unsynchronized".

The corresponding change was made for pgupgrade.sgml in commit 410aa248
---
 doc/src/sgml/ref/pg_rewind.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index e808239aa5b..3231f67845a 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -210,7 +210,7 @@ PostgreSQL documentation
 to be written safely to disk.  This option causes
 pg_rewind to return without waiting, which is
 faster, but means that a subsequent operating system crash can leave
-the synchronized data directory corrupt.  Generally, this option is
+the data directory corrupt.  Generally, this option is
 useful for testing but should not be used on a production
 installation.

-- 
2.17.1

>From d4bde4e9acf6375d8894af0a6d5a1557a29174b3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 19 Feb 2022 11:47:35 -0600
Subject: [PATCH 02/13] doc: pg_column_compression(): we say method not
 algorithm everywhere else

could backpatch to v14
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2ecf0482d84..a34eb8b43e0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29285,7 +29285,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset((pg_backup_stop()).lsn);
 text


-Shows the compression algorithm that was used to compress
+Shows the compression method that was used to compress
 an individual variable-length value. Returns NULL
 if the value is not compressed.

-- 
2.17.1

>From fb28ed4be7f82337a8e4aa92140bdafc78ee2278 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 20 Dec 2021 19:13:29 -0600
Subject: [PATCH 03/13] doc review: update synopsis: pg_upgrade optional
 newbindir

See also: 959f6d6a1821b7d9068244f500dd80953e768d16
---
 doc/src/sgml/ref/pgupgrade.sgml | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index f024c3ef259..8cda8d16d17 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -24,8 +24,7 @@ PostgreSQL documentation
pg_upgrade
-b
oldbindir
-   -B
-   newbindir
+   -B newbindir
-d
oldconfigdir
-D
-- 
2.17.1

>From cc2480bbdf667369d920459bea0d497e7b7255dc Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 7 Apr 2022 09:04:23 -0500
Subject: [PATCH 04/13] doc review: basebackup_to_shell.required_role

c6306db24bd913375f99494e38ab315befe44e11

See also:
https://www.postgresql.org/message-id/CA%2BTgmoaBQ5idAh7OsQGAbLY166SVkj7KkKROkTyN5sOF6QDuww%40mail.gmail.com
---
 doc/src/sgml/basebackup-to-shell.sgml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/basebackup-to-shell.sgml b/doc/src/sgml/basebackup-to-shell.sgml
index 9f44071d502..1

Re: A qsort template

2022-04-11 Thread John Naylor

On Mon, Apr 11, 2022 at 5:34 AM David Rowley  wrote:

> With this particular test, v15 is about 15% *slower* than v14.  I
> didn't know what to blame at first, so I tried commenting out the sort
> specialisations and got the results in the red bars in the graph. This
> made it about 7.5% *faster* than v14. So looks like this patch is to
> blame.  I then hacked the comparator function that's used in the
> specialisations for BIGINT to comment out the tiebreak to remove the
> indirect function call, which happens to do nothing in this 1 column
> sort case.  The aim here was to get an idea what the performance would
> be if there was a specialisation for single column sorts. That's the
> yellow bars, which show about 10% *faster* than master.

Thanks for investigating! (I assume you meant 10% faster than v14?)

On Mon, Apr 11, 2022 at 4:55 AM Peter Geoghegan  wrote:

> The B&M quicksort implementation that we adopted is generally
> extremely fast for that case, since it uses 3 way partitioning (based
> on the Dutch National Flag algorithm). This essentially makes sorting
> large groups of duplicates take only linear time (not linearithmic
> time).

In the below thread, I wondered if it still counts as extremely fast
nowadays. I hope to give an answer to that during next cycle. Relevant
to the open item, the paper linked there has a variety of
low-cardinality cases. I'll incorporate them in a round of tests soon.

https://www.postgresql.org/message-id/cafbsxshanjtsx9dnjppxjxwg3bu+yq6pnmsfpm0uvyuafdw...@mail.gmail.com

On Mon, Apr 11, 2022 at 4:44 AM Thomas Munro  wrote:

> Upthread we were discussing which variations it'd be worth investing
> extra text segment space on to gain speedup and we put those hard
> decisions off for future work, but on reflection, we probably should
> tackle this particular point to avoid a regression.  I think something
> like the attached achieves that (draft, not tested much yet, could
> perhaps find a tidier way to code the decision tree).  In short:
> variants qsort_tuple_{int32,signed,unsigned}() no longer fall back,
> but new variants qsort_tuple_{int32,signed,unsigned}_tiebreak() do.

Looks good at a glance, I will get some numbers after modifying my test scripts.

> We should perhaps also reconsider the other XXX comment about finding
> a way to skip the retest of column 1 in the tiebreak comparator.
> Perhaps you'd just install a different comparetup function, eg
> comparetup_index_btree_tail (which would sharing code), so no need to
> multiply specialisations for that.

If we need to add these cases to avoid regression, it makes sense to
make them work as well as we reasonably can.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: Outdated copyright year in parse_jsontable.c

2022-04-11 Thread Julien Rouhaud

On Mon, Apr 11, 2022 at 03:58:01PM +0900, Michael Paquier wrote:
> On Mon, Apr 11, 2022 at 02:08:38PM +0800, Julien Rouhaud wrote:
> > I just noticed that parse_jsontable.c was added with a wrong copyright year,
> > trivial patch attached.
> 
> Thanks, I'll go fix that in a bit.  I am spotting four more, as of
> src/backend/replication/basebackup_*.c that point to 2020.

Ah right, I now realize that my command found them too but it was hidden with
the .po files and I missed them.

Thanks!

Re: avoid multiple hard links to same WAL file after a crash

2022-04-11 Thread Kyotaro Horiguchi

At Thu, 7 Apr 2022 11:29:54 -0700, Nathan Bossart  
wrote in 
> The attached patch prevents this problem by using durable_rename() instead
> of durable_rename_excl() for WAL recycling.  This removes the protection
> against accidentally overwriting an existing WAL file, but there shouldn't
> be one.

>From another direction, if the new segment was the currently active
one, we just mustn't install it. Otherwise we don't care.

So, the only thing we need to care is segment switch. Without it, the
segment that InstallXLogFileSegment found by the stat loop is known to
be safe to overwrite even if exists.

When segment switch finds an existing file, it's no problem since the
segment switch doesn't create a new segment.  Otherwise segment switch
always calls InstallXLogFileSegment.  The section from searching for
an empty segmetn slot until calling durable_rename_excl() is protected
by ControlFileLock. Thus if a process is in the section, no other
process can switch to a newly-created segment.

If this diagnosis is correct, the comment is proved to be paranoid.


>* Perform the rename using link if available, paranoidly trying to 
> avoid
>* overwriting an existing file (there shouldn't be one).

As the result, I think Nathan's fix is correct that we can safely use
durable_rename() instead.

And I propose to use renameat2 on Linux so that we can detect the
contradicting case by the regression tests even though only on Linux.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: pg_get_publication_tables() output duplicate relid

2022-04-11 Thread houzj.f...@fujitsu.com

On Tuesday, December 14, 2021 3:42 PM houzj.f...@fujitsu.com 
 wrote:
> 
> On Sat, Nov 20, 2021 7:31 PM Amit Kapila  wrote:
> > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila 
> > wrote:
> > >
> > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote
> > > 
> > wrote:
> > > >
> > > > The problematic case is attaching the partition *after* the
> > > > subscriber has already marked the root parent as synced and/or
> > > > ready for replication.  Refreshing the subscription doesn't help
> > > > it discover the newly attached partition, because a
> > > > publish_via_partition_root only ever tells about the root parent,
> > > > which would be already synced, so the subscriber would think
> > > > there's nothing to copy.
> > > >
> > >
> > > Okay, I see this could be a problem but I haven't tried to reproduce it.
> >
> > One more thing you mentioned is that the initial sync won't work after
> > refresh but later changes will be replicated but I noticed that later
> > changes also don't get streamed till we restart the subscriber server.
> > I am not sure but we might not be invalidating apply workers cache due
> > to which it didn't notice the same.
> 
> I investigated this bug recently, and I think the reason is that when 
> receiving
> relcache invalidation message, the callback function[1] in walsender only 
> reset
> the schema sent status while it doesn't reset the replicate_valid flag. So, it
> won’t rebuild the publication actions of the relation.
> 
> [1]
> static void
> rel_sync_cache_relation_cb(Datum arg, Oid relid) ...
>   /*
>* Reset schema sent status as the relation definition may have
> changed.
>* Also free any objects that depended on the earlier definition.
>*/
>   if (entry != NULL)
>   {
>   entry->schema_sent = false;
>   list_free(entry->streamed_txns);
> ...
> 
> Also, when you DETACH a partition, the publication won’t be rebuilt too
> because of the same reason. Which could cause unexpected behavior if we
> modify the detached table's data . And the bug happens regardless of whether
> pubviaroot is set or not.
> 
> For the fix:
> 
> I think if we also reset replicate_valid flag in rel_sync_cache_relation_cb, 
> then
> the bug can be fixed. I have a bit hesitation about this approach, because it
> could increase the frequency of invalidating and rebuilding the publication
> action. But I haven't produced some other better approaches.
> 

I have confirmed that the bug of ATTACH PARTITION has been fixed due to recent
commit 7f481b8. Currently, we always invalidate the RelationSyncCache when
attaching a partition, so the pubactions of the newly attached partition will
be rebuilt correctly.

Best regards,
Hou zj

Re: typos

2022-04-11 Thread David Rowley

On Mon, 11 Apr 2022 at 16:39, David Rowley  wrote:
> I will start pushing the less controversial of these, after a bit of 
> squashing.

I just committed 3 separate commits for the following:

Committed: 0001 + 0003 + 0004 + 0006 + 0007 (modified) + 0008 + 0009 +
0012 (doc parts)
Committed: 0012 (remainder) + 0013 + 0014 + 0018
Committed: 0015

I skipped:
0002 (skipped as we should backpatch)
0005 (unsure if the proposed patch is better)
0010 (I can't follow this change)
0011 (Could do with input from Robert and Joe)

and also skipped:
0016 (unsure if we should change these of pgindent is not touching it)
0017 (unsure if we should change these of pgindent is not touching it)
0019 (pgindent will get these when the time comes)

I'll wait for feedback on the ones I didn't use.

Are you able to rebase the remainder? Probably with the exception of 0019.

Thanks for finding all these!

David

Re: 回复：POC: Cleaning up orphaned files using undo logs

2022-04-11 Thread Antonin Houska

孔凡深(云梳)  wrote:

> Hi, Antonin. I am more interested in zheap. Recently reviewing the patch you 
> submitted.
> When I use pg_undodump-tool to dump the undo page chunk, I found that some 
> chunk header is abnormal. 
> After reading the relevant codes in 
> 0006-The-initial-implementation-of-the-pg_undodump-tool.patch, 
> I feel that there is a bug  in the function parse_undo_page.

Thanks, I'll take a look if the project happens to continue. Currently it
seems that another approach is more likely to be taken:

https://www.postgresql.org/message-id/CA%2BTgmoa_VNzG4ZouZyQQ9h%3DoRiy%3DZQV5%2BxHQXxMWmep4Ygg8Dg%40mail.gmail.com

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

PostgreSQL Program Application

2022-04-11 Thread Qiongwen Liu

To Whom It May Concern,

Hi, I am Qiongwen Liu, and I am studying at the University of California,
Berkeley as an economics and computer science major. I am interested in the
Develop Performance Farm Benchmarks and Website of PostgreSQL in
this program, attached below is my application. Please check it out. Thank
you.

Sincerely,
Qiongwen


The PostgreSQL  (1).pdf
Description: Adobe PDF document

Re: Temporary file access API

2022-04-11 Thread Antonin Houska

Robert Haas  wrote:

> On Fri, Apr 8, 2022 at 4:54 AM Antonin Houska  wrote:
> > Do you think that the use of a system call is a problem itself (e.g. because
> > the code looks less simple if read/write is used somewhere and fread/fwrite
> > elsewhere; of course of read/write is mandatory in special cases like WAL,
> > heap pages, etc.)  or is the problem that the system calls are used too
> > frequently? I suppose only the latter.
> 
> I'm not really super-interested in reducing the number of system
> calls. It's not a dumb thing in which to be interested and I know that
> for example Thomas Munro is very interested in it and has done a bunch
> of work in that direction just to improve performance. But for me the
> attraction of this is mostly whether it gets us closer to TDE.
> 
> And that's why I'm asking these questions about adopting it in
> different places. I kind of thought that your previous patches needed
> to encrypt, I don't know, 10 or 20 different kinds of files. So I was
> surprised to see this patch touching the handling of only 2 kinds of
> files. If we consolidate the handling of let's say 15 of 20 cases into
> a single mechanism, we've really moved the needle in the right
> direction -- but consolidating the handling of 2 of 20 cases, or
> whatever the real numbers are, isn't very exciting.

There are't really that many kinds of files to encrypt:

https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data

(And pg_stat/* files should be removed from the list.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: simplifying foreign key/RI checks

2022-04-11 Thread Amit Langote

On Thu, Apr 7, 2022 at 10:05 AM Amit Langote  wrote:
> There were rebase conflicts with the recently committed
> execPartition.c/h changes.  While fixing them, I thought maybe
> find_leaf_part_for_key() doesn't quite match in style with its
> neighbors in execPartition.h, so changed it to
> ExecGetLeafPartitionForKey().

This one has been marked Returned with Feedback in the CF app, which
makes sense given the discussion on -committers [1].

Agree with the feedback given that it would be better to address *all*
RI trigger check/action functions in the project of sidestepping SPI
when doing those checks/actions, not only RI_FKey_check_ins / upd() as
the current patch does.  I guess that will require thinking a little
bit harder about how to modularize the new implementation so that the
various trigger functions don't end up with their own bespoke
check/action implementations.

I'll think about that, also consider what Corey proposed in [2], and
try to reformulate this for v16.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] 
https://www.postgresql.org/message-id/flat/E1ncXX2-000mFt-Pe%40gemulon.postgresql.org
[2] 
https://www.postgresql.org/message-id/flat/CADkLM%3DeZJddpx6RDop-oCrQ%2BJ9R-wfbf6MoLxUUGjbpwTkoUXQ%40mail.gmail.com

Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

2022-04-11 Thread Michael Paquier

On Mon, Apr 11, 2022 at 06:48:58PM +1200, Thomas Munro wrote:
> Sorry for the delay... I got a bit confused about the different things
> going on in this thread but I hope I've got it now:
> 
> 1.  This test had some pre-existing bugs/races, which hadn't failed
> before due to scheduling, even under Valgrind.  The above changes
> appear to fix those problems.  To Michael for comment.

I have seen the thread, and there is a lot in it.  I will try to look
tomorrow at the parts I got involved in.
--
Michael


signature.asc
Description: PGP signature

RE: Logical replication timeout problem

2022-04-11 Thread wangw.f...@fujitsu.com

On Mon, Apr 11, 2022 at 2:39 PM I wrote:
> Attach the new patch.
Also, share test results and details.

To check that the lsn information used for the calculation is what we expected,
I get some information by adding logs in the function LagTrackerRead.

Summary of test results:
- In current HEAD and current HEAD with v14 patch, we could found the
  information of same lsn as received from subscriber-side in lag_tracker.
- In current HEAD with v13 patch, we could hardly found the information of same
  lsn in lag_tracker.

Attach the details:
[The log by HEAD]
the lsn we received from subscriber  |  the lsn whose time we used to calculate 
in lag_tracker
382826584|  382826584
743884840|  743884840
1104943232   |  1104943232
1468949424   |  1468949424
1469521216   |  1469521216

[The log by HEAD with v14 patch]
the lsn we received from subscriber  |  the lsn whose time we used to calculate 
in lag_tracker
382826584|  382826584
743890672|  743890672
1105074264   |  1105074264
1469127040   |  1469127040
1830591240   |  1830591240

[The log by HEAD with v13 patch]
the lsn we received from subscriber  |  the lsn whose time we used to calculate 
in lag_tracker
382826584|  359848728 
743884840|  713808560 
1105010640   |  1073978544
1468517536   |  1447850160
1469516328   |  1469516328

Regards,
Wang wei

Re: PG DOCS - logical replication filtering

2022-04-11 Thread Peter Smith

On Mon, Apr 11, 2022 at 1:27 PM Amit Kapila  wrote:
>
> On Fri, Apr 8, 2022 at 11:42 AM Peter Smith  wrote:
> >
> > On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila  wrote:
> > >
> >
> > > 3.
> > > +   
> > > +Whenever an UPDATE is processed, the row filter
> > > +expression is evaluated for both the old and new row (i.e. before
> > > +and after the data is updated).
> > > +   
> > > +
> > > +   
> > > +If both evaluations are true, it replicates the
> > > +UPDATE change.
> > > +   
> > > +
> > > +   
> > > +If both evaluations are false, it doesn't 
> > > replicate
> > > +the change.
> > > +   
> > >
> > > I think we can combine these three separate paragraphs.
> >
> > The first sentence is the explanation, then there are the 3 separate
> > “If …” cases mentioned. It doesn’t seem quite right to group just the
> > first 2 “if…” cases into one paragraph, while leaving the 3rd one
> > separated. OTOH combining everything into one big paragraph seems
> > worse. Please confirm if you still want this changed.
> >
>
> Yeah, I think we should have something like what Euler's version had
> and maybe keep a summary section from the current patch.
>

Modified in v7 [1]. Removed the bullets. Structured the text
paragraphs the same way that Euler had it. The summary is kept as-is.


> > > 5.
> > > +   
> > > +
> > > + Publication publish operations are ignored
> > > when copying pre-existing table data.
> > > +
> > > +   
> > > +
> > > +   
> > > +
> > > + If the subscriber is in a release prior to 15, copy pre-existing 
> > > data
> > > + doesn't use row filters even if they are defined in the publication.
> > > + This is because old releases can only copy the entire table data.
> > > +
> > > +   
> > >
> > > I don't see the need for the first  here, the second one seems
> > > to convey it.
> >
> > Well, the 2nd note is only about compatibility with older versions
> > doing the subscribe. But the 1st note is not version-specific at all.
> > It is saying that the COPY does not take the “publish” option into
> > account. If you know of some other docs already mentioning this subtle
> > behaviour of the COPY then I can remove this note and just
> > cross-reference to the other place. But I did not know anywhere this
> > is already mentioned, so that is why I wrote the note about it.
> >
>
> I don't see the need to say about general initial sync (COPY) behavior
> here. It is already defined at [1]. If we want to enhance, we can do
> that as a separate patch to make changes where the initial sync is
> explained. I am not sure that is required though.
>

Did you miss providing the link URL? Anyway, I removed the note in v7
[1]. This information can be done as a separate patch one day (or not
at all).

> I feel in the initial "Row Filters" and "Row Filter Rules" sections,
> we don't need to have separate paragraphs. I think the same is pointed
> out by Alvaro as well.
>

Modified in v7 [1] those sections as suggested. I also assumed these
were the same sections that Alvaro was referring to.

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPt1X%3D3VaWRbx3yHByEMC-GPh4oeeMeJKJeTmOELDxZJHQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: PG DOCS - logical replication filtering

2022-04-11 Thread Peter Smith

On Fri, Apr 8, 2022 at 4:12 PM Peter Smith  wrote:
>
> On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila  wrote:
> >
> > On Thu, Mar 24, 2022 at 11:48 AM Peter Smith  wrote:
> > >
> >
> > Review comments:
> > ===
[snip]
>
> > 9. I suggest adding an example for partition tables showing how the
> > row filter is used based on the 'publish_via_partition_root'
> > parameter.
>
> OK - I am working on this and will add it in a future patch version.
>

OK. Added in v7 [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPt1X%3D3VaWRbx3yHByEMC-GPh4oeeMeJKJeTmOELDxZJHQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: REINDEX blocks virtually any queries but some prepared queries.

2022-04-11 Thread Frédéric Yhuel





On 4/11/22 02:57, Michael Paquier wrote:

On Fri, Apr 08, 2022 at 04:23:48PM +0200, Frédéric Yhuel wrote:

Thank you Michael.


And done as of 8ac700a.
--


Thank you Micheal!

For reference purposes, we can see in the code of get_relation_info(), 
in plancat.c, that indeed every index of the table are opened, and 
therefore locked, regardless of the query.


Best regards,
Frédéric

Re: PG DOCS - logical replication filtering

2022-04-11 Thread Peter Smith

PSA patch v7 which addresses all the remaining review comments (from
Amit [1a][1b], and from Alvaro [2]).

--
[1a] 
https://www.postgresql.org/message-id/CAHut%2BPvDFWGUOBugYMtcXhAiViZu%2BQ6P-kxw2%2BU83VOGx0Osdg%40mail.gmail.com
[1b] 
https://www.postgresql.org/message-id/CAA4eK1JPyVoc1dUjeqbPd9D0_uYxWyyx-8fcsrgiZ5Tpr9OAuw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/202204091045.v2ei4yupxqso%40alvherre.pgsql

Kind Regards,
Peter Smith.
Fujitsu Australia


v7-0001-PG-DOCS-page-for-row-filters.patch
Description: Binary data

95 matches

Mail list logo