date:20220114

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-14 Thread Bharath Rupireddy

On Sat, Jan 15, 2022 at 12:46 AM Bossart, Nathan  wrote:
>
> On 1/14/22, 3:43 AM, "Maxim Orlov"  wrote:
> > The code seems to be in good condition. All the tests are running ok with 
> > no errors.
>
> Thanks for your review.
>
> > I like the whole idea of shifting additional checkpointer jobs as much as 
> > possible to another worker. In my view, it is more appropriate to call this 
> > worker "bg cleaner" or "bg file cleaner" or smth.

I personally prefer "background cleaner" as the new process name in
line with "background writer" and "background worker".

> > It could be useful for systems with high load, which may deal with deleting 
> > many files at once, but I'm not sure about "small" installations. Extra bg 
> > worker need more resources to do occasional deletion of small amounts of 
> > files. I really do not know how to do it better, maybe to have two 
> > different code paths switched by GUC?
>
> I'd personally like to avoid creating two code paths for the same
> thing.  Are there really cases when this one extra auxiliary process
> would be too many?  And if so, how would a user know when to adjust
> this GUC?  I understand the point that we should introduce new
> processes sparingly to avoid burdening low-end systems, but I don't
> think we should be afraid to add new ones when it is needed.

IMO, having a GUC for enabling/disabling this new worker and it's
related code would be a better idea. The reason is that if the
postgres has no replication slots at all(which is quite possible in
real stand-alone production environments) or if the file enumeration
(directory traversal and file removal) is fast enough on the servers,
there's no point having this new worker, the checkpointer itself can
take care of the work as it is doing today.

> That being said, if making the extra worker optional addresses the
> concerns about resource usage, maybe we should consider it.  Justin
> suggested using something like max_parallel_maintenance_workers
> upthread [0].

I don't think having this new process is built as part of
max_parallel_maintenance_workers, instead I prefer to have it as an
auxiliary process much like "background writer", "wal writer" and so
on.

I think now it's the time for us to run some use cases and get the
perf reports to see how beneficial this new process is going to be, in
terms of improving the checkpoint timings.

> > Should we also think about adding WAL preallocation into custodian worker 
> > from the patch "Pre-alocationg WAL files" [1] ?
>
> This was brought up in the pre-allocation thread [1].  I don't think
> the custodian process would be the right place for it, and I'm also
> not as concerned about it because it will generally be a small, fixed,
> and configurable amount of work.  In any case, I don't sense a ton of
> support for a new auxiliary process in this thread, so I'm hesitant to
> go down the same path for pre-allocation.
>
> [0] https://postgr.es/m/20211213171935.GX17618%40telsasoft.com
> [1] https://postgr.es/m/B2ACCC5A-F9F2-41D9-AC3B-251362A0A254%40amazon.com

I think the idea of weaving every non-critical task to a common
background process is a good idea but let's not mix up with the new
background cleaner process here for now, at least until we get some
numbers and prove that the idea proposed here will be beneficial.

Regards,
Bharath Rupireddy.

Re: MultiXact\SLRU buffers configuration

2022-01-14 Thread Andrey Borodin



> 15 янв. 2022 г., в 03:20, Shawn Debnath  написал(а):
> 
> On Fri, Jan 14, 2022 at 05:28:38PM +0800, Julien Rouhaud wrote:
>>> PFA rebase of the patchset. Also I've added a patch to combine 
>>> page_number, page_status, and page_dirty together to touch less 
>>> cachelines.
>> 
>> The cfbot reports some errors on the latest version of the patch:
>> 
>> https://cirrus-ci.com/task/6121317215764480
>> [...]
>> Could you send a new version?  In the meantime I will switch the patch 
>> status
>> to Waiting on Author.
>> 
> 
> I was planning on running a set of stress tests on these patches. Could 
> we confirm which ones we plan to include in the commitfest?

Many thanks for your interest. Here's the  latest version.

Best regards, Andrey Borodin.


v19-0003-Pack-SLRU-page_number-page_status-and-page_dirty.patch
Description: Binary data


v19-0002-Divide-SLRU-buffers-into-8-associative-banks.patch
Description: Binary data


v19-0001-Make-all-SLRU-buffer-sizes-configurable.patch
Description: Binary data

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-14 Thread Justin Pryzby

I tried to pg_upgrade from a v13 instance like:
time make check -C src/bin/pg_upgrade oldsrc=`pwd`/13 
oldbindir=`pwd`/13/tmp_install/usr/local/pgsql/bin

I had compiled and installed v13 into `pwd`/13.

First, test.sh failed, because of an option in initdb which doesn't exist in
the old version: -x 210

I patched test.sh so the option is used only the "new" version.

The tab_core_types table has an XID column, so pg_upgrade --check complains and
refuses to run.  If I drop it, then pg_upgrade runs, but then fails like this:

|Files /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/new_xids.txt and 
/home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/old_xids.txt differ
|See /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/xids.diff
|
|--- /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/new_xids.txt   
 2022-01-15 00:14:23.035294414 -0600
|+++ /home/pryzbyj/src/postgres/src/bin/pg_upgrade/tmp_check/old_xids.txt   
 2022-01-15 00:13:59.634945012 -0600
|@@ -1,5 +1,5 @@
|  relfrozenxid | relminmxid 
| --+
|-3 |  3
|+15594 |  3
| (1 row)

Also, the patch needs to be rebased over Peter's vacuum changes.

Here's the changes I used for my test:

diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index c6361e3c085..5eae42192b6 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -24,7 +24,8 @@ standard_initdb() {
# without increasing test runtime, run these tests with a custom 
setting.
# Also, specify "-A trust" explicitly to suppress initdb's warning.
# --allow-group-access and --wal-segsize have been added in v11.
-   "$1" -N --wal-segsize 1 --allow-group-access -A trust -x 210
+   "$@" -N --wal-segsize 1 --allow-group-access -A trust
+
if [ -n "$TEMP_CONFIG" -a -r "$TEMP_CONFIG" ]
then
cat "$TEMP_CONFIG" >> "$PGDATA/postgresql.conf"
@@ -237,7 +238,7 @@ fi
 
 PGDATA="$BASE_PGDATA"
 
-standard_initdb 'initdb'
+standard_initdb 'initdb' -x 210
 
 pg_upgrade $PG_UPGRADE_OPTS --no-sync -d "${PGDATA}.old" -D "$PGDATA" -b 
"$oldbindir" -p "$PGPORT" -P "$PGPORT"
 
diff --git a/src/bin/pg_upgrade/upgrade_adapt.sql 
b/src/bin/pg_upgrade/upgrade_adapt.sql
index 27c4c7fd011..c5ce8bc95b2 100644
--- a/src/bin/pg_upgrade/upgrade_adapt.sql
+++ b/src/bin/pg_upgrade/upgrade_adapt.sql
@@ -89,3 +89,5 @@ DROP OPERATOR public.#%# (pg_catalog.int8, NONE);
 DROP OPERATOR public.!=- (pg_catalog.int8, NONE);
 DROP OPERATOR public.#@%# (pg_catalog.int8, NONE);
 \endif
+
+DROP TABLE IF EXISTS tab_core_types;

Re: Proposal: More structured logging

2022-01-14 Thread Julien Rouhaud

Hi,

On Tue, Jan 11, 2022 at 11:05:26AM +0100, Ronan Dunklau wrote:
> 
> Done, and I added anoher commit per your suggestion to add this comment.

The cfbot reports that the patchset doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3293.log
=== Applying patches on top of PostgreSQL commit ID 
43c2175121c829c8591fc5117b725f1f22bfb670 ===
[...]
=== applying patch ./v4-0003-Output-error-tags-in-CSV-logs.patch
[...]
patching file src/backend/utils/error/elog.c
Hunk #2 FAILED at 3014.
1 out of 2 hunks FAILED -- saving rejects to file 
src/backend/utils/error/elog.c.rej

Could you send a rebased version?  In the meantime I will switch the cf entry
to Waiting on Author.

Re: TAP test to cover "EndOfLogTLI != replayTLI" case

2022-01-14 Thread Julien Rouhaud

Hi,

On Mon, Jan 10, 2022 at 09:46:23AM +0530, Amul Sul wrote:
> 
> Thanks for the note, I can see the same test is failing on my centos
> vm too with the latest master head(376ce3e404b).  The failing reason is
> the "recovery_target_inclusive = off" setting which is unnecessary for
> this test, the attached patch removing the same.

This version still fails on windows according to the cfbot:

https://cirrus-ci.com/task/5882621321281536

[18:14:02.639] c:\cirrus>call perl src/tools/msvc/vcregress.pl recoverycheck
[18:14:56.114]
[18:14:56.122] #   Failed test 'check standby content before timeline switch 
0/500FB30'
[18:14:56.122] #   at t/003_recovery_targets.pl line 234.
[18:14:56.122] #  got: '6000'
[18:14:56.122] # expected: '7000'

I'm switching the cf entry to Waiting on Author.

Re: SQL:2011 application time

2022-01-14 Thread Julien Rouhaud

Hi,

On Sat, Nov 20, 2021 at 05:51:16PM -0800, Paul A Jungwirth wrote:
> 
> Here are updated patches. They are rebased and clean up some of my TODOs.

The cfbot reports that the patchset doesn't apply anymore:
http://cfbot.cputube.org/patch_36_2048.log
=== Applying patches on top of PostgreSQL commit ID 
5513dc6a304d8bda114004a3b906cc6fde5d6274 ===
=== applying patch ./v10-0001-Add-PERIODs.patch
patching file src/backend/commands/tablecmds.c
Hunk #1 FAILED at 40.
[...]
1 out of 21 hunks FAILED -- saving rejects to file 
src/backend/commands/tablecmds.c.rej
patching file src/bin/pg_dump/pg_dump.c
Hunk #1 succeeded at 5906 with fuzz 2 (offset -454 lines).
Hunk #2 FAILED at 6425.
Hunk #3 succeeded at 6121 with fuzz 2 (offset -566 lines).
Hunk #4 succeeded at 6203 (offset -561 lines).
Hunk #5 succeeded at 8015 with fuzz 2 (offset -539 lines).
Hunk #6 FAILED at 8862.
Hunk #7 FAILED at 8875.
Hunk #8 FAILED at 8917.
[...]
4 out of 15 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej
patching file src/bin/pg_dump/pg_dump.h
Hunk #2 FAILED at 284.
Hunk #3 FAILED at 329.
Hunk #4 succeeded at 484 (offset 15 lines).
2 out of 4 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.h.rej

I also see that there were multiple reviews with unanswered comments, so I will
switch the cf entry to Waiting on Author.

Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-01-14 Thread Julien Rouhaud

Hi,

On Tue, Jan 11, 2022 at 04:14:25PM +0900, Michael Paquier wrote:
> The CF bot is unhappy, so here is a rebase, with some typo fixes
> reported by Justin offlist.

The cfbot still complains about this patch on Windows:
https://cirrus-ci.com/task/6411385683836928
https://api.cirrus-ci.com/v1/artifact/task/6411385683836928/tap/src/bin/pg_upgrade/tmp_check/log/regress_log_002_pg_upgrade

# Running: pg_upgrade --no-sync -d 
c:/cirrus/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_old_node_data/pgdata -D 
c:/cirrus/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_new_node_data/pgdata -b 
C:/cirrus/tmp_install/bin -B C:/cirrus/tmp_install/bin -p 56296 -P 56297

libpq environment variable PGHOST has a non-local server value: 
C:/Users/ContainerAdministrator/AppData/Local/Temp/FhBIlsw6SV
Failure, exiting
not ok 3 - run of pg_upgrade for new instance

#   Failed test 'run of pg_upgrade for new instance'
#   at t/002_pg_upgrade.pl line 255.
### Starting node "new_node"
# Running: pg_ctl -w -D 
c:/cirrus/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_new_node_data/pgdata -l 
c:/cirrus/src/bin/pg_upgrade/tmp_check/log/002_pg_upgrade_new_node.log -o 
--cluster-name=new_node start
waiting for server to start done
server started
# Postmaster PID for node "new_node" is 5748
# Running: pg_dumpall --no-sync -d port=56297 
host=C:/Users/ContainerAdministrator/AppData/Local/Temp/FhBIlsw6SV 
dbname='postgres' -f 
C:\cirrus\src\bin\pg_upgrade\tmp_check\tmp_test_X4aZ/dump2.sql
# Running: diff -q 
C:\cirrus\src\bin\pg_upgrade\tmp_check\tmp_test_X4aZ/dump1.sql 
C:\cirrus\src\bin\pg_upgrade\tmp_check\tmp_test_X4aZ/dump2.sql
Files C:\cirrus\src\bin\pg_upgrade\tmp_check\tmp_test_X4aZ/dump1.sql and 
C:\cirrus\src\bin\pg_upgrade\tmp_check\tmp_test_X4aZ/dump2.sql differ
not ok 4 - old and new dump match after pg_upgrade

#   Failed test 'old and new dump match after pg_upgrade'

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-14 Thread Julien Rouhaud

Hi,

On Fri, Dec 17, 2021 at 01:03:06PM +0530, Shruthi Gowda wrote:
> 
> I have updated the DBOID preserve patch to handle this case and
> generated the latest patch on top of your v7-001-preserve-relfilenode
> patch.

The cfbot reports that the patch doesn't apply anymore:
http://cfbot.cputube.org/patch_36_3296.log
=== Applying patches on top of PostgreSQL commit ID 
5513dc6a304d8bda114004a3b906cc6fde5d6274 ===
=== applying patch ./v7-0002-Preserve-database-OIDs-in-pg_upgrade.patch
[...]
patching file src/bin/pg_upgrade/info.c
Hunk #1 FAILED at 190.
Hunk #2 succeeded at 351 (offset 27 lines).
1 out of 2 hunks FAILED -- saving rejects to file src/bin/pg_upgrade/info.c.rej
patching file src/bin/pg_upgrade/pg_upgrade.h
Hunk #1 FAILED at 145.
1 out of 1 hunk FAILED -- saving rejects to file 
src/bin/pg_upgrade/pg_upgrade.h.rej
patching file src/bin/pg_upgrade/relfilenode.c
Hunk #1 FAILED at 193.
1 out of 1 hunk FAILED -- saving rejects to file 
src/bin/pg_upgrade/relfilenode.c.rej

Could you send a rebased version?  In the meantime I willl switch the cf entry
to Waiting on Author.

Re: Add sub-transaction overflow status in pg_stat_activity

2022-01-14 Thread Julien Rouhaud

On Sat, Jan 15, 2022 at 12:13:39AM -0500, Tom Lane wrote:
> 
> The discussion just upthread was envisioning not only that we'd add
> infrastructure for this, but then that other projects would build
> more infrastructure on top of that.  That's an awful lot of work
> that will become useless --- indeed maybe counterproductive --- once
> we find an actual fix.  I say "counterproductive" because I wonder
> what compatibility problems we'd have if the eventual fix results in
> fundamental changes in the way things work in this area.

I'm not sure what you're referring to.  If that's the hackish extension I
mentioned, its goal was to provide exactly what this thread is about so I
wasn't advocating for additional tooling.  If that's about pgBagder, no extra
work would be needed: there's already a report about any WARNING/ERROR and such
found in the logs so the information would be immediately visible.

> Since it's worked the same way for a lot of years, I'm not impressed
> by arguments that we need to push something into v15.

Well, people have also been struggling with it for a lot of years, even if they
don't always come here to complain about it.  And apparently at least 2 people
already had to code something similar to be able to find the problematic
transactions, so I still think that at least some monitoring improvement would
be welcome in v15 if none of the other approach get committed.

Re: Add sub-transaction overflow status in pg_stat_activity

2022-01-14 Thread Tom Lane

Julien Rouhaud  writes:
> On Fri, Jan 14, 2022 at 07:42:29PM +, Bossart, Nathan wrote:
>> On 1/14/22, 8:26 AM, "Tom Lane"  wrote:
>>> It feels to me like far too much effort is being invested in fundamentally
>>> the wrong direction here.

> Agreed, it would be better but if that leads to significant work that doesn't
> land in pg15, it would be nice to at least get more monitoring possibilities
> in pg15 to help locate problems in application.

The discussion just upthread was envisioning not only that we'd add
infrastructure for this, but then that other projects would build
more infrastructure on top of that.  That's an awful lot of work
that will become useless --- indeed maybe counterproductive --- once
we find an actual fix.  I say "counterproductive" because I wonder
what compatibility problems we'd have if the eventual fix results in
fundamental changes in the way things work in this area.

Since it's worked the same way for a lot of years, I'm not impressed
by arguments that we need to push something into v15.

>> An easy first step might be to increase PGPROC_MAX_CACHED_SUBXIDS and
>> NUM_SUBTRANS_BUFFERS.

I don't think that's an avenue to a fix.  We need some more-fundamental
rethinking about how this should work.  (No, I don't have any ideas
at the moment.)

> There's also something proposed at
> https://www.postgresql.org/message-id/003201d79d7b$189141f0$49b3c5d0$@tju.edu.cn,
> which seems to reach some nice improvement without a major redesign of the
> subtransaction system, but I now realize that apparently no one added it to 
> the
> commitfest :(

Hmm ... that could win if we're looking up the same subtransaction's
parent over and over, but I wonder if it wouldn't degrade a lot of
workloads too.

regards, tom lane

Re: sequences vs. synchronous replication

2022-01-14 Thread Fujii Masao





On 2022/01/12 1:07, Tomas Vondra wrote:

I explored the idea of using page LSN a bit


Many thanks!



The patch from 22/12 simply checks if the change should/would wait for
sync replica, and if yes it WAL-logs the sequence increment. There's a
couple problems with this, unfortunately:


Yes, you're right.



So I think this approach is not really an improvement over WAL-logging
every increment. But there's a better way, I think - we don't need to
generate WAL, we just need to ensure we wait for it to be flushed at
transaction end in RecordTransactionCommit().

That is, instead of generating more WAL, simply update XactLastRecEnd
and then ensure RecordTransactionCommit flushes/waits etc. Attached is a
patch doing that - the changes in sequence.c are trivial, changes in
RecordTransactionCommit simply ensure we flush/wait even without XID
(this actually raises some additional questions that I'll discuss in a
separate message in this thread).


This approach (and also my previous proposal) seems to assume that the value 
returned from nextval() should not be used until the transaction executing that 
nextval() has been committed successfully. But I'm not sure how many 
applications follow this assumption. Some application might use the return 
value of nextval() instantly before issuing commit command. Some might use the 
return value of nextval() executed in rollbacked transaction.

If we want to avoid duplicate sequence value even in those cases, ISTM that the 
transaction needs to wait for WAL flush and sync rep before nextval() returns 
the value. Of course, this might cause other issues like performance decrease, 
though.



On btrfs, it looks like this (the numbers next to nextval are the cache
size, with 1 being the default):

   client  test master   log-all  page-lsn   log-all  page-lsn
   ---
1  insert  829   807   802   97%   97%
   nextval/1 16491   814 164655%  100%
   nextval/3224487 16462 24632   67%  101%
   nextval/6424516 24918 24671  102%  101%
   nextval/128   32337 33178 32863  103%  102%

   client  test master   log-all  page-lsn   log-all  page-lsn
   ---
4  insert 1577  1590  1546  101%   98%
   nextval/1 45607  1579 212203%   47%
   nextval/3268453 49141 51170   72%   75%
   nextval/6466928 65534 66408   98%   99%
   nextval/128   83502 81835 82576   98%   99%

The results seem clearly better, I think.


Thanks for benchmarking this! I agree that page-lsn is obviously better than 
log-all.



For "insert" there's no drop at all (same as before), because as soon as
a transaction generates any WAL, it has to flush/wait anyway.

And for "nextval" there's a drop, but only with 4 clients, and it's much
smaller (53% instead of 97%). And increasing the cache size eliminates
even that.


Yes, but 53% drop would be critial for some applications that don't want to 
increase the cache size for some reasons. So IMO it's better to provide the 
option to enable/disable that page-lsn approach.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: extended stats on partitioned tables

2022-01-14 Thread Justin Pryzby

On Mon, Dec 13, 2021 at 09:40:09PM +0100, Tomas Vondra wrote:
> 1) If the table is a separate relation (not part of an inheritance
> tree), this should make no difference. -> OK
> 
> 2) If the table is using "old" inheritance, this reverts back to
> pre-regression behavior. So people will keep using the old statistics
> until the ANALYZE, and we need to tell them to ANALYZE or something.
> 
> 3) If the table is using partitioning, it's guaranteed to be empty and
> there are no stats at all. Again, we should tell people to run ANALYZE.

I think these can be mentioned in the commit message, which can end up in the
minor release notes as a recommendation to rerun ANALYZE.

Thanks for pushing 0001.

-- 
Justin

Re: Support for NSS as a libpq TLS backend

2022-01-14 Thread Julien Rouhaud

Hi,

On Wed, Dec 15, 2021 at 11:10:14PM +0100, Daniel Gustafsson wrote:
> 
> I've attached a v50 which fixes the issues found by Joshua upthread, as well 
> as
> rebases on top of all the recent SSL and pgcrypto changes.

The cfbot reports that the patchset doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3138.log
=== Applying patches on top of PostgreSQL commit ID 
74527c3e022d3ace648340b79a6ddec3419f6732 ===
[...]
=== applying patch ./v50-0010-nss-Build-infrastructure.patch
patching file configure
patching file configure.ac
Hunk #3 succeeded at 1566 (offset 1 line).
Hunk #4 succeeded at 2366 (offset 1 line).
Hunk #5 succeeded at 2379 (offset 1 line).
patching file src/backend/libpq/Makefile
patching file src/common/Makefile
patching file src/include/pg_config.h.in
Hunk #3 succeeded at 926 (offset 3 lines).
patching file src/interfaces/libpq/Makefile
patching file src/tools/msvc/Install.pm
Hunk #1 FAILED at 440.
1 out of 1 hunk FAILED -- saving rejects to file src/tools/msvc/Install.pm.rej

Could you send a rebased version, possibly with an updated configure as
reported by Joshua?  In the meantime I will switch the entry to Waitinng on
Author.

Re: suboverflowed subtransactions concurrency performance optimize

2022-01-14 Thread Julien Rouhaud

Hi,

On Wed, Dec 08, 2021 at 04:39:11PM +, Simon Riggs wrote:
> 
> This patch shows where I'm going, with changes in GetSnapshotData()
> and XidInMVCCSnapshot() and XactLockTableWait().
> Passes make check, but needs much more, so this is review-only at this
> stage to give a flavour of what is intended.

Thanks a lot to everyone involved in this!

I can't find any entry in the commitfest for the work being done here.  Did I
miss something?  If not could you create an entry in the next commitfest to
make sure that it doesn't get forgotten?

Re: Add sub-transaction overflow status in pg_stat_activity

2022-01-14 Thread Julien Rouhaud

On Fri, Jan 14, 2022 at 07:42:29PM +, Bossart, Nathan wrote:
> On 1/14/22, 8:26 AM, "Tom Lane"  wrote:
> >
> > It feels to me like far too much effort is being invested in fundamentally
> > the wrong direction here.  If the subxact overflow business is causing
> > real-world performance problems, let's find a way to fix that, not put
> > effort into monitoring tools that do little to actually alleviate anyone's
> > pain.
> 
> +1

Agreed, it would be better but if that leads to significant work that doesn't
land in pg15, it would be nice to at least get more monitoring possibilities
in pg15 to help locate problems in application.

> An easy first step might be to increase PGPROC_MAX_CACHED_SUBXIDS and
> NUM_SUBTRANS_BUFFERS.

There's already something proposed for slru sizing:
https://commitfest.postgresql.org/36/2627/.  Unfortunately it hasn't been
committed yet despite some popularity.  I also don't know how much it improves
workloads that hit the overflow issue.

> This wouldn't be a long-term solution to all
> such performance problems, and we'd still probably want the proposed
> monitoring tools, but maybe it'd kick the can down the road a bit.

Yeah simply increasing PGPROC_MAX_CACHED_SUBXIDS won't really solve the
problem.  Also the xid cache is already ~30% of the PGPROC size, increasing it
any further is likely to end up being a loss for everyone that doesn't heavily
rely on needing more than 64 subtransactions.

There's also something proposed at
https://www.postgresql.org/message-id/003201d79d7b$189141f0$49b3c5d0$@tju.edu.cn,
which seems to reach some nice improvement without a major redesign of the
subtransaction system, but I now realize that apparently no one added it to the
commitfest :(

Re: Column Filtering in Logical Replication

2022-01-14 Thread Amit Kapila

On Fri, Jan 14, 2022 at 7:08 PM Alvaro Herrera  wrote:
>
> On 2022-Jan-14, Amit Kapila wrote:
>
> > 1. Replica Identity handling: Currently the column filter patch gives
> > an error during create/alter subscription if the specified column list
> > is invalid (Replica Identity columns are missing). It also gives an
> > error if the user tries to change the replica identity. However, it
> > doesn't deal with cases where the user drops and adds a different
> > primary key that has a different set of columns which can lead to
> > failure during apply on the subscriber.
>
> Hmm, yeah, I suppose we should check that the primary key is compatible
> with the column list in all publications.  (I wonder what happens in the
> interim, that is, what happens to tuples modified after the initial PK
> is dropped and before the new PK is installed.  Are these considered to
> have "replica identiy nothing"?)
>

I think so.

> > I think another issue w.r.t column filter patch is that even while
> > creating publication (even for 'insert' publications) it should check
> > that all primary key columns must be part of published columns,
> > otherwise, it can fail while applying on subscriber as it will try to
> > insert NULL for the primary key column.
>
> I'm not so sure about the primary key aspects, actually; keep in mind
> that the replica can have a different table definition, and it might
> have even a completely different primary key.  I think this part is up
> to the user to set up correctly; we have enough with just trying to make
> the replica identity correct.
>

But OTOH, the primary key is also considered default replica identity,
so I think users will expect it to work. You are right this problem
can also happen if the user defined a different primary key on a
replica but that is even a problem in HEAD (simple inserts will fail)
but I am worried about the case where both the publisher and
subscriber have the same primary key as that works in HEAD.

-- 
With Regards,
Amit Kapila.

Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output

2022-01-14 Thread Peter Geoghegan

On Thu, Jan 6, 2022 at 10:21 AM Peter Geoghegan  wrote:
> The patch bitrot again, so attached is a rebased version, v3.

I pushed a version of this patch earlier. This final version didn't go
quite as far as v3 did: it retained a few VACUUM VERBOSE only INFO
messages (it didn't demote them to DEBUG2). See the commit message for
details.

Thank you for your review work, Masahiko and Andres.
-- 
Peter Geoghegan

Re: Refactoring of compression options in pg_basebackup

2022-01-14 Thread Michael Paquier

On Fri, Jan 14, 2022 at 04:53:12PM -0500, Robert Haas wrote:
> On Thu, Jan 13, 2022 at 10:23 PM Michael Paquier  wrote:
>> Using --compression-level=NUMBER and --server-compress=METHOD to
>> specify a server-side compression method with a level is fine by me,
>> but I find the reuse of --compress to specify a compression method
>> confusing as it maps with the past option we have kept in
>> pg_basebackup for a couple of years now.  Based on your suggested set
>> of options, we could then have a --client-compress=METHOD and
>> --compression-level=NUMBER to specify a client-side compression method
>> with a level.  If we do that, I guess that we should then:
>> 1) Block the combination of --server-compress and --client-compress.
>> 2) Remove the existing -Z/--compress and -z/--gzip.
> 
> I could live with that. I'm not sure that --client-compress instead of
> reusing --compress is going to be better ... but I don't think it's
> awful so much as just not my first choice. I also don't think it would
> be horrid to leave -z, --gzip, and -Z as shorthands for the
> --client-compress=gzip with --compression-level also in the last case,
> instead of removing all that stuff.

Okay.  So, based on this feedback, I guess that something like the
attached would be what we are looking for.  I have maximized the
amount of code removed with the removal of -z/-Z,  but I won't fight
hard if the consensus is to keep them, either.  We could also keep
-z/--gzip, and stick -Z to the new --compression-level with
--compress removed.
--
Michael
From 1bbe624a4bea5e00a1997f81fe3b1ec1f637fa44 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 15 Jan 2022 11:51:40 +0900
Subject: [PATCH v4] Rework the option set of pg_basebackup

---
 src/bin/pg_basebackup/pg_basebackup.c| 99 ++--
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 37 
 doc/src/sgml/ref/pg_basebackup.sgml  | 33 ---
 3 files changed, 105 insertions(+), 64 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index aa43fc0924..57b054a8ad 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -113,6 +113,7 @@ static bool showprogress = false;
 static bool estimatesize = true;
 static int	verbose = 0;
 static int	compresslevel = 0;
+static WalCompressionMethod client_method = COMPRESSION_NONE;
 static IncludeWal includewal = STREAM_WAL;
 static bool fastcheckpoint = false;
 static bool writerecoveryconf = false;
@@ -358,8 +359,10 @@ usage(void)
 	printf(_("  --waldir=WALDIRlocation for the write-ahead log directory\n"));
 	printf(_("  -X, --wal-method=none|fetch|stream\n"
 			 " include required WAL files with specified method\n"));
-	printf(_("  -z, --gzip compress tar output\n"));
-	printf(_("  -Z, --compress=0-9 compress tar output with given compression level\n"));
+	printf(_("  --client-compress=METHOD\n"
+			 " method to compress data (client-side)\n"));
+	printf(_("  --compression-level=1-9\n"
+			 " compress tar output with given compression level\n"));
 	printf(_("\nGeneral options:\n"));
 	printf(_("  -c, --checkpoint=fast|spread\n"
 			 " set fast or spread checkpointing\n"));
@@ -524,8 +527,7 @@ LogStreamerMain(logstreamer_param *param)
 	stream.do_sync);
 	else
 		stream.walmethod = CreateWalTarMethod(param->xlog,
-			  (compresslevel != 0) ?
-			  COMPRESSION_GZIP : COMPRESSION_NONE,
+			  client_method,
 			  compresslevel,
 			  stream.do_sync);
 
@@ -976,7 +978,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
 	 bool is_recovery_guc_supported,
 	 bool expect_unterminated_tarfile)
 {
-	bbstreamer *streamer;
+	bbstreamer *streamer = NULL;
 	bbstreamer *manifest_inject_streamer = NULL;
 	bool		inject_manifest;
 	bool		must_parse_archive;
@@ -1035,19 +1037,22 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
 			archive_file = NULL;
 		}
 
+		if (client_method == COMPRESSION_NONE)
+			streamer = bbstreamer_plain_writer_new(archive_filename,
+   archive_file);
 #ifdef HAVE_LIBZ
-		if (compresslevel != 0)
+		else if (client_method == COMPRESSION_GZIP)
 		{
 			strlcat(archive_filename, ".gz", sizeof(archive_filename));
 			streamer = bbstreamer_gzip_writer_new(archive_filename,
   archive_file,
   compresslevel);
 		}
-		else
 #endif
-			streamer = bbstreamer_plain_writer_new(archive_filename,
-   archive_file);
-
+		else
+		{
+			Assert(false);		/* not reachable */
+		}
 
 		/*
 		 * If we need to parse the archive for whatever reason, then we'll
@@ -1745,8 +1750,6 @@ main(int argc, char **argv)
 		{"slot", required_argument, NULL, 'S'},
 		{"tablespace-mapping", required_argument, NULL, 'T'},
 		{"wal-method", required_argument, NULL, 'X'},
-		{"gzip", no_argumen

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-14 Thread Andres Freund

Hi,

On 2022-01-15 13:19:42 +1300, Thomas Munro wrote:
> On Sat, Jan 15, 2022 at 11:44 AM Andres Freund  wrote:
> > > The patch Alexander tested most recently uses a tri-state eof flag [...]
> >
> > What about instead giving WalReceiverConn an internal WaitEventSet, and 
> > using
> > that consistently? I've attached a draft for that.
> >
> > Alexander, could you test with that patch applied?
> 
> Isn't your patch nearly identical to one that I already posted, that
> Alexander tested and reported success with here?

Sorry, somehow I missed that across the many patches in the thread... And yes,
it does look remarkably similar.

> https://www.postgresql.org/message-id/5d507424-13ce-d19f-2f5d-ab4c6a987316%40gmail.com
> 
> I can believe that fixes walreceiver

One thing that still bothers me around this is that we didn't detect the
problem of the dead walreceiver connection, even after missing the
FD_CLOSE. There's plenty other ways that a connection can get stalled for
prolonged was, that we'd not get notified about either. That's why there's
wal_receiver_timeout, after all.

But from what I can see wal_receiver_timeout doesn't work even halfway
reliably, because of all the calls down to libpqrcv_PQgetResult, where we just
block indefinitely?

This actually seems like a significant issue to me, and not just on windows.

> (if we're sure that there isn't a libpq-changes-the-socket problem)

I just don't see what that problem could be, once connection is
established. The only way a the socket fd could change is a reconnect, which
doesn't happen automatically.

I actually was searching the archives for threads on it, but I didn't find
much besides the references around [1]. And I didn't see a concrete risk
explained there?

> but AFAICS the same problem exists for postgres_fdw and async append.

Perhaps - but I suspect it'll matter far less with them than with walreceiver.

> That's why I moved to trying to
> fix the multiple-WES thing (though of course I agree we should be
> using long lived WESes wherever possible

That approach seems like a very very leaky bandaid, with a decent potential
for unintended consequences. Perhaps there's nothing better that we can do,
but I'd rather try to fix the problem closer to the root...

> I just didn't think that seemed back-patchable, so it's more of a feature
> patch for the future).

Hm, it doesn't seem crazy invasive to me. But I guess we might be looking at a
revert of the shutdown changes for now anyway? In that case we should be
fixing this anyway, but we might be able to afford doing it in master only?

Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/CA%2BhUKG%2BzCNJZBXcURPdQvdY-tjyD0y7Li2wZEC6XChyUej1S5w%40mail.gmail.com

Add last commit LSN to pg_last_committed_xact()

2022-01-14 Thread James Coleman

I'd recently been thinking about monitoring how many bytes behind a
logical slot was and realized that's not really possible to compute
currently. That's easy enough with a physical slot because we can get
the current WAL LSN easily enough and the slot exposes the current LSN
positions of the slot. However for logical slots that naive
computation isn't quite right. The logical slot can't flush past the
last commit, so even if there's 100s of megabytes of unflushed WAL on
the slot there may be zero lag (in terms of what's possible to
process).

I've attached a simple patch (sans tests and documentation) to get
feedback early. After poking around this afternoon it seemed to me
that the simplest approach was to hook into the commit timestamps
infrastructure and store the commit's XLogRecPtr in the cache of the
most recent value (but of course don't write it out to disk). That the
downside of making this feature dependent on "track_commit_timestamps
= on", but that seems reasonable:

1. Getting the xid of the last commit is similarly dependent on commit
timestamps infrastructure.
2. It's a simple place to hook into and avoids new shared data and locking.

Thoughts?

Thanks,
James Coleman


v1-0001-Expose-LSN-of-last-commit-via-pg_last_committed_x.patch
Description: Binary data

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-14 Thread Thomas Munro

On Sat, Jan 15, 2022 at 9:47 AM Andres Freund  wrote:
> Walreceiver only started using WES in
> 2016-03-29 [314cbfc5d] Add new replication mode synchronous_commit = 
> 'remote_ap
>
> With that came the following comment:
>
> /*
>  * Ideally we would reuse a WaitEventSet object repeatedly
>  * here to avoid the overheads of WaitLatchOrSocket on epoll
>  * systems, but we can't be sure that libpq (or any other
>  * walreceiver implementation) has the same socket (even if
>  * the fd is the same number, it may have been closed and
>  * reopened since the last time).  In future, if there is a
>  * function for removing sockets from WaitEventSet, then we
>  * could add and remove just the socket each time, potentially
>  * avoiding some system calls.
>  */
> Assert(wait_fd != PGINVALID_SOCKET);
> rc = WaitLatchOrSocket(MyLatch,
>WL_EXIT_ON_PM_DEATH | 
> WL_SOCKET_READABLE |
>WL_TIMEOUT | WL_LATCH_SET,
>wait_fd,
>NAPTIME_PER_CYCLE,
>WAIT_EVENT_WAL_RECEIVER_MAIN);
>
> I don't really see how libpq could have changed the socket underneath us, as
> long as we get it the first time after the connection successfully was
> established?  I mean, there's a running command that we're processing the
> result of?

Erm, I didn't analyse the situation much back then, I just knew that
libpq could reconnect in early phases.  I can see that once you reach
that stage you can count on socket stability though, so yeah that
should work as long as you can handle it correctly in the earlier
connection phase (probably using the other patch I posted and
Alexander tested), it should all work nicely.  You'd probably want to
formalise the interface/documentation on that point.

> Nor do I understand what "any other walreceiver implementation"
> refers to?

I think I meant that it goes via function pointers to talk to
libpqwalreceiver.c, but I know now that we don't actually support
using that to switch to different code, it's just a solution to a
backend/frontend linking problem.  The comment was probably just
paranoia based on the way the interface works.

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-14 Thread Thomas Munro

On Sat, Jan 15, 2022 at 11:44 AM Andres Freund  wrote:
> > The patch Alexander tested most recently uses a tri-state eof flag [...]
>
> What about instead giving WalReceiverConn an internal WaitEventSet, and using
> that consistently? I've attached a draft for that.
>
> Alexander, could you test with that patch applied?

Isn't your patch nearly identical to one that I already posted, that
Alexander tested and reported success with here?

https://www.postgresql.org/message-id/5d507424-13ce-d19f-2f5d-ab4c6a987316%40gmail.com

I can believe that fixes walreceiver (if we're sure that there isn't a
libpq-changes-the-socket problem), but AFAICS the same problem exists
for postgres_fdw and async append.  That's why I moved to trying to
fix the multiple-WES thing (though of course I agree we should be
using long lived WESes wherever possible, I just didn't think that
seemed back-patchable, so it's more of a feature patch for the
future).

Re: Adding CI to our tree

2022-01-14 Thread Justin Pryzby

On Fri, Jan 14, 2022 at 03:34:11PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote:
> > I can probably adjust to whatever we decide to do. But I think we're
> > really just tinkering at the edges here. What I think we really need is
> > the moral equivalent of `make check-world` in one invocation of
> > vcregress.pl.
> 
> I agree strongly that we need that. But I think a good chunk of Justin's
> changes are actually required to get there?
> 
> Specifically, unless we want lots of duplicated logic in vcregress.pl, we
> need to make vcregress know how to run NO_INSTALLCHECK test. The option added
> was just so the buildfarm doesn't start to run tests multiple times...

The main reason I made the INSTALLCHECK runs conditional (they only run if a
new option is specified) is because of these comments:

| # Disabled because these tests require 
"shared_preload_libraries=pg_stat_statements",
| # which typical installcheck users do not have (e.g. buildfarm clients).
| NO_INSTALLCHECK = 1

Also, I saw that you saw that Thomas discovered/pointed out that a bunch of TAP
tests aren't being run by CI.   I think vcregress should have an "alltap"
target that runs everything like glob("**/t/").  CI would use that instead of
the existing ssl, auth, subscription, recovery, and bin targets.  The buildfarm
could switch to that after it's been published.

https://www.postgresql.org/message-id/20220114234947.av4kkhuj7netsy5r%40alap3.anarazel.de

-- 
Justin

Re: A test for replay of regression tests

2022-01-14 Thread Andres Freund

Hi,

On 2022-01-15 02:32:35 +1300, Thomas Munro wrote:
> 1. The way I invoke pg_regress doesn't seem to work correctly under
> the build farm client (though it works fine under make), not sure why
> yet, but reproduced here and will figure it out tomorrow.

I think it's just a problem of the buildfarm specifying port names in
extra_opts. E.g.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2022-01-14%2011%3A49%3A36
has

# Checking port 58074
# Found port 58074
Name: primary
...
# Running: 
/home/tmunro/build-farm/buildroot/HEAD/pgsql.build/src/test/recovery/../../../src/test/regress/pg_regress
 --dlpath="/home/tmunro/build-farm/buildroot/HEAD/pgsql.build/src/test/regress" 
--bindir= --port=58074 --schedule=../regress/parallel_schedule 
--max-concurrent-tests=20 --inputdir=../regress 
--outputdir="/home/tmunro/build-farm/buildroot/HEAD/pgsql.build/src/test/recovery"
 --port=5678
(using postmaster on /tmp/1W6qVPVyCv, port 5678)

Note how there's both --port=58074 and --port=5678 in the pg_regress
invocation. The latter comming from EXTRA_REGRESS_OPTS, which the buildfarm
client sets.

The quickest fix would probably be to just move the 027_stream_regress.pl
added --port until after $extra_opts?

> 2. The new test in src/test/modules/t/002_tablespace.pl apparently has
> some path-related problem on Windows

This is the failure you're talking about?

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2022-01-14%2012%3A04%3A55

> that I didn't know about, because CI isn't even running the TAP tests under
> src/test/module/test_misc (and various other locations), but the BF is :-/
> And I was happy because modulescheck was passing...

This we need to fix... But if you're talking about fairywren's failure, it's
more than not running some tests, it's that we do not test windows mingw
outside of cross compilation.

Greetings,

Andres Freund

Re: Adding CI to our tree

2022-01-14 Thread Andres Freund

Hi,

On 2022-01-13 15:27:40 -0500, Andrew Dunstan wrote:
> I can probably adjust to whatever we decide to do. But I think we're
> really just tinkering at the edges here. What I think we really need is
> the moral equivalent of `make check-world` in one invocation of
> vcregress.pl.

I agree strongly that we need that. But I think a good chunk of Justin's
changes are actually required to get there?

Specifically, unless we want lots of duplicated logic in vcregress.pl, we
need to make vcregress know how to run NO_INSTALLCHECK test. The option added
was just so the buildfarm doesn't start to run tests multiple times...

Greetings,

Andres Freund

Re: Windows: Wrong error message at connection termination

2022-01-14 Thread Sergey Shinderuk


On 14.01.2022 13:01, Sergey Shinderuk wrote:

Unexpectedly, this changes the error message:

 postgres=# set idle_session_timeout = '1s';
 SET
 postgres=# select 1;
 could not receive data from server: Software caused connection 
abort (0x2745/10053)


For the record, after more poking I realized that it depends on timing. 
 By injecting delays I can get any of the following from libpq:


* could not receive data from server: Software caused connection abort
* server closed the connection unexpectedly
* no connection to the server



Should we handle ECONNABORTED similarly to ECONNRESET in pqsecure_raw_read?


So this doesn't make sense anymore.

Sorry for the noise.

--
Sergey Shinderukhttps://postgrespro.com/

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-14 Thread Tom Lane

Thomas Munro  writes:
> (That patch is assuming that we're looking for something simple to
> back-patch, in the event that we decide not to just revert the
> graceful shutdown patch from back branches.  Reverting might be a
> better idea for now, and then we could fix all this stuff going
> forward.)

FWIW, I'm just fine with reverting, particularly in the back branches.
It seems clear that this dank corner of Windows contains even more
creepy-crawlies than we thought.

regards, tom lane

Re: tab completion of enum values is broken

2022-01-14 Thread Tom Lane

I wrote:
> Oh ... experimenting on macOS (with the system-provided libedit)
> shows no bug here.  So I guess we'll need to make this conditional
> somehow, perhaps on USE_FILENAME_QUOTING_FUNCTIONS.  That's another
> reason for not going overboard.

After further fooling with that, I concluded that the only workable
solution is a run-time check for whether the readline library included
the leading quote in what it hands us.  A big advantage of doing it this
way is that it mostly fixes my complaint #1: by crafting the check
properly, we will include quotes if the user hits TAB without having typed
anything, and we won't complete an incorrectly non-quoted identifier.
There's still nothing to be done about single quotes inside an enum
label, but I'm okay with blowing that case off.

So I think the attached is committable.  I've tried it on readline
7.0 (RHEL8) as well as whatever libedit Apple is currently shipping.

regards, tom lane

diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index ba98b8190b..d3d1bd650e 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -42,7 +42,8 @@ $node->start;
 $node->safe_psql('postgres',
 	"CREATE TABLE tab1 (f1 int, f2 text);\n"
 	  . "CREATE TABLE mytab123 (f1 int, f2 text);\n"
-	  . "CREATE TABLE mytab246 (f1 int, f2 text);\n");
+	  . "CREATE TABLE mytab246 (f1 int, f2 text);\n"
+	  . "CREATE TYPE enum1 AS ENUM ('foo', 'bar', 'baz');\n");
 
 # Developers would not appreciate this test adding a bunch of junk to
 # their ~/.psql_history, so be sure to redirect history into a temp file.
@@ -223,6 +224,16 @@ check_completion(
 
 clear_line();
 
+# check enum label completion
+# some versions of readline/libedit require two tabs here, some only need one
+# also, some versions will offer quotes, some will not
+check_completion(
+	"ALTER TYPE enum1 RENAME VALUE 'ba\t\t",
+	qr|'?bar'? +'?baz'?|,
+	"offer multiple enum choices");
+
+clear_line();
+
 # send psql an explicit \q to shut it down, else pty won't close properly
 $timer->start(5);
 $in .= "\\q\n";
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 39be6f556a..f7ce90da6e 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -278,25 +278,40 @@ do { \
 	matches = rl_completion_matches(text, complete_from_query); \
 } while (0)
 
+/*
+ * libedit will typically include the literal's leading single quote in
+ * "text", while readline will not.  Adapt our offered strings to fit.
+ * If we don't yet have text, include quotes in what's offered, to get the
+ * user off to the right start.
+ */
 #define COMPLETE_WITH_ENUM_VALUE(type) \
 do { \
 	char   *_completion_schema; \
 	char   *_completion_type; \
+	bool	use_quotes; \
 \
 	_completion_schema = strtokx(type, " \t\n\r", ".", "\"", 0, \
  false, false, pset.encoding); \
 	(void) strtokx(NULL, " \t\n\r", ".", "\"", 0, \
    false, false, pset.encoding); \
 	_completion_type = strtokx(NULL, " \t\n\r", ".", "\"", 0, \
-			   false, false, pset.encoding);  \
-	if (_completion_type == NULL)\
+			   false, false, pset.encoding); \
+	use_quotes = (text[0] == '\'' || \
+  start == 0 || rl_line_buffer[start - 1] != '\''); \
+	if (_completion_type == NULL) \
 	{ \
-		completion_charp = Query_for_list_of_enum_values; \
+		if (use_quotes) \
+			completion_charp = Query_for_list_of_enum_values_quoted; \
+		else \
+			completion_charp = Query_for_list_of_enum_values_unquoted; \
 		completion_info_charp = type; \
 	} \
 	else \
 	{ \
-		completion_charp = Query_for_list_of_enum_values_with_schema; \
+		if (use_quotes) \
+			completion_charp = Query_for_list_of_enum_values_with_schema_quoted; \
+		else \
+			completion_charp = Query_for_list_of_enum_values_with_schema_unquoted; \
 		completion_info_charp = _completion_type; \
 		completion_info_charp2 = _completion_schema; \
 	} \
@@ -678,7 +693,7 @@ static const SchemaQuery Query_for_list_of_collations = {
 "   AND (pg_catalog.quote_ident(nspname)='%s' "\
 "OR '\"' || nspname || '\"' ='%s') "
 
-#define Query_for_list_of_enum_values \
+#define Query_for_list_of_enum_values_quoted \
 "SELECT pg_catalog.quote_literal(enumlabel) "\
 "  FROM pg_catalog.pg_enum e, pg_catalog.pg_type t "\
 " WHERE t.oid = e.enumtypid "\
@@ -687,7 +702,16 @@ static const SchemaQuery Query_for_list_of_collations = {
 "OR '\"' || typname || '\"'='%s') "\
 "   AND pg_catalog.pg_type_is_visible(t.oid)"
 
-#define Query_for_list_of_enum_values_with_schema \
+#define Query_for_list_of_enum_values_unquoted \
+"SELECT enumlabel "\
+"  FROM pg_catalog.pg_enum e, pg_catalog.pg_type t "\
+" WHERE t.oid = e.enumtypid "\
+"   AND substring(enumlabel,1,%d)='%s' "\
+"   AND (pg_catalog.quote_ident(typname)='%s' "\
+"OR '\"' || typname || '\"'='%s') "\
+"   AND pg_catalog.pg_type_is_visible(t.oid)"
+
+#define Query_for_list_of_enum_values_with_schema_quoted \
 "SELECT pg_catal

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-14 Thread Andres Freund

Hi,

On 2022-01-15 10:59:00 +1300, Thomas Munro wrote:
> On Sat, Jan 15, 2022 at 9:28 AM Andres Freund  wrote:
> > I think it doesn't even need to touch socket.c to cause breakage. Using two
> > different WaitEventSets is enough.
>
> Right.  I was interested in your observation because so far we'd
> *only* been considering the two-consecutive-WaitEventSets case, which
> we could grok experimentally.

There likely are further problems in other parts, but I think socket.c is
unlikely to be involved in walreceiver case - there shouldn't be any socket.c
style socket in walreceiver itself, nor do I think we are doing a
send/recv/select backed by socket.c.


> The patch Alexander tested most recently uses a tri-state eof flag [...]

What about instead giving WalReceiverConn an internal WaitEventSet, and using
that consistently? I've attached a draft for that.

Alexander, could you test with that patch applied?

Greetings,

Andres Freund
>From 8139d324687364164c9abbd3e2b2a54b5d5bbcad Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 14 Jan 2022 14:39:57 -0800
Subject: [PATCH] WIP: use long-lived WaitEventSet for libpqwalreceiver
 connections.

---
 src/include/replication/walreceiver.h |  29 +++--
 .../libpqwalreceiver/libpqwalreceiver.c   | 112 --
 src/backend/replication/logical/tablesync.c   |  11 +-
 src/backend/replication/logical/worker.c  |  13 +-
 src/backend/replication/walreceiver.c |  26 +---
 5 files changed, 112 insertions(+), 79 deletions(-)

diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 92f73a55b8d..07a16470d48 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -253,6 +253,17 @@ typedef void (*walrcv_check_conninfo_fn) (const char *conninfo);
  */
 typedef char *(*walrcv_get_conninfo_fn) (WalReceiverConn *conn);
 
+/*
+ * walrcv_wait_fn
+ *
+ * Waits for socket to become readable, or a latch interrupt.
+ *
+ * timeout is passed on to WaitEventSetWait(), return value is
+ * like WaitLatchOrSocket's.
+ */
+typedef int (*walrcv_wait_fn) (WalReceiverConn *conn, long timeout,
+			   uint32 wait_event_info);
+
 /*
  * walrcv_get_senderinfo_fn
  *
@@ -317,14 +328,13 @@ typedef void (*walrcv_endstreaming_fn) (WalReceiverConn *conn,
 /*
  * walrcv_receive_fn
  *
- * Receive a message available from the WAL stream.  'buffer' is a pointer
- * to a buffer holding the message received.  Returns the length of the data,
- * 0 if no data is available yet ('wait_fd' is a socket descriptor which can
- * be waited on before a retry), and -1 if the cluster ended the COPY.
+ * Receive a message available from the WAL stream.  'buffer' is a pointer to
+ * a buffer holding the message received.  Returns the length of the data, 0
+ * if no data is available yet (see walrcv_wait()), and -1 if the cluster
+ * ended the COPY.
  */
 typedef int (*walrcv_receive_fn) (WalReceiverConn *conn,
-  char **buffer,
-  pgsocket *wait_fd);
+  char **buffer);
 
 /*
  * walrcv_send_fn
@@ -385,6 +395,7 @@ typedef struct WalReceiverFunctionsType
 	walrcv_connect_fn walrcv_connect;
 	walrcv_check_conninfo_fn walrcv_check_conninfo;
 	walrcv_get_conninfo_fn walrcv_get_conninfo;
+	walrcv_wait_fn walrcv_wait;
 	walrcv_get_senderinfo_fn walrcv_get_senderinfo;
 	walrcv_identify_system_fn walrcv_identify_system;
 	walrcv_server_version_fn walrcv_server_version;
@@ -407,6 +418,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
 	WalReceiverFunctions->walrcv_check_conninfo(conninfo)
 #define walrcv_get_conninfo(conn) \
 	WalReceiverFunctions->walrcv_get_conninfo(conn)
+#define walrcv_wait(conn, timeout, wait_event_info) \
+	WalReceiverFunctions->walrcv_wait(conn, timeout, wait_event_info)
 #define walrcv_get_senderinfo(conn, sender_host, sender_port) \
 	WalReceiverFunctions->walrcv_get_senderinfo(conn, sender_host, sender_port)
 #define walrcv_identify_system(conn, primary_tli) \
@@ -419,8 +432,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
 	WalReceiverFunctions->walrcv_startstreaming(conn, options)
 #define walrcv_endstreaming(conn, next_tli) \
 	WalReceiverFunctions->walrcv_endstreaming(conn, next_tli)
-#define walrcv_receive(conn, buffer, wait_fd) \
-	WalReceiverFunctions->walrcv_receive(conn, buffer, wait_fd)
+#define walrcv_receive(conn, buffer) \
+	WalReceiverFunctions->walrcv_receive(conn, buffer)
 #define walrcv_send(conn, buffer, nbytes) \
 	WalReceiverFunctions->walrcv_send(conn, buffer, nbytes)
 #define walrcv_create_slot(conn, slotname, temporary, two_phase, snapshot_action, lsn) \
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 0d89db4e6a6..24f062f779c 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -46,6 +46,8 @@ struct WalReceiverConn

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-14 Thread Thomas Munro

On Sat, Jan 15, 2022 at 10:59 AM Thomas Munro  wrote:
> (That patch is assuming that we're looking for something simple to
> back-patch, in the event that we decide not to just revert the
> graceful shutdown patch from back branches.  Reverting might be a
> better idea for now, and then we could fix all this stuff going
> forward.)

(Though, as mentioned already, reverting isn't really enough either,
because other OSes that Windows might be talking to use lingering
sockets...  and there may still be ways for this to break...)

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-14 Thread Thomas Munro

On Sat, Jan 15, 2022 at 9:28 AM Andres Freund  wrote:
> I think it doesn't even need to touch socket.c to cause breakage. Using two
> different WaitEventSets is enough.

Right.  I was interested in your observation because so far we'd
*only* been considering the two-consecutive-WaitEventSets case, which
we could grok experimentally.  The patch Alexander tested most
recently uses a tri-state eof flag, so (1) the WES event starts out in
"unknown" state and polls with WSAPoll() to figure out whether the
socket was already closed when it wasn't looking, and then (2) it
switches to believing that we'll definitely get an FD_CLOSE so we
don't need to make the extra call on every wait.  That does seem to do
the job, but if there are *other* places that can eat FD_CLOSE event
once we've switched to believing that it will come, that might be
fatal to the second part of that idea, and we might have to assume
"unknown" all the time, which would be somewhat similar to the way we
do a dummy WSASend() every time when waiting for WRITEABLE.

(That patch is assuming that we're looking for something simple to
back-patch, in the event that we decide not to just revert the
graceful shutdown patch from back branches.  Reverting might be a
better idea for now, and then we could fix all this stuff going
forward.)

> > Issuing a WSAEventSelect for a socket cancels any previous WSAAsyncSelect or
> > WSAEventSelect for the same socket and clears the internal network event
> > record.
>
> Note the bit about clearing the internal network event record. Which seems to
> pretty precisely explain why we're loosing FD_CLOSEs?

Indeed.

> And it does also explain why this is more likely after the shutdown changes:
> It's more likely the network stack knows it has readable data *and* that the
> connection closed. Which is recorded in the "internal network event
> record". But once all the data is read, walsender.c will do another
> WaitLatchOrSocket(), which does WSAEventSelect(), clearing the "internal event
> record" and loosing the FD_CLOSE.

Yeah.

Re: Refactoring of compression options in pg_basebackup

2022-01-14 Thread Robert Haas

On Thu, Jan 13, 2022 at 10:23 PM Michael Paquier  wrote:
> Using --compression-level=NUMBER and --server-compress=METHOD to
> specify a server-side compression method with a level is fine by me,
> but I find the reuse of --compress to specify a compression method
> confusing as it maps with the past option we have kept in
> pg_basebackup for a couple of years now.  Based on your suggested set
> of options, we could then have a --client-compress=METHOD and
> --compression-level=NUMBER to specify a client-side compression method
> with a level.  If we do that, I guess that we should then:
> 1) Block the combination of --server-compress and --client-compress.
> 2) Remove the existing -Z/--compress and -z/--gzip.

I could live with that. I'm not sure that --client-compress instead of
reusing --compress is going to be better ... but I don't think it's
awful so much as just not my first choice. I also don't think it would
be horrid to leave -z, --gzip, and -Z as shorthands for the
--client-compress=gzip with --compression-level also in the last case,
instead of removing all that stuff.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-14 Thread Andres Freund

Hi,

On 2022-01-14 12:28:48 -0800, Andres Freund wrote:
> But once all the data is read, walsender.c will do another
> WaitLatchOrSocket(), which does WSAEventSelect(), clearing the "internal event
> record" and loosing the FD_CLOSE.

Walreceiver only started using WES in
2016-03-29 [314cbfc5d] Add new replication mode synchronous_commit = 'remote_ap

With that came the following comment:

/*
 * Ideally we would reuse a WaitEventSet object repeatedly
 * here to avoid the overheads of WaitLatchOrSocket on epoll
 * systems, but we can't be sure that libpq (or any other
 * walreceiver implementation) has the same socket (even if
 * the fd is the same number, it may have been closed and
 * reopened since the last time).  In future, if there is a
 * function for removing sockets from WaitEventSet, then we
 * could add and remove just the socket each time, potentially
 * avoiding some system calls.
 */
Assert(wait_fd != PGINVALID_SOCKET);
rc = WaitLatchOrSocket(MyLatch,
   WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE 
|
   WL_TIMEOUT | WL_LATCH_SET,
   wait_fd,
   NAPTIME_PER_CYCLE,
   WAIT_EVENT_WAL_RECEIVER_MAIN);

I don't really see how libpq could have changed the socket underneath us, as
long as we get it the first time after the connection successfully was
established?  I mean, there's a running command that we're processing the
result of?  Nor do I understand what "any other walreceiver implementation"
refers to?

Thomas, I think you wrote that?

Greetings,

Andres Freund

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-14 Thread Andres Freund

Hi,

On 2022-01-14 20:31:22 +1300, Thomas Munro wrote:
> Andres and I chatted about this stuff off list and he pointed out
> something else about the wrappers in socket.c: there are more paths in
> there that work with socket events, which means more ways to lose the
> precious FD_CLOSE event.

I think it doesn't even need to touch socket.c to cause breakage. Using two
different WaitEventSets is enough.

https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect
says:

> The FD_CLOSE network event is recorded when a close indication is received
> for the virtual circuit corresponding to the socket. In TCP terms, this
> means that the FD_CLOSE is recorded when the connection goes into the TIME
> WAIT or CLOSE WAIT states. This results from the remote end performing a
> shutdown on the send side or a closesocket. FD_CLOSE being posted after all
> data is read from a socket

So FD_CLOSE is *recorded* internally when the connection is closed. But only
posted to the visible event once all data is read. All good so far. But
combine that with:

> Issuing a WSAEventSelect for a socket cancels any previous WSAAsyncSelect or
> WSAEventSelect for the same socket and clears the internal network event
> record.

Note the bit about clearing the internal network event record. Which seems to
pretty precisely explain why we're loosing FD_CLOSEs?

And it does also explain why this is more likely after the shutdown changes:
It's more likely the network stack knows it has readable data *and* that the
connection closed. Which is recorded in the "internal network event
record". But once all the data is read, walsender.c will do another
WaitLatchOrSocket(), which does WSAEventSelect(), clearing the "internal event
record" and loosing the FD_CLOSE.

My first inclination was that we ought to wrap the socket created for windows
in pgwin32_socket() in a custom type with some additional data - including
information about already received events, an EVENT, etc. I think that might
help to remove a lot of the messy workarounds we have in socket.c etc. But: It
wouldn't do much good here, because the socket is not a socket created by
socket.c but by libpq :(.

Greetings,

Andres Freund

Re: Add sub-transaction overflow status in pg_stat_activity

2022-01-14 Thread Bossart, Nathan

On 1/14/22, 8:26 AM, "Tom Lane"  wrote:
> Julien Rouhaud  writes:
>> Like many I previously had to investigate a slowdown due to sub-transaction
>> overflow, and even with the information available in a monitoring view (I had
>> to rely on a quick hackish extension as I couldn't patch postgres) it's not
>> terribly fun to do this way.  On top of that log analyzers like pgBadger 
>> could
>> help to highlight such a problem.
>
> It feels to me like far too much effort is being invested in fundamentally
> the wrong direction here.  If the subxact overflow business is causing
> real-world performance problems, let's find a way to fix that, not put
> effort into monitoring tools that do little to actually alleviate anyone's
> pain.

+1

An easy first step might be to increase PGPROC_MAX_CACHED_SUBXIDS and
NUM_SUBTRANS_BUFFERS.  This wouldn't be a long-term solution to all
such performance problems, and we'd still probably want the proposed
monitoring tools, but maybe it'd kick the can down the road a bit.
Perhaps another improvement could be to store the topmost transaction
along with the parent transaction in the subtransaction log to avoid
the loop in SubTransGetTopmostTransaction().

Nathan

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-14 Thread Pavel Borisov

>
> We intend to do the following work on the patch soon:
> 1. Write a detailed README
> 2. Split the patch into several pieces including a separate part for
> XID_FMT. But if committers meanwhile choose to commit Jim's XID_FMT patch
> we also appreciate this and will rebase our patch accordingly.
>  2A. Probably refactor it to store precalculated XMIN/XMAX in memory
> tuple representation instead of t_xid_base/t_multi_base
>  2B. Split the in-memory part of a patch as a separate
> 3. Construct some variants for leaving "double xmax" format as a temporary
> one just after upgrade for having only one persistent on-disk format
> instead of two.
>  3A. By using SQL function "vacuum doublexmax;"
> OR
>  3B. By freeing space on all heap pages for pd_special before
> pg-upgrade.
> OR
>  3C. By automatically repacking all "double xmax" pages after upgrade
> (with a priority specified by common vacuum-related GUCs)
> 4. Intentionally prohibit starting a new transaction with XID difference
> of more than 2^32 from the oldest currently running one. This is to enforce
> some dba's action for cleaning defunct transaction but not binding one:
> he/she can wait if they consider these old transactions not defunct.
> 5. Investigate and add a solution for archs without 64-bit atomic values.
>5A. Provide XID 8-byte alignment for systems where 64-bit atomics
> is provided for 8-byte aligned values.
>5B. Wrap XID reading into PG atomic locks for remaining 32-bit ones
> (they are expected to be rare).
>

Hi, hackers!

PFA patch with README for 64xid proposal. It is 0003 patch of the same v6,
that was proposed earlier [1].
As always, I very much appreciate your ideas on this readme patch, on
overall 64xid patch [1], and on the roadmap on its improvement quoted above.

 [1]
https://www.postgresql.org/message-id/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


v6-0003-README-for-64bit-xid.patch
Description: Binary data

Re: Parallelize correlated subqueries that execute within each worker

2022-01-14 Thread James Coleman

On Mon, Nov 15, 2021 at 10:01 AM Robert Haas  wrote:
>
> On Wed, Nov 3, 2021 at 1:34 PM James Coleman  wrote:
> > As I understand the current code, parallel plans are largely chosen
> > based not on where it's safe to insert a Gather node but rather by
> > determining if a given path is parallel safe. Through that lens params
> > are a bit of an odd man out -- they aren't inherently unsafe in the
> > way a parallel-unsafe function is, but they can only be used in
> > parallel plans under certain conditions (whether because of project
> > policy, performance, or missing infrastructure).
>
> Right.
>
> > Introducing consider_parallel_rechecking_params and
> > parallel_safe_ignoring_params allows us to keep more context on params
> > and make a more nuanced decision at the proper level of the plan. This
> > is what I mean by "rechecked in the using context", though I realize
> > now that both "recheck" and "context" are overloaded terms in the
> > project, so don't describe the concept particularly clearly. When a
> > path relies on params we can only make a final determination about its
> > parallel safety if we know whether or not the current parallel node
> > can provide the param's value. We don't necessarily know that
> > information until we attempt to generate a full parallel node in the
> > plan (I think what you're describing as "inserting a Gather node")
> > since the param may come from another node in the plan. These new
> > values allow us to do that by tracking tentatively parallel-safe
> > subplans (given proper Gather node placement) and delaying the
> > parallel-safety determination until the point at which a param is
> > available (or not).
>
> So I think I agree with you here. But I don't like all of this
> "ignoring_params" stuff and I don't see why it's necessary. Say we
> don't have both parallel_safe and parallel_safe_ignoring_params. Say
> we just have parallel_safe. If the plan will be parallel safe if the
> params are available, we label it parallel safe. If the plan will not
> be parallel safe even if the params are available, we say it's not
> parallel safe. Then, when we get to generate_gather_paths(), we don't
> generate any paths if there are required parameters that are not
> available. What's wrong with that approach?
>
> Maybe it's clearer to say this: I feel like one extra Boolean is
> either too much or too little. I think maybe it's not even needed. But
> if it is needed, then why just a bool instead of, say, a Bitmapset of
> params that are needed, or something?
>
> I'm sort of speaking from intuition here rather than sure knowledge. I
> might be totally wrong.

Apologies for quite the delay responding to this.

I've been chewing on this a bit, and I was about to go re-read the
code and see how easy it'd be to do exactly what you're suggesting in
generate_gather_paths() (and verifying it doesn't need to happen in
other places). However there's one (I think large) gotcha with that
approach (assuming it otherwise makes sense): it means we do
unnecessary work. In the current patch series we only need to recheck
parallel safety if we're in a situation where we might actually
benefit from doing that work (namely when we have a correlated
subquery we might otherwise be able to execute in a parallel plan). If
we don't track that status we'd have to recheck the full parallel
safety of the path for all paths -- even without correlated
subqueries.

Alternatively we could merge these fields into a single enum field
that tracked these states. Even better, we could use a bitmap to
signify what items are/aren't parallel safe. I'm not sure if that'd
create even larger churn in the patch, but maybe it's worth it either
way. In theory it'd open up further expansions to this concept later
(though I'm not aware of any such ideas).

If you think such an approach would be an improvement I'd be happy to
take a pass at a revised patch.

Thoughts?

Thanks,
James Coleman

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-14 Thread Bossart, Nathan

On 1/14/22, 3:43 AM, "Maxim Orlov"  wrote:
> The code seems to be in good condition. All the tests are running ok with no 
> errors.

Thanks for your review.

> I like the whole idea of shifting additional checkpointer jobs as much as 
> possible to another worker. In my view, it is more appropriate to call this 
> worker "bg cleaner" or "bg file cleaner" or smth.
>
> It could be useful for systems with high load, which may deal with deleting 
> many files at once, but I'm not sure about "small" installations. Extra bg 
> worker need more resources to do occasional deletion of small amounts of 
> files. I really do not know how to do it better, maybe to have two different 
> code paths switched by GUC?

I'd personally like to avoid creating two code paths for the same
thing.  Are there really cases when this one extra auxiliary process
would be too many?  And if so, how would a user know when to adjust
this GUC?  I understand the point that we should introduce new
processes sparingly to avoid burdening low-end systems, but I don't
think we should be afraid to add new ones when it is needed.

That being said, if making the extra worker optional addresses the
concerns about resource usage, maybe we should consider it.  Justin
suggested using something like max_parallel_maintenance_workers
upthread [0].

> Should we also think about adding WAL preallocation into custodian worker 
> from the patch "Pre-alocationg WAL files" [1] ?

This was brought up in the pre-allocation thread [1].  I don't think
the custodian process would be the right place for it, and I'm also
not as concerned about it because it will generally be a small, fixed,
and configurable amount of work.  In any case, I don't sense a ton of
support for a new auxiliary process in this thread, so I'm hesitant to
go down the same path for pre-allocation.

Nathan

[0] https://postgr.es/m/20211213171935.GX17618%40telsasoft.com
[1] https://postgr.es/m/B2ACCC5A-F9F2-41D9-AC3B-251362A0A254%40amazon.com

Re: Parallelize correlated subqueries that execute within each worker

2022-01-14 Thread James Coleman

On Fri, Dec 3, 2021 at 2:35 AM Michael Paquier  wrote:
>
> On Mon, Nov 15, 2021 at 10:01:37AM -0500, Robert Haas wrote:
> > On Wed, Nov 3, 2021 at 1:34 PM James Coleman  wrote:
> >> As I understand the current code, parallel plans are largely chosen
> >> based not on where it's safe to insert a Gather node but rather by
> >> determining if a given path is parallel safe. Through that lens params
> >> are a bit of an odd man out -- they aren't inherently unsafe in the
> >> way a parallel-unsafe function is, but they can only be used in
> >> parallel plans under certain conditions (whether because of project
> >> policy, performance, or missing infrastructure).
> >
> > Right.
>
> Please note that the CF bot is complaining here, so I have moved this
> patch to the next CF, but changed the status as waiting on author.

I rebased this back in December, but somehow forgot to reply with the
updated patch, so, here it is finally.

Thanks,
James Coleman


v4-0001-Allow-parallel-LATERAL-subqueries-with-LIMIT-OFFS.patch
Description: Binary data


v4-0002-Parallel-query-support-for-basic-correlated-subqu.patch
Description: Binary data


v4-0003-Other-places-to-consider-for-completeness.patch
Description: Binary data

Re: Pluggable toaster

2022-01-14 Thread Teodor Sigaev


I'd like to ask your opinion for next small questions

1) May be, we could add toasteroid to pg_type to for default toaster for 
datatype.  ALTER TYPE type SET DEFAULT TOASTER toaster;


2) The name of default toaster is deftoaster, which was choosen at 
night, may be heap_toaster is better? heap because default toaster 
stores chunks in heap table.


Thank you!

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Pluggable toaster

2022-01-14 Thread Teodor Sigaev





In my understanding, we want to be able to
1. Access data from a toasted object one slice at a time, by using
knowledge of the structure
2. If toasted data is updated, then update a minimum number of
slices(s), without rewriting the existing slices
3. If toasted data is expanded, then allownew slices to be appended to
the object without rewriting the existing slices


There are more options:
1 share common parts between not only versions of row but between all 
rows in a column. Seems strange but examples:

  - urls often have a common prefix and so storing in a prefix tree (as
SP-GiST does) allows significantly decrease storage size
  - the same for json - it's often use case with common part of its
hierarchical structure
  - one more usecase for json. If json use only a few schemes
(structure) it's possible to store in toast storage only values and
don't store keys and structure
2 Current toast storage stores chunks in heap accesses method and to 
provide fast access by toast id it makes an index. Ideas:

  - store chunks directly in btree tree, pgsql's btree already has an
INCLUDE columns, so, chunks and visibility data will be stored only
in leaf pages. Obviously it reduces number of disk's access for
"untoasting".
  - use another access method for chunk storage


ISTM that we would want the toast algorithm to be associated with the
datatype, not the column?
Can you explain your thinking?

Hm. I'll try to explain my motivation.
1) Datatype could have more than one suitable toasters. For different
   usecases: fast retrieving, compact storage, fast update etc. As I
   told   above, for jsonb there are several optimal strategies for
   toasting:   for values with a few different structures, for close to
   hierarchical structures,  for values with different parts by access
   mode (easy to imagine json with some keys used for search and some
   keys only for   output to user)
2) Toaster could be designed to work with different data type. Suggested
   appendable toaster is designed to work with bytea but could work with
   text

Looking on this point I have doubts where to store connection between 
toaster and datatype. If we add toasteroid to pg_type how to deal with 
several toaster for one datatype? (And we could want to has different 
toaster on one table!) If we add typoid to pg_toaster then how it will 
work with several datatypes? An idea to add a new many-to-many 
connection table seems workable but here there are another questions, 
such as will any toaster work with any table access method?


To resolve this bundle of question we propose validate() method of 
toaster, which should be called during DDL operation, i.e. toaster is 
assigned to column or column's datatype is changed.


More thought:
Now postgres has two options for column: storage and compression and now 
we add toaster. For me it seems too redundantly. Seems, storage should 
be binary value: inplace (plain as now) and toastable. All other 
variation such as toast limit, compression enabling, compression kind 
should be an per-column option for toaster (that's why we suggest valid 
toaster oid for any column with varlena/toastable datatype). It looks 
like a good abstraction but we will have a problem with backward 
compatibility and I'm afraid I can't implement it very fast.






We already have Expanded toast format, in-memory, which was designed
specifically to allow us to access sub-structure of the datatype
in-memory. So I was expecting to see an Expanded, on-disk, toast
format that roughly matched that concept, since Tom has already shown
us the way. (varatt_expanded). This would be usable by both JSON and
PostGIS.
Hm, I don't understand. varatt_custom has variable-length tail which 
toaster could use it by any way, appandable toaster use it to store 
appended tail.





Some other thoughts:

I imagine the data type might want to keep some kind of dictionary
inside the main toast pointer, so we could make allowance for some
optional datatype-specific private area in the toast pointer itself,
allowing a mix of inline and out-of-line data, and/or a table of
contents to the slices.

I'm thinking could also tackle these things at the same time:
* We want to expand TOAST to 64-bit pointers, so we can have more
pointers in a table
* We want to avoid putting the data length into the toast pointer, so
we can allow the toasted data to be expanded without rewriting
everything (to avoid O(N^2) cost)

Right

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2022-01-14 Thread Pavel Stehule

Hi

I like this feature, but I don't like the introduction of double + too
much. I think it is confusing.

Is it really necessary? Cannot be enough just reorganization of \dn and
\dn+.

Regards

Pavel

pá 14. 1. 2022 v 17:35 odesílatel Justin Pryzby 
napsal:

> Rebased before Julian asks.
>

Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2022-01-14 Thread Justin Pryzby

Rebased before Julian asks.
>From cf57143c85a2a6088fe6038e6d41771fd60aae34 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 13 Jul 2021 21:25:48 -0500
Subject: [PATCH 1/4] Add pg_am_size(), pg_namespace_size() ..

See also: 358a897fa, 528ac10c7
---
 src/backend/utils/adt/dbsize.c  | 130 
 src/include/catalog/pg_proc.dat |  19 +
 2 files changed, 149 insertions(+)

diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 3a2f2e1f99d..6cde01f0587 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -13,18 +13,23 @@
 
 #include 
 
+#include "access/genam.h"
 #include "access/htup_details.h"
 #include "access/relation.h"
+#include "access/table.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_tablespace.h"
 #include "commands/dbcommands.h"
+#include "commands/defrem.h"
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/relfilenodemap.h"
@@ -832,6 +837,131 @@ pg_size_bytes(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+/*
+ * Return the sum of size of relations for which the given attribute of
+ * pg_class matches the specified OID value.
+ */
+static int64
+calculate_size_attvalue(int attnum, Oid attval)
+{
+	int64		totalsize = 0;
+	ScanKeyData 	skey;
+	Relation	pg_class;
+	SysScanDesc	scan;
+	HeapTuple	tuple;
+
+	ScanKeyInit(&skey, attnum,
+			BTEqualStrategyNumber, F_OIDEQ, attval);
+
+	pg_class = table_open(RelationRelationId, AccessShareLock);
+	scan = systable_beginscan(pg_class, InvalidOid, false, NULL, 1, &skey);
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		Relation rel;
+		Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
+
+		rel = try_relation_open(classtuple->oid, AccessShareLock);
+		if (!rel)
+			continue;
+
+		for (ForkNumber forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+			totalsize += calculate_relation_size(&rel->rd_node, rel->rd_backend, forkNum);
+
+		relation_close(rel, AccessShareLock);
+	}
+
+	systable_endscan(scan);
+	table_close(pg_class, AccessShareLock);
+	return totalsize;
+}
+
+/* Compute the size of relations in a schema (namespace) */
+static int64
+calculate_namespace_size(Oid nspOid)
+{
+	/*
+	 * User must be a member of pg_read_all_stats or have CREATE privilege for
+	 * target namespace.
+	 */
+	if (!is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+	{
+		AclResult aclresult;
+		aclresult = pg_namespace_aclcheck(nspOid, GetUserId(), ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_SCHEMA,
+		   get_namespace_name(nspOid));
+	}
+
+	return calculate_size_attvalue(Anum_pg_class_relnamespace, nspOid);
+}
+
+Datum
+pg_namespace_size_oid(PG_FUNCTION_ARGS)
+{
+	int64		size;
+	Oid			nspOid = PG_GETARG_OID(0);
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
+Datum
+pg_namespace_size_name(PG_FUNCTION_ARGS)
+{
+	int64		size;
+	Name		nspName = PG_GETARG_NAME(0);
+	Oid			nspOid = get_namespace_oid(NameStr(*nspName), false);
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
+/* Compute the size of relations using the given access method */
+static int64
+calculate_am_size(Oid amOid)
+{
+	/* XXX acl_check? */
+
+	return calculate_size_attvalue(Anum_pg_class_relam, amOid);
+}
+
+Datum
+pg_am_size_oid(PG_FUNCTION_ARGS)
+{
+	int64		size;
+	Oid			amOid = PG_GETARG_OID(0);
+
+	size = calculate_am_size(amOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
+Datum
+pg_am_size_name(PG_FUNCTION_ARGS)
+{
+	int64		size;
+	Name		amName = PG_GETARG_NAME(0);
+	Oid			amOid = get_am_oid(NameStr(*amName), false);
+
+	size = calculate_am_size(amOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
 /*
  * Get the filenode of a relation
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d6bf1f3274b..18248799e73 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7230,6 +7230,25 @@
   descr => 'total disk space usage for the specified tablespace',
   proname => 'pg_tablespace_size', provolatile => 'v', prorettype => 'int8',
   proargtypes => 'name', prosrc => 'pg_tablespace_size_name' },
+
+{ oid => '9410',
+  descr => 'total disk space usage for the specified namespace',
+  proname => 'pg_namespace_size', provolatile => 'v', prorettype => 'int8',
+  proargtypes => 'oid', prosrc => 'pg_namespace_size_oid' },
+{ oid => '9411',
+  descr => 'total disk space usage for the specified namespace',
+  proname => 'pg_namespace_size', provolatile => 'v', prorettype => 'int8',
+  proargtypes => 'name', prosrc => '

Re: Add sub-transaction overflow status in pg_stat_activity

2022-01-14 Thread Tom Lane

Julien Rouhaud  writes:
> Like many I previously had to investigate a slowdown due to sub-transaction
> overflow, and even with the information available in a monitoring view (I had
> to rely on a quick hackish extension as I couldn't patch postgres) it's not
> terribly fun to do this way.  On top of that log analyzers like pgBadger could
> help to highlight such a problem.

It feels to me like far too much effort is being invested in fundamentally
the wrong direction here.  If the subxact overflow business is causing
real-world performance problems, let's find a way to fix that, not put
effort into monitoring tools that do little to actually alleviate anyone's
pain.

regards, tom lane

Re: do only critical work during single-user vacuum?

2022-01-14 Thread John Naylor

I see a CF entry has been created already, and the cfbot doesn't like
my PoC. To prevent confusion, I've taken the liberty of switching the
author to myself and set to Waiting on Author. FWIW, my local build
passed make check-world after applying Justin's fix and changing a
couple other things.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: Add sub-transaction overflow status in pg_stat_activity

2022-01-14 Thread Dilip Kumar

On Fri, Jan 14, 2022 at 1:17 PM Julien Rouhaud  wrote:

> On Thu, Jan 13, 2022 at 10:27:31PM +, Bossart, Nathan wrote:
> > Thanks for the new patch!
> >
> > +   
> > +Returns a record of information about the backend's
> subtransactions.
> > +The fields returned are subxact_count
> identifies
> > +number of active subtransaction and subxact_overflow
> > + shows whether the backend's subtransaction cache is
> > +overflowed or not.
> > +   
> > +   
> >
> > nitpick: There is an extra "" here.
>
> Also the sentence looks a bit weird.  I think something like that would be
> better:
>
> > +Returns a record of information about the backend's
> subtransactions.
> > +The fields returned are subxact_count,
> which
> > +identifies the number of active subtransaction and
> subxact_overflow
> > +, which shows whether the backend's subtransaction
> cache is
> > +overflowed or not.
>

Thanks for looking into this, I will work on this next week.


> While on the sub-transaction overflow topic, I'm wondering if we should
> also
> raise a warning (maybe optionally) immediately when a backend overflows
> (so in
> GetNewTransactionId()).
>
> Like many I previously had to investigate a slowdown due to sub-transaction
> overflow, and even with the information available in a monitoring view (I
> had
> to rely on a quick hackish extension as I couldn't patch postgres) it's not
> terribly fun to do this way.  On top of that log analyzers like pgBadger
> could
> help to highlight such a problem.
>
> I don't want to derail this thread so let me know if I should start a
> distinct
> discussion for that.
>

Yeah that seems like a good idea.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Undocumented error

2022-01-14 Thread David G. Johnston

On Friday, January 14, 2022, Tomas Vondra 
wrote:

> ,
>
> On 1/14/22 00:02, Petar Dambovaliev wrote:
>
>>
>> the error code is `-1` and the error text is `invalid ordering of
>> speculative insertion changes`
>>
>
> Which Postgres version is this, exactly? Was the WAL generated by that
> same version, or did you update/upgrade recently?

The OP failed to mention this is Aurora from AWS based off of 12.4 (we
chatted on Discord about this).

David J.

Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2022-01-14 Thread Andrei Zubkov

Hello,

On Sun, 2022-01-02 at 13:28 -0800, Andres Freund wrote:
> Hi,
> 
> This fails with an assertion failure:
> https://cirrus-ci.com/task/5567540742062080?logs=cores#L55
> 
> 
Andres, thank you for your test! I've missed it. Fixed in attached
patch v5.

On Wed, 2021-12-22 at 04:25 +0300, Anton A. Melnikov wrote:
> 
> 
> I completely agree that creating a separate view for these new fields
> is
> the most correct thing to do.

Anton,

I've created a new view named pg_stat_statements_aux. But for now both
views are using the same function pg_stat_statements which returns all
fields. It seems reasonable to me - if sampling solution will need all
values it can query the function.

> Also it might be better to use the term 'auxiliary' and use the same 
> approach as for existent similar vars.

Agreed, renamed to auxiliary term.

> So internally it might look something like this:
> 
> double  aux_min_time[PGSS_NUMKIND];
> double  aux_max_time[PGSS_NUMKIND];
> TimestampTz aux_stats_reset;
> 
> And at the view level:
>aux_min_plan_time float8,
>aux_max_plan_time float8,
>aux_min_exec_time float8,
>aux_max_exec_time float8,
>aux_stats_reset timestamp with time zone
> 
> Functions names might be pg_stat_statements_aux() and 
> pg_stat_statements_aux_reset().
> 

But it seems "stats_reset" term is not quite correct in this case. The
"stats_since" field holds the timestamp of hashtable entry, but not the
reset time. The same applies to aux_stats_since - for new statement it
holds its entry time, but in case of reset it will hold the reset time.

"stats_reset" name seems a little bit confusing to me.

Attached patch v5
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From e3140e136bab9d44d7818ed86e1c8ff119936532 Mon Sep 17 00:00:00 2001
From: Andrei Zubkov 
Date: Fri, 14 Jan 2022 18:04:53 +0300
Subject: [PATCH] pg_stat_statements: Track statement entry timestamp

This patch adds stats_since column to the pg_stat_statements view. This column
is populated with the current timestamp when a new statement is added to the
pg_stat_statements hashtable. It provides clean information about statistics
collection time interval for each statement. Besides it can be used
by sampling solutions to detect situations when a statement was evicted and
returned back between samples.
Such sampling solution could derive any pg_stat_statements statistic value for
an interval between two samples with except of all min/max statistics. To
address this issue this patch adds auxiliary statistics to the pg_stat_statements
function and a new view named pg_stat_statements_aux:

 View "public.pg_stat_statements_aux"
  Column   |   Type   | Collation | Nullable | Default
---+--+---+--+-
 userid| oid  |   |  |
 dbid  | oid  |   |  |
 toplevel  | boolean  |   |  |
 queryid   | bigint   |   |  |
 query | text |   |  |
 aux_min_plan_time | double precision |   |  |
 aux_max_plan_time | double precision |   |  |
 aux_min_exec_time | double precision |   |  |
 aux_max_exec_time | double precision |   |  |
 stats_since   | timestamp with time zone |   |  |
 aux_stats_since   | timestamp with time zone |   |  |

These auxiliary statistics are resettable independently of statement reset by
the new function pg_stat_statements_aux_reset(userid oid, dbid oid, queryid bigint)

Discussion: https://www.postgresql.org/message-id/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru
---
 contrib/pg_stat_statements/Makefile   |   3 +-
 .../expected/pg_stat_statements.out   |  90 
 .../pg_stat_statements--1.9--1.10.sql | 122 +++
 .../pg_stat_statements/pg_stat_statements.c   | 203 +-
 .../pg_stat_statements.control|   2 +-
 .../sql/pg_stat_statements.sql|  50 +
 doc/src/sgml/pgstatstatements.sgml| 184 +++-
 7 files changed, 640 insertions(+), 14 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 7fabd96f38d..edc40c8bbfb 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
 	pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
+DATA = pg_stat_statements--1.4.sql \
+	pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_state

Re: Logical replication timeout problem

2022-01-14 Thread Fabrice Chapuis

if it takes little work for you, can you please send me a piece of code
with the change needed to do the test

Thanks

Regards,

Fabrice

On Fri, Jan 14, 2022 at 1:03 PM Amit Kapila  wrote:

> On Fri, Jan 14, 2022 at 3:47 PM Fabrice Chapuis 
> wrote:
> >
> > If I can follow you, I have to make the following changes:
> >
>
> No, not like that but we can try that way as well to see if that helps
> to avoid your problem. Am, I understanding correctly even after
> modification, you are seeing the problem. Can you try by calling
> WalSndKeepaliveIfNecessary() instead of WalSndKeepalive()?
>
> --
> With Regards,
> Amit Kapila.
>

Re: [PATCH] Allow multiple recursive self-references

2022-01-14 Thread Denis Hirn



> On 14. Jan 2022, at 13:21, Peter Eisentraut 
>  wrote:

> 
> There is nothing in there that says that certain branches of the UNION in a 
> recursive query mean certain things. In fact, it doesn't even require the 
> query to contain a UNION at all.  It just says to iterate on evaluating the 
> query until a fixed point is reached.  I think this supports my claim that 
> the associativity and commutativity of a UNION in a recursive query still 
> apply.
> 
> This is all very complicated, so I don't claim this to be authoritative, but 
> I just don't see anything in the spec that supports what you are saying.


I disagree. In SQL:2016, it's discussed in 7.16  syntax rule 
3) j) i), which defines:

> 3) Among the WQEi, ... WQEk of a given stratum, there shall be at least one 
> sion>, say WQEj, such that:
> A) WQEj is a  that immediately contains UNION.
> B) WQEj has one operand that does not contain a  referencing 
> any of WQNi,
>..., WQNk. This operand is said to be the non-recursive operand of 
> WQEj.
> C) WQEj is said to be an anchor expression, and WQNj an anchor name.

Where  is defined as:
>  ::=
> 
>   |  UNION [ALL | DISTINCT]
>   [  ] 
>
>  ::=
> 
>   | ...
>
>  ::= ...
>   |   ... 

This definition pretty much sums up what I have called RUNION.

The SQL standard might not impose a strict order on the UNION branches.
But you have to be able to uniquely identify the anchor expression.

Best,
  -- Denis

Re: support for MERGE

2022-01-14 Thread Justin Pryzby

Resending some language fixes to the public documentation.
>From 28b8532976ddb3e8b617ca007fae6b4822b36527 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 13 Nov 2021 12:11:46 -0600
Subject: [PATCH] f!typos

---
 doc/src/sgml/mvcc.sgml  |  8 ++---
 doc/src/sgml/ref/create_policy.sgml |  5 +--
 doc/src/sgml/ref/insert.sgml|  2 +-
 doc/src/sgml/ref/merge.sgml | 48 ++---
 doc/src/sgml/trigger.sgml   |  7 +++--
 src/test/regress/expected/merge.out |  4 +--
 src/test/regress/sql/merge.sql  |  4 +--
 7 files changed, 40 insertions(+), 38 deletions(-)

diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index a1ae8423414..61a10c28120 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -441,16 +441,16 @@ COMMIT;
 can specify several actions and they can be conditional, the
 conditions for each action are re-evaluated on the updated version of
 the row, starting from the first action, even if the action that had
-originally matched was later in the list of actions.
+originally matched appears later in the list of actions.
 On the other hand, if the row is concurrently updated or deleted so
 that the join condition fails, then MERGE will
-evaluate the conditions NOT MATCHED actions next,
+evaluate the condition's NOT MATCHED actions next,
 and execute the first one that succeeds.
 If MERGE attempts an INSERT
 and a unique index is present and a duplicate row is concurrently
-inserted then a uniqueness violation is raised.
+inserted, then a uniqueness violation is raised.
 MERGE does not attempt to avoid the
-ERROR by attempting an UPDATE.
+ERROR by executing an UPDATE.

 

diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml
index 3db3908b429..db312681f78 100644
--- a/doc/src/sgml/ref/create_policy.sgml
+++ b/doc/src/sgml/ref/create_policy.sgml
@@ -96,10 +96,11 @@ CREATE POLICY name ON 
 
   
-   No separate policy exists for MERGE. Instead policies
+   No separate policy exists for MERGE. Instead, the policies
defined for SELECT, INSERT,
UPDATE and DELETE are applied
-   while executing MERGE, depending on the actions that are activated.
+   while executing MERGE, depending on the actions that are
+   performed.
   
  
 
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 477de2689b6..ad61d757af5 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -592,7 +592,7 @@ INSERT oid count
You may also wish to consider using MERGE, since that
-   allows mixed INSERT, UPDATE and
+   allows mixing INSERT, UPDATE and
DELETE within a single statement.
See .
   
diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
index 7700d9b9bb1..149df9f0ad9 100644
--- a/doc/src/sgml/ref/merge.sgml
+++ b/doc/src/sgml/ref/merge.sgml
@@ -73,11 +73,11 @@ DELETE
from data_source to
target_table_name
producing zero or more candidate change rows.  For each candidate change
-   row the status of MATCHED or NOT MATCHED
+   row, the status of MATCHED or NOT MATCHED
is set just once, after which WHEN clauses are evaluated
-   in the order specified.  The first clause to match each candidate change
-   row is executed.  No more than one WHEN clause is
-   executed for any candidate change row.  
+   in the order specified.  For each candidate change row, the first clause to
+   evaluate as true executed.  No more than one WHEN clause
+   is executed for any candidate change row.  
   
 
   
@@ -85,14 +85,14 @@ DELETE
regular UPDATE, INSERT, or
DELETE commands of the same names. The syntax of
those commands is different, notably that there is no WHERE
-   clause and no tablename is specified.  All actions refer to the
+   clause and no table name is specified.  All actions refer to the
target_table_name,
though modifications to other tables may be made using triggers.
   
 
   
-   When DO NOTHING action is specified, the source row is
-   skipped. Since actions are evaluated in the given order, DO
+   When DO NOTHING is specified, the source row is
+   skipped. Since actions are evaluated in their specified order, DO
NOTHING can be handy to skip non-interesting source rows before
more fine-grained handling.
   
@@ -178,7 +178,7 @@ DELETE
 
  
   A substitute name for the data source. When an alias is
-  provided, it completely hides whether table or query was specified.
+  provided, it completely hides the actual name of the table or query.
  
 

@@ -202,7 +202,7 @@ DELETE
rows should appear in join_condition.
join_condition subexpressions that
only reference target_table_name
-   columns can only affect which action is taken, often in surprising ways.
+   columns can affect which action is taken, often in surprising ways.
   
  
 
@@ -226,7

Re: Boyer-More-Horspool searching LIKE queries

2022-01-14 Thread Atsushi Ogawa

Thanks for the comments.

> > The conditions under which B-M-H can be used are as follows.
> >
> > (1) B-M-H in LIKE search supports only single-byte character sets and
UTF8.
> > Multibyte character sets does not support, because it may contain
another
> > characters in the byte sequence. For UTF-8, it works fine, because in
> > UTF-8 the byte sequence of one character cannot contain another
character.
>
> You can make it work with any encoding, if you check for that case after
> you find a match. See how text_position() does it.

I saw the text_position(). I would like to do the same.

> > (3) The pattern string should be at least 4 characters.
> > For example, '%AB%' can use B-M-H.
>
> To be precise, the patch checks that the pattern string is at least 4
> *bytes* long. A pattern like E'%\U0001F418%' would benefit too.

*bytes* is precise. I will revise the comment of code.

> If I'm reading the code correctly, it doesn't account for escapes
> correctly. It will use B-M-H for a pattern like '%\\%', even though
> that's just searching for a single backslash and won't benefit from B-M-H.

You are correct. I will fix it.

> > (4) The first and last character of the pattern string should be '%'.
>
> I wonder if we can do better than that. If you have a pattern like
> '%foo%bar', its pretty obvious (to a human) that you can quickly check
> if the string ends in 'bar', and then check if it also contains the
> substring 'foo'. Is there some way to generalize that?

I think the following optimizations are possible.

(1)%foo%bar
 Check if the string ends with 'bar' and search for 'foo' by B-M-H.

(2)foo%bar%
 Check if the string starts with 'foo' and search for 'bar' by B-M-H.

(3)foo%bar%baz
 Check if the string starts with 'foo' and string ends with 'baz' and
 search for 'bar' by B-M-H.

> Looking at MatchText() in like.c, there is this piece of code:
>
> >   else if (*p == '%')
> >   {
> >   charfirstpat;
> >
> >   /*
> >* % processing is essentially a search for a
text position at
> >* which the remainder of the text matches the
remainder of the
> >* pattern, using a recursive call to check each
potential match.
> >*
> >* If there are wildcards immediately following
the %, we can skip
> >* over them first, using the idea that any
sequence of N _'s and
> >* one or more %'s is equivalent to N _'s and one
% (ie, it will
> >* match any sequence of at least N text
characters).  In this way
> >* we will always run the recursive search loop
using a pattern
> >* fragment that begins with a literal
character-to-match, thereby
> >* not recursing more than we have to.
> >*/
> >   NextByte(p, plen);
> >
> >   while (plen > 0)
> >   {
> >   if (*p == '%')
> >   NextByte(p, plen);
> >   else if (*p == '_')
> >   {
> >   /* If not enough text left to
match the pattern, ABORT */
> >   if (tlen <= 0)
> >   return LIKE_ABORT;
> >   NextChar(t, tlen);
> >   NextByte(p, plen);
> >   }
> >   else
> >   break;  /* Reached a
non-wildcard pattern char */
> >   }
>
> Could we use B-M-H to replace that piece of code?

For example, in a pattern such as %foo%bar%, it is possible to first search
for 'foo' by B-M-H, and then search for 'bar' by B-M-H. It would be nice if
such a
process could be generalized to handle various LIKE search patterns.

> How does the performance compare with regular expressions? Would it be
> possible to use this for trivial regular expressions too? Or could we
> speed up the regexp engine to take advantage of B-M-H, and use it for
> LIKE? Or something like that?

I think regular expressions in postgresql is slower than LIKE.
I compared it with the following two SQLs.

(1)LIKE: execution time is about 0.8msec
select count(*) from pg_catalog.pg_description d where d.description like
'%greater%';

(2)regular expression: execution time is about 3.1 msec
select count(*) from pg_catalog.pg_description d where d.description ~
'greater';

For trivial regular expressions, it may be better to use LIKE.

> > I have measured the performance with the following query.
>
> Setting up the B-M-H table adds some initialization overhead, so this
> would be a loss for cases where the LIKE is ex

Re: Consistently use the function name CreateCheckPoint instead of CreateCheckpoint in code comments

2022-01-14 Thread Amit Kapila

On Fri, Jan 14, 2022 at 8:55 AM Bharath Rupireddy
 wrote:
>
> The function CreateCheckPoint is specified as CreateCheckpoint in some
> of the code comments whereas in other places it is correctly
> mentioned. Attaching a tiny patch to use CreateCheckPoint consistently
> across code comments.
>

LGTM. I'll take care of this unless someone thinks otherwise.

-- 
With Regards,
Amit Kapila.

Re: Undocumented error

2022-01-14 Thread Tomas Vondra


Hi,

On 1/14/22 00:02, Petar Dambovaliev wrote:

Hello,

I am using the libpq to consume a replication slot.
Very rarely, i would get a very strange error that i can't find any 
information on.
It is not mentioned in any documentation and i don't know under what 
conditions it triggers.

Any help is appreciated.

The function i am calling is : `PQgetCopyData`
the error code is `-1` and the error text is `invalid ordering of 
speculative insertion changes`


Well, that's strange. I see the error message in the source code, but it 
kinda implies it's something that should not happen. So either there's a 
bug in how we WAL log this stuff, or maybe the decoding is wrong. In any 
case it has to be very rare issue, because the code is like this since 
2018 and there have been 0 complaints so far.


Which Postgres version is this, exactly? Was the WAL generated by that 
same version, or did you update/upgrade recently?


Are you able to reproduce the issue? Do you know what did the 
transaction that generated this WAL?


It'd be helpful to see the WAL that trigger this issue - presumably the 
error message includes the LSN of the record at which this fails, so use 
pg_waldump to dump that segment and show us a sufficiently large chunk 
from before that LSN. Not sure how much, because I don't know if you use 
subtransactions, how long the transactions are, etc.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: generic plans and "initial" pruning

2022-01-14 Thread Amit Langote

On Thu, Jan 6, 2022 at 3:45 PM Amul Sul  wrote:
> Here are few comments for v1 patch:

Thanks Amul.  I'm thinking about Robert's latest comments addressing
which may need some rethinking of this whole design, but I decided to
post a v2 taking care of your comments.

> +   /* Caller error if we get here without contains_init_steps */
> +   Assert(pruneinfo->contains_init_steps);
>
> -   prunedata = prunestate->partprunedata[i];
> -   pprune = &prunedata->partrelprunedata[0];
>
> -   /* Perform pruning without using PARAM_EXEC Params */
> -   find_matching_subplans_recurse(prunedata, pprune, true, &result);
> +   if (parentrelids)
> +   *parentrelids = NULL;
>
> You got two blank lines after Assert.

Fixed.

> --
>
> +   /* Set up EState if not in the executor proper. */
> +   if (estate == NULL)
> +   {
> +   estate = CreateExecutorState();
> +   estate->es_param_list_info = params;
> +   free_estate = true;
> }
>
> ... [Skip]
>
> +   if (free_estate)
> +   {
> +   FreeExecutorState(estate);
> +   estate = NULL;
> }
>
> I think this work should be left to the caller.

Done.  Also see below...

> /*
>  * Stuff that follows matches exactly what ExecCreatePartitionPruneState()
>  * does, except we don't need a PartitionPruneState here, so don't call
>  * that function.
>  *
>  * XXX some refactoring might be good.
>  */
>
> +1, while doing it would be nice if foreach_current_index() is used
> instead of the i & j sequence in the respective foreach() block, IMO.

Actually, I rewrote this part quite significantly so that most of the
code remains in its existing place.  I decided to let
GetLockableRelations_walker() create a PartitionPruneState and pass
that to ExecFindInitialMatchingSubPlans() that is now left more or
less as is.  Instead, ExecCreatePartitionPruneState() is changed to be
callable from outside the executor.

The temporary EState is no longer necessary.  ExprContext,
PartitionDirectory, etc. are now managed in the caller,
GetLockableRelations_walker().

> --
>
> +   while ((i = bms_next_member(validsubplans, i)) >= 0)
> +   {
> +   Plan   *subplan = list_nth(subplans, i);
> +
> +   context->relations =
> +   bms_add_members(context->relations,
> +   get_plan_scanrelids(subplan));
> +   }
>
> I think instead of get_plan_scanrelids() the
> GetLockableRelations_worker() can be used; if so, then no need to add
> get_plan_scanrelids() function.

You're right, done.

> --
>
>  /* Nodes containing prunable subnodes. */
> +   case T_MergeAppend:
> +   {
> +   PlannedStmt *plannedstmt = context->plannedstmt;
> +   List   *rtable = plannedstmt->rtable;
> +   ParamListInfo params = context->params;
> +   PartitionPruneInfo *pruneinfo;
> +   Bitmapset  *validsubplans;
> +   Bitmapset  *parentrelids;
>
> ...
> if (pruneinfo && pruneinfo->contains_init_steps)
> {
> int i;
> ...
>return false;
> }
> }
> break;
>
> Most of the declarations need to be moved inside the if-block.

Done.

> Also, initially, I was a bit concerned regarding this code block
> inside GetLockableRelations_worker(), what if (pruneinfo &&
> pruneinfo->contains_init_steps) evaluated to false? After debugging I
> realized that plan_tree_walker() will do the needful -- a bit of
> comment would have helped.

You're right.  Added a dummy else {} block with just the comment saying so.

> +   case T_CustomScan:
> +   foreach(lc, ((CustomScan *) plan)->custom_plans)
> +   {
> +   if (walker((Plan *) lfirst(lc), context))
> +   return true;
> +   }
> +   break;
>
> Why not plan_walk_members() call like other nodes?

Makes sense, done.

Again, most/all of this patch might need to be thrown away, but here
it is anyway.

--
Amit Langote
EDB: http://www.enterprisedb.com


v2-0001-Teach-AcquireExecutorLocks-to-acquire-fewer-locks.patch
Description: Binary data

Re: Column Filtering in Logical Replication

2022-01-14 Thread Alvaro Herrera

On 2022-Jan-14, Amit Kapila wrote:

> 1. Replica Identity handling: Currently the column filter patch gives
> an error during create/alter subscription if the specified column list
> is invalid (Replica Identity columns are missing). It also gives an
> error if the user tries to change the replica identity. However, it
> doesn't deal with cases where the user drops and adds a different
> primary key that has a different set of columns which can lead to
> failure during apply on the subscriber.

Hmm, yeah, I suppose we should check that the primary key is compatible
with the column list in all publications.  (I wonder what happens in the
interim, that is, what happens to tuples modified after the initial PK
is dropped and before the new PK is installed.  Are these considered to
have "replica identiy nothing"?)

> I think another issue w.r.t column filter patch is that even while
> creating publication (even for 'insert' publications) it should check
> that all primary key columns must be part of published columns,
> otherwise, it can fail while applying on subscriber as it will try to
> insert NULL for the primary key column.

I'm not so sure about the primary key aspects, actually; keep in mind
that the replica can have a different table definition, and it might
have even a completely different primary key.  I think this part is up
to the user to set up correctly; we have enough with just trying to make
the replica identity correct.

> 2. Handling of partitioned tables vs. Replica Identity (RI): When
> adding a partitioned table with a column list to the publication (with
> publish_via_partition_root = false), we should check the Replica
> Identity of all its leaf partition as the RI on the partition is the
> one actually takes effect when publishing DML changes. We need to
> check RI while attaching the partition as well, as the newly added
> partitions will automatically become part of publication if the
> partitioned table is part of the publication. If we don't do this the
> later deletes/updates can fail.

Hmm, yeah.

> 3. Tablesync.c handling: Ideally, it would be good if we have a single
> query to fetch both row filters and column filters but even if that is
> not possible in the first version, the behavior should be same for
> both queries w.r.t partitioned tables, For ALL Tables and For All
> Tables In Schema cases.
> 
> Currently, the column filter patch doesn't seem to respect For ALL
> Tables and For All Tables In Schema cases, basically, it just copies
> the columns it finds through some of the publications even if one of
> the publications is defined as For All Tables. The row filter patch
> ignores the row filters if one of the publications is defined as For
> ALL Tables and For All Tables In Schema.

Oh, yeah, if a table appears in two publications and one of them is ALL
TABLES [IN SCHEMA], then we don't consider it as an all-columns
publication.  You're right, that should be corrected.

> We have done some testing w.r.t above cases with both patches and my
> colleague will share the results.

Great, thanks.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: A test for replay of regression tests

2022-01-14 Thread Thomas Munro

On Sat, Jan 15, 2022 at 12:39 AM Thomas Munro  wrote:
> On Wed, Dec 22, 2021 at 11:41 AM Thomas Munro  wrote:
> > Rebased and updated based on feedback.  Responses to multiple emails below:
>
> Pushed, but the build farm doesn't like it with a couple of different
> ways of failing.  I'll collect some results and revert shortly.

Problems:

1. The way I invoke pg_regress doesn't seem to work correctly under
the build farm client (though it works fine under make), not sure why
yet, but reproduced here and will figure it out tomorrow.
2. The new test in src/test/modules/t/002_tablespace.pl apparently has
some path-related problem on Windows that I didn't know about, because
CI isn't even running the TAP tests under src/test/module/test_misc
(and various other locations), but the BF is :-/  And I was happy
because modulescheck was passing...

I reverted the two commits responsible for those failures to keep the
build farm green, and I'll try to fix them tomorrow.

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-01-14 Thread Bharath Rupireddy

On Thu, Jan 13, 2022 at 11:50 AM Bharath Rupireddy
 wrote:
> Thanks.  IMO, the following format of logging is better, so attaching
> the v2-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch as
> .patch
>
> 2021-12-28 02:52:24.464 UTC [2394396] LOG:  checkpoint completed at
> location=0/212FFC8 with REDO start location=0/212FF90: wrote 451
> buffers (2.8%); 0 WAL file(s) added, 0 removed, 1 recycled;
> write=0.012 s, sync=0.032 s, total=0.071 s; sync files=6,
> longest=0.022 s, average=0.006 s; distance=6272 kB, estimate=6272 kB

One of the test cases was failing with the above style of the log
message, changing "checkpoint complete" to "checkpoint completed at
location" doesn't seem to be a better idea. It looks like the users or
the log monitoring tools might be using the same text "checkpoint
complete", therefore I don't want to break that. Here's the v3 patch
that I think will work better. Please review.

Regards,
Bharath Rupireddy.
From b6f2a53bcaea2d37e317bc3d9922893c10712d3a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 14 Jan 2022 12:55:30 +
Subject: [PATCH v3] Add checkpoint and redo LSN to LogCheckpointEnd log
 message

It is useful (for debugging purposes) if the checkpoint end message
has the checkpoint LSN and REDO LSN. It gives more context while
analyzing checkpoint-related issues. The pg_controldata gives the
last checkpoint LSN and REDO LSN, but having this info alongside
the log message helps analyze issues that happened previously,
connect the dots and identify the root cause.
---
 src/backend/access/transam/xlog.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c9d4cbf3ff..b2eabc72f9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8920,11 +8920,14 @@ LogCheckpointEnd(bool restartpoint)
 
 	if (restartpoint)
 		ereport(LOG,
-(errmsg("restartpoint complete: wrote %d buffers (%.1f%%); "
+(errmsg("restartpoint complete: location=%X/%X, REDO start location=%X/%X; "
+		"wrote %d buffers (%.1f%%); "
 		"%d WAL file(s) added, %d removed, %d recycled; "
 		"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
 		"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
 		"distance=%d kB, estimate=%d kB",
+		LSN_FORMAT_ARGS(ControlFile->checkPoint),
+		LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo),
 		CheckpointStats.ckpt_bufs_written,
 		(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
 		CheckpointStats.ckpt_segs_added,
@@ -8940,11 +8943,14 @@ LogCheckpointEnd(bool restartpoint)
 		(int) (CheckPointDistanceEstimate / 1024.0;
 	else
 		ereport(LOG,
-(errmsg("checkpoint complete: wrote %d buffers (%.1f%%); "
+(errmsg("checkpoint complete: location=%X/%X, REDO start location=%X/%X; "
+		"wrote %d buffers (%.1f%%); "
 		"%d WAL file(s) added, %d removed, %d recycled; "
 		"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
 		"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
 		"distance=%d kB, estimate=%d kB",
+		LSN_FORMAT_ARGS(ControlFile->checkPoint),
+		LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo),
 		CheckpointStats.ckpt_bufs_written,
 		(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
 		CheckpointStats.ckpt_segs_added,
-- 
2.25.1

RE: Column Filtering in Logical Replication

2022-01-14 Thread tanghy.f...@fujitsu.com

On Friday, January 14, 2022 7:52 PM Amit Kapila  wrote:
> 
> On Wed, Jan 12, 2022 at 2:40 AM Justin Pryzby  wrote:
> >
> > Is there any coordination between the "column filter" patch and the "row
> > filter" patch ?  Are they both on track for PG15 ?  Has anybody run them
> > together ?
> >
> 
> The few things where I think we might need to define some common
> behavior are as follows:
> 

I tried some cases about the points you mentions, which can be taken as
reference.

> 1. Replica Identity handling: Currently the column filter patch gives
> an error during create/alter subscription if the specified column list
> is invalid (Replica Identity columns are missing). It also gives an
> error if the user tries to change the replica identity. However, it
> doesn't deal with cases where the user drops and adds a different
> primary key that has a different set of columns which can lead to
> failure during apply on the subscriber.
> 

An example for this scenario:
-- publisher --
create table tbl(a int primary key, b int);
create publication pub for table tbl(a);
alter table tbl drop CONSTRAINT tbl_pkey;
alter table tbl add primary key (b);

-- subscriber --
create table tbl(a int, b int);
create subscription sub connection 'port=5432 dbname=postgres' publication pub;

-- publisher --
insert into tbl values (1,1);

-- subscriber --
postgres=# select * from tbl;
 a | b
---+---
 1 |
(1 row)

update tbl set b=1 where a=1;
alter table tbl add primary key (b);

-- publisher --
delete from tbl;


The subscriber reported the following error message and DELETE failed in 
subscriber.
ERROR:  publisher did not send replica identity column expected by the logical 
replication target relation "public.tbl"
CONTEXT:  processing remote data during "DELETE" for replication target 
relation "public.tbl" in transaction 723 at 2022-01-14 13:11:51.514261+08

-- subscriber
postgres=# select * from tbl;
 a | b
---+---
 1 | 1
(1 row)

> I think another issue w.r.t column filter patch is that even while
> creating publication (even for 'insert' publications) it should check
> that all primary key columns must be part of published columns,
> otherwise, it can fail while applying on subscriber as it will try to
> insert NULL for the primary key column.
> 

For example:
-- publisher --
create table tbl(a int primary key, b int);
create publication pub for table tbl(a);
alter table tbl drop CONSTRAINT tbl_pkey;
alter table tbl add primary key (b);

-- subscriber --
create table tbl(a int, b int primary key);
create subscription sub connection 'port=5432 dbname=postgres' publication pub;

-- publisher --
insert into tbl values (1,1);

The subscriber reported the following error message and INSERT failed in 
subscriber.
ERROR:  null value in column "b" of relation "tbl" violates not-null constraint
DETAIL:  Failing row contains (1, null).

-- subscriber --
postgres=# select * from tbl;
 a | b
---+---
(0 rows)

> 2. Handling of partitioned tables vs. Replica Identity (RI): When
> adding a partitioned table with a column list to the publication (with
> publish_via_partition_root = false), we should check the Replica
> Identity of all its leaf partition as the RI on the partition is the
> one actually takes effect when publishing DML changes. We need to
> check RI while attaching the partition as well, as the newly added
> partitions will automatically become part of publication if the
> partitioned table is part of the publication. If we don't do this the
> later deletes/updates can fail.
> 

Please see the following 3 cases about partition.

Case1 (publish a parent table which has a partition table):

-- publisher --
create table parent (a int, b int) partition by range (a);
create table child partition of parent default;
create unique INDEX ON child (a,b);
alter table child alter a set not null;
alter table child alter b set not null;
alter table child replica identity using INDEX child_a_b_idx;
create publication pub for table parent(a) 
with(publish_via_partition_root=false);

-- subscriber --
create table parent (a int, b int) partition by range (a);
create table child partition of parent default;
create subscription sub connection 'port=5432 dbname=postgres' publication pub;

-- publisher --
insert into parent values (1,1);

-- subscriber --
postgres=# select * from parent;
 a | b
---+---
 1 |
(1 row)

-- add RI in subscriber to avoid other errors
update child set b=1 where a=1;
create unique INDEX ON child (a,b);
alter table child alter a set not null;
alter table child alter b set not null;
alter table child replica identity using INDEX child_a_b_idx;

-- publisher --
delete from parent;

The subscriber reported the following error message and DELETE failed in 
subscriber.
ERROR:  publisher did not send replica identity column expected by the logical 
replication target relation "public.child"
CONTEXT:  processing remote data during "DELETE" for replication target 
relation "public.child" in transaction 727

Re: autovacuum: change priority of the vacuumed tables

2022-01-14 Thread Marco Garavello

Hello all,

I've just read this thread as I googled for "*how to give to autovacuum
priority on a table*"
I'm looking this because I have large table with lot of inserts/updates,
and to improve the system I want to force the first free autovacuum worker
to work on that table, of course if condition "such table is in
need-to-be-vacuumed-state (depending on *autovacuum_scale_factor*
parameter)" is satisfed.
I found that *autovacuum_scale_factor* can be specified for each table, so
I think this could do the trick, isn't it?
>From my understanding specifying it lower on the target table should give
an implicit higher priority to that table.


*Distinti Saluti / *Kind Regards

*Marco Garavello* | IoT Cloud Operations Specialist



Il giorno ven 14 gen 2022 alle ore 12:55 Tomas Vondra <
tomas.von...@2ndquadrant.com> ha scritto:

>
> On 03/03/2018 10:21 PM, Jim Nasby wrote:
> > On 3/3/18 2:53 PM, Tomas Vondra wrote:
> >> That largely depends on what knobs would be exposed. I'm against adding
> >> some low-level knobs that perhaps 1% of the users will know how to tune,
> >> and the rest will set it incorrectly. Some high-level options that would
> >> specify the workload type might work, but I have no idea about details.
> >
> > Not knowing about details is why we've been stuck here for years: it's
> > not terribly obvious how to create a scheduler that is going to work in
> > all situations. Current autovac is great for 80% of situations, but it
> > simply doesn't handle the remaining 20% by itself. Once you're pushing
> > your IO limits you *have* to start scheduling manual vacuums for any
> > critical tables.
> >
> > At least if we exposed some low level ability to control autovac
> > workers then others could create tools to improve the situation.
> > Currently that's not possible because manual vacuum lacks features
> > that autovac has.
> >
>
> I have my doubts about both points - usefulness of low-level controls
> and viability of tools built on them.
>
> Firstly, my hunch is that if we knew what low-level controls to expose,
> it would pretty much how to implement the tool internally. Exposing
> something just because you home someone will find a use for that seems
> like a dead-end to me. So, which controls would you expose?
>
> Second, all the statistics used to decide which tables need vacuuming
> are already exposed, and we have things like bgworkers etc. So you could
> go and write a custom autovacuum today - copy the autovacuum code, tweak
> the scheduling, done. Yet no such tool emerged yet. Why is that?
>
> >>> One fairly simple option would be to simply replace the logic
> >>> that currently builds a worker's table list with running a query
> >>> via SPI. That would allow for prioritizing important tables. It
> >>> could also reduce the problem of workers getting "stuck" on a ton
> >>> of large tables by taking into consideration the total number of
> >>> pages/tuples a list contains.
> >>>
> >> I don't see why SPI would be needed to do that, i.e. why couldn't
> >> we implement such prioritization with the current approach. Another
> >> thing
> >
> > Sure, it's just a SMOC. But most of the issue here is actually a
> > query problem. I suspect that the current code would actually shrink
> > if converted to SPI. In any case, I'm not wed to that idea.
> >
>
> I disagree this a "query problem" - it certainly is not the case that
> simply prioritizing the tables differently will make a difference. Or
> more precisely, it certainly does not solve the autovacuum issues I'm
> thinking about. I have no idea which issues are you trying to solve,
> because you haven't really described those.
>
> >> is I really doubt prioritizing "important tables" is an good solution,
> >> as it does not really guarantee anything.
> >
> > If by "important" you mean small tables with high update rates,
> > prioritizing those actually would help as long as you have free workers.
> > By itself it doesn't gain all that much though.
> >
>
> Which is why I mentioned we could have separate pools of autovacuum
> workers - one for regular tables, one for "important" ones.
>
> >>> A more fine-grained approach would be to have workers make a new
> >>> selection after every vacuum they complete. That would provide
> >>> the ultimate in control, since you'd be able to see exactly what
> >>> all the other workers are doing.
> >> That was proposed earlier in this thread, and the issue is it may
> >> starve all the other tables when the "important" tables need
> >> cleanup all the time.
> >
> > There's plenty of other ways to shoot yourself in the foot in that
> > regard already. We can always have safeguards in place if we get too
> > close to wrap-around, just like we currently do.
>
> I haven't mentioned wraparound at all.
>
> My point is that if you entirely ignore some tables because "important"
> ones need constant cleanup (saturating the total autovacuum capacity),
> then you'll end up with extreme bloat in those other tables. And th

Re: [PATCH] Allow multiple recursive self-references

2022-01-14 Thread Peter Eisentraut


On 11.01.22 12:33, Denis Hirn wrote:

I have been studying this a bit more.  I don't understand your argument here.
Why would this query have different semantics than, say

WITH RECURSIVE t(n) AS (
 VALUES (1)
   UNION ALL
 VALUES (2)
   UNION ALL
 SELECT n+1 FROM t WHERE n < 100
) SELECT * FROM t LIMIT 100;

The order of UNION branches shouldn't be semantically relevant.


WITH RECURSIVE (ab)uses the (typically associative and commutative) UNION [ALL] 
clause,
and fundamentally changes the semantics – associativity and commutativity no 
longer apply.
I think your confusion stems from this ambiguity.


The language in the SQL standard does not support this statement.  There 
is nothing in there that says that certain branches of the UNION in a 
recursive query mean certain things.  In fact, it doesn't even require 
the query to contain a UNION at all.  It just says to iterate on 
evaluating the query until a fixed point is reached.  I think this 
supports my claim that the associativity and commutativity of a UNION in 
a recursive query still apply.


This is all very complicated, so I don't claim this to be authoritative, 
but I just don't see anything in the spec that supports what you are saying.

Re: Partial aggregates pushdown

2022-01-14 Thread Julien Rouhaud

Hi,

On Mon, Nov 15, 2021 at 04:01:51PM +0300, Alexander Pyhalov wrote:
> 
> I've updated patch - removed catversion dump.

This version of the patchset doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3369.log
=== Applying patches on top of PostgreSQL commit ID 
025b920a3d45fed441a0a58fdcdf05b321b1eead ===
=== applying patch ./0001-Partial-aggregates-push-down-v07.patch
patching file src/bin/pg_dump/pg_dump.c
Hunk #1 succeeded at 13111 (offset -965 lines).
Hunk #2 FAILED at 14167.
Hunk #3 succeeded at 13228 (offset -961 lines).
Hunk #4 succeeded at 13319 (offset -966 lines).
1 out of 4 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej

Could you send a rebased version?  In the meantime I will switch the cf entry
to Waiting on Author.

Re: Windows: Wrong error message at connection termination

2022-01-14 Thread Sergey Shinderuk


On 14.01.2022 13:01, Sergey Shinderuk wrote:
When the timeout expires, the server sends the error message and 
gracefully closes the connection by sending a FIN.  Later, psql sends 
another query to the server, and the server responds with a RST.  But 
now recv() returns WSAECONNABORTED(10053) instead of WSAECONNRESET(10054).


On the other hand, I cannot reproduce this behavior with a remote server 
even if pause psql just before the recv() call to let the RST win the race.


So I get:

postgres=# set idle_session_timeout = '1s';
recv() returned 15 errno 0
SET
recv() returned -1 errno 10035 (WSAEWOULDBLOCK)
postgres=# select 1;
recv() returned 116 errno 0
recv() returned 0 errno 0
recv() returned 0 errno 0
FATAL:  terminating connection due to idle-session timeout
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

recv() signals EOF like on Unix.

Here I connected from a Windows virtual machine to the macOS host, but 
the Wireshark dump looks the same (there is a RST) as for a localhost 
connection.


Is this "error-eating" behavior of RST on Windows specific only to 
localhost connections?


--
Sergey Shinderukhttps://postgrespro.com/

Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

2022-01-14 Thread Julien Rouhaud

Hi,

On Thu, Sep 09, 2021 at 02:12:08AM +, houzj.f...@fujitsu.com wrote:
> 
> Attach new version patch set which remove the workaround patch.

This version of the patchset doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3143.log
=== Applying patches on top of PostgreSQL commit ID 
a18b6d2dc288dfa6e7905ede1d4462edd6a8af47 ===
=== applying patch ./v19-0001-CREATE-ALTER-TABLE-PARALLEL-DML.patch
[...]
patching file src/backend/commands/tablecmds.c
Hunk #1 FAILED at 40.
Hunk #2 succeeded at 624 (offset 21 lines).
Hunk #3 succeeded at 670 (offset 21 lines).
Hunk #4 succeeded at 947 (offset 19 lines).
Hunk #5 succeeded at 991 (offset 19 lines).
Hunk #6 succeeded at 4256 (offset 40 lines).
Hunk #7 succeeded at 4807 (offset 40 lines).
Hunk #8 succeeded at 5217 (offset 40 lines).
Hunk #9 succeeded at 6193 (offset 42 lines).
Hunk #10 succeeded at 19278 (offset 465 lines).
1 out of 10 hunks FAILED -- saving rejects to file 
src/backend/commands/tablecmds.c.rej
[...]
patching file src/bin/pg_dump/pg_dump.c
Hunk #1 FAILED at 6253.
Hunk #2 FAILED at 6358.
Hunk #3 FAILED at 6450.
Hunk #4 FAILED at 6503.
Hunk #5 FAILED at 6556.
Hunk #6 FAILED at 6609.
Hunk #7 FAILED at 6660.
Hunk #8 FAILED at 6708.
Hunk #9 FAILED at 6756.
Hunk #10 FAILED at 6803.
Hunk #11 FAILED at 6872.
Hunk #12 FAILED at 6927.
Hunk #13 succeeded at 15524 (offset -1031 lines).
12 out of 13 hunks FAILED -- saving rejects to file 
src/bin/pg_dump/pg_dump.c.rej
[...]
patching file src/bin/psql/describe.c
Hunk #1 succeeded at 1479 (offset -177 lines).
Hunk #2 succeeded at 1493 (offset -177 lines).
Hunk #3 succeeded at 1631 (offset -241 lines).
Hunk #4 succeeded at 3374 (offset -277 lines).
Hunk #5 succeeded at 3731 (offset -310 lines).
Hunk #6 FAILED at 4109.
1 out of 6 hunks FAILED -- saving rejects to file src/bin/psql/describe.c.rej

Could you send a rebased version?  In the meantime I will switch the entry to
Waiting on Author.

Re: Skipping logical replication transactions on subscriber side

2022-01-14 Thread vignesh C

On Fri, Jan 14, 2022 at 7:49 AM Masahiko Sawada  wrote:
>
> On Wed, Jan 12, 2022 at 11:10 PM vignesh C  wrote:
> >
> > On Wed, Jan 12, 2022 at 11:32 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Jan 12, 2022 at 12:21 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Wed, Jan 12, 2022 at 5:49 AM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Tue, Jan 11, 2022 at 7:08 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On second thought, the same is true for other cases, for example,
> > > > > > > preparing the transaction and clearing skip_xid while handling a
> > > > > > > prepare message. That is, currently we don't clear skip_xid while
> > > > > > > handling a prepare message but do that while handling 
> > > > > > > commit/rollback
> > > > > > > prepared message, in order to avoid the worst case. If we do both
> > > > > > > while handling a prepare message and the server crashes between 
> > > > > > > them,
> > > > > > > it ends up that skip_xid is cleared and the transaction will be
> > > > > > > resent, which is identical to the worst-case above.
> > > > > > >
> > > > > >
> > > > > > How are you thinking to update the skip xid before prepare? If we do
> > > > > > it in the same transaction then the changes in the catalog will be
> > > > > > part of the prepared xact but won't be committed. Now, say if we do 
> > > > > > it
> > > > > > after prepare, then the situation won't be the same because after
> > > > > > restart the same xact won't appear again.
> > > > >
> > > > > I was thinking to commit the catalog change first in a separate
> > > > > transaction while not updating origin LSN and then prepare an empty
> > > > > transaction while updating origin LSN.
> > > > >
> > > >
> > > > But, won't it complicate the handling if in the future we try to
> > > > enhance this API such that it skips partial changes like skipping only
> > > > for particular relation(s) or particular operations as discussed
> > > > previously in this thread?
> > >
> > > Right. I was thinking that if we accept the situation that the user
> > > has to set skip_xid again in case of the server crashes, we might be
> > > able to accept also the situation that the user has to clear skip_xid
> > > in a case of the server crashes. But it seems the former is less
> > > problematic.
> > >
> > > I've attached an updated patch that incorporated all comments I got so 
> > > far.
> >
> > Thanks for the updated patch, few comments:
>
> Thank you for the comments!
>
> > 1) Currently skip xid is not displayed in describe subscriptions, can
> > we include it too:
> > \dRs+  sub1
> > List of 
> > subscriptions
> >  Name |  Owner  | Enabled | Publication | Binary | Streaming | Two
> > phase commit | Synchronous commit |Conninfo
> > --+-+-+-++---+--++
> >  sub1 | vignesh | t   | {pub1}  | f  | f | e
> >  | off| dbname=postgres host=localhost
> > (1 row)
> >
> > 2) This import "use PostgreSQL::Test::Utils;" is not required:
> > +# Tests for skipping logical replication transactions.
> > +use strict;
> > +use warnings;
> > +use PostgreSQL::Test::Cluster;
> > +use PostgreSQL::Test::Utils;
> > +use Test::More tests => 6;
> >
> > 3) Some of the comments uses a punctuation mark and some of them does
> > not use, Should we keep it consistent:
> > +# Wait for worker error
> > +$node_subscriber->poll_query_until(
> > +   'postgres',
> >
> > +# Set skip xid
> > +$node_subscriber->safe_psql(
> > +   'postgres',
> >
> > +# Create publisher node.
> > +my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
> > +$node_publisher->init(allows_streaming => 'logical');
> >
> >
> > +# Create subscriber node.
> > +my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
> >
> > 4) Should this be changed:
> > + * True if we are skipping all data modification changes (INSERT,
> > UPDATE, etc.) of
> > + * the specified transaction at MySubscription->skipxid.  Once we
> > start skipping
> > + * changes, we don't stop it until the we skip all changes of the
> > transaction even
> > + * if pg_subscription is updated that and MySubscription->skipxid
> > gets changed or
> > to:
> > + * True if we are skipping all data modification changes (INSERT,
> > UPDATE, etc.) of
> > + * the specified transaction at MySubscription->skipxid.  Once we
> > start skipping
> > + * changes, we don't stop it until we skip all changes of the transaction 
> > even
> > + * if pg_subscription is updated that and MySubscription->skipxid
> > gets changed or
> >
> > In "stop it until the we skip all changes", here the is not required.
> >
>
> I agree with all the comments above. I've attached an updated patch.

Thanks

Re: Logical replication timeout problem

2022-01-14 Thread Amit Kapila

On Fri, Jan 14, 2022 at 3:47 PM Fabrice Chapuis  wrote:
>
> If I can follow you, I have to make the following changes:
>

No, not like that but we can try that way as well to see if that helps
to avoid your problem. Am, I understanding correctly even after
modification, you are seeing the problem. Can you try by calling
WalSndKeepaliveIfNecessary() instead of WalSndKeepalive()?

-- 
With Regards,
Amit Kapila.

Re: Column Filtering in Logical Replication

2022-01-14 Thread Amit Kapila

On Wed, Jan 12, 2022 at 2:40 AM Justin Pryzby  wrote:
>
> Is there any coordination between the "column filter" patch and the "row
> filter" patch ?  Are they both on track for PG15 ?  Has anybody run them
> together ?
>

The few things where I think we might need to define some common
behavior are as follows:

1. Replica Identity handling: Currently the column filter patch gives
an error during create/alter subscription if the specified column list
is invalid (Replica Identity columns are missing). It also gives an
error if the user tries to change the replica identity. However, it
doesn't deal with cases where the user drops and adds a different
primary key that has a different set of columns which can lead to
failure during apply on the subscriber.

I think another issue w.r.t column filter patch is that even while
creating publication (even for 'insert' publications) it should check
that all primary key columns must be part of published columns,
otherwise, it can fail while applying on subscriber as it will try to
insert NULL for the primary key column.

2. Handling of partitioned tables vs. Replica Identity (RI): When
adding a partitioned table with a column list to the publication (with
publish_via_partition_root = false), we should check the Replica
Identity of all its leaf partition as the RI on the partition is the
one actually takes effect when publishing DML changes. We need to
check RI while attaching the partition as well, as the newly added
partitions will automatically become part of publication if the
partitioned table is part of the publication. If we don't do this the
later deletes/updates can fail.

All these cases are dealt with in row filter patch because of the
on-the-fly check which means we check the validation of columns in row
filters while actual operation update/delete via
CheckCmdReplicaIdentity and cache the result of same for future use.
This is inline with existing checks of RI vs. operations on tables.
The primary reason for this was we didn't want to handle validation of
row filters at so many places.

3. Tablesync.c handling: Ideally, it would be good if we have a single
query to fetch both row filters and column filters but even if that is
not possible in the first version, the behavior should be same for
both queries w.r.t partitioned tables, For ALL Tables and For All
Tables In Schema cases.

Currently, the column filter patch doesn't seem to respect For ALL
Tables and For All Tables In Schema cases, basically, it just copies
the columns it finds through some of the publications even if one of
the publications is defined as For All Tables. The row filter patch
ignores the row filters if one of the publications is defined as For
ALL Tables and For All Tables In Schema.

For row filter patch, if the publication contains a partitioned table,
the publication parameter publish_via_partition_root determines if it
uses the partition row filter (if the parameter is false, the default)
or the root partitioned table row filter and this is taken care of
even during the initial tablesync.

For column filter patch, if the publication contains a partitioned
table, it seems that it finds all columns that the tables in its
partition tree specified in the publications, whether
publish_via_partition_root is true or false.

We have done some testing w.r.t above cases with both patches and my
colleague will share the results.

-- 
With Regards,
Amit Kapila.

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-14 Thread Maxim Orlov

The code seems to be in good condition. All the tests are running ok with
no errors.

I like the whole idea of shifting additional checkpointer jobs as much
as possible to another worker. In my view, it is more appropriate to call
this worker "bg cleaner" or "bg file cleaner" or smth.

It could be useful for systems with high load, which may deal with deleting
many files at once, but I'm not sure about "small" installations. Extra bg
worker need more resources to do occasional deletion of small amounts of
files. I really do not know how to do it better, maybe to have two
different code paths switched by GUC?

Should we also think about adding WAL preallocation into custodian worker
from the patch "Pre-alocationg WAL files" [1] ?

[1]
https://www.postgresql.org/message-id/flat/20201225200953.jjkrytlrzojbn...@alap3.anarazel.de
-- 
Best regards,
Maxim Orlov.

Re: A test for replay of regression tests

2022-01-14 Thread Thomas Munro

On Wed, Dec 22, 2021 at 11:41 AM Thomas Munro  wrote:
> Rebased and updated based on feedback.  Responses to multiple emails below:

Pushed, but the build farm doesn't like it with a couple of different
ways of failing.  I'll collect some results and revert shortly.

Re: Schema variables - new implementation for Postgres 15

2022-01-14 Thread Pavel Stehule

pá 14. 1. 2022 v 11:49 odesílatel Marcos Pegoraro 
napsal:

> For example, if I define a variable called "relkind", then psql's \sv
>> meta-command is broken because the query it performs can't distinguish
>> between the column and the variable.
>>
>> If variables use : as prefix you´ll never have these conflicts.
>
> select relkind from pg_class where relkind = :relkind
>

This syntax is used for client side variables already.

This is similar to MSSQL or MySQL philosophy. But the disadvantage of this
method is the impossibility of modularization - all variables are in one
space (although there are nested scopes).

The different syntax disallows any collision well, it is far to what is
more usual standard in this area. And if we introduce special syntax, then
there is no way back. We cannot use :varname - this syntax is used already,
but we can use, theoretically, @var or $var. But, personally, I don't want
to use it, if there is possibility to do without it. The special syntax can
be used maybe for direct access to function arguments, or for not
persistent (temporal) session variables like MSSQL. There is a relatively
big space of functionality for session variables, and the system that I
used is based on ANSI SQL/PSM or DB2 and it is near to Oracle. It has a lot
of advantages for writing stored procedures. On other hand, for adhoc work
the session variables like MySQL (without declaration) can be handy, so I
don't want to use (and block) syntax that can be used for something
different.




>
>

Re: Schema variables - new implementation for Postgres 15

2022-01-14 Thread Julien Rouhaud

Hi,

On Fri, Jan 14, 2022 at 07:49:09AM -0300, Marcos Pegoraro wrote:
> >
> > For example, if I define a variable called "relkind", then psql's \sv
> > meta-command is broken because the query it performs can't distinguish
> > between the column and the variable.
> >
> If variables use : as prefix you´ll never have these conflicts.
> 
> select relkind from pg_class where relkind = :relkind

This is already used by psql client side variables, so this is not an option.

Re: Schema variables - new implementation for Postgres 15

2022-01-14 Thread Marcos Pegoraro

>
> For example, if I define a variable called "relkind", then psql's \sv
> meta-command is broken because the query it performs can't distinguish
> between the column and the variable.
>
> If variables use : as prefix you´ll never have these conflicts.

select relkind from pg_class where relkind = :relkind

Re: Printing backtrace of postgres processes

2022-01-14 Thread Bharath Rupireddy

On Sat, Nov 20, 2021 at 11:50 AM Bharath Rupireddy
 wrote:
>
> On Fri, Nov 19, 2021 at 4:07 PM vignesh C  wrote:
> > The Attached v15 patch has the fixes for the same.
>
> Thanks. The v15 patch LGTM and the cf bot is happy hence marking it as RfC.

The patch was not applying because of the recent commit [1]. I took
this opportunity and tried a bunch of things without modifying the
core logic of the pg_log_backtrace feature that Vignesh has worked on.

I did following -  moved the duplicate code to a new function
CheckPostgresProcessId which can be used by pg_log_backtrace,
pg_log_memory_contexts, pg_signal_backend and pg_log_query_plan ([2]),
modified the code comments, docs and tests to be more in sync with the
commit [1], moved two of ProcessLogBacktraceInterrupt calls (archiver
and wal writer) to their respective interrupt handlers. Here's the v16
version that I've come up with.

[1]
commit 790fbda902093c71ae47bff1414799cd716abb80
Author: Fujii Masao 
Date:   Tue Jan 11 23:19:59 2022 +0900

Enhance pg_log_backend_memory_contexts() for auxiliary processes.

[2] 
https://www.postgresql.org/message-id/flat/20220114063846.7jygvulyu6g23kdv%40jrouhaud

Regards,
Bharath Rupireddy.

v16-0001-Add-function-to-log-the-backtrace-of-the-specifi.patch
Description: Binary data

Re: support for MERGE

2022-01-14 Thread Erik Rijkers





Op 13-01-2022 om 13:43 schreef Alvaro Herrera:

Apologies, there was a merge failure and I failed to notice.  Here's the
correct patch.

(This is also in github.com/alvherre/postgres/tree/merge-15)



[v6-0001-MERGE-SQL-Command-following-SQL-2016]


I read though the MERGE-docs; some typos:

'For example, given MERGE foo AS f'
'For example, given MERGE INTO foo AS f'

'that clause clause'
'that clause'

'This is same as'
'This is the same as'

'for certain type of action'
'for certain types of action'

'The MERGE allows the user'
'MERGE allows the user'
(from the paragraph 'Read Committed Isolation Level'.  Likely 
copied from the paragraph above: 'The DELETE'; but there it refers to an 
earlier mention of DELETE.)



Erik Rijkers

Re: Logical replication timeout problem

2022-01-14 Thread Fabrice Chapuis

If I can follow you, I have to make the following changes:

1. In walsender.c:

static void
WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn,
TransactionId xid)
{
static TimestampTz sendTime = 0;
TimestampTz now = GetCurrentTimestamp();

/* Keep the worker process alive */
WalSndKeepalive(true);
/*
* Track lag no more than once per WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS to
* avoid flooding the lag tracker when we commit frequently.
*/
#define WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS 1000
if (!TimestampDifferenceExceeds(sendTime, now,
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS))
return;

LagTrackerWrite(lsn, now);
sendTime = now;
}

I put *requestReply *parameter to true, is that correct?

2. In pgoutput.c

/*
 * Sends the decoded DML over wire.
 *
 * This is called both in streaming and non-streaming modes.
 */
static void
pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
TransactionId xid = InvalidTransactionId;
Relation ancestor = NULL;

WalSndUpdateProgress(ctx, txn->origin_lsn,  change->txn->xid);

if (!is_publishable_relation(relation))
return;
...

Make a call to *WalSndUpdateProgress* in function *pgoutput_change.*

For info: the information in the log after reproducing the problem.

2022-01-13 11:19:46.340 CET [82233] LOCATION:  WalSndKeepaliveIfNecessary,
walsender.c:3389
2022-01-13 11:19:46.340 CET [82233] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2022-01-13 11:19:46.340 CET [82233] LOG:  0: attempt to send keep alive
message
2022-01-13 11:19:46.340 CET [82233] LOCATION:  WalSndKeepaliveIfNecessary,
walsender.c:3389
2022-01-13 11:19:46.340 CET [82233] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2022-01-13 11:19:46.340 CET [82233] LOG:  0: attempt to send keep alive
message
2022-01-13 11:19:46.340 CET [82233] LOCATION:  WalSndKeepaliveIfNecessary,
walsender.c:3389
2022-01-13 11:19:46.340 CET [82233] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2022-01-13 11:20:46.418 CET [82232] ERROR:  XX000: terminating logical
replication worker due to timeout
2022-01-13 11:20:46.418 CET [82232] LOCATION:  LogicalRepApplyLoop,
worker.c:1267
2022-01-13 11:20:46.421 CET [82224] LOG:  0: worker process: logical
replication worker for subscription 26994 (PID 82232) exited with exit code
1
2022-01-13 11:20:46.421 CET [82224] LOCATION:  LogChildExit,
postmaster.c:3625

Thanks a lot for your help.

Fabrice

On Thu, Jan 13, 2022 at 2:59 PM Amit Kapila  wrote:

> On Thu, Jan 13, 2022 at 3:43 PM Fabrice Chapuis 
> wrote:
> >
> > first phase: postgres read WAL files and generate 1420 snap files.
> > second phase: I guess, but on this point maybe you can clarify, postgres
> has to decode the snap files and remove them if no statement must be
> applied on a replicated table.
> > It is from this point that the worker process exit after 1 minute
> timeout.
> >
>
> Okay, I think the problem could be that because we are skipping all
> the changes of transaction there is no communication sent to the
> subscriber and it eventually timed out. Actually, we try to send
> keep-alive at transaction boundaries like when we call
> pgoutput_commit_txn. The pgoutput_commit_txn will call
> OutputPluginWrite->WalSndWriteData. I think to tackle the problem we
> need to try to send such keepalives via WalSndUpdateProgress and
> invoke that in pgoutput_change when we skip sending the change.
>
> --
> With Regards,
> Amit Kapila.
>

Undocumented error

2022-01-14 Thread Petar Dambovaliev

Hello,

I am using the libpq to consume a replication slot.
Very rarely, i would get a very strange error that i can't find any
information on.
It is not mentioned in any documentation and i don't know under what
conditions it triggers.
Any help is appreciated.

The function i am calling is : `PQgetCopyData`
the error code is `-1` and the error text is `invalid ordering of
speculative insertion changes`

Re: [PATCH] New default role allowing to change per-role/database settings

2022-01-14 Thread Julien Rouhaud

Hi,

On Wed, Sep 08, 2021 at 07:38:25AM +, shinya11.k...@nttdata.com wrote:
> >Thanks for letting me know, I've attached a rebased v4 of this patch, no 
> >other
> >changes.
> I tried it, but when I used set command, tab completion did not work properly 
> and an error occurred.

Also, the patch doesn't apply anymore:

http://cfbot.cputube.org/patch_36_2918.log

=== Applying patches on top of PostgreSQL commit ID 
5513dc6a304d8bda114004a3b906cc6fde5d6274 ===
=== applying patch 
./v4-0001-Add-new-PGC_ADMINSET-guc-context-and-pg_change_ro.patch
[...]
1 out of 23 hunks FAILED -- saving rejects to file 
src/backend/utils/misc/guc.c.rej

I'm switching the patch to Waiting on Author.

Re: Windows: Wrong error message at connection termination

2022-01-14 Thread Sergey Shinderuk


On 02.12.2021 22:31, Tom Lane wrote:

I'll push the close-only change in a little bit.


Unexpectedly, this changes the error message:

postgres=# set idle_session_timeout = '1s';
SET
postgres=# select 1;
	could not receive data from server: Software caused connection abort 
(0x2745/10053)

The connection to the server was lost. Succeeded.
postgres=#

Without shutdown/closesocket it would most likely be:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

When the timeout expires, the server sends the error message and 
gracefully closes the connection by sending a FIN.  Later, psql sends 
another query to the server, and the server responds with a RST.  But 
now recv() returns WSAECONNABORTED(10053) instead of WSAECONNRESET(10054).


Without shutdown/closesocket, after the timeout expires, the server 
sends the error message, the client sends an ACK, and the server 
responds with a RST.  Then psql tries to sends the next query, but 
nothing is sent at the TCP level, and the next recv() returns WSAECONNRESET.


IIUIC, in both cases we may or may not recv() the error message from the 
server depending on how fast the RST arrives from the server.


Should we handle ECONNABORTED similarly to ECONNRESET in pqsecure_raw_read?

--
Sergey Shinderukhttps://postgrespro.com/

Re: Multi-Column List Partitioning

2022-01-14 Thread Julien Rouhaud

Hi,

The cfbot reports some clang warning on the last version of the patchset:

https://cirrus-ci.com/task/6721617647632384

[16:35:24.444] partprune.c:2775:8: error: explicitly assigning value of 
variable of type 'int' to itself [-Werror,-Wself-assign]
[16:35:24.444] off = off;
[16:35:24.444]

A quick look at the patch seems to indicate that it's harmless dead code that
should be removed, and there are no other failure reported by the cfbot, but I
definitely didn't read the patch carefully so a confirmation (and an updated
patch) would be welcome.

However I see that Amul raised some concerns, so I will change the patch status
to Waiting on Author.  Feel free to switch it back to Needs Review if you think
it's more appropriate.

Re: MultiXact\SLRU buffers configuration

2022-01-14 Thread Julien Rouhaud

Hi,
On Sun, Dec 26, 2021 at 03:09:59PM +0500, Andrey Borodin wrote:
> 
> 
> PFA rebase of the patchset. Also I've added a patch to combine page_number, 
> page_status, and page_dirty together to touch less cachelines.

The cfbot reports some errors on the latest version of the patch:

https://cirrus-ci.com/task/6121317215764480
[04:56:38.432] su postgres -c "make -s -j${BUILD_JOBS} world-bin"
[04:56:48.270] In file included from async.c:134:
[04:56:48.270] ../../../src/include/access/slru.h:47:28: error: expected 
identifier or ‘(’ before ‘:’ token
[04:56:48.270]47 | typedef enum SlruPageStatus:int16_t
[04:56:48.270]   |^
[04:56:48.270] ../../../src/include/access/slru.h:53:3: warning: data 
definition has no type or storage class
[04:56:48.270]53 | } SlruPageStatus;
[04:56:48.270]   |   ^~
[04:56:48.270] ../../../src/include/access/slru.h:53:3: warning: type defaults 
to ‘int’ in declaration of ‘SlruPageStatus’ [-Wimplicit-int]
[04:56:48.270] ../../../src/include/access/slru.h:58:2: error: expected 
specifier-qualifier-list before ‘SlruPageStatus’
[04:56:48.270]58 |  SlruPageStatus page_status;
[04:56:48.270]   |  ^~
[04:56:48.270] async.c: In function ‘asyncQueueAddEntries’:
[04:56:48.270] async.c:1448:41: error: ‘SlruPageEntry’ has no member named 
‘page_dirty’
[04:56:48.270]  1448 |  NotifyCtl->shared->page_entries[slotno].page_dirty = 
true;
[04:56:48.270]   | ^
[04:56:48.271] make[3]: *** [: async.o] Error 1
[04:56:48.271] make[3]: *** Waiting for unfinished jobs
[04:56:48.297] make[2]: *** [common.mk:39: commands-recursive] Error 2
[04:56:48.297] make[2]: *** Waiting for unfinished jobs
[04:56:54.554] In file included from clog.c:36:
[04:56:54.554] ../../../../src/include/access/slru.h:47:28: error: expected 
identifier or ‘(’ before ‘:’ token
[04:56:54.554]47 | typedef enum SlruPageStatus:int16_t
[04:56:54.554]   |^
[04:56:54.554] ../../../../src/include/access/slru.h:53:3: warning: data 
definition has no type or storage class
[04:56:54.554]53 | } SlruPageStatus;
[04:56:54.554]   |   ^~
[04:56:54.554] ../../../../src/include/access/slru.h:53:3: warning: type 
defaults to ‘int’ in declaration of ‘SlruPageStatus’ [-Wimplicit-int]
[04:56:54.554] ../../../../src/include/access/slru.h:58:2: error: expected 
specifier-qualifier-list before ‘SlruPageStatus’
[04:56:54.554]58 |  SlruPageStatus page_status;
[04:56:54.554]   |  ^~
[04:56:54.554] clog.c: In function ‘TransactionIdSetPageStatusInternal’:
[04:56:54.554] clog.c:396:39: error: ‘SlruPageEntry’ has no member named 
‘page_dirty’
[04:56:54.554]   396 |  XactCtl->shared->page_entries[slotno].page_dirty = true;
[04:56:54.554]   |   ^
[04:56:54.554] In file included from ../../../../src/include/postgres.h:46,
[04:56:54.554]  from clog.c:33:
[04:56:54.554] clog.c: In function ‘BootStrapCLOG’:
[04:56:54.554] clog.c:716:47: error: ‘SlruPageEntry’ has no member named 
‘page_dirty’
[04:56:54.554]   716 |  
Assert(!XactCtl->shared->page_entries[slotno].page_dirty);
[04:56:54.554]   |   ^
[04:56:54.554] ../../../../src/include/c.h:848:9: note: in definition of macro 
‘Assert’
[04:56:54.554]   848 |   if (!(condition)) \
[04:56:54.554]   | ^
[04:56:54.554] clog.c: In function ‘TrimCLOG’:
[04:56:54.554] clog.c:801:40: error: ‘SlruPageEntry’ has no member named 
‘page_dirty’
[04:56:54.554]   801 |   XactCtl->shared->page_entries[slotno].page_dirty = 
true;
[04:56:54.554]   |^
[04:56:54.554] In file included from ../../../../src/include/postgres.h:46,
[04:56:54.554]  from clog.c:33:
[04:56:54.554] clog.c: In function ‘clog_redo’:
[04:56:54.554] clog.c:997:48: error: ‘SlruPageEntry’ has no member named 
‘page_dirty’
[04:56:54.554]   997 |   
Assert(!XactCtl->shared->page_entries[slotno].page_dirty);
[04:56:54.554]   |^
[04:56:54.554] ../../../../src/include/c.h:848:9: note: in definition of macro 
‘Assert’
[04:56:54.554]   848 |   if (!(condition)) \
[04:56:54.554]   | ^
[04:56:54.555] make[4]: *** [: clog.o] Error 1
[04:56:54.555] make[3]: *** [../../../src/backend/common.mk:39: 
transam-recursive] Error 2
[04:56:54.555] make[3]: *** Waiting for unfinished jobs
[04:56:56.405] make[2]: *** [common.mk:39: access-recursive] Error 2
[04:56:56.405] make[1]: *** [Makefile:42: all-backend-recurse] Error 2
[04:56:56.405] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2
[04:56:56.407]
[04:56:56.407] Exit status: 2

Could you send a new version?  In the meantime I will switch the patch status
to Waiting on Author.

Re: support for MERGE

2022-01-14 Thread Erik Rijkers


Op 13-01-2022 om 13:43 schreef Alvaro Herrera:

Apologies, there was a merge failure and I failed to notice.  Here's the
correct patch.

(This is also in github.com/alvherre/postgres/tree/merge-15)



[20220113/v6-0001-MERGE-SQL-Command-following-SQL-2016.patch]


Good morning,


I got into this crash (may be the same as Jaime's):

#-
# bash

t1=table1
t2=table2

psql -qX << SQL
drop table if exists $t1 cascade;
drop table if exists $t2 cascade;
create table $t1 as select /*cast(id::text as jsonb) js,*/ id from 
generate_series(1,20) as f(id);
create table $t2 as select /*cast(id::text as jsonb) js,*/ id from 
generate_series(1,20) as f(id);

delete from $t1 where id % 2 = 1;
delete from $t2 where id % 2 = 0;
(   select 't1 - target', count(*) t1 from $t1
  union all select 't2 - source', count(*) t2 from $t2 ) order by 1;

merge into $t1 as t1 using $t2 as t2 on t1.id = t2.id
when not matched and (t2.id > 10) and (t1.id > 10)
 then  do nothing
when not matched then  insert values ( id )
when matched then  do nothing ;

(   select 't1 - target', count(*) t1 from $t1
  union all select 't2 - source', count(*) t2 from $t2 ) order by 1 ;

SQL
#-

Referencing alias 't1' in the WHEN NOT MATCHED member seems what 
triggers his crash: when I remove the phrase 'and (t1.id > 10)', the 
statement finishes correctly.



And I don't know if it is related but if I use this phrase:

when not matched and (id > 10)

I get:

ERROR:  column "id" does not exist
LINE 2: when not matched and id > 0 -- (t2.id > 10) and (t1.id > 10)
 ^
HINT:  There is a column named "id" in table "t1", but it cannot be 
referenced from this part of the query.


Is that hint correct? Seems a bit strange.


Thanks,

Erik Rijkers

Re: Schema variables - new implementation for Postgres 15

2022-01-14 Thread Pavel Stehule

pá 14. 1. 2022 v 3:44 odesílatel Julien Rouhaud  napsal:

> Hi,
>
> On Thu, Jan 13, 2022 at 07:32:26PM +0100, Pavel Stehule wrote:
> > čt 13. 1. 2022 v 19:23 odesílatel Dean Rasheed  >
> > napsal:
> >
> > > On Thu, 13 Jan 2022 at 17:42, Pavel Stehule 
> > > wrote:
> > > >
> > > > I like the idea of prioritizing tables over variables with warnings
> when
> > > collision is detected. It cannot break anything. And it allows to using
> > > short identifiers when there is not collision.
> > >
> > > Yeah, that seems OK, as long as it's clearly documented. I don't think
> > > a warning is necessary.
>
> What should be the behavior for a cached plan that uses a variable when a
> conflicting relation is later created?  I think that it should be the same
> as a
> search_path change and the plan should be discarded.
>

This is a more generic problem - creating a new DDL object doesn't
invalidate plans.

https://www.postgresql.org/message-id/2589876.1641914327%40sss.pgh.pa.us



>
> > The warning can be disabled by default, but I think it should be there.
> > This is a signal, so some in the database schema should be renamed.
> Maybe -
> > session_variables_ambiguity_warning.
>
> I agree that having a way to know that a variable has been bypassed can be
> useful.
>
> > > (FWIW, testing with dbfiddle, that appears to match Db2's behaviour).
> > >
> >
> > Thank you for check
>
> Do you know what's oracle's behavior on that?
>
>
Oracle is very different, because package variables are not visible from
plain SQL. And change of interface invalidates dependent objects and
requires recompilation. So it is a little bit more sensitive. If I remember
well, the SQL identifiers have bigger priority than PL/SQL identifiers
(package variables), so proposed behavior is very similar to Oracle
behavior too. The risk of unwanted collision is reduced (on Oracle) by
local visibility of package variables, and availability of package
variables only in some environments.



>
> I've been looking at the various dependency handling, and I noticed that
> collation are ignored, while they're accepted syntax-wise:
>
> =# create collation mycollation (locale = 'fr-FR', provider = 'icu');
> CREATE COLLATION
>
> =# create variable myvariable text collate mycollation;
> CREATE VARIABLE
>
> =# select classid::regclass, objid, objsubid, refclassid::regclass,
> refobjid, refobjsubid from pg_depend where classid::regclass::text =
> 'pg_variable' or refclassid::regclass::text = 'pg_variable';
>classid   | objid | objsubid |  refclassid  | refobjid | refobjsubid
> -+---+--+--+--+-
>  pg_variable | 16407 |0 | pg_namespace | 2200 |   0
> (1 row)
>
> =# let myvariable = 'AA';
> LET
>
> =# select 'AA' collate "en-x-icu" < myvariable;
>  ?column?
> --
>  f
> (1 row)
>
> =# select 'AA' collate "en-x-icu" < myvariable collate mycollation;
> ERROR:  42P21: collation mismatch between explicit collations "en-x-icu"
> and "mycollation"
> LINE 1: select 'AA' collate "en-x-icu" < myvariable collate mycollat...
>
> So it's missing both dependency recording for variable's collation and also
> teaching various code that variables can have a collation.
>
> It's also missing some invalidation detection.  For instance:
>
> =# create variable myval text;
> CREATE VARIABLE
>
> =# let myval = 'pg_class';
> LET
>
> =# prepare s(text) as select relname from pg_class where relname = $1 or
> relname = myval;
> PREPARE
>
> =# set plan_cache_mode = force_generic_plan ;
> SET
>
> =# execute s ('');
>  relname
> --
>  pg_class
> (1 row)
>
> =# drop variable myval ;
> DROP VARIABLE
>
> =# create variable myval int;
> CREATE VARIABLE
>
> =# execute s ('');
> ERROR:  XX000: cache lookup failed for session variable 16408
>
> The plan should have been discarded and the new plan should fail for type
> problem.
>
> Strangely, subsequent calls don't error out:
>
> =# execute s('');
>  relname
> -
> (0 rows)
>
> But doing an explain shows that there's a problem:
>
> =# explain execute s('');
> ERROR:  XX000: cache lookup failed for variable 16408
>

looks like bug

Regards

Pavel

Possible fails in pg_stat_statements test

2022-01-14 Thread Anton A. Melnikov


Hello,

There are some places in the pg_state_statement's regress test where the 
bool result of comparison between the number of rows obtained and 
wal_records generated by query should be displayed.
Now counting the number of wal_records for some query in 
pg_state_statement is done by the global pgWalUsage.wal_records counter 
difference calculation.
During query execution the extra wal_records may appear that are not 
relate to the query.

There are two reasons why this might happen:
1) Owing to the hit into pruning of some page in optional pruning 
function (heap_page_prune_opt()).
2) When a new page is required for a new xid in clog and 
WriteZeroPageXlogRec() was called.
In both cases an extra wal record with zero xl_xid is generated, so 
wal_records counter gives an incremented value for this query and 
pg_stat_statement test will fail.


This patch introduces an additional counter of wal records not related 
to the query being executed.
Due to this counter pg_stat_statement finds out the number of wal 
records that are not relevant to the query and does not include them in 
the per query statistic.

This removes the possibility of the error described above.

There is a way to reproduce this error when patch is not applied:
1) start server with "shared_preload_libraries = 'pg_stat_statements'" 
string in the postgresql.conf;

2) replace makefile in contrib/pg_stat_statements with attached one;
3) replace test file 
contrib/pg_stat_statements/sql/pg_stat_statements.sql and expected 
results contrib/pg_stat_statements/expected/pg_stat_statements.out

with shorter versions from attached files;
4) copy test.sh to contrib/pg_stat_statements and make sure that PGHOME 
point to your server;

5) cd to contrib/pg_stat_statements and execute:
export ITER=1 && while ./start.sh || break; export ITER=$(($ITER+1)); do 
:; done


Usually 100-200 iterations will be enough.
To catch the error more faster one can add wal_records column to SELECT
in line 26 of contrib/pg_stat_statements/sql/pg_stat_statements.sql as 
followes:

SELECT query, calls, rows, wal_records,
and replace the contrib/pg_stat_statements/expected/pg_stat_statements.out
with attached pg_stat_statements-fast.out

With best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
commit 3f4659a8d8a390bb24fbc6f82a6add7949fbebe2
Author: Anton A. Melnikov 
Date:   Fri Jan 14 10:54:35 2022 +0300

Fix possible fails in pg_stat_statements test via taking into account non-query wal records.

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 082bfa8f77..bd437aefc3 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1370,7 +1370,7 @@ pgss_store(const char *query, uint64 queryId,
 		e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
 		e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
 		e->counters.usage += USAGE_EXEC(total_time);
-		e->counters.wal_records += walusage->wal_records;
+		e->counters.wal_records += (walusage->wal_records - walusage->non_query_wal_recs);
 		e->counters.wal_fpi += walusage->wal_fpi;
 		e->counters.wal_bytes += walusage->wal_bytes;
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3201fcc52b..41f17ab97c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -20,6 +20,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
+#include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
@@ -208,6 +209,11 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
 	   limited_ts, &nnewlpdead, NULL);
 
+			/* Take into account that heap_page_prune() just generated a new
+			 * wal record with zero xl_xid that is not related to current query.
+			 */
+			pgWalUsage.non_query_wal_recs++;
+
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
 			 * ndeleted minus the number of newly-LP_DEAD-set items.
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index de787c3d37..e944fc3b1a 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -38,6 +38,7 @@
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
+#include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
@@ -955,6 +956,12 @@ WriteZeroPageXlogRec(int pageno)
 	XLogBeginInsert();
 	XLogRegisterData((char *) (&pageno), sizeof(int));
 	(void) XLogInsert(RM_CLOG_ID, CLOG_ZEROPAGE);
+
+	/*
+	 * Consider that a new unrelated to current query wal record
+	 * with zero xl_xid has just been created.
+	 */
+	pgWalUsage.non_query_wal_recs++;
 }
 
 /*

Re: MDAM techniques and Index Skip Scan patch

2022-01-14 Thread Julien Rouhaud

Hi,

On Fri, Jan 14, 2022 at 08:55:26AM +0100, Dmitry Dolgov wrote:
> 
> FYI, I've attached this thread to the CF item as an informational one,
> but as there are some patches posted here, folks may get confused. For
> those who have landed here with no context, I feel obliged to mention
> that now there are two alternative patch series posted under the same
> CF item:
> 
> * the original one lives in [1], waiting for reviews since the last May
> * an alternative one posted here from Floris

Ah, I indeed wasn't sure of which patchset(s) should actually be reviewed.
It's nice to have the alternative approach threads linkied in the commit fest,
but it seems that the cfbot will use the most recent attachments as the only
patchset, thus leaving the "original" one untested.

I'm not sure of what's the best approach in such situation.  Maybe creating a
different CF entry for each alternative, and link the other cf entry on the cf
app using the "Add annotations" or "Links" feature rather than attaching
threads?

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-14 Thread Alexander Lakhin

13.01.2022 09:36, Thomas Munro wrote:
> Here's a draft attempt at a fix.  First I tried to use recv(fd, &c, 1,
> MSG_PEEK) == 0 to detect EOF, which seemed to me to be a reasonable
> enough candidate, but somehow it corrupts the stream (!?), so I used
> Alexander's POLLHUP idea, except I pushed it down to a more principled
> place IMHO.  Then I suppressed it after the initial check because then
> the logic from my earlier patch takes over, so stuff like FeBeWaitSet
> doesn't suffer from extra calls, only these two paths that haven't
> been converted to long-lived WESes yet.  Does this pass the test?
Yes, this fix eliminates the flakiness for me. The test commit_ts (with
002_standby and 003_standby_2) passed 2x200 iterations.
It also makes my test postgres_fdw/001_disconnection pass reliably.
> I wonder if this POLLHUP technique is reliable enough (I know that
> wouldn't work on other systems[1], which is why I was trying to make
> MSG_PEEK work...).
>
> What about environment variable PG_TEST_USE_UNIX_SOCKETS=1, does it
> reproduce with that set, and does the patch fix it?  I'm hoping that
> explains some Windows CI failures from a nearby thread[2].
With PG_TEST_USE_UNIX_SOCKETS=1 the test commit_ts/002_standby fails on
the unpatched HEAD:
t/002_standby.pl  1/4 # poll_query_until timed out executing this query:
# SELECT '0/303C628'::pg_lsn <= pg_last_wal_replay_lsn()
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
# Looks like your test exited with 25 just after 1.
t/002_standby.pl  Dubious, test returned 25 (wstat 6400, 0x1900)

002_standby_primary.log contains:
2022-01-13 18:57:32.925 PST [1448] LOG:  starting PostgreSQL 15devel,
compiled by Visual C++ build 1928, 64-bit
2022-01-13 18:57:32.926 PST [1448] LOG:  listening on Unix socket
"C:/Users/1/AppData/Local/Temp/yOKQPH1FoO/.s.PGSQL.62733"

The same with my postgres_fdw test:
# 03:41:12.533246 result:   0
# 0|0
# 03:41:12.534758 executing query...
# 03:41:14.267594 result:   3
#
# psql::1: WARNING:  no connection to the server
# psql::1: ERROR:  FATAL:  terminating connection due to
administrator command
# server closed the connection unexpectedly
#   This probably means the server terminated abnormally
#   before or while processing the request.
# CONTEXT:  remote SQL command: FETCH 100 FROM c1
# 03:41:14.270449 executing query...
# 03:41:14.334437 result:   3
#
# psql::1: ERROR:  could not connect to server "fpg"
# DETAIL:  connection to server on socket
"C:/Users/1/AppData/Local/Temp/hJWD9mzPHM/.s.PGSQL.57414" failed:
Connection refused (0x274D/10061)
#   Is the server running locally and accepting connections on that
socket?
# 03:41:14.336918 executing query...
# 03:41:14.422381 result:   3
#
# psql::1: ERROR:  could not connect to server "fpg"
# DETAIL:  connection to server on socket
"C:/Users/1/AppData/Local/Temp/hJWD9mzPHM/.s.PGSQL.57414" failed:
Connection refused (0x274D/10061)
#   Is the server running locally and accepting connections on that
socket?
# 03:41:14.425628 executing query...
...hang...

With the patch these tests pass successfully.

I can also confirm that on Windows 10 20H2 (previous tests were
performed on Windows Server 2012) the unpatched HEAD +
PG_TEST_USE_UNIX_SOCKETS=1 hangs on src\test\recovery\001_stream_rep (on
iterations 1, 1, 4 for me).
(v9-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-un.patch [1] not
required to see that.)
001_stream_rep_primary.log contains:
...
2022-01-13 19:46:48.287 PST [1364] LOG:  listening on Unix socket
"C:/Users/1/AppData/Local/Temp/EWzapwiXjV/.s.PGSQL.58248"
2022-01-13 19:46:48.317 PST [6736] LOG:  database system was shut down
at 2022-01-13 19:46:37 PST
2022-01-13 19:46:48.331 PST [1364] LOG:  database system is ready to
accept connections
2022-01-13 19:46:49.536 PST [1088] 001_stream_rep.pl LOG:  statement:
CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a
2022-01-13 19:46:49.646 PST [3028] 001_stream_rep.pl LOG:  statement:
SELECT pg_current_wal_insert_lsn()
2022-01-13 19:46:49.745 PST [3360] 001_stream_rep.pl LOG:  statement:
SELECT '0/3023268' <= replay_lsn AND state = 'streaming' FROM
pg_catalog.pg_stat_replication WHERE application_name = 'standby_1';
...
2022-01-13 19:50:39.755 PST [4924] 001_stream_rep.pl LOG:  statement:
SELECT '0/3023268' <= replay_lsn AND state = 'streaming' FROM
pg_catalog.pg_stat_replication WHERE application_name = 'standby_1';
2022-01-13 19:50:39.928 PST [5924] 001_stream_rep.pl LOG:  statement:
SELECT '0/3023268' <= replay_lsn AND state = 'streaming' FROM
pg_catalog.pg_stat_replication WHERE application_name = 'standby_1';

Without PG_TEST_USE_UNIX_SOCKETS=1 and without the fix the
001_stream_rep hangs too (but on iterations 3, 8, 2). So it seems that
using unix sockets increases the fail rate.

With the fix 100 iterations with PG_TEST_USE_UNIX_SOCKETS=1 and 40 (and
still counting) iterations without PG_TEST_USE_UNIX_SOCKETS pass.

So the fix looks as absolutely w

85 matches

Mail list logo