Re: Large Pages and Super Pages for PostgreSQL
On Sun, Jan 16, 2022 at 6:03 PM DEVOPS_WwIT wrote: > Solaris and FreeBSD supports large/super pages, and can be used > automatically by applications. > > Seems Postgres can't use the large/super pages on Solaris and FreeBSD > os(I think can't use the large/super page HPUX and AIX), is there anyone > could take a look? Hello, I can provide some clues and partial answers about page size on three of the OSes you mentioned: 1. Solaris: I haven't used that OS for a long time, but I thought it was supposed to promote memory to larger pages sizes transparently with some heuristics. To control page size explicitly, it *looks* like memcntl(2) with command MHA_MAPSIZE_VA could be used; that's what the man page says, anyway. If someone is interested in writing a patch to do that, I'd be happy to review it and test it on illumos... 2. AIX: We *nearly* made this work recently[1]. The summary is that AIX doesn't have a way to control the page size of anonymous shared mmap memory (our usual source of shared memory), so you have to use SystemV shared memory if you want non-default page size for shared memory. We got as far as adding the option shared_memory_type=sysv, and the next step is pretty easy: just pass in some magic flags. This just needs someone with access and motivation to pick up that work... 3. FreeBSD: FreeBSD does transparently migrate PostgreSQL memory to "super" pages quite well in my experience, but there is also a new facility in FreeBSD 13 to ask for specific page sizes explicitly. I wrote a quick and dirty patch to enable PostgreSQL's huge_pages and huge_page_size settings to work with that interface, but I haven't yet got as far as testing it very hard or proposing it... but here it is, if you like experimental code[2]. I don't know about HP-UX. I think it might be dead, Jim. [1] https://www.postgresql.org/message-id/flat/HE1PR0202MB28126DB4E0B6621CC6A1A91286D90%40HE1PR0202MB2812.eurprd02.prod.outlook.com [2] https://github.com/macdice/postgres/commit/a71aafe5582c2e61005af0d16ca82eed89445a67
Re: XLogReadRecord() error in XlogReadTwoPhaseData()
On Fri, Nov 19, 2021 at 09:18:23PM -0800, Noah Misch wrote: > On Wed, Nov 17, 2021 at 11:05:06PM -0800, Noah Misch wrote: > > On Wed, Nov 17, 2021 at 05:47:10PM -0500, Tom Lane wrote: > > > Noah Misch writes: > > > > Each of the three failures happened on a sparc64 Debian+gcc machine. I > > > > had > > > > tried ~8000 iterations on thorntail, another sparc64 Debian+gcc animal, > > > > without reproducing this. > > > > > # 'pgbench: error: client 0 script 1 aborted in command > > > 4 query 0: ERROR: could not read two-phase state from WAL at 0/159EF88: > > > unexpected pageaddr 0/0 in log segment 00010001, offset > > > 5890048 > > > [1] > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-17%2013%3A01%3A24 > > > > Two others: > > ERROR: could not read two-phase state from WAL at 0/16F1850: invalid > > record length at 0/16F1850: wanted 24, got 0 > > -- > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-12%2013%3A01%3A15 > > ERROR: could not read two-phase state from WAL at 0/1668020: incorrect > > resource manager data checksum in record at 0/1668020 > > -- > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-11-16%2015%3A00%3A52 > > I don't have a great theory, but here are candidates to examine next: > > > > - Run with wal_debug=on to confirm logged write location matches read > > location. > > - Run > > "PGDATA=contrib/amcheck/tmp_check/t_003_cic_2pc_CIC_2PC_test_data/pgdata > > pg_waldump -s 0/0100" at the end of the test. > > - Dump WAL page binary image at the point of failure. > > - Log which branches in XLogReadRecord() are taken. > > Tom Turelinckx, are you able to provide remote access to kittiwake or > tadarida? I'd use it to attempt the above things. All else being equal, > kittiwake is more relevant since it's still supported upstream. Thanks for setting up access. Summary: this kernel has a bug in I/O syscalls. How practical is it to update that kernel? (Userland differs across these animals, but the kernel does not.) The kernel on buildfarm member thorntail doesn't exhibit the bug. For specifics of the kernel bug, see the attached test program. In brief, the bug arises if one process is write()ing or pwrite()ing a file at about the same time that another process is read()ing or pread()ing the same. POSIX says the reader should see the data as it existed before the write or the newly-written data. On this kernel, the reader can see zeros instead. That leads to the $SUBJECT failure. PostgreSQL processes write out a given WAL block 20-30 times in ~10ms, and COMMIT PREPARED reads that block. The writers aren't changing the bytes of interest to COMMIT PREPARED, but the zeros from the kernel bug yield the failure. We could opt to work around that by writing only the not-already-written portion of a WAL block, but I doubt that's worthwhile unless it happens to be a performance win anyway. Separately, while I don't know of relevance to PostgreSQL, I was interested to see that CentOS 7 pwrite()/pread() fail to have the POSIX-required atomicity. > The setup of your buildfarm animals provides a clue. I understand kittiwake > is the same as ibisbill except for build options, and tadarida is the same as > mussurana except for build options. ibisbill and mussurana haven't failed, so > one of these is likely needed to reproduce: > > absence of --enable-cassert > CFLAGS='-g -O2 -fstack-protector -Wformat -Werror=format-security ' > CPPFLAGS='-Wdate-time -D_FORTIFY_SOURCE=2' > LDFLAGS='-Wl,-z,relro -Wl,-z,now' That was a red herring. ibisbill and mussurana don't use --with-tap-tests. Adding --with-tap-tests has been enough to make their configurations witness the same kinds of failures. nm /* * Stress-test pread(), pwrite(), read(), and write() to detect a few problems * with their handling of regular files: * * - Lack of atomicity. * https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07 * requires atomicity. (An implementation may always return <= 1; if it * chooses to return higher values, it must maintain atomicity.) * * - Transiently making readers see zeros when they read concurrently with a * write, even if the file had no zero at that offset before or after the * write. * * By default, this program will print "mismatch" messages if pwrite() is * non-atomic. Build with other options to test other behaviors: * -DCHANGE_CONTENT=0 tests the zeros bug instead of plain atomicity * -DUSE_SEEK=1 tests write()/read() instead of pwrite()/pread() * -DREOPEN=0 reuses the same file descriptor across iterations * -DXLOG_BLCKSZ=32 tests a different byte count *high values may require "ulimit -Ss" changes * * * Observed behaviors: * * Linux 3.10.0-1160.49.1.el7.x86_64 (CentOS 7.9.2009): * pwrite/pread is non-atomic if count>16 (no -D switches) * write/read is atomic (-DUSE
Re: MultiXact/SLRU buffers configuration
> 15 янв. 2022 г., в 20:46, Justin Pryzby написал(а): >>> >>> I was planning on running a set of stress tests on these patches. Could >>> we confirm which ones we plan to include in the commitfest? >> >> Many thanks for your interest. Here's the latest version. > > This is failing to compile under linux and windows due to bitfield syntax. > http://cfbot.cputube.org/andrey-borodin.html Uh, sorry, I formatted a patch from wrong branch. Just tested Cirrus. It's wonderful, thanks! Really faster than doing stuff on my machines... Best regards, Andrey Borodin. v20-0001-Make-all-SLRU-buffer-sizes-configurable.patch Description: Binary data v20-0002-Divide-SLRU-buffers-into-8-associative-banks.patch Description: Binary data v20-0003-Pack-SLRU-page_number-page_status-and-page_dirty.patch Description: Binary data
Large Pages and Super Pages for PostgreSQL
Hi Hacker Solaris and FreeBSD supports large/super pages, and can be used automatically by applications. Seems Postgres can't use the large/super pages on Solaris and FreeBSD os(I think can't use the large/super page HPUX and AIX), is there anyone could take a look? following is my testing: 1. check OS supported large page size -bash-4.3$ pagesize -a 4096 2097152 1073741824 2. the OS version is 5.11 -bash-4.3$ uname -a SunOS 08a6a65f-b5a0-c159-f184-e81c379d1f5d 5.11 hunghu-20220114T101258Z:a3282be5a8 i86pc i386 i86pc -bash-4.3$ 3. PostgreSQL shared buffers is 11G -bash-4.3$ grep -i shared_buffer postgresql.conf shared_buffers = 11GB # min 128kB 4. checked on Solaris OS, all of the memory are 4k page for PostgreSQL , had not use 2M or 1G page size -bash-4.3$ cat postmaster.pid |head -n 1 31637 -bash-4.3$ pmap -sxa 31637 31637: /opt/local/bin/postgres Address Kbytes RSS Anon Locked Pgsz Mode Mapped File 0040 4 4 - - 4K r-x-- postgres 00401000 872 28 - - - r-x-- postgres 004DB000 84 84 - - 4K r-x-- postgres 004F 184 24 - - - r-x-- postgres 0051E000 248 248 - - 4K r-x-- postgres 0055C000 8 8 - - - r-x-- postgres 0055E000 8 8 - - 4K r-x-- postgres 0056 16 12 - - - r-x-- postgres 00564000 4 4 - - 4K r-x-- postgres 00565000 20 20 - - - r-x-- postgres 0056A000 4 4 - - 4K r-x-- postgres 0056B000 24 24 - - - r-x-- postgres 00571000 8 8 - - 4K r-x-- postgres 00573000 4 4 - - - r-x-- postgres 00574000 16 16 - - 4K r-x-- postgres 00578000 24 4 - - - r-x-- postgres 0057E000 4 4 - - 4K r-x-- postgres 0057F000 8 8 - - - r-x-- postgres 00581000 8 8 - - 4K r-x-- postgres 00583000 4 4 - - - r-x-- postgres 00584000 188 188 - - 4K r-x-- postgres 005B3000 84 28 - - - r-x-- postgres 005C8000 24 24 - - 4K r-x-- postgres 005CE000 76 40 - - - r-x-- postgres 005E1000 4 4 - - 4K r-x-- postgres 005E2000 368 280 - - - r-x-- postgres 0063E000 4 4 - - 4K r-x-- postgres 0063F000 80 36 - - - r-x-- postgres 00653000 12 12 - - 4K r-x-- postgres 00656000 8 8 - - - r-x-- postgres 00658000 4 4 - - 4K r-x-- postgres 00659000 12 12 - - - r-x-- postgres 0065C000 8 8 - - 4K r-x-- postgres 0065E000 4 4 - - - r-x-- postgres 0065F000 4 4 - - 4K r-x-- postgres 0066 12 12 - - - r-x-- postgres 00663000 8 8 - - 4K r-x-- postgres 00665000 12 12 - - - r-x-- postgres 00668000 4 4 - - 4K r-x-- postgres 00669000 4 4 - - - r-x-- postgres 0066A000 8 8 - - 4K r-x-- postgres 0066C000 32 32 - - - r-x-- postgres 00674000 4 4 - - 4K r-x-- postgres 00675000 4 4 - - - r-x-- postgres 00676000 4 4 - - 4K r-x-- postgres 00677000 156 156 - - - r-x-- postgres 0069E000 4 4 - - 4K r-x-- postgres 0069F000 416 396 - - - r-x-- postgres 00707000 4
Re: using an end-of-recovery record in all cases
Hi, On Mon, Oct 18, 2021 at 10:56:53AM +0530, Amul Sul wrote: > > Attached is the rebased and updated version. The patch removes the > newly introduced PerformRecoveryXLogAction() function. In addition to > that, removed the CHECKPOINT_END_OF_RECOVERY flag and its related > code. Also, dropped changes for bgwriter.c and slot.c in this patch, which > seem unrelated to this work. The cfbot reports that this version of the patch doesn't apply anymore: http://cfbot.cputube.org/patch_36_3365.log === Applying patches on top of PostgreSQL commit ID 0c53a6658e47217ad3dd416a5543fc87c3ecfd58 === === applying patch ./v3-0001-Always-use-an-end-of-recovery-record-rather-than-.patch patching file src/backend/access/transam/xlog.c [...] Hunk #14 FAILED at 9061. Hunk #15 FAILED at 9241. 2 out of 15 hunks FAILED -- saving rejects to file src/backend/access/transam/xlog.c.rej Can you send a rebased version? In the meantime I will switch the cf entry to Waiting on Author.
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Hi, On Fri, Dec 24, 2021 at 07:21:59PM +0900, Kyotaro Horiguchi wrote: > Just a complaint.. > > At Fri, 12 Nov 2021 16:43:27 +0900 (JST), Kyotaro Horiguchi > wrote in > > "" on Japanese (CP-932) environment. I didn't actually > > tested it on Windows and msys environment ...yet. > > Active perl cannot be installed because of (perhaps) a powershell > version issue... Annoying.. > > https://community.activestate.com/t/please-update-your-powershell-install-scripts/7897 I'm not very familiar with windows, but maybe using strawberry perl instead ([1]) would fix your problem? I think it's also quite popular and is commonly used to run pgBadger on Windows. Other than that, I see that the TAP tests are failing on all the environment, due to Perl errors. For instance: [04:06:00.848] [04:05:54] t/003_promote.pl . [04:06:00.848] Dubious, test returned 2 (wstat 512, 0x200) https://api.cirrus-ci.com/v1/artifact/task/4751213722861568/tap/src/bin/pg_basebackup/tmp_check/log/regress_log_020_pg_receivewal # Initializing node "standby" from backup "my_backup" of node "primary" Odd number of elements in hash assignment at /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 996. Use of uninitialized value in list assignment at /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 996. Use of uninitialized value $tsp in concatenation (.) or string at /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 1008. Use of uninitialized value $tsp in concatenation (.) or string at /tmp/cirrus-ci-build/src/bin/pg_ctl/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 1009. That's apparently the same problem on every failure reported. Can you send a fixed patchset? In the meantime I will switch the cf entry to Waiting on Author. [1] https://strawberryperl.com/
Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
Hi, On Thu, Jan 06, 2022 at 05:29:23PM +0900, Etsuro Fujita wrote: > > Done. Attached is a new version. The patchset doesn't apply anymore: http://cfbot.cputube.org/patch_36_3392.log === Applying patches on top of PostgreSQL commit ID 43c2175121c829c8591fc5117b725f1f22bfb670 === === applying patch ./v2-0001-postgres-fdw-Add-support-for-parallel-commit.patch patching file contrib/postgres_fdw/connection.c patching file contrib/postgres_fdw/expected/postgres_fdw.out Hunk #2 FAILED at 10825. 1 out of 2 hunks FAILED -- saving rejects to file contrib/postgres_fdw/expected/postgres_fdw.out.rej patching file contrib/postgres_fdw/option.c patching file contrib/postgres_fdw/sql/postgres_fdw.sql Hunk #1 FAILED at 3452. 1 out of 1 hunk FAILED -- saving rejects to file contrib/postgres_fdw/sql/postgres_fdw.sql.rej patching file doc/src/sgml/postgres-fdw.sgml I also see that Fuji-san raised some questions, so for now I will simply change the patch status to Waiting on Author.
Re: Isn't wait_for_catchup slightly broken?
I wrote: > Another thing that is bothering me a bit is that a number of the > callers use $node->lsn('insert') as the target. This also seems > rather dubious, because that could be ahead of what's been written > out. These callers are just taking it on faith that something will > eventually cause that extra WAL to get written out (and become > available to the standby). Again, that seems to make the test > slower than it need be, with a worst-case scenario being that it > eventually times out. Admittedly this is unlikely to be a big > problem unless some background op issues an abortive transaction > at just the wrong time. Nonetheless, I wonder if we shouldn't > standardize on "thou shalt use the write position", because I > don't think the other alternatives have anything to recommend them. Here's a version that makes sure that callers specify a write position not an insert position. I also simplified the callers wherever it turned out that they could just use the default parameters. regards, tom lane diff --git a/src/bin/pg_basebackup/t/020_pg_receivewal.pl b/src/bin/pg_basebackup/t/020_pg_receivewal.pl index b616c0ccf1..98dbdab595 100644 --- a/src/bin/pg_basebackup/t/020_pg_receivewal.pl +++ b/src/bin/pg_basebackup/t/020_pg_receivewal.pl @@ -290,7 +290,7 @@ $standby->psql( "CREATE_REPLICATION_SLOT $archive_slot PHYSICAL (RESERVE_WAL)", replication => 1); # Wait for standby catchup -$primary->wait_for_catchup($standby, 'replay', $primary->lsn('write')); +$primary->wait_for_catchup($standby); # Get a walfilename from before the promotion to make sure it is archived # after promotion my $standby_slot = $standby->slot($archive_slot); diff --git a/src/bin/pg_rewind/t/007_standby_source.pl b/src/bin/pg_rewind/t/007_standby_source.pl index 62254344f6..239b510c1a 100644 --- a/src/bin/pg_rewind/t/007_standby_source.pl +++ b/src/bin/pg_rewind/t/007_standby_source.pl @@ -74,7 +74,7 @@ $node_a->safe_psql('postgres', "INSERT INTO tbl1 values ('in A, before promotion')"); $node_a->safe_psql('postgres', 'CHECKPOINT'); -my $lsn = $node_a->lsn('insert'); +my $lsn = $node_a->lsn('write'); $node_a->wait_for_catchup('node_b', 'write', $lsn); $node_b->wait_for_catchup('node_c', 'write', $lsn); @@ -93,8 +93,7 @@ $node_a->safe_psql('postgres', "INSERT INTO tbl1 VALUES ('in A, after C was promoted')"); # make sure it's replicated to B before we continue -$lsn = $node_a->lsn('insert'); -$node_a->wait_for_catchup('node_b', 'replay', $lsn); +$node_a->wait_for_catchup('node_b'); # Also insert a new row in the standby, which won't be present in the # old primary. @@ -161,7 +160,7 @@ in A, after C was promoted $node_a->safe_psql('postgres', "INSERT INTO tbl1 values ('in A, after rewind')"); -$lsn = $node_a->lsn('insert'); +$lsn = $node_a->lsn('write'); $node_b->wait_for_catchup('node_c', 'replay', $lsn); check_query( diff --git a/src/bin/pg_rewind/t/008_min_recovery_point.pl b/src/bin/pg_rewind/t/008_min_recovery_point.pl index 9ad7e6b62b..8240229230 100644 --- a/src/bin/pg_rewind/t/008_min_recovery_point.pl +++ b/src/bin/pg_rewind/t/008_min_recovery_point.pl @@ -69,8 +69,7 @@ $node_3->init_from_backup($node_1, $backup_name, has_streaming => 1); $node_3->start; # Wait until node 3 has connected and caught up -my $lsn = $node_1->lsn('insert'); -$node_1->wait_for_catchup('node_3', 'replay', $lsn); +$node_1->wait_for_catchup('node_3'); # # Swap the roles of node_1 and node_3, so that node_1 follows node_3. @@ -106,8 +105,7 @@ $node_2->restart(); # # make sure node_1 is full caught up with node_3 first -$lsn = $node_3->lsn('insert'); -$node_3->wait_for_catchup('node_1', 'replay', $lsn); +$node_3->wait_for_catchup('node_1'); $node_1->promote; # Force a checkpoint after promotion, like earlier. diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index e18f27276c..eceb8b2d8b 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -2509,8 +2509,11 @@ poll_query_until timeout. Requires that the 'postgres' db exists and is accessible. -target_lsn may be any arbitrary lsn, but is typically $primary_node->lsn('insert'). -If omitted, pg_current_wal_lsn() is used. +The default value of target_lsn is $node->lsn('write'), which ensures +that the standby has caught up to what has been committed on the primary. +If you pass an explicit value of target_lsn, it should almost always be +the primary's write LSN; so this parameter is seldom needed except when +querying some intermediate replication node rather than the primary. This is not a test. It die()s on failure. @@ -2531,23 +2534,18 @@ sub wait_for_catchup { $standby_name = $standby_name->name; } - my $lsn_expr; - if (defined($target_lsn)) + if (!defined($target_lsn)) { - $lsn_expr = "'$target_lsn'"; - } - else - { - $lsn_expr = 'pg_current_wal_lsn()'; + $target_lsn = $self->lsn('write'); } print
Re: sequences vs. synchronous replication
On 1/15/22 06:12, Fujii Masao wrote: On 2022/01/12 1:07, Tomas Vondra wrote: I explored the idea of using page LSN a bit Many thanks! The patch from 22/12 simply checks if the change should/would wait for sync replica, and if yes it WAL-logs the sequence increment. There's a couple problems with this, unfortunately: Yes, you're right. So I think this approach is not really an improvement over WAL-logging every increment. But there's a better way, I think - we don't need to generate WAL, we just need to ensure we wait for it to be flushed at transaction end in RecordTransactionCommit(). That is, instead of generating more WAL, simply update XactLastRecEnd and then ensure RecordTransactionCommit flushes/waits etc. Attached is a patch doing that - the changes in sequence.c are trivial, changes in RecordTransactionCommit simply ensure we flush/wait even without XID (this actually raises some additional questions that I'll discuss in a separate message in this thread). This approach (and also my previous proposal) seems to assume that the value returned from nextval() should not be used until the transaction executing that nextval() has been committed successfully. But I'm not sure how many applications follow this assumption. Some application might use the return value of nextval() instantly before issuing commit command. Some might use the return value of nextval() executed in rollbacked transaction. IMO any application that assumes data from uncommitted transactions is outright broken and we should not try to fix that because it's quite futile (and likely will affect well-behaving applications). The issue I'm trying to fix in this thread is much narrower - we don't actually meet the guarantees for committed transactions (that only did nextval without generating any WAL). If we want to avoid duplicate sequence value even in those cases, ISTM that the transaction needs to wait for WAL flush and sync rep before nextval() returns the value. Of course, this might cause other issues like performance decrease, though. Right, something like that. But that'd hurt well-behaving applications, because by doing the wait earlier (in nextval, not at commit) it increases the probability of waiting. FWIW I'm not against improvements in this direction, but it goes way beyong fixing the original issue. On btrfs, it looks like this (the numbers next to nextval are the cache size, with 1 being the default): client test master log-all page-lsn log-all page-lsn --- 1 insert 829 807 802 97% 97% nextval/1 16491 814 16465 5% 100% nextval/32 24487 16462 24632 67% 101% nextval/64 24516 24918 24671 102% 101% nextval/128 32337 33178 32863 103% 102% client test master log-all page-lsn log-all page-lsn --- 4 insert 1577 1590 1546 101% 98% nextval/1 45607 1579 21220 3% 47% nextval/32 68453 49141 51170 72% 75% nextval/64 66928 65534 66408 98% 99% nextval/128 83502 81835 82576 98% 99% The results seem clearly better, I think. Thanks for benchmarking this! I agree that page-lsn is obviously better than log-all. For "insert" there's no drop at all (same as before), because as soon as a transaction generates any WAL, it has to flush/wait anyway. And for "nextval" there's a drop, but only with 4 clients, and it's much smaller (53% instead of 97%). And increasing the cache size eliminates even that. Yes, but 53% drop would be critial for some applications that don't want to increase the cache size for some reasons. So IMO it's better to provide the option to enable/disable that page-lsn approach. I disagree. This drop applies only to extremely simple transactions - once the transaction does any WAL write, it disappears. Even if the transaction does only a couple reads, it'll go away. I find it hard to believe there's any serious application doing this. So I think we should get it reliable (to not lose data after commit) first and then maybe see if we can improve this. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: extended stats on partitioned tables
On 1/15/22 19:35, Tomas Vondra wrote: On 1/15/22 06:11, Justin Pryzby wrote: On Mon, Dec 13, 2021 at 09:40:09PM +0100, Tomas Vondra wrote: 1) If the table is a separate relation (not part of an inheritance tree), this should make no difference. -> OK 2) If the table is using "old" inheritance, this reverts back to pre-regression behavior. So people will keep using the old statistics until the ANALYZE, and we need to tell them to ANALYZE or something. 3) If the table is using partitioning, it's guaranteed to be empty and there are no stats at all. Again, we should tell people to run ANALYZE. I think these can be mentioned in the commit message, which can end up in the minor release notes as a recommendation to rerun ANALYZE. Good point. I pushed the 0002 part and added a short paragraph suggesting ANALYZE might be necessary. I did not go into details about the individual cases, because that'd be too much for a commit message. Thanks for pushing 0001. Thanks for posting the patches! I've pushed the second part - attached are the two remaining parts. I'll wait a bit before pushing the rebased 0001 (which goes into master branch only). Not sure about 0002 - I'm not convinced the refactored ACL checks are an improvement, but I'll think about it. BTW when backpatching the first part, I had to decide what to do about tests. The 10 & 11 branches did not have the check_estimated_rows() function. I considered removing the tests, reworking the tests not to need the function, or adding the function. Ultimately I added the function, which seemed like the best option. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Windows: Wrong error message at connection termination
Sergey Shinderuk writes: > On 14.01.2022 13:01, Sergey Shinderuk wrote: >> Unexpectedly, this changes the error message: > ... > For the record, after more poking I realized that it depends on timing. > By injecting delays I can get any of the following from libpq: > * could not receive data from server: Software caused connection abort > * server closed the connection unexpectedly > * no connection to the server Thanks for the follow-up. At the moment I'm not planning to do anything pending the results of the other thread [1]. It seems likely though that we'll end up reverting this explicit-close behavior in the back branches, as the other changes involved look too invasive for back-patching. regards, tom lane [1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BOeoETZQ%3DQw5Ub5h3tmwQhBmDA%3DnuNO3KG%3DzWfUypFAw%40mail.gmail.com
Re: Support tab completion for upper character inputs in psql
Peter Eisentraut writes: > The rest of the patch seems ok in principle, since AFAICT enums are the > only query result in tab-complete.c that are not identifiers and thus > subject to case issues. I spent some time looking at this patch. I'm not very happy with it, for two reasons: 1. The downcasing logic in the patch bears very little resemblance to the backend's actual downcasing logic, which can be found in src/backend/parser/scansup.c's downcase_identifier(). Notably, the patch's restriction to only convert all-ASCII strings seems indefensible, because that's not how things really work. I fear we can't always exactly duplicate the backend's behavior, because it's dependent on the server's locale and encoding; but I think we should at least get it right in the common case where psql is using the same locale and encoding as the server. 2. I don't think there's been much thought about the larger picture of what is to be accomplished. Right now, we successfully tab-complete inputs that are prefixes of the canonical spelling (per quote_identifier) of the object's name, and don't try at all for non-canonical spellings. I'm on board with trying to allow some of the latter but I'm not sure that this patch represents much forward progress. To be definite about it, suppose we have a DB containing just two tables whose names start with "m", say mytab and mixedTab. Then: (a) m immediately completes mytab, ignoring mixedTab (b) "m immediately completes "mixedTab", ignoring mytab (c) "my fails to find anything (d) mi fails to find anything (e) M fails to find anything This patch proposes to improve case (e), but to my taste cases (a) through (c) are much bigger problems. It'd be nice if (d) worked too --- that'd require injecting a double-quote where the user had not typed one, but we already do the equivalent thing with single-quotes for file names, so why not? (Although after fighting with readline yesterday to try to get it to handle single-quoted enum labels sanely, I'm not 100% sure if (d) is possible.) Also, even for case (e), what we have with this patch is that it immediately completes mytab, ignoring mixedTab. Is that what we want? Another example is that miX fails to find anything, which seems like a POLA violation given that mY completes to mytab. I'm not certain how many of these alternatives can be supported without introducing ambiguity that wasn't there before (which'd manifest as failing to complete in cases where the existing code chooses an alternative just fine). But I really don't like the existing behavior for (b) and (c) --- I should be able to spell a name with double quotes if I want, without losing completion support. BTW, another thing that maybe we should think about is how this interacts with the pattern matching capability in \d and friends. If people can tab-complete non-canonical spellings, they might expect the same spellings to work in \d. I don't say that this patch has to fix that, but we might want to look and be sure we're not painting ourselves into a corner (especially since I see that we already perform tab-completion in that context). regards, tom lane
Re: [PATCH] Full support for index LP_DEAD hint bits on standby
On Sat, Jan 15, 2022 at 08:39:14PM +0300, Michail Nikolaev wrote: > Hello, Junien. > > Thanks for your attention. > > > The cfbot reports that this patch is currently failing at least on > > Linux and Windows, e.g. https://cirrus-ci.com/task/6532060239101952. > > Fixed. It was the issue with the test - hangs on Windows because of > psql + spurious vacuum sometimes. It looks like there's still a server crash caused the CI or client to hang. https://cirrus-ci.com/task/6350310141591552 2022-01-13 06:31:04.182 GMT [8636][walreceiver] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2022-01-13 06:31:04.182 GMT [6848][startup] LOG: invalid record length at 0/3014B58: wanted 24, got 0 2022-01-13 06:31:04.228 GMT [8304][walreceiver] FATAL: could not connect to the primary server: connection to server on socket "C:/Users/ContainerAdministrator/AppData/Local/Temp/_7R9Pa5CwW/.s.PGSQL.58307" failed: Connection refused (0x274D/10061)
Re: extended stats on partitioned tables
On 1/15/22 06:11, Justin Pryzby wrote: On Mon, Dec 13, 2021 at 09:40:09PM +0100, Tomas Vondra wrote: 1) If the table is a separate relation (not part of an inheritance tree), this should make no difference. -> OK 2) If the table is using "old" inheritance, this reverts back to pre-regression behavior. So people will keep using the old statistics until the ANALYZE, and we need to tell them to ANALYZE or something. 3) If the table is using partitioning, it's guaranteed to be empty and there are no stats at all. Again, we should tell people to run ANALYZE. I think these can be mentioned in the commit message, which can end up in the minor release notes as a recommendation to rerun ANALYZE. Good point. I pushed the 0002 part and added a short paragraph suggesting ANALYZE might be necessary. I did not go into details about the individual cases, because that'd be too much for a commit message. Thanks for pushing 0001. Thanks for posting the patches! I've pushed the second part - attached are the two remaining parts. I'll wait a bit before pushing the rebased 0001 (which goes into master branch only). Not sure about 0002 - I'm not convinced the refactored ACL checks are an improvement, but I'll think about it. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom e065e54d1961a4e295ebe77febbe6a3307d142b5 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Sun, 12 Dec 2021 00:07:27 +0100 Subject: [PATCH 1/2] Add stxdinherit flag to pg_statistic_ext_data The support for extended statistics on inheritance trees was somewhat problematic, because the catalog pg_statistic_ext_data did not have a flag identifying whether the data was built with/without data for the child relations. For partitioned tables we've been able to work around this because the non-leaf relations can't contain data, so there's really just one set of statistics to store. But for regular inheritance trees we had to pick, and there is no clearly better option - we need both. This introduces the pg_statistic_ext_data.stxdinherit flag, analogous to pg_statistic.stainherit, and modifies analyze to build statistics for both cases. This relaxes the relationship between the two catalogs - until now we've created the pg_statistic_ext_data one when creating the statistics, and then only updated it later. There was always 1:1 relationship between rows in the two catalogs. With this change we no longer insert any data rows while creating statistics, because we don't know which flag value to create, and it seems wasteful to initialize both for every relation. The there may be 0, 1 or 2 data rows for each statistics definition. Patch by me, with extensive improvements and fixes by Justin Pryzby. Author: Tomas Vondra, Justin Pryzby Reviewed-by: Tomas Vondra, Justin Pryzby Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com --- doc/src/sgml/catalogs.sgml | 23 +++ src/backend/catalog/system_views.sql| 2 + src/backend/commands/analyze.c | 28 +--- src/backend/commands/statscmds.c| 72 +- src/backend/optimizer/util/plancat.c| 147 src/backend/statistics/dependencies.c | 22 ++- src/backend/statistics/extended_stats.c | 75 +- src/backend/statistics/mcv.c| 9 +- src/backend/statistics/mvdistinct.c | 5 +- src/backend/utils/adt/selfuncs.c| 37 + src/backend/utils/cache/syscache.c | 6 +- src/include/catalog/pg_statistic_ext_data.h | 4 +- src/include/commands/defrem.h | 1 + src/include/nodes/pathnodes.h | 1 + src/include/statistics/statistics.h | 11 +- src/test/regress/expected/rules.out | 2 + src/test/regress/expected/stats_ext.out | 31 +++-- src/test/regress/sql/stats_ext.sql | 13 +- 18 files changed, 254 insertions(+), 235 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 03e2537b07d..2aeb2ef346e 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -7521,6 +7521,19 @@ SCRAM-SHA-256$:&l created with CREATE STATISTICS. + + Normally there is one entry, with stxdinherit = + false, for each statistics object that has been analyzed. + If the table has inheritance children, a second entry with + stxdinherit = true is also created. + This row represents the statistics object over the inheritance tree, i.e., + statistics for the data you'd see with + SELECT * FROM table*, + whereas the stxdinherit = false row + represents the results of + SELECT * FROM ONLY table. + + Like pg_statistic, pg_statistic_ext_data should not be @@ -7560,6 +7573,16 @@ SCRAM-SHA-256$ :&l + + + stxdinherit bool + + + If true, the stats include inheritance child columns, not j
Re: [PATCH] Full support for index LP_DEAD hint bits on standby
Hello, Junien. Thanks for your attention. > The cfbot reports that this patch is currently failing at least on > Linux and Windows, e.g. https://cirrus-ci.com/task/6532060239101952. Fixed. It was the issue with the test - hangs on Windows because of psql + spurious vacuum sometimes. > I'm switching this patch on Waiting on Author. I have tested it multiple times on my Github repo, seems to be stable now. Switching back to Ready for committer. Best regards. Michail. From 9372bac9b56d27cf993e9d1fa66127c86b51f25c Mon Sep 17 00:00:00 2001 From: Michail Nikolaev Date: Sat, 15 Jan 2022 16:21:51 +0300 Subject: [PATCH v7 1/3] code --- src/backend/access/common/bufmask.c | 25 src/backend/access/gist/gistget.c| 43 +++-- src/backend/access/gist/gistxlog.c | 15 + src/backend/access/hash/hash.c | 4 +- src/backend/access/hash/hash_xlog.c | 17 + src/backend/access/hash/hashsearch.c | 18 -- src/backend/access/hash/hashutil.c | 33 +- src/backend/access/heap/heapam.c | 42 +--- src/backend/access/heap/heapam_handler.c | 5 +- src/backend/access/index/genam.c | 20 +++--- src/backend/access/index/indexam.c | 81 +--- src/backend/access/nbtree/nbtinsert.c| 22 +-- src/backend/access/nbtree/nbtree.c | 4 +- src/backend/access/nbtree/nbtsearch.c| 14 +++- src/backend/access/nbtree/nbtutils.c | 33 +- src/backend/access/nbtree/nbtxlog.c | 16 + src/backend/access/table/tableam.c | 4 +- src/backend/access/transam/rmgr.c| 4 +- src/backend/access/transam/xlogutils.c | 6 ++ src/backend/storage/ipc/standby.c| 6 ++ src/bin/pg_rewind/parsexlog.c| 2 +- src/bin/pg_waldump/rmgrdesc.c| 2 +- src/include/access/bufmask.h | 1 + src/include/access/gist.h| 5 ++ src/include/access/gistxlog.h| 1 + src/include/access/hash.h| 2 + src/include/access/hash_xlog.h | 1 + src/include/access/heapam.h | 2 +- src/include/access/nbtree.h | 2 + src/include/access/nbtxlog.h | 1 + src/include/access/relscan.h | 15 - src/include/access/rmgr.h| 2 +- src/include/access/rmgrlist.h| 46 +++--- src/include/access/tableam.h | 14 ++-- src/include/access/xlog_internal.h | 4 ++ 35 files changed, 422 insertions(+), 90 deletions(-) diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c index 4e953bfd61..22026482ad 100644 --- a/src/backend/access/common/bufmask.c +++ b/src/backend/access/common/bufmask.c @@ -128,3 +128,28 @@ mask_page_content(Page page) memset(&((PageHeader) page)->pd_upper, MASK_MARKER, sizeof(uint16)); } + +/* + * mask_lp_dead + * + * In some index AMs, line pointer flags can be modified without emitting any + * WAL record. Sometimes it is required to mask LP_DEAD flags set on primary to + * set own values on standby. + */ +void +mask_lp_dead(Page page) +{ + OffsetNumber offnum, + maxoff; + + maxoff = PageGetMaxOffsetNumber(page); + for (offnum = FirstOffsetNumber; + offnum <= maxoff; + offnum = OffsetNumberNext(offnum)) + { + ItemId itemId = PageGetItemId(page, offnum); + + if (ItemIdHasStorage(itemId) && ItemIdIsDead(itemId)) + itemId->lp_flags = LP_NORMAL; + } +} diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index adbf622c83..1905c04c51 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -14,6 +14,7 @@ */ #include "postgres.h" +#include "access/bufmask.h" #include "access/genam.h" #include "access/gist_private.h" #include "access/relscan.h" @@ -49,6 +50,7 @@ gistkillitems(IndexScanDesc scan) Assert(so->curBlkno != InvalidBlockNumber); Assert(!XLogRecPtrIsInvalid(so->curPageLSN)); Assert(so->killedItems != NULL); + Assert(so->numKilled > 0); buffer = ReadBuffer(scan->indexRelation, so->curBlkno); if (!BufferIsValid(buffer)) @@ -62,8 +64,13 @@ gistkillitems(IndexScanDesc scan) * If page LSN differs it means that the page was modified since the last * read. killedItems could be not valid so LP_DEAD hints applying is not * safe. + * + * Another case - standby was promoted after start of current transaction. + * It is not required for correctness, but it is better to just skip + * everything. */ - if (BufferGetLSNAtomic(buffer) != so->curPageLSN) + if ((BufferGetLSNAtomic(buffer) != so->curPageLSN) || + (scan->xactStartedInRecovery && !RecoveryInProgress())) { UnlockReleaseBuffer(buffer); so->numKilled = 0; /* reset counter */ @@ -71,6 +78,20 @@ gistkillitems(IndexScanDesc scan) } Assert(GistPageIsLeaf(page)); + if (GistPageHasLpSafeOnStandby(page) && !scan->xactStartedInRecovery) + { + /* Seems like server was promoted some time ago, + *
Re: MultiXact/SLRU buffers configuration
On Sat, Jan 15, 2022 at 12:16:59PM +0500, Andrey Borodin wrote: > > 15 янв. 2022 г., в 03:20, Shawn Debnath написал(а): > > On Fri, Jan 14, 2022 at 05:28:38PM +0800, Julien Rouhaud wrote: > >>> PFA rebase of the patchset. Also I've added a patch to combine > >>> page_number, page_status, and page_dirty together to touch less > >>> cachelines. > >> > >> The cfbot reports some errors on the latest version of the patch: > >> > >> https://cirrus-ci.com/task/6121317215764480 > >> [...] > >> Could you send a new version? In the meantime I will switch the patch > >> status to Waiting on Author. > > > > I was planning on running a set of stress tests on these patches. Could > > we confirm which ones we plan to include in the commitfest? > > Many thanks for your interest. Here's the latest version. This is failing to compile under linux and windows due to bitfield syntax. http://cfbot.cputube.org/andrey-borodin.html And compile warnings: slru.c: In function ‘SlruAdjustNSlots’: slru.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] 161 | int nbanks = 1; | ^~~ slru.c: In function ‘SimpleLruReadPage_ReadOnly’: slru.c:530:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] 530 | int bankstart = (pageno & shared->bank_mask) * shared->bank_size; | ^~~ Note that you can test the CI result using any github account without waiting for the cfbot. See ./src/tools/ci/README. -- Justin
Re: Refactoring of compression options in pg_basebackup
On Fri, Jan 14, 2022 at 10:53 PM Robert Haas wrote: > > On Thu, Jan 13, 2022 at 10:23 PM Michael Paquier wrote: > > Using --compression-level=NUMBER and --server-compress=METHOD to > > specify a server-side compression method with a level is fine by me, > > but I find the reuse of --compress to specify a compression method > > confusing as it maps with the past option we have kept in > > pg_basebackup for a couple of years now. Based on your suggested set > > of options, we could then have a --client-compress=METHOD and > > --compression-level=NUMBER to specify a client-side compression method > > with a level. If we do that, I guess that we should then: > > 1) Block the combination of --server-compress and --client-compress. > > 2) Remove the existing -Z/--compress and -z/--gzip. > > I could live with that. I'm not sure that --client-compress instead of > reusing --compress is going to be better ... but I don't think it's > awful so much as just not my first choice. I also don't think it would > be horrid to leave -z, --gzip, and -Z as shorthands for the > --client-compress=gzip with --compression-level also in the last case, > instead of removing all that stuff. It never makes sense to compress *both* in server and client, right? One argument in that case for using --compress would be that we could have that one take options like --compress=gzip (use gzip in the client) and --compress=server-lz4 (use lz4 on the server), and automatically make it impossible to do both. And maybe also accept --compress=client-gzip (which would be the same as just specifying gzip). That would be an argument for actually keeping --compress and not using --client-compress, because obviously it would be silly to have --client-compress=server-lz4... And yes, I agree that considering both server and client compression even if we don't have server compression makes sense, since we don't want to change things around again when we get it. We could perhaps also consider accepting --compress=gzip:7 (:) as a way to specify the level, for both client and server side. I think having --client-compress and --server-compress separately but having --compression-level *not* being separate would be confusing and I *think* that's what the current patch proposes? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud wrote: > > Hi, > > On Sat, Jan 15, 2022 at 02:04:12PM +0530, Bharath Rupireddy wrote: > > > > We had an issue where there were many mapping files generated during > > the crash recovery and end-of-recovery checkpoint was taking a lot of > > time. We had to manually intervene and delete some of the mapping > > files (although it may not sound sensible) to make end-of-recovery > > checkpoint faster. Because of the race condition between manual > > deletion and checkpoint deletion, the unlink error occurred which > > crashed the server and the server entered the recovery again wasting > > the entire earlier recovery work. > > Maybe I'm missing something but wouldn't > https://commitfest.postgresql.org/36/3448/ better solve the problem? The error can cause the new background process proposed there in that thread to restart, which is again costly. Since we have LOG-only and continue behavior in CheckPointSnapBuild already, having the same behavior for CheckPointLogicalRewriteHeap helps a lot. Regards, Bharath Rupireddy.
Re: Skipping logical replication transactions on subscriber side
On Fri, Jan 14, 2022 at 5:35 PM vignesh C wrote: > > Thanks for the updated patch, few minor comments: > 1) Should "SKIP" be "SKIP (" here: > @@ -1675,7 +1675,7 @@ psql_completion(const char *text, int start, int end) > /* ALTER SUBSCRIPTION */ > else if (Matches("ALTER", "SUBSCRIPTION", MatchAny)) > COMPLETE_WITH("CONNECTION", "ENABLE", "DISABLE", "OWNER TO", > - "RENAME TO", "REFRESH > PUBLICATION", "SET", > + "RENAME TO", "REFRESH > PUBLICATION", "SET", "SKIP", > Won't the another rule as follows added by patch sufficient for what you are asking? + /* ALTER SUBSCRIPTION SKIP */ + else if (Matches("ALTER", "SUBSCRIPTION", MatchAny, "SKIP")) + COMPLETE_WITH("("); I might be missing something but why do you think the handling of SKIP be any different than what we are doing for SET? -- With Regards, Amit Kapila.
Re: Skipping logical replication transactions on subscriber side
On Fri, Jan 14, 2022 at 7:49 AM Masahiko Sawada wrote: > > I agree with all the comments above. I've attached an updated patch. > Review comments 1. + + + In this case, you need to consider changing the data on the subscriber so that it The starting of this sentence doesn't make sense to me. How about changing it like: "To resolve conflicts, you need to ... 2. + pg_subscription.subskipxid) + is cleared. See for + the details of logical replication conflicts. + + + + skip_option specifies options for this operation. + The supported option is: + + + +xid (xid) + + + Specifies the ID of the transaction whose changes are to be skipped + by the logical replication worker. Setting NONE resets + the transaction ID. + Empty spaces after line finish are inconsistent. I personally use a single space before a new line but I see that others use two spaces and the nearby documentation also uses two spaces in this regard so I am fine either way but let's be consistent. 3. + case ALTER_SUBSCRIPTION_SKIP: + { + if (!superuser()) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg("must be superuser to skip transaction"))); + + parse_subscription_options(pstate, stmt->options, SUBOPT_XID, &opts); + + if (IsSet(opts.specified_opts, SUBOPT_XID)) .. .. Is there a case when the above 'if (IsSet(..' won't be true? If not, then probably there should be Assert instead of 'if'. 4. +static TransactionId skipping_xid = InvalidTransactionId; I find this variable name bit odd. Can we name it skip_xid? 5. + * skipping_xid is valid if we are skipping all data modification changes + * (INSERT, UPDATE, etc.) of the specified transaction at MySubscription->skipxid. + * Once we start skipping changes, we don't stop it until we skip all changes I think it would be better to write the first line of comment as: "We enable skipping all data modification changes (INSERT, UPDATE, etc.) for the subscription if the user has specified skip_xid. Once we ..." 6. +static void +maybe_start_skipping_changes(TransactionId xid) +{ + Assert(!is_skipping_changes()); + Assert(!in_remote_transaction); + Assert(!in_streamed_transaction); + + /* Make sure subscription cache is up-to-date */ + maybe_reread_subscription(); Why do we need to update the cache here by calling maybe_reread_subscription() and at other places in the patch? It is sufficient to get the skip_xid value at the start of the worker via GetSubscription(). 7. In maybe_reread_subscription(), isn't there a need to check whether skip_xid is changed where we exit and launch the worker and compare other subscription parameters? 8. +static void +clear_subscription_skip_xid(TransactionId xid, XLogRecPtr origin_lsn, + TimestampTz origin_timestamp) +{ + Relation rel; + Form_pg_subscription subform; + HeapTuple tup; + bool nulls[Natts_pg_subscription]; + bool replaces[Natts_pg_subscription]; + Datum values[Natts_pg_subscription]; + + memset(values, 0, sizeof(values)); + memset(nulls, false, sizeof(nulls)); + memset(replaces, false, sizeof(replaces)); + + if (!IsTransactionState()) + StartTransactionCommand(); + + LockSharedObject(SubscriptionRelationId, MySubscription->oid, 0, + AccessShareLock); It is important to add a comment as to why we need a lock here. 9. + * needs to be set subskipxid again. We can reduce the possibility by + * logging a replication origin WAL record to advance the origin LSN + * instead but it doesn't seem to be worth since it's a very minor case. You can also add here that there is no way to advance origin_timestamp so that would be inconsistent. 10. +clear_subscription_skip_xid(TransactionId xid, XLogRecPtr origin_lsn, + TimestampTz origin_timestamp) { .. .. + if (!IsTransactionState()) + StartTransactionCommand(); .. .. + CommitTransactionCommand(); .. } The transaction should be committed in this function if it is started here otherwise it should be the responsibility of the caller to commit it. -- With Regards, Amit Kapila.
Re: [PATCH] Allow multiple recursive self-references
> On 14. Jan 2022, at 13:21, Peter Eisentraut wrote: > > There is nothing in there that says that certain branches of the UNION in a recursive query mean certain things. In fact, it doesn't even require the query to contain a UNION at all. It just says to iterate on evaluating the query until a fixed point is reached. I think this supports my claim that the associativity and commutativity of a UNION in a recursive query still apply. > > This is all very complicated, so I don't claim this to be authoritative, but I just don't see anything in the spec that supports what you are saying. Please also have a look at SQL:2016, 7.16 General Rules 2) c), which defines the evaluation semantics of recursive queries. I think that this part of the SQL standard refutes your argument. Best, -- Denis
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Hi, On Sat, Jan 15, 2022 at 02:04:12PM +0530, Bharath Rupireddy wrote: > > We had an issue where there were many mapping files generated during > the crash recovery and end-of-recovery checkpoint was taking a lot of > time. We had to manually intervene and delete some of the mapping > files (although it may not sound sensible) to make end-of-recovery > checkpoint faster. Because of the race condition between manual > deletion and checkpoint deletion, the unlink error occurred which > crashed the server and the server entered the recovery again wasting > the entire earlier recovery work. Maybe I'm missing something but wouldn't https://commitfest.postgresql.org/36/3448/ better solve the problem?
Re: psql - add SHOW_ALL_RESULTS option
Hello Andres, The reason this test constantly fails on cfbot windows is a use-after-free bug. Indeed! Thanks a lot for the catch and the debug! The ClearOrSaveResult function is quite annoying because it may or may not clear the result as a side effect. Attached v14 moves the status extraction before the possible clear. I've added a couple of results = NULL after such calls in the code. -- Fabien.diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index e0abe34bb6..8f7f93172a 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -50,8 +50,28 @@ BEGIN \; SELECT 2.0 AS "float" \; SELECT 'world' AS "text" \; COMMIT; + float +--- + 2.0 +(1 row) + + text +--- + world +(1 row) + -- compound with empty statements and spurious leading spacing \;\; SELECT 3 + 3 \;\;\; SELECT ' ' || ' !' \;\; SELECT 1 + 4 \;; + ?column? +-- +6 +(1 row) + + ?column? +-- + ! +(1 row) + ?column? -- 5 @@ -61,6 +81,11 @@ COMMIT; SELECT 1 + 1 + 1 AS "add" \gset SELECT :add + 1 + 1 AS "add" \; SELECT :add + 1 + 1 AS "add" \gset + add +- + 5 +(1 row) + -- set operator SELECT 1 AS i UNION SELECT 2 ORDER BY i; i diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 1ab200a4ad..0a22850912 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -127,18 +127,11 @@ echo '\x \\ SELECT * FROM foo;' | psql commands included in the string to divide it into multiple transactions. (See for more details about how the server handles multi-query strings.) - Also, psql only prints the - result of the last SQL command in the string. - This is different from the behavior when the same string is read from - a file or fed to psql's standard input, - because then psql sends - each SQL command separately. - Because of this behavior, putting more than one SQL command in a - single -c string often has unexpected results. - It's better to use repeated -c commands or feed - multiple commands to psql's standard input, + If having several commands executed in one transaction is not desired, + use repeated -c commands or feed multiple commands to + psql's standard input, either using echo as illustrated above, or via a shell here-document, for example: @@ -3570,10 +3563,6 @@ select 1\; select 2\; select 3; commands included in the string to divide it into multiple transactions. (See for more details about how the server handles multi-query strings.) -psql prints only the last query result -it receives for each request; in this example, although all -three SELECTs are indeed executed, psql -only prints the 3. @@ -4160,6 +4149,18 @@ bar +SHOW_ALL_RESULTS + + +When this variable is set to off, only the last +result of a combined query (\;) is shown instead of +all of them. The default is on. The off behavior +is for compatibility with older versions of psql. + + + + + SHOW_CONTEXT diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 3503605a7d..47eabcbb8e 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -34,6 +34,8 @@ static bool DescribeQuery(const char *query, double *elapsed_msec); static bool ExecQueryUsingCursor(const char *query, double *elapsed_msec); static bool command_no_begin(const char *query); static bool is_select_command(const char *query); +static int SendQueryAndProcessResults(const char *query, double *pelapsed_msec, + bool is_watch, const printQueryOpt *opt, FILE *printQueryFout, bool *tx_ended); /* @@ -354,7 +356,7 @@ CheckConnection(void) * Returns true for valid result, false for error state. */ static bool -AcceptResult(const PGresult *result) +AcceptResult(const PGresult *result, bool show_error) { bool OK; @@ -385,7 +387,7 @@ AcceptResult(const PGresult *result) break; } - if (!OK) + if (!OK && show_error) { const char *error = PQerrorMessage(pset.db); @@ -473,6 +475,18 @@ ClearOrSaveResult(PGresult *result) } } +/* + * Consume all results + */ +static void +ClearOrSaveAllResults() +{ + PGresult *result; + + while ((result = PQgetResult(pset.db)) != NULL) + ClearOrSaveResult(result); +} + /* * Print microtiming output. Always print raw milliseconds; if the interval @@ -573,7 +587,7 @@ PSQLexec(const char *query) ResetCancelConn(); - if (!AcceptResult(res)) + if (!AcceptResult(res, true)) { ClearOrSaveResult(res); res = NULL; @@ -596,11 +610,8 @@ int PSQLexecWatch(const char *query, c
Re: pg_replslotdata - a tool for displaying replication slot information
Hi, On Mon, Dec 06, 2021 at 07:16:12PM +, Bossart, Nathan wrote: > On 12/5/21, 11:10 PM, "Michael Paquier" wrote: > > On Thu, Dec 02, 2021 at 08:32:08AM +0530, Bharath Rupireddy wrote: > >> On Thu, Dec 2, 2021 at 4:22 AM Andres Freund wrote: > I don't have any other compelling use- cases at the moment, but I will > say > that it is typically nice from an administrative standpoint to be able to > inspect things like this without logging into a running server. > >>> > >>> Weighed against the cost of maintaining (including documentation) a > >>> separate > >>> tool this doesn't seem sufficient reason. > >> > >> IMHO, this shouldn't be a reason to not get something useful (useful > >> IMO and few others in this thread) into the core. The maintenance of > >> the tools generally is low compared to the core server features once > >> they get reviewed and committed. > > > > Well, a bit less maintenance is always better than more maintenance. > > An extra cost that you may be missing is related to the translation of > > the documentation, as well as the translation of any new strings this > > would require. FWIW, I don't directly see a use for this tool that > > could not be solved with an online server. > > Bharath, perhaps you should maintain this outside of core PostgreSQL > for now. If some compelling use-cases ever surface that make it seem > worth the added maintenance burden, this thread could probably be > revisited. Ironically, the patch is currently failing due to translation problem: https://cirrus-ci.com/task/5467034313031680 [19:12:28.179] su postgres -c "make -s -j${BUILD_JOBS} world-bin" [19:12:44.270] make[3]: *** No rule to make target 'po/cs.po', needed by 'po/cs.mo'. Stop. [19:12:44.270] make[2]: *** [Makefile:44: all-pg_replslotdata-recurse] Error 2 [19:12:44.270] make[2]: *** Waiting for unfinished jobs [19:12:44.499] make[1]: *** [Makefile:42: all-bin-recurse] Error 2 [19:12:44.499] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2 Looking at the thread, I see support from 3 people: - Bharath - Japin - Satyanarayana while 3 committers think that the extra maintenance effort isn't worth the usage: - Peter E. - Andres - Michael and a +0.5 from Nathan IIUC. I also personally don't think that this worth the maintenance effort. This tool being entirely client side, there's no problem with maintaining it on a separate repository, as mentioned by Nathan, including using it on the cloud providers that provides access to at least the data file. Another pro of the external repo is that the tool can be made available immediately and for older releases. Since 3 committers voted against it I think that the patch should be closed as "Rejected". I will do that in a few days unless there's some compelling objection by then.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
On Fri, Jan 14, 2022 at 1:08 AM Andres Freund wrote: > > Hi, > > On 2021-12-31 18:12:37 +0530, Bharath Rupireddy wrote: > > Currently the server is erroring out when unable to remove/parse a > > logical rewrite file in CheckPointLogicalRewriteHeap wasting the > > amount of work the checkpoint has done and preventing the checkpoint > > from finishing. > > This seems like it'd make failures to remove the files practically > invisible. Which'd have it's own set of problems? > > What motivated proposing this change? We had an issue where there were many mapping files generated during the crash recovery and end-of-recovery checkpoint was taking a lot of time. We had to manually intervene and delete some of the mapping files (although it may not sound sensible) to make end-of-recovery checkpoint faster. Because of the race condition between manual deletion and checkpoint deletion, the unlink error occurred which crashed the server and the server entered the recovery again wasting the entire earlier recovery work. In summary, with the changes (emitting LOG-only messages for unlink failures and continuing with the other files) proposed for CheckPointLogicalRewriteHeap in this thread and the existing code in CheckPointSnapBuild, I'm sure it will help not waste the recovery that's has been done in case unlink fails for any reasons. Regards, Bharath Rupireddy.
Re: missing indexes in indexlist with partitioned tables
Hi, On Thu, Oct 28, 2021 at 01:44:31PM +, Arne Roland wrote: > > The attached patch takes that route. I'd appreciate feedback! The cfbot reports that the patch doesn't apply anymore: http://cfbot.cputube.org/patch_36_3452.log === Applying patches on top of PostgreSQL commit ID 025b920a3d45fed441a0a58fdcdf05b321b1eead === === applying patch ./partIndexlistClean.patch patching file src/backend/access/heap/vacuumlazy.c Hunk #1 FAILED at 2375. 1 out of 1 hunk FAILED -- saving rejects to file src/backend/access/heap/vacuumlazy.c.rej patching file src/backend/access/transam/xlog.c Hunk #1 succeeded at 911 with fuzz 1 (offset 5 lines). Hunk #2 FAILED at 5753. [...] 1 out of 6 hunks FAILED -- saving rejects to file src/backend/access/transam/xlog.c.rej [...] patching file src/backend/commands/publicationcmds.c Hunk #1 FAILED at 813. 1 out of 1 hunk FAILED -- saving rejects to file src/backend/commands/publicationcmds.c.rej patching file src/include/nodes/pathnodes.h Hunk #9 FAILED at 1516. [...] 1 out of 17 hunks FAILED -- saving rejects to file src/include/nodes/pathnodes.h.rej Could you send a rebased version? In the meantime I will switch the cf entry to Waiting on Author.
Re: pg14 psql broke \d datname.nspname.relname
Hi, On Tue, Dec 21, 2021 at 10:58:39AM -0800, Mark Dilger wrote: > > Rebased patch attached: This version doesn't apply anymore: http://cfbot.cputube.org/patch_36_3367.log === Applying patches on top of PostgreSQL commit ID 5513dc6a304d8bda114004a3b906cc6fde5d6274 === === applying patch ./v3-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch [...] 1 out of 52 hunks FAILED -- saving rejects to file src/bin/psql/describe.c.rej Could you send a rebased version? In the meantime I will switch the cf entry to Waiting on Author.
Re: Pre-allocating WAL files
On Sat, Jan 15, 2022 at 1:36 PM Bharath Rupireddy wrote: > > On Thu, Jan 6, 2022 at 3:39 AM Bossart, Nathan wrote: > > > > On 12/30/21, 3:52 AM, "Maxim Orlov" wrote: > > > I did check the patch too and found it to be ok. Check and check-world > > > are passed. > > > Overall idea seems to be good in my opinion, but I'm not sure where is > > > the optimal place to put the pre-allocation. > > > > > > On Thu, Dec 30, 2021 at 2:46 PM Pavel Borisov > > > wrote: > > >> I've checked the patch v7. It applies cleanly, code is good, check-world > > >> tests passed without problems. > > >> I think it's ok to use checkpointer for this and the overall patch can > > >> be committed. But the seen performance gain makes me think again before > > >> adding this feature. I did tests myself a couple of months ago and got > > >> similar results. > > >> Really don't know whether is it worth the effort. > > > > Thank you both for your review. > > It may have been discussed earlier, let me ask this here - IIUC the > whole point of pre-allocating WAL files is that creating new WAL files > of wal_segment_size requires us to write zero-filled empty pages to > the disk which is costly. With the advent of > fallocate/posix_fallocate, isn't file allocation going to be much > faster on platforms where fallocate is supported? IIRC, the > "Asynchronous and "direct" IO support for PostgreSQL." has a way to > use fallocate. If at all, we move ahead and use fallocate, then the > whole point of pre-allocating WAL files becomes unnecessary? > > Having said above, the idea of pre-allocating WAL files is still > relevant, given the portability of fallocate/posix_fallocate. Adding one more point: do we have any numbers like how much total time WAL files allocation usually takes, maybe under a high-write load server? Regards, Bharath Rupireddy.
Re: Pre-allocating WAL files
On Thu, Jan 6, 2022 at 3:39 AM Bossart, Nathan wrote: > > On 12/30/21, 3:52 AM, "Maxim Orlov" wrote: > > I did check the patch too and found it to be ok. Check and check-world are > > passed. > > Overall idea seems to be good in my opinion, but I'm not sure where is the > > optimal place to put the pre-allocation. > > > > On Thu, Dec 30, 2021 at 2:46 PM Pavel Borisov > > wrote: > >> I've checked the patch v7. It applies cleanly, code is good, check-world > >> tests passed without problems. > >> I think it's ok to use checkpointer for this and the overall patch can be > >> committed. But the seen performance gain makes me think again before > >> adding this feature. I did tests myself a couple of months ago and got > >> similar results. > >> Really don't know whether is it worth the effort. > > Thank you both for your review. It may have been discussed earlier, let me ask this here - IIUC the whole point of pre-allocating WAL files is that creating new WAL files of wal_segment_size requires us to write zero-filled empty pages to the disk which is costly. With the advent of fallocate/posix_fallocate, isn't file allocation going to be much faster on platforms where fallocate is supported? IIRC, the "Asynchronous and "direct" IO support for PostgreSQL." has a way to use fallocate. If at all, we move ahead and use fallocate, then the whole point of pre-allocating WAL files becomes unnecessary? Having said above, the idea of pre-allocating WAL files is still relevant, given the portability of fallocate/posix_fallocate. Regards, Bharath Rupireddy.